Migrating blog comments from Disqus to Staticman

In this age of controversial social media platforms, having a blog is one of the few remaining opportunities to keep ownership over your content. There are several good solutions around to publish and host one, but Jekyll and GitHub Pages are a great (and free) combination for people like myself who are happy hacking a little bit - except for not providing a comment system out of the box.

For years, I filled that gap with Disqus - a service that hosts your comments in exchange for a bit of advertising space. It was great at first, but over time ads became heavier, users were pushed towards creating accounts and abusively tracked. Moreover, hosting comments externally affects search engine indexing, and over time this all caused people to comment less and less, so I decided to bring the comments back to my blog.

A comment system isn’t a very complicated app, but it would be another database that I’d have to care for, and a departure from Jekyll’s static generation model that served me so well. The ideal solution would be to store comments in the same place I store posts: a trusty GitHub repository. Jekyll can read data files to show the comments, and all I needed was to host an app somewhere that would create those files when a new comment is written.

I almost coded that app myself, but Eduardo Bouças wrote and kindly shared Staticman, which does precisely that. Sure, I still had to host/configure it, adapt the blog to send it the comments (and read them from the repository files), and migrate the old comments from Disqus. These things combined took me a couple days, so I thought I’d share the process here.

Hosting Staticman

It’s a good idea to first familiarize oneself with how Staticman works, but the gist is that your blog’s “new comment” form sends the POST to Staticman (instead of sending to the blog itself); Staticman has a GitHub API key that allows it to add the data file containing the post data to your website. That will trigger a rebuild (in the same way that a new blog post would), and Jekyll will show the new comment.

A sequence diagram illustrating how the website (browser), the Staticman instance and the git provider (GitHub) interact: the browser POSTs a content submission to the Staticman, which grabs the config from GitHub, creates the pull request on it and sends back an OK response; once a merge happens, GitHub deploys the site with the newly submitted content

If you want to moderate the comments (like I do), it can create a pull request instead of merging the data directly. You review the pull request and merge it to approve, or discard to reject - a very familiar environment for most programmers these days. It supports other git providers such as GitLab, but I’ll focus on GitHub.

You will need to host it somewhere. It’s a lightweight, database-less Node.js app, so there are lots of options and not a lot of configuration involved. My choice is a DigitalOcean droplet (you can check my recent blog post on cost-effective hosting for details).

The official instructions are clear once you figure the moving parts. Your server will contain two RSA keys: a GitHub API key so the server can act on your behalf), and a private key that (I suppose) is used to store local secrets.

A few gotchas I ran into:

There are two configuration files: the API configuration (config.production.json) and the site configuration (staticman.xml). The first contains secrets such as API keys and should only reside on your Staticman server; the other goes on your blog’s repository, telling Staticman what to do when it receives a comment, and can be public (here is mine).
The docs currently state that the GitHub Application ID in config.production.json is githubAppId; actually, it’s gitHubAppID.
Both RSA keys were triggering a node-rsa error. In order to fix it, I changed the code (here and here).
Thanks to GitHub’s support for Let’s Encrypt, my blog runs over https (TLS), which means it cannot post data to a regular http server. My go-to solution for those cases is to run the application behind nginx, configuring it to terminate the secure connection and use certificates that Let’s Encrypt provides for free.

If you use Ansible (or are comfortable reading Ansible files), here is the playbook that installs/configures the Staticman and nginx, with Supervisor to keep it running and Certbot to keep the certificates up to date.

Creating and showing comments

At this point you should have a working Staticman server, so the next step is to add a form to your blog that sends the comment to it. The form should have the same fields that Staticman expects, and you can use JavaScript to send the data to the server and show the comment immediately after it’s created.

I based mine on a few examples I saw online, most notably this one. It uses jQuery to send the data to the server and show the comment - not my choice in 2024, but I already have legacy JQuery code on the blog anyway, so I rolled with it.

You will know it is working when a post results in a pull request on your blog’s repository like this one. Merging it will add the comment to your blog’s _data directory, and the next step is to show it in the post’s page.

Again I borrowed a lot from Avglinux’s example, fixing a couple issues with the threaded replies and adjusting to my blog’s style. I also replaced the Liquid strip_html filter with a custom one that sanitizes it instead, so I can allow some HTML tags alongside the Markdown, while still keep the blog safe from JavaScript injection, cross-site scripting attacks and the like.

This PR contains all the code mentioned above; feel free to peruse and copy any of them; possibly checking the latest versions as this post gets older.

Migrating comments from Disqus to Staticman

With this in place, all that was left to do was to migrate the comments from Disqus.

Disqus allows you to export the comments to an XML file (documented here), but in order to import them anywhere else, a conversion is needed. I found a few recipes (1, 2, 3, 4) online, but none of those worked for me, so I threw together some JavaScript code that does the job:

You can just run it, making the needed adjustments for your staticman.yml configuration (e.g., if you changed the filename structure or added other fields that you want to import or generate) and put the generated comments directory under your _data directory in your blog’s repository, like I did here.

The code documents some of the shenanigans I found (odd terminology, invalid characters, etc.). It’s worth noticing that not every bit of information needed by Staticman is available in the XML, so a few choices were made:

I kept the comment _id as its original Disqus ID (instead of generating a UUID, which would change the values at each migration run and require an extra lookup for comment replies). Doing so made the replying_to_uid field odd, but it will correctly point to the _id of the comment being replied to, and Staticman is fine with that.
createdAt is an ISO 8601 date with seconds precision, which is easy to convert to the date Staticman field (which is, by default, a Unix time), but the comment filenames are based on the timestamp in milliseconds. In order to improve uniqueness in the case of same-second comments, I filled the ms using the _id (once again keeping successive migration runs idempotent).
My blog uses Gravatar to display user pictures (if they create one on the site; a generated pattern otherwise) based on a hash of their e-mail. Unfortunately, Disqus doesn’t export users’ emails, so instead of leaving it blank (which would give all users the same pattern), I hash the Disqus username, so the same user will always have the same pattern across the site.

Conclusion

As I said before, it took me a while to figure out all these pieces, but I’m happy with how it turned out: I own the comments (which I can keep hosting if I ever switch away from Jekyll), they are indexed by search engines, and I can moderate them in a familiar environment (pull requests).

There is the burden/cost of hosting the server, but I share it with other apps, so it’s effectively free for me. I did not (yet) set up email notifications for replies on users’ comments or a spam filter, but that can be done with Mailgun and Akismet - and those services have generous free tiers.

The only caveat is that Staticman doesn’t seem to be actively maintained, despite its numerous forks/users. That is a sign of maturity, but also makes me wary of yet-undiscovered vulnerabilities. But with its minimal code (and thus attack surface) and Dependabot on my fork warning me about vulnerabilities found in its dependencies, I think it’s worth the risk. Worst come to worse, it can always be replaced by a custom solution, since the comments are not locked in a proprietary system anymore - something I’ll never give up again.

chester's blog

technology, travel, comics, books, math, web, software and random thoughts

Migrating blog comments from Disqus to Staticman

Hosting Staticman

Creating and showing comments

Migrating comments from Disqus to Staticman

Conclusion