In this age of controversial social media platforms, having a blog is one of the few remaining opportunities to keep ownership over your content. There are several good solutions around to publish and host one, but Jekyll and GitHub Pages are a great (and free) combination for people like myself who are happy hacking a little bit - except for not providing a comment system out of the box.
For years, I filled that gap with Disqus - a service that hosts your comments in exchange for a bit of advertising space. It was great at first, but over time ads became heavier, users were pushed towards creating accounts and abusively tracked. Moreover, hosting comments externally affects search engine indexing, and over time this all caused people to comment less and less, so I decided to bring the comments back to my blog.
A comment system isn’t a very complicated app, but it would be another database that I’d have to care for, and a departure from Jekyll’s static generation model that served me so well. The ideal solution would be to store comments in the same place I store posts: a trusty GitHub repository. Jekyll can read data files to show the comments, and all I needed was to host an app somewhere that would create those files when a new comment is written.
I almost coded that app myself, but Eduardo Bouças wrote and kindly shared Staticman, which does precisely that. Sure, I still had to host/configure it, adapt the blog to send it the comments (and read them from the repository files), and migrate the old comments from Disqus. These things combined took me a couple days, so I thought I’d share the process here.
Hosting Staticman
It’s a good idea to first familiarize oneself with how Staticman works, but the gist is that your blog’s “new comment” form sends the POST to Staticman (instead of sending to the blog itself); Staticman has a GitHub API key that allows it to add the data file containing the post data to your website. That will trigger a rebuild (in the same way that a new blog post would), and Jekyll will show the new comment.
If you want to moderate the comments (like I do), it can create a pull request instead of merging the data directly. You review the pull request and merge it to approve, or discard to reject - a very familiar environment for most programmers these days. It supports other git providers such as GitLab, but I’ll focus on GitHub.
You will need to host it somewhere. It’s a lightweight, database-less Node.js app, so there are lots of options and not a lot of configuration involved. My choice is a DigitalOcean droplet (you can check my recent blog post on cost-effective hosting for details).
The official instructions are clear once you figure the moving parts. Your server will contain two RSA keys: a GitHub API key so the server can act on your behalf), and a private key that (I suppose) is used to store local secrets.
A few gotchas I ran into:
-
There are two configuration files: the API configuration (
config.production.json
) and the site configuration (staticman.xml
). The first contains secrets such as API keys and should only reside on your Staticman server; the other goes on your blog’s repository, telling Staticman what to do when it receives a comment, and can be public (here is mine). -
The docs currently state that the GitHub Application ID in
config.production.json
isgithubAppId
; actually, it’sgitHubAppID
. -
Both RSA keys were triggering a
node-rsa
error. In order to fix it, I changed the code (here and here). -
Thanks to GitHub’s support for Let’s Encrypt, my blog runs over https (TLS), which means it cannot post data to a regular http server. My go-to solution for those cases is to run the application behind nginx, configuring it to terminate the secure connection and use certificates that Let’s Encrypt provides for free.
If you use Ansible (or are comfortable reading Ansible files), here is the playbook that installs/configures the Staticman and nginx, with Supervisor to keep it running and Certbot to keep the certificates up to date.
Creating and showing comments
At this point you should have a working Staticman server, so the next step is to add a form to your blog that sends the comment to it. The form should have the same fields that Staticman expects, and you can use JavaScript to send the data to the server and show the comment immediately after it’s created.
I based mine on a few examples I saw online, most notably this one. It uses jQuery to send the data to the server and show the comment - not my choice in 2024, but I already have legacy JQuery code on the blog anyway, so I rolled with it.
You will know it is working when a post results in a pull request on your blog’s repository like this one. Merging it will add the comment to your blog’s _data
directory, and the next step is to show it in the post’s page.
Again I borrowed a lot from Avglinux’s example, fixing a couple issues with the threaded replies and adjusting to my blog’s style. I also replaced the Liquid strip_html
filter with a custom one that sanitizes it instead, so I can allow some HTML tags alongside the Markdown, while still keep the blog safe from JavaScript injection, cross-site scripting attacks and the like.
This PR contains all the code mentioned above; feel free to peruse and copy any of them; possibly checking the latest versions as this post gets older.
Migrating comments from Disqus to Staticman
With this in place, all that was left to do was to migrate the comments from Disqus.
Disqus allows you to export the comments to an XML file (documented here), but in order to import them anywhere else, a conversion is needed. I found a few recipes (1, 2, 3, 4) online, but none of those worked for me, so I threw together some JavaScript code that does the job:
You can just run it, making the needed adjustments for your staticman.yml
configuration (e.g., if you changed the filename structure or added other fields that you want to import or generate) and put the generated comments
directory under your _data
directory in your blog’s repository, like I did here.
The code documents some of the shenanigans I found (odd terminology, invalid characters, etc.). It’s worth noticing that not every bit of information needed by Staticman is available in the XML, so a few choices were made:
-
I kept the comment
_id
as its original Disqus ID (instead of generating a UUID, which would change the values at each migration run and require an extra lookup for comment replies). Doing so made thereplying_to_uid
field odd, but it will correctly point to the_id
of the comment being replied to, and Staticman is fine with that. -
createdAt
is an ISO 8601 date with seconds precision, which is easy to convert to thedate
Staticman field (which is, by default, a Unix time), but the comment filenames are based on the timestamp in milliseconds. In order to improve uniqueness in the case of same-second comments, I filled the ms using the_id
(once again keeping successive migration runs idempotent). -
My blog uses Gravatar to display user pictures (if they create one on the site; a generated pattern otherwise) based on a hash of their e-mail. Unfortunately, Disqus doesn’t export users’ emails, so instead of leaving it blank (which would give all users the same pattern), I hash the Disqus username, so the same user will always have the same pattern across the site.
Conclusion
As I said before, it took me a while to figure out all these pieces, but I’m happy with how it turned out: I own the comments (which I can keep hosting if I ever switch away from Jekyll), they are indexed by search engines, and I can moderate them in a familiar environment (pull requests).
There is the burden/cost of hosting the server, but I share it with other apps, so it’s effectively free for me. I did not (yet) set up email notifications for replies on users’ comments or a spam filter, but that can be done with Mailgun and Akismet - and those services have generous free tiers.
The only caveat is that Staticman doesn’t seem to be actively maintained, despite its numerous forks/users. That is a sign of maturity, but also makes me wary of yet-undiscovered vulnerabilities. But with its minimal code (and thus attack surface) and Dependabot on my fork warning me about vulnerabilities found in its dependencies, I think it’s worth the risk. Worst come to worse, it can always be replaced by a custom solution, since the comments are not locked in a proprietary system anymore - something I’ll never give up again.