Migrate From Tumblr
I recently set up this new blog for myself. In the past I’ve gone through several blogging platforms. LiveJournal was probably my first, followed by Blogger, an experimental jaunt on wordpress for a weekend, then to Tumblr, where I last wrote in 2014. But I really liked some of the content I had on tumblr and they had a way to export all of my posts, so I exported it to json using their API guide.
This site is Hugo, which uses Markdown files as posts and the build
process generates static HTML files that I just rsync
up to my server running nginx.
Migrating the posts themselves
To convert that JSON export I got from Tumblr into Hugo posts, I wrote a quick bash script.
This script takes a file outfile.json
that’s in the current directory and generates a markdown
file for each post. It requires bash 4.x (so if you’re on macOS, brew install bash
and make
sure your PATH
uses that one) and jq
.
#! /usr/bin/env bash
set -euo pipefail
# output alias URLs as json
get_urls() {
local id="$1"
query_with_id "$id" '[.url, .["url-with-slug"]]'
}
# given a post ID, run a jq query against it to read field(s)
query_with_id() {
local id="$1"; shift
jq --arg id "$id" '.posts[] | select(.id == $id)' "$file" \
| jq "$@"
}
file="oldblog.json"
# get all of the post IDs from the json file
ids="$( jq --raw-output '.posts[].id' "$file")"
mapfile -t ids <<< "$ids"
echo "got ${#ids[@]} ids"
# iterate over each post and generate the file
for id in "${ids[@]}"; do
aliases="$( get_urls "$id" | sed -E 's|https://blog.spike.cx||' )"
title="$( query_with_id "$id" '.["regular-title"]' )"
body="$( query_with_id "$id" --raw-output '.["regular-body"]' )"
slug="$( query_with_id "$id" --raw-output '.slug' )"
date="$( query_with_id "$id" --raw-output '.["date-gmt"]' )"
# convert that date into ISO-8601 format
date="$( date -d "$date" --iso-8601=seconds )"
outfile="./content/posts/${slug}.md"
cat <<END > "$outfile"
+++
title = $title
draft = true
date = "$date"
aliases = $aliases
+++
> Note: this post was migrated from my old Tumblr-backed blog
$body
END
done
I did wind up having to massage some of the files. Although I used Markdown for most posts, one was HTML and none of them specified the language for any blocks of code.
Redirecting from the old post URLs
Initially, I had planned on Hugo dealing with redirecting from the old post URL to the new one as Tumblr uses a slightly different URL structure. Because I was using the same domain, this seemed like it should work pretty well, but Hugo does the redirects client-side and I wanted search engines to pick up the change, so I created nginx config.
Tumblr uses a structure like /post/<post-id>/<slug>
(with the <slug>
being optoinal)
where this site uses /posts/<slug>
and I decided to only handle any URL starting with /post/<post-id>
and do the redirect.
To do this, I used the following one-liner:
jq --raw-output '.posts[] | "rewrite ^\( .url | gsub("https://blog.spike.cx"; "")) /posts/\(.slug) permanent;"' oldblog.json
I use the url
field in the post, remove the scheme and domain from it and output an
nginx rewrite
directive.
So now when a client requests an old post’s URL, they get a 301 permanent redirect to the updated post location. The one-liner results in a series of lines that I pasted into the nginx config:
rewrite ^/post/74130342713 /posts/experiments-with-elixir-and-dynamo-with-otp permanent;
rewrite ^/post/60548255435 /posts/testing-bash-scripts-with-bats permanent;
rewrite ^/post/59389300605 /posts/new-things-learned-from-minecraft-docker-project permanent;
This shows it in action:
$ curl -XGET -I 'https://blog.spike.cx/post/60548255435/'
HTTP/1.1 301 Moved Permanently
Server: nginx/1.10.3 (Ubuntu)
Date: Mon, 26 Feb 2024 05:11:28 GMT
Content-Type: text/html
Content-Length: 194
Location: https://blog.spike.cx/posts/testing-bash-scripts-with-bats/
Connection: keep-alive