The LJ Import Saga

Final Part - The Machines Take Over

This is a follow up post to Trying to automate the LiveJournal entries

Let’s start at the beginning, the main issue I found getting content in to Plume easily. I’m sure there is a way to just dump it in to the database, but I don’t know enough about that to take that risk. So I looked at a couple of CSV to API services I found online, either free of cheap.

I came across, it seemed to work but had a high failure rate for some reason and the free option only allowed to hoof 3 rows at a time. I wasn’t sure enough if this was a Plume, EasyCSV, or my data issue and I wasn’t about to pay to find out.

I remembered a while ago that I used Parabola to manage some data at work, and well, long story short, that’s how I crashed by Plume server at 2 API requests a minute.

The problem here isn’t so much the API injection, Plume can handle that pretty well. The issue is how many servers this one is federated with and the influx of traffic that the server struggles to respond to requests and is temporarily knocked offline.

I discovered this by doing 10 API calls per minute, it didn’t go well. The happy zone tends to be one request per minute into the API, the server calls out, and receives the traffic before repeating this again 60 seconds later.

It can mostly handle 2 requests per minute but that is a tiny bit 🌶️

In terms of setting up and cleaning the data, Parabola makes that so simple using a workflow and transformation points.

parabola workflow

I’m rolling this to first take the CSV (obvs) and removing any empty posts, which in theory there shouldn’t be, then it removes any LJ posts marked private which are posts that were never public or usemask (friends only), it then goes through some cleaning of the date formats to something that Plume can accept and for posts that don’t have a subject (which is required in Plume) the LiveJournal itemid identifier is used instead (thats why you see posts titled 105786).

From there some additional clean up is done, removing <lj user="(NAME)"> tags and converting them back to plain text. The last step before the API call is a handler to avoid repost attempts if it fails and to limit the post volume if it can’t process the job in under an hour.

I changed the final API call to include additional things like the itemid (for sanity checks) and current_mood (for the lolz) in to the tags of the post on Plume. I also thought it would be interesting to keep current_music in the post, which is why you see Listening to: in the post, some times it is empty and sometimes it has the track that the LJ client noted at the time of the original post.

After 2004 I might remove that as I didn’t always use that tag, so it may not be worth keeping.

I did a few sanity checks so far and I learned a few things about myself and that for the most part it seems that this is running well.

So for the next month or so, I’m going to be doing some short daily runs to import content.