Rambling Labs Blog Ramblings on software development

  • Migrating your blog posts to Markdown with Upmark and Nokogiri

    As I said in my last post, for our new site, we changed our blog engine from WordPress to the Postmarkdown gem. At the end of that post, I mentioned that we had to migrate the old posts from WordPress to Markdown.

    To do this, we built a ruby script using the Upmark gem and the Nokogiri gem. Nokogiri is used for HTML and XML parsing, among other things, while Upmark is used to generate Markdown from a given HTML.

    First, we exported our old blog posts from WordPress to an XML file that looks like this:

    <?xml version="1.0" encoding="UTF-8" ?>
    <!-- This is a WordPress eXtended RSS file generated by WordPress as an export of your site. -->
    <!-- ... -->
    <rss version="2.0"
        <title>Rambling Labs</title>
        <pubDate>Fri, 23 Dec 2011 18:49:41 +0000</pubDate>
    <!-- ... -->
    <!-- Several items in the following format -->
          <title>The Name of the post</title>
          <pubDate>Mon, 05 Dec 2011 19:30:17 +0000</pubDate>
          <guid isPermaLink="false">http://www.ramblinglabs.com/?p=8</guid>
          <content:encoded><![CDATA[<!-- A lot of HTML -->]]></content:encoded>
          <!-- ... -->
    <!-- ... -->

    Then, on the script, we read the items with Nokogiri:

    File.open("export.xml") do |file|
      items = Nokogiri::XML(file).xpath("//channel//item")

    After that, we migrate the HTML to Markdown with Upmark:

      # ...
      items.each do |item|
        content = Upmark.convert(item.at_xpath("content:encoded").text)
      # ...

    And finally, write the appropriate files (in app/posts for Postmarkdown) with these lines, inside the loop as well:

        date_str = item.at_xpath("wp:post_date_gmt").text + " +0000"
        name = item.at_xpath("wp:post_name").text.strip
        date = Time.parse(date_str).utc
        filename = date.strftime("%Y-%m-%d-%H%M%S-"+name+".markdown")
        path = 'app/posts/'+filename
        File.open(path, 'w') do |f|
          f.puts content

    Pretty cool, huh?!

  • blog comments powered by Disqus