Rambling Labs Blog Ramblings on software development

  • Migrating your blog posts to Markdown with Upmark and Nokogiri

    As I said in my last post, for our new site, we changed our blog engine from WordPress to the Postmarkdown gem. At the end of that post, I mentioned that we had to migrate the old posts from WordPress to Markdown.

    To do this, we built a ruby script using the Upmark gem and the Nokogiri gem. Nokogiri is used for HTML and XML parsing, among other things, while Upmark is used to generate Markdown from a given HTML.

    First, we exported our old blog posts from WordPress to an XML file that looks like this:

    <?xml version="1.0" encoding="UTF-8" ?>
    <!-- This is a WordPress eXtended RSS file generated by WordPress as an export of your site. -->
    <!-- ... -->
    <rss version="2.0"
         xmlns:excerpt="http://wordpress.org/export/1.1/excerpt/"
         xmlns:content="http://purl.org/rss/1.0/modules/content/"
         xmlns:wfw="http://wellformedweb.org/CommentAPI/"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:wp="http://wordpress.org/export/1.1/"
            >
      <channel>
        <title>Rambling Labs</title>
        <link>http://www.ramblinglabs.com</link>
        <description></description>
        <pubDate>Fri, 23 Dec 2011 18:49:41 +0000</pubDate>
        <language>en</language>
        <wp:wxr_version>1.1</wp:wxr_version>
        <wp:base_site_url>http://www.ramblinglabs.com</wp:base_site_url>
        <wp:base_blog_url>http://www.ramblinglabs.com</wp:base_blog_url>
    <!-- ... -->
    <!-- Several items in the following format -->
        <item>
          <title>The Name of the post</title>
          <link>http://www.ramblinglabs.com/2012/12/the-name-of-the-post/</link>
          <pubDate>Mon, 05 Dec 2011 19:30:17 +0000</pubDate>
          <dc:creator>the_creator</dc:creator>
          <guid isPermaLink="false">http://www.ramblinglabs.com/?p=8</guid>
          <description></description>
          <content:encoded><![CDATA[<!-- A lot of HTML -->]]></content:encoded>
          <!-- ... -->
        </item>
    <!-- ... -->
        </channel>
    </rss>
    

    Then, on the script, we read the items with Nokogiri:

    File.open("export.xml") do |file|
      items = Nokogiri::XML(file).xpath("//channel//item")
    end
    

    After that, we migrate the HTML to Markdown with Upmark:

      # ...
      items.each do |item|
        content = Upmark.convert(item.at_xpath("content:encoded").text)
      end
      # ...
    

    And finally, write the appropriate files (in app/posts for Postmarkdown) with these lines, inside the loop as well:

        date_str = item.at_xpath("wp:post_date_gmt").text + " +0000"
        name = item.at_xpath("wp:post_name").text.strip
    
        date = Time.parse(date_str).utc
        filename = date.strftime("%Y-%m-%d-%H%M%S-"+name+".markdown")
        path = 'app/posts/'+filename
    
        File.open(path, 'w') do |f|
          f.puts content
        end
    

    Pretty cool, huh?!

  • blog comments powered by Disqus