For a while now I’ve been wanting to write something that would take a Microsoft Word document and transform it into post-able html.

Word has better formatting, editing, navigation, undo/re-do, and spell check than any standard blogging product I’ve found.

It took me all of today (oh boy but my XSLT was rusty) but I finally got something pretty decent working. You can get it here.

Ironically, this post was not written with Word, but most things from here on out will be.

Basic Features

For kicks I am using the O’Reilly Word document template (you can Google for “O’Reilly ORA.dot” to find a link to it) which once you get used to it is pretty fantastic. It has nice simple hot-keys and nice simple markup.

For my transform, for instance, I’m taking all the ORA “code blocks” from the Word document, and turning each new level of indentation into a new “ul” block. This avoids the Wordpress and html space normalization issues, and also means you can control how much indentation depth you want displayed.

How to Use it

Create a word doc! Note: If you are going to use my transform directly you will have to download the O’Reilly template and create your document based on its styles.

In Word:

  • You need to turn off merge tracking (you only ever need to do this once) Menu:Tools/Options/Security/“Store random number to improve merge accuracy”
  • Then, save the document in xml: Menu:File/Save As/Save As Type:“XML Document”
  • If you have a transform already to go you can then select the “Transform…” button.
  • Assuming you have saved your document in .doc form, go ahead and click the “Continue” button on the dialog that pops up.
  • Exit Word

Rename the xml file to an html file, and open that file in a browser to verify its contents.

You can now paste the contents of that file into your Wordpress ( etc ) blog.

Pretty neat!

Future Work

There’s still some stuff worth doing:

  • I’d like to turn the cross references into links.
  • I’d like to handle embedded clip art and other images ( maybe i will save the file once as a html page, use that to generate standardized image files, and then in the transform just like to the corresponding filenames )
  • I’d like to learn how to group blocks of sibling elements together properly so that I can turn linear xml lists of bulleted paragraphs into a groupings of ul/li elements
  • I’d like to break the O’Reilly habit and create my own markup scheme in Word that I can use for the transform instead.

I’ll update the template as I improve it.

Closing Words

Much thanks to Miloslav Nic for the Zvon tutorials

Much thanks to NetBeans for nice XSLT editor. While it does once again prove Java slow, there are indeed a whole lot of nice features to the editor.

This whole effort makes me think about all the neat things that this could have applications for in games. Primary example: turning an rtc script or conversation system document into localizable string tables for in game subtitle display.