I recently had to convert a few Microsoft Word documents to DITA XML and thought it would be worth sharing my notes on the steps I took. To summarize, I opened each Word document with OpenOffice 3.1, saved it as a DocBook XML document, and then converted that to DITA with the XSLT stylesheet from a DITA plugin that I found. Images were a little more trouble, but at least I was able to eventually automate that part as well, dispelling my worries that I’d have to add all the image references…
The New York Times article A Cookbook of One’s Own From the Internet (registration required) describes how Kamran Mohsenin, the founder of a photography web site, took an interesting step beyond personalized calendars: personalized cookbooks using recipes from epicurious.com, a web site has 25,000 recipes from Gourmet and Bon Appetit magazines. (I grew up with both of these magazines around the house, because my parents were big fans.) This reminds me of a quote I just read near the end of…
There many reasons to like the Darwin Information Typing Architecture, but much of the praise for it lately seems a bit misguided. For a lot of XML products and services companies, DITA is the new bottle in which to put their old wine. They talk about how DITA is great because it lets you:
I recently asked if anyone knew of applications that pull meta[@name and @content]
metadata out of HTML head
elements, and I got a few interesting answers. To extract such data, writing a short XSLT stylesheet that reads the output of John Cowan’s TagSoup would be easy, but lately I’ve been thinking: with a slight change to those meta
elements, they’d be RDFa, which can store more versatile metadata that is easier to get out (see Getting Those Triples).