Trying to solve the *ONE* XML PITA

My favourite gripe with XML is that you can't reasonably print it. Laugh at me if you want: the SGML toolchain (Emacs, PSGML, openjade) is still my favourite way of publishing DocBook documents for a simple reason: it delivers word processor output. Not that I like word processors, but most publishers in the biomedical field expect submitted manuscripts in M$ Word format - offering them a FOP-formatted PDF document won't do me any good. openjade generates RTF (among other formats) for direct consumption, or you use OpenOffice to convert it to M$ Word .doc files. There used to be tools that generate RTF from XML, but either they were non-free, or they didn't work. Today I gave a different approach a try: convert DocBook to OpenDocument Format. To make things a little harder, I attempted to do this on my office computer, which much to my regret runs Windows XP. As mentioned previously, it runs the Cygwin tools to maintain a certain level of sanity.

docbook2odf is a free (GPL) tool available from, a Slovak company which initiated and maintains the tool. I've followed the (terse) instructions on their homepage to get the tool up and running.

First step is the installation of the dependencies. The tool is implemented as a Perl script and uses Sablotron to process the XML documents. That is, you need both Sablotron and the Perl interface to it, none of which ship with Cygwin.

Sablotron is available at Ginger Alliance (Sablotron 1.0.3-sources) and builds on Cygwin out of the box. The XML::Sablotron 1.01 module is available at the same site or via CPAN. Building this module requires the following modification to Makefile.PL:

< LIBS => $libs,
> LIBS => $libs . " -liconv -lreadline -L/usr/lib/gcc/i686-pc-cygwin/3.4.4 -lgcc -lstdc++",

You'll also need the following Perl modules, both of which are available at CPAN: File::Which (File-Which-0.05.tar.gz) and Archive::Zip (Archive-Zip-1.18). Both build and install without a hitch.

A major hurdle turned out to be PerlMagick. This Perl interface to the well-known image conversion tool requires ImageMagick 6.32, available right here. It is recommended to run the Cygwin setup tool again and install any graphics libraries that you can get hold of. You should at least have JPEG and TIFF support. I had to realize that the PerlMagick interface shipped with the ImageMagick sources would not build out of the box. To save me hassles, I configured ImageMagick like this: ./configure --without-perl. Have a cup of coffee until the build and the installation are done.

I've downloaded PerlMagick-6.32 separately (before I detected that the ImageMagick sources include the module anyway), so I used that copy to attempt to fix the build failure. I've changed Makefile.PL to read:

# Linker flags for building an executable
'LDFLAGS' => "-L/home/Administrator/ImageMagick-6.3.2/magick/.libs -lMagick $Config{'ldflags'}",

# Linker flags for building a dynamically loadable module
'LDDLFLAGS' => "-L/home/Administrator/ImageMagick-6.3.2/magick/.libs -lMagick $Config{'lddlflags'}",

# Install PerlMagick binary into ImageMagick bin directory
'INSTALLBIN' => '/usr/local/bin',

# Library specification
'LIBS' => [ '-L/home/Administrator/ImageMagick-6.3.2/magick/.libs -L/usr/lib64 -lfreetype -llcms -ltiff -lfreetype -ljpeg -lgs -lpng -lXext -lXt -lSM -lICE -lX11 -lbz2 -lrsvg-2 -lgdk_pixbuf-2.0 -lpng12 -lm -lgobject-2.0 -lgmodule-2.0 -ldl -lglib-2.0 -lxml2 -lgvc -lz -lpthread -lm -lpthread -lMagick' ],

You may have to fiddle with the library paths to make this work on your system.

Finally, after half an afternoon of downloading and compiling, I ran a test on one of my paper drafts which I quickly converted from SGML to XML. The structure of the document is well preserved by applying the appropriate styles, and the default layout looks pretty nice (except the nag ad running on the right-hand side of each page). It'll need some configuration, but it is a good start.


Noch keine Kommentare

Mein Kommentar

Dieser Artikel ist geschlossen. Keine Kommentare mehr möglich.