Validating XML documents with PUBLIC identifiers and catalogs
And indenting them, and changing their encoding...
To check the validity of XML files, I’ve used the stdinparse utility that comes with Xerces C for years, but no more. While creating some DITA files, I wanted to validate them using the document’s PUBLIC identifier and not its SYSTEM identifier. (I didn’t use PUBLIC identifiers much in the days between SGML and DITA. They’re useful for DITA work because the DITA Open Toolkit automates the assembly of multiple pieces, and sharing pieces in multiple places is easier with PUBLIC declarations, especially if you’re assembling a system that will run on a machine other than your own.)
I did some searches, and it turned out that I’d put the perfect utility on my Windows machine’s hard disk years ago. It also looks like it’s included in some Linux distributions as well, or only an apt-get away: xmllint, which is part of libxml2. It’s written in C, so it’s fast, and binaries are easy to find for Windows and Linux.
Once you set the SGML_CATALOG_FILES environment variable to point to your catalog, the -catalogs
switch tells it to use the catalog. For example:
set SGML_CATALOG_FILES=c:/usr/local/DITA-OT1.4.1/catalog-dita.xml
xmllint -noout -valid -catalogs myditafile.xml
The -noout
switch tells xmllint to not output the document itself, -valid
tells it to validate the document, and -catalogs
tells it to use the catalog defined in SGML_CATALOG_FILES.
xmllint
has a lot of other nice switches. If you omit the -nout
switch, there are some handy transformations you can easily perform on the document. You can indent it with -format
, and -encode
lets you specify a new encoding for the output, as Dave Holden pointed out when I described some simple XSLT stylesheets I once used to convert the encoding of XML documents. The -noblanks
switch drops ignorable white space, -relaxng
validates the document against a RELAX NG schema, -schema
validates it against a W3C schema, and there are dozens of more switches.
I can’t believe this was sitting on my hard disk for so long without my noticing how useful it can be.
1 Comments
By Caustic Dave on January 27, 2008 11:18 AM
Oh yeah. xmllint is one of my favorite utilities. It has saved me from doom many times.
I wish there was something like it for Javascript.
Share this post