Handschrifthead

Name

herold — Converter to transform HTML to DocBook 4.2 XML

Synopsis

herold [--in=file] [--out=file] [--source-encoding=name] [--destination-encoding=name] [--no-tables] [--log-level=level] [--root-element=element] [--sysid=url]

DESCRIPTION

This program is part of the dbdoclet suite.

herold is a converter that transforms HTML to DocBook XML. It uses the transformation engine of dbdoclet, which generates DocBook from javadoc comments.

The software is written in Java, so you must have a Java 2 environment installed on your system to use it. The converter itself is developed and tested with JDK 1.4.2.

The converter tries to parse real world HTML and transforms it into valid DocBook XML. The main focus up to now is to generate valid DocBook, so you might loose some information from your HTML files.

Because the layout of HTML pages is often based on tables, the result of the transformation may be bad and/or useless concerning the contents of the document. A possible solution of this problem can be the option --no-tables, which suppresses all tables from the HTML source.

OPTIONS

--in, -i <FILE> <FILE> ...

Defines one or more input HTML files. If not specified the names of the input files are read from stdin.

--out, -o <FILE>

Sets the output file name. If not specified, the output is printed to stdout.

--source-encoding, -encoding <NAME>

Specifies the encoding of the source files, such as ISO-8859-1.

--destination-encoding, -docencoding <NAME>

Specifies the encoding of the generated DocBook XML files. Example: -docencoding ISO-8859-1

--log-file, -l <LEVEL>

Sets the logging level. Possible values are error, warn, info or debug.

--not-tables, -T

Suppresses all tables while transforming.

--root-element, -r <ROOT ELEMENT>

The root element for the resulting document.

--sysid, -s <SYSTEM ID>

The system identifier for the resulting document.

EXAMPLES

herold --in=index.html --out=Article.xml

find . -name "*.html" | herold --root-element=article --out=Article.xml

find . -name "*.html" | herold --root-element=book --out=Book.xml

herold --in=file1.html --in=file2.html --out=Article.xml

herold --source-encoding=ISO-8859-1
        --destination-encoding=UTF-8 --in=Article.html --out=Article.xml

herold --in=file.html
        --sysid=file:///usr/share/dbdoclet/docbook/dtd/docbookx.dtd --no-tables
        --out=Article.xml

SEE ALSO

dbdoclet(1), http://www.dbdoclet.org.

AUTHOR

Michael Fuchs - michael.fuchs@unico-group.com

java.net