HTML-XML-utils consists of a set of small C programs (filters) that read HTML and XML files and can add a table of contents, an alphabetical index, a bibliography, cross-references, numbered headings, remove elements, count elements, pretty-print them, etc. When it reads HTML, it assumes the code is correct HTML 4.0 or close to it.
Below are the sets of utilities included:
asc2xml - convert from UTF-8 to &#nnn; entities
xml2asc - convert from &#nnn; entities to UTF-8
hxaddid - add IDs to selected elements
hxcite - replace bibliographic references by hyperlinks
hxcite-mkbib - expand references and create bibliography
hxclean - apply heuristics to correct an HTML file
Continue Reading... Below are the sets of utilities included:
asc2xml - convert from UTF-8 to &#nnn; entities
xml2asc - convert from &#nnn; entities to UTF-8
hxaddid - add IDs to selected elements
hxcite - replace bibliographic references by hyperlinks
hxcite-mkbib - expand references and create bibliography
hxclean - apply heuristics to correct an HTML file
source:http://linuxpoison.blogspot.com/2012/01/135781677518703.html