HTML Parser
Introduction
This parser assumes that the input document is an HTML document. It automatically converts HTML tags that have a native element representation to their native representation – all other tags as well as processing instructions and HTML comments are parsed as is.
Conversion of HTML tags to native kramdown elements
Here is a description of the HTML tags that have native representations and how they are converted:
em strong blockquote hr br a img p thead tbody tfoot tr td th ul ol dl li dl dt dd- These HTML tags are just transformed into their native representation and don’t need any further processing.
b i- The HTML
<b>tag is converted to thestrongelement and the<i>tag to theemelement. h1 h2 h3 h4 h5 h6- These six header tags are all mapped to the single
headerelement with different header levels. code pre- The
codetag is converted to acodespanelement and thepretag to acodeblockelement. All child tags are removed and only the contained text is used. table- The
tabletag is converted to its native element counterpart if its cells contain only span level elements - otherwise it is not converted.
The following general transformations are also applied:
-
Direct text children of the following HTML tags are removed:
html head hgroup ol ul dl table colgroup tbody thead tfoot tr select optgroup -
Text elements containing only whitespace and appearing either directly after the start tag, directly before the end tag or between block-level child elements are removed from the following HTML tags:
body section nav article aside header footer address div li dd blockquote figure figcaption td th fieldset form -
Leading and trailing whitespace in the content of the following HTML tags is removed:
address article aside blockquote body caption dd div dl dt fieldset figcaption form footer header h1 h2 h3 h4 h5 h6 legend li nav p section td th}
Options
The HTML parser doesn’t have any options.