HTML Parser
Introduction
This parser assumes that the input document is an HTML document. It automatically converts HTML tags that have a native element representation to their native representation – all other tags as well as processing instructions and HTML comments are parsed as is.
Conversion of HTML tags to native kramdown elements
Here is a description of the HTML tags that have native representations and how they are converted:
em strong blockquote hr br a img p thead tbody tfoot tr td th ul ol dl li dl dt dd
- These HTML tags are just transformed into their native representation and don’t need any further processing.
b i
- The HTML
<b>
tag is converted to thestrong
element and the<i>
tag to theem
element. h1 h2 h3 h4 h5 h6
- These six header tags are all mapped to the single
header
element with different header levels. code pre
- The
code
tag is converted to acodespan
element and thepre
tag to acodeblock
element. All child tags are removed and only the contained text is used. table
- The
table
tag is converted to its native element counterpart if its cells contain only span level elements - otherwise it is not converted.
The following general transformations are also applied:
-
Direct text children of the following HTML tags are removed:
html head hgroup ol ul dl table colgroup tbody thead tfoot tr select optgroup
-
Text elements containing only whitespace and appearing either directly after the start tag, directly before the end tag or between block-level child elements are removed from the following HTML tags:
body section nav article aside header footer address div li dd blockquote figure figcaption td th fieldset form
-
Leading and trailing whitespace in the content of the following HTML tags is removed:
address article aside blockquote body caption dd div dl dt fieldset figcaption form footer header h1 h2 h3 h4 h5 h6 legend li nav p section td th}
Options
The HTML parser doesn’t have any options.