HTML Converter
Introduction
This converter converts a kramdown element tree into an HTML fragment and supports all available element types. Below is a list of additional features of the HTML converter as well as some additional information.
Automatic Generation of Header IDs
kramdown supports the automatic generation of header IDs if the option auto_ids
is set to true
(which is the default). This is done by converting the untransformed, i.e. plain, header text (or,
if the option auto_id_stripping
is set, only the text from real text elements) via the following
steps:
- All characters except letters, numbers, spaces and dashes are removed.
- All characters from the start of the line until the first letter are removed.
- Everything except letters and numbers is converted to dashes.
- Everything is lowercased.
- If nothing is left, the identifier
section
is used. - If a such created identifier already exists, a dash and a sequential number is added (first
-1
, then-2
and so on).
Note that the option auto_id_stripping
will be removed in version 2.0 because this will be the
default behaviour!
Following are some examples of header texts and their respective generated IDs:
Sample header | generated ID | generated ID with auto_id_stripping |
# This is a header |
this-is-a-header | this-is-a-header |
## 12. Another one 1 here |
another-one-1-here | another-one-1-here |
### Do ^& it now |
do–it-now | do–it-now |
# Hallo |
hallo | hallo |
# Hallo |
hallo-1 | hallo-1 |
# 123456789 |
section | section |
# <i>test</i> |
itesti | test |
The automatic creation of header IDs is not part of standard Markdown. The rules on how header text is converted to an identifier are based on the rules specified by Pandoc.
Such generated IDs are only used if no ID has been set manually before.
Standalone Images
By assigning the reference name “standalone” to an image that is the sole content of a paragraph,
the image will be rendered inside a <figure>
element instead of an image inside a paragraph.
Code Blocks
A code block is wrapped in both <pre>
and <code>
tags.
Showing Whitespace in a Code Block
It is sometimes useful to visualize whitespace within a code block. This can be achieved by adding
the class show-whitespaces
to a code block (using a block IAL).
Here is an example where the whitespaces are shown:
⋅⋅leading⋅tab and⋅space
trailing⋅⋅⋅tab⋅and⋅space⋅⋅⋅⋅
When showing whitespace in a code block, all spaces are replaced with the entity ⋅
and
additionally spaces and tabs in the code block are marked up using HTML span
tags and the
following CSS classes:
ws-space-l
,ws-tab-l
: leading spaces/tabsws-space-r
,ws-tab-r
: trailing spaces/tabsws-space
,ws-tab
: spaces/tabs in between
Automatic Syntax Highlighting
kramdown supports setting a code language for code blocks and code spans, see Language of Code Blocks. This language will be used for highlighting code blocks and spans.
The actual syntax highlighting is configurable via the option ‘syntax_highlighter’. The default value is ‘coderay’ which implies that the Coderay syntax highlighter is used.
Another syntax highlighter is Rouge which is Pygments compatible, i.e. it supports all of Pygments CSS themes.
Math Support
kramdown supports the use of various math engines. The default math engine is MathJax (which can also be used with KaTeX).
For proper functionality, the HTML template must be configured to link to the engine’s Javascript and CSS. Note that the CSS includes references to webfonts.
Also available are precompiling versions to eliminate the need for client-side Javascript. Those are Mathjax-Node, KaTeX, and SsKaTeX. Each one requires a Javascript engine installed where kramdown runs, in order to perform the precompilation. The resulting pages still require CSS and fonts, but no Javascript anymore.
Alternative math engines are Ritex and itex2MML both of which output MathML.
Emphasis
kramdown uses the HTML element em
to style light and the element strong
to style strong
emphasized text parts.
Definition Lists
kramdown allows the automatic generation of element IDs for terms of a definition list. The
algorithm is the same as with headers (see above except that the last two points about
adding “section” and unique IDs is not respected. Also, the algorithm works only on the real text as
if auto_id_stripping
was activated.
The automatic generation of IDs is activated by assigning the reference name “auto_ids” to a definition list. This will generate plain IDs without a prefix. By using a reference name of the format “auto_ids-PREFIX”, the prefix is used.
Such generated IDs are only used if no ID has been set manually.
Here are examples:
|
|
|
|
Footnotes
If a document contains footnotes, they are automatically placed at the end of the document.
By assigning the reference name “footnotes” to an ordered or unordered list, the list will be replaced with the footnotes, instead of placing the footnotes at the end of the document.
Automatic “Table of Contents” Generation
kramdown supports the automatic generation of the table of contents of all headers that have an ID
set. Just assign the reference name “toc” to an ordered or unordered list by using an IAL and the
list will be replaced with the actual table of contents, rendered as nested unordered lists if “toc”
was applied to an unordered list or else as nested ordered lists. All attributes applied to the
original list will also be applied to the generated TOC list and it will get an ID of markdown-toc
if no ID was set.
When the auto_ids
option is set, all headers will appear in the table of contents as they all will
have an ID. Assign the class name “no_toc” to a header to exclude it from the table of contents.
Here is an example that generates a “Table of Contents” as an unordered list:
# Contents header
{:.no_toc}
* A markdown unordered list which will be replaced with the ToC, excluding the "Contents header" from above
{:toc}
# H1 header
## H2 header
For a “Table of Contents” as an ordered list:
1. The generated Toc will be an ordered list
{:toc}
# H1 header
## H2 header
Options
The HTML converter supports the following options:
auto_ids
- Use automatic header ID generation
If this option is
true
, ID values for all headers are automatically generated if no ID is explicitly specified.Default: true
Used by: HTML/Latex converter auto_id_prefix
- Prefix used for automatically generated header IDs
This option can be used to set a prefix for the automatically generated header IDs so that there is no conflict when rendering multiple kramdown documents into one output file separately. The prefix should only contain characters that are valid in an ID!
Default: ‘’
Used by: HTML/Latex converter auto_id_stripping
- Strip all formatting from header text for automatic ID generation
If this option is
true
, only the text elements of a header are used for generating the ID later (in contrast to just using the raw header text line).This option will be removed in version 2.0 because this will be the default then.
Default: false
Used by: kramdown parser transliterated_header_ids
- Transliterate the header text before generating the ID
Only ASCII characters are used in headers IDs. This is not good for languages with many non-ASCII characters. By enabling this option the header text is transliterated to ASCII as good as possible so that the resulting header ID is more useful.
The stringex library needs to be installed for this feature to work!
Default: false
Used by: HTML/Latex converter template
- The name of an ERB template file that should be used to wrap the output
or the ERB template itself.
This is used to wrap the output in an environment so that the output can be used as a stand-alone document. For example, an HTML template would provide the needed header and body tags so that the whole output is a valid HTML file. If no template is specified, the output will be just the converted text.
When resolving the template file, the given template name is used first. If such a file is not found, the converter extension (the same as the converter name) is appended. If the file still cannot be found, the templates name is interpreted as a template name that is provided by kramdown (without the converter extension). If the file is still not found, the template name is checked if it starts with ‘string://’ and if it does, this prefix is removed and the rest is used as template content.
kramdown provides a default template named ‘document’ for each converter.
Default: ‘’
Used by: all converters footnote_nr
- The number of the first footnote
This option can be used to specify the number that is used for the first footnote.
Default: 1
Used by: HTML converter entity_output
- Defines how entities are output
The possible values are :as_input (entities are output in the same form as found in the input), :numeric (entities are output in numeric form), :symbolic (entities are output in symbolic form if possible) or :as_char (entities are output as characters if possible, only available on Ruby 1.9).
Default: :as_char
Used by: HTML converter, kramdown converter smart_quotes
- Defines the HTML entity names or code points for smart quote output
The entities identified by entity name or code point that should be used for, in order, a left single quote, a right single quote, a left double and a right double quote are specified by separating them with commas.
Default: lsquo,rsquo,ldquo,rdquo
Used by: HTML/Latex converter toc_levels
- Defines the levels that are used for the table of contents
The individual levels can be specified by separating them with commas (e.g. 1,2,3) or by using the range syntax (e.g. 1..3). Only the specified levels are used for the table of contents.
Default: 1..6
Used by: HTML/Latex converter syntax_highlighter
- Set the syntax highlighter
Specifies the syntax highlighter that should be used for highlighting code blocks and spans. If this option is set to +nil+, no syntax highlighting is done.
Options for the syntax highlighter can be set with the syntax_highlighter_opts configuration option.
Default: rouge
Used by: HTML/Latex converter syntax_highlighter_opts
- Set the syntax highlighter options
Specifies options for the syntax highlighter set via the syntax_highlighter configuration option.
The value needs to be a hash with key-value pairs that are understood by the used syntax highlighter.
Default: {}
Used by: HTML/Latex converter math_engine
- Set the math engine
Specifies the math engine that should be used for converting math blocks/spans. If this option is set to +nil+, no math engine is used and the math blocks/spans are output as is.
Options for the selected math engine can be set with the math_engine_opts configuration option.
Default: mathjax
Used by: HTML converter math_engine_opts
- Set the math engine options
Specifies options for the math engine set via the math_engine configuration option.
The value needs to be a hash with key-value pairs that are understood by the used math engine.
Default: {}
Used by: HTML converter footnote_backlink
- Defines the text that should be used for the footnote backlinks
The footnote backlink is just text, so any special HTML characters will be escaped.
If the footnote backlint text is an empty string, no footnote backlinks will be generated.
Default: ‘&8617;’
Used by: HTML converter footnote_backlink_inline
- Specifies whether the footnote backlink should always be inline
With the default of false the footnote backlink is placed at the end of the last paragraph if there is one, or an extra paragraph with only the footnote backlink is created.
Setting this option to true tries to place the footnote backlink in the last, possibly nested paragraph or header. If this fails (e.g. in the case of a table), an extra paragraph with only the footnote backlink is created.
Default: false
Used by: HTML converter typographic_symbols
- Defines a mapping from typographical symbol to output characters
Typographical symbols are normally output using their equivalent Unicode codepoint. However, sometimes one wants to change the output, mostly to fallback to a sequence of ASCII characters.
This option allows this by specifying a mapping from typographical symbol to its output string. For example, the mapping {hellip: …} would output the standard ASCII representation of an ellipsis.
The available typographical symbol names are:
- hellip: ellipsis
- mdash: em-dash
- ndash: en-dash
- laquo: left guillemet
- raquo: right guillemet
- laquo_space: left guillemet followed by a space
- raquo_space: right guillemet preceeded by a space
Default: {}
Used by: HTML/Latex converter remove_line_breaks_for_cjk
- Specifies whether line breaks should be removed between CJK characters
Default: false
Used by: HTML converter