Version 0.8.3 of LaTeXML, the popular LaTeX to XML converter, was just released.

For those who don’t know it, LaTeXML (also written LaTexml) is a cross-platform program that aims to achieve a faithful translation of a LaTeX document into an XML file. The XML file can be later used by LaTeXML itself to generate other formats like HTML(5) webpages, ePub ebooks, and even DOC/ODT documents1.

LaTeXML workflow. (CSS3 logo by daPhyre, XML logo by me; all the others belong to their respective owners)

The work on LaTeXML started at NIST in the context of the DLMF, as a tool to publish LaTeX documents with lots of math to the web. Since then it has evolved a lot, and it’s now able to convert without errors the majority of the documents in the arXiv2.

The new version, 0.8.3, closes 81 bugs and brings many enhancements and performance improvements, along with some new experimental features. It also adds 50+ new bindings, which means that it supports many more LaTeX packages and your documents will get a higher chance to get properly converted.

Installation

Detailed installation instructions can be found on the official website.

Debian & Co. (e.g. Ubuntu, etc…)

Stable version in official repos

LaTeXML can be found in the official repositories of Debian and Debian-based systems like Ubuntu. You can promptly install it via:

$ sudo apt install latexml

This is the preferred method; however, the package in the repositories might be outdated.

Stable version in unofficial PPA (only Ubuntu)

Ubuntu users can also get the latest stable version of LaTeXML via an external PPA.

$ sudo add-apt-repository ppa:matteosecli/latexml
$ sudo apt update
$ sudo apt install latexml

Bleeding-edge version in unofficial PPA (only Ubuntu)

Ubuntu users can easily get the bleeding-edge version of LaTeXML via an external PPA that builds daily packages based on the latest code on GitHub.

$ sudo add-apt-repository ppa:matteosecli/latexml-daily
$ sudo apt update
$ sudo apt install latexml

Be careful! The bleeding-edge version might be highly unstable and have experimental features that might not survive in the next stable version.

Be careful! Do not use the stable PPA ppa:matteosecli/latexml and the beeding-edge PPA ppa:matteosecli/latexml-daily at the same time, as you might end up with unexpected results.

MacOS

The easiest way to get LaTeXML on MacOS is via Homebrew (if you don’t know what Homebrew is or you don’t have it on your system, visit the official website). However, the official package in Homebrew lacks many of the necessary Perl modules LaTeXML needs, and a bare installation via Homebrew would give you a non-functional LaTeXML.

So, before installing LaTeXML, it is necessary to install all the missing dependencies; the easiest way to do that is via CPANM:

$ brew install cpanm
$ cpanm Archive::Zip DB_File File::Which Getopt::Long Image::Size IO::String JSON::XS LWP MIME::Base64 Parse::RecDescent Pod::Parser Text::Unidecode Test::More URI XML::LibXML XML::LibXSLT UUID::Tiny

Then, you can install LaTeXML via:

$ brew install imagemagick --with-perl
$ brew install latexml

Tip: you can install the bleeding-edge version of LaTeXML by replacing the last command with brew install latexml --HEAD.

Warning: if you already have ImageMagick installed via Homebrew, it will complain about it; in this case, first remove ImageMagick via brew uninstall --ignore-dependencies imagemagick. The problem is that, by default, the ImageMagick version installed via Homebrew misses the PerlMagick module required by LaTeXML; so, you have to build it from scratch with the option --with-perl.

From source

You can easily install LaTeXML from source both on Linux and MacOS. First of all, install the dependencies via CPANM:

$ cpanm Archive::Zip DB_File File::Which Getopt::Long Image::Size IO::String JSON::XS LWP MIME::Base64 Parse::RecDescent Pod::Parser Text::Unidecode Test::More URI XML::LibXML XML::LibXSLT UUID::Tiny

Then, download LaTeXML:

$ wget https://dlmf.nist.gov/LaTeXML/releases/LaTeXML-0.8.3.tar.gz
$ tar zxvf LaTeXML-0.8.3.tar.gz
$ cd LaTeXML-0.8.3

Tip: if you want to install the bleeding-edge version instead of the stable one, replace the previous commands with the following: git clone https://github.com/brucemiller/LaTeXML.git && cd LaTeXML.

Build it as follows:

$ perl Makefile.PL
$ make
$ make test

And finally, install it:

$ sudo make install

Examples

Detailed usage examples and available options can be found in the official manual.

I show below some popular examples.

LaTeX to HTML5

Covert a document.tex LaTeX document into a webpage:

$ latexmlc document.tex --format=html5 --destination=document.html

or, if you want to keep the intermediate XML for further processing,

$ latexml document.tex
$ latexmlpost --format=html5 --destination=document.html document.xml

The resulting document will embed math formulas by using MathML, which unfortunately is not supported by all the major browsers3. For browsers that don’t have native MathML support, you can conditionally activate MathJax by converting via:

latexmlc document.tex --format=html5 --javascript=LaTeXML-maybeMathJax.js --dest=document.html

Tip: if instead you always want to use MathJax, even with browsers that support MathML, you have to use the option --javascript="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=MML_HTMLorMML".

The output of LaTeXML doesn’t look exactly appealing out-of-the-box; however, it is highly customizable and you can include custom CSS and Javascript to make the output “beautiful” — as, for example, TeXify does.

LaTeX to ePub

Covert a document.tex LaTeX document into an ePub3 ebook:

$ latexmlc document.tex --destination=document.epub

Unfortunately, even though the ePub3 standard supports MathML, the current situation with ePub3 readers is even more desperate than browsers’ support4, as only a tiny number of them have MathML support.

You can sort of bypass the problem by adding a slimmed-down MathJax in your ePub3, but it’s not a definitive solution as not all ePub readers have Javascript support and MathJax rendering is, anyway, quite resource-intensive.

Detailed Changelog

Click to show a detailed changelog

[0.8.3] - 2018-08-08

Implemented enhancements

  • spieman.cls spoils document title #997
  • Footnotes should be counted chapterwise for documentclass book. #993
  • [rawtex] mmaauth.cls #985
  • ucs.sty raw support #982
  • [binding-request] subcaption.sty #943
  • Improve fragile macro support (textcase.sty) #932
  • Some issues with the environment thebibliography #930
  • Support and document installation of LaTeXML via homebrew on Mac OS X #929
  • Document examples of math rendering use cases #923
  • Update mathjax link in documentation #922
  • Add link to references to table of content in HTML output #914
  • Fails to parse / binding required for ltxcmds.sty #905
  • Contributing instructions #904
  • Boolean in LaTeX based on LaTeXML use? #900
  • Warnings about vertical bars in $q_{A|A}$ #899
  • There are some samples or corpus for test-kit? #835
  • Missing autoref labels #822
  • LaTeXML should set explicit spacing for operators outside the MathML dictionary #789
  • listings in figures? #779
  • Request for LaTeXML::Util::Test Documentation #722
  • Safer ImageMagick conversions #663
  • Secure IO in LaTeXML #606
  • Binding request: siunitx & friends, enumitem, type1cm, listingsutf8, braket #580
  • Package Minitoc not supported #541
  • need a binding for endnotes package #520
  • LaTeXML speedups and profiling #480
  • binding for etoolbox #478
  • MathParser enhancement – parametric math grammars #438
  • integrate support for minted.sty #291
  • Implement algorithm2e bindings #233
  • LaTeXML support for CVs #200
  • Expose math lexemes in final output #947 (dginev)
  • moderncv.sty binding #924 (dginev)

Fixed bugs

  • bold and italic markup is sometimes lost in index terms #1023
  • problem with prebuilt debian (ubuntu) installation #1022
  • False error on underscore in included file names #1019
  • {quote} environment breaks list environments ({itemize}, {enumerate} etc.) #1018
  • Blank lines are not preserved in {alltt} environment #1017
  • “\AA” in math mode breaks math-to-image-conversion with “latexmlpost –mathimages” #1012
  • Incorrect references to tables, figures and equations #1009
  • CSName parameters should (sometimes?) expand protected macros #1006
  • Incorrect html5 figure breaks css #1005
  • Incorrect newline command in parbox and minipage #1004
  • Disappearing parent with context toc #1003
  • pgfparse of deg function in a TikZ picture #1002
  • author name splitting fails in some cases when converting a bib-file with latexml #999
  • captions on figures in margins #998
  • \noindent and list environments: class=”ltx_noindent” in wrong <para> in some cases #994
  • hyperref.sty.ltxml throws error for \url containing underscores in \footnote #992
  • pifont.sty.ltxml maps \ding{…} to wrong symbols #988
  • Section embedded within other section in sample2e.tex #984
  • Padding in aligned environment not correct after fourth column #968
  • Recent citation regressions #959
  • Infix \vert produces wrong math output #958
  • babel.def throws an error #957
  • svmult.sty: version update + dual heading numbering #955
  • No entries for Bibliography and appendix sections in TOC #954
  • Infinite loop while Building #951
  • Errors thrown by latexml when URLs defined in baseurl contain unescaped underscores #950
  • [bug] Incorrect <title> in HTML output #946
  • [bug] Inconsistent chapter numbering in ‘book’ #941
  • Some issues with post-processing of included graphics #939
  • http://dlmf.nist.gov/LaTeXML/ is down #935
  • ImageMagick 7 incompatibilities (was Rotation of an included graphics leads to error in latexmlpost #934
  • Math fonts needs surrounding {-} #931
  • LaTeXML generates invalid content MML #928
  • Math parsing problem of \boldsymbol{0} ? #926
  • The token T_CS[\@ifpackagelater] is not defined #925
  • Wrong encoding in HTML output of references #919
  • latexmlpost: parser error : Input is not proper UTF-8, indicate encoding ! #918
  • Bib-file parsing issues #917
  • Add a linebreak after “Appendix A” in HTML output #913
  • –graphicsmap is not respected if destination file of different format already exists #901
  • Small formatting issue with captions in float package #898
  • –graphicsmap=pdf.svg doesn’t seem to work #894
  • Test Failure: t/81_babel.t #891
  • booktabs.sty.ltxml produces extra lines #888
  • Alignment.pm: regrouping rows #882
  • Fatal error #878
  • LaTeXML package broken on Ubuntu #877
  • Enhancement for LaTeX.pool.ltxml #876
  • Nested emph #874
  • Jats Conversion breaks References #870
  • Output format JATS adds linebreak in Author fields #864
  • v0.8.2 not in Debian (unstable) apt sources #846
  • href: LaTeXML failing on URLs containing underscores #831
  • Latexmlc (LaTeXML 0.8.2) epub generation fails on Windows 10 #827
  • Two more problems with algpseudocode #826
  • bug in endnote conversion in case of existance of $$ in endnote #819
  • Strange cause of fatal error misdefined \pgfkeyscurrentkeyRAW #818
  • \textrm{} inside mathescape inside listings causes failures #816
  • captionpos=b in listings results in malformed:ltx:toccaption error #815
  • Unexpected pgfkeys and parsing a list as a floating point number in pgfplot’s axis environment #813
  • Fatal:expected:id Cannot find a node with xml:id #812
  • –mathimagemagnification=1.25 invalid? #810
  • Cannot use non-html5 equations #809
  • Latexmlc epub generation fails on Windows #806
  • listings package and misparsed strings #795
  • latexmlc does not open the documentation #792
  • Tikz “\clip” does not seem to be taken into account #791
  • Support href inside math expressions #788
  • \graphicspath again #787
  • [braket] Some commands not working in \Braket{} with SVG output #786
  • [amsmath] Issues with \DeclareMathOperator #785
  • LaTeXML hanging with TIKZ example #784
  • latexmlc segmentation fault. #764
  • \halign in TikZ and image generation #760
  • latexml omits parentheses from tag #533
  • Ensure raw binmode when writing archive files in latexmlc #960 (dginev)
  • Robust preloads for daemon reruns, unlink cache for mathsvg #889 (dginev)

Closed issues

  • W3C validator emits warnings for LaTeXML-generated HTML5 #1016
  • pgfmath regression #991
  • Postprocessing doesn’t show errors during processing of bibliography #916
  • eso-pic.sty fails to parse #909
  • –graphicsmap option isn’t validated #893
  • Make “latexml –help” do something helpful #892
  • latexmlmath breaks at “%” #884
  • Typo in TeX.pool.ltxml #875
  • Weird @filelist behaviour when loading cls file #857
  • Invocation() of a macro created via \let references original macro #849
  • Suspicious errors when running babel tests #848
  • \stackrel: behaviour incompatible to LaTeX #838
  • Reopening issue #831 - href: LaTeXML failing on URLs containing underscores #837
  • Change to new MathJax CDN by April 30 #834
  • Rendering error for the input \mathop{\Gamma} #833
  • Errors with algpseudocode #825
  • optionally pull back crossrefs in bibliographies. #800
  • Typo in the manual #797

Merged pull requests

  • Paranoid perl invocation for epub test #1031 (dginev)
  • Also use same perl between Test.pm and system() #1030 (dginev)
  • pass @INC to latexmlc invocation in tests #1029 (dginev)
  • Fixed pod error as reported by CPANTS. #1028 (manwar)
  • Texlive more tests #1027 (tkw1536)
  • Improve enumitem binding #1026 (tkw1536)
  • Add csquotes binding #1025 (tkw1536)
  • Support for expandable environment names #1024 (tkw1536)
  • Update Perl Versions Tested By Travis #1014 (tkw1536)
  • Cleanup and Refactor Locators #1013 (tkw1536)
  • Add Dockerfile for use with Docker #1008 (tkw1536)
  • wiki.sty and turing.sty bindings #990 (dginev)
  • add copy of utf8.def as utf8x.def.ltxml #986 (dginev)
  • \@[email protected] definition from latex.ltx, allows loading ucs.sty #983 (dginev)
  • etoolbox.sty support #978 (dginev)
  • Adding textcase.sty test #977 (dginev)
  • Fix misc fatal:misdefined:ANON problems #973 (dginev)
  • avoid Fatal for tikz matrices #972 (dginev)
  • improved \endinput macro #971 (dginev)
  • Integrate logs from separate bibliography converts in post-processing #969 (dginev)
  • update homepage link #967 (dginev)
  • Initialize branch TagsNTitles; replaces attributes @refnum,@frefnum,@… #966 (brucemiller)
  • using IPC::Open3 to silence the status write of latexmlc during testing #965 (dginev)
  • when core returns an empty document, ensure post still succeeds #964 (dginev)
  • make DirectoryList tolerant of spaces #963 (dginev)
  • Teasing away Perl-level die for error arxiv doc 1802.05385 #962 (dginev)
  • Teasing away Perl-level dies for 100+ error document #961 (dginev)
  • Soften heuristic, was breaking mathbf macro #956 (dginev)
  • Infinite building loop #953 (dginev)
  • Two more wrongly encoded cases in LaTeXML.pm #949 (dginev)
  • More encoding cleanup #940 (dginev)
  • binmode not needed if we always output bytes #938 (dginev)
  • Small typo in index.tex #937 (matteosecli)
  • Use utf8 when writing to stdout in latexml #920 (dginev)
  • Use library to generate XML in maketests instead of system() #906 (bfirsh)
  • Fix first usage example #890 (bfirsh)
  • Ensure mathsvg produces svg in whatsout=math when html format #886 (dginev)
  • Preserve image graphic in fragment when not found #885 (dginev)
  • –mathsvg for latexmlc, math whatsout #883 (dginev)
  • update Mailing list #881 (kohlhase)
  • Safe no-op for \advance when missing definition #880 (dginev)
  • Fix typo in enspace #879 (dginev)
  • –includestyles fix for latexmlc and 1011.5551 fixes #873 (dginev)
  • Add error when readBalanced fails to balance at end of Mouth #872 (dginev)
  • [feature] First iteration of TEI-implementation for LaTeXML. #871 (Thanathan-k)
  • Refactor LaTeXML::Core::Box blessed objects #869 (tkw1536)
  • Bugfix for older Parse::RecDescent versions #868 (tkw1536)
  • Fix missing runtime_class warning #867 (tkw1536)
  • Fix extra space in JATS output (fixes #864) #865 (tkw1536)
  • Also record labels as metadata for longtables, when requested in lxRDFa #862 (dginev)
  • Keyval Improvements and xkeyval support #860 (tkw1536)
  • Rewrite tools/makemanifest in Perl #859 (tkw1536)
  • Set @filelist macro before reading file (fixes #857) #858 (tkw1536)
  • Sort entries in Manifest File #854 (tkw1536)
  • Add TexLive to Travis Tests #852 (tkw1536)
  • Add Makefile.old to gitignore #850 (tkw1536)
  • Change shebang to /usr/bin/env/perl #847 (tkw1536)
  • More updates to archlinux setup instructions #845 (tkw1536)
  • Improvements to doc/ scripts #843 (tkw1536)
  • Add archlinux setup instructions #842 (tkw1536)
  • Fix typo in PREFIX_ALIAS #840 (dginev)
  • add onlyPreamble checks for \documentclass and \documentstyle #839 (dginev)
  • Consistently use the XML::XML_NS namespace uri #824 (dginev)
  • Fix bug with wrongly checking all files inside archive input #814 (dginev)
  • Fix typo in command line call #808 (ianhbell)
  • Fixed a typo in manual.tex following issue #797 #798 (matteosecli)
  • fixes typo in LaTeXML::Core::Token documentation #796 (teepeemm)
  • Fixes #792 (regression with latexmlc documentation) #793 (dginev)

  1. DOC/ODT support is provided by an external plugin

  2. Take a look at the conversion statistics on CorTeX

  3. The status of MathML support can be checked here

  4. MathML Support on ePub3 Reading Systems