A six-part series: How do TeX macros actually work?
About this series
This article series has an ambitious goal: to explain, step-by-step, how TeX macros, such as LaTeX commands, actually work—by exploring processes which take place inside TeX engine software. It tries to tell the story of TeX’s processing behaviour:
- reading input characters and using category codes;
- the production of character and command tokens—and the formulae TeX uses;
- how TeX identifies, then processes, TeX commands;
- the internal details of storing macro definitions, and macro arguments, as tokens lists—using graphics produced from internal TeX data;
- and concludes with exploring what macro expansion actually means—using real data from inside a TeX engine.
However, because TeX engines are such complex software programs we cannot hope to cover everything but we have tried to address the most important, core, features of TeX’s macro-processing capabilities.
Each article has the following navigation bar before and after the text so that you can quickly jump to another article in the series:
Part 1: Category codes
This article examines the reasoning behind TeX’s concept of category codes: what they are and how TeX uses them to filter its input into content for typesetting and commands to be executed.
Part 2: Reading input through TeX’s “eyes”
Through a series of graphics we use the time-tested analogy of TeX having “eyes” with which to read (scan) its input. We explore examples of TeX’s use of category codes to create character tokens and how TeX recognizes and processes commands by using category code 0 (“escape character”).
Part 3: From input text to commands
This article takes an in-depth look at how TeX recognizes and process commands detected in the input. We explore how TeX stores and retrieves information about commands: command codes, command modifiers and survey a few internal variables that TeX uses to store information about items read-in from the input. Some of the article is quite low-level material which can be skipped on a first reading.
Part 4: The structure of a macro
We introduce and use the following “framework” for describing the structure of macros:
<TeX macro primitive><macro name><parameter text>{<replacement text>}
We then explore a range of examples to demonstrate the role and purpose of a macro’s <parameter text>
as a “token template” which can be constructed through the use of tokens acting as delimiters.
Part 5: Token lists and TeX’s internal macro storage
This article explores, in detail, how TeX uses token lists to store macro definitions. Using extensive diagrams generated with a specially modified version of TeX, we explore the specialized tokens that TeX uses to identify and process a user’s macro arguments.
Part 6: Macro expansion and processing
In Part 6 we use some detailed graphics to explain and explore the exact meaning of macro expansion and the consequences of TeX’s tokenization of macro arguments prior to feeding them into a macro’s <replacement text>.
A short note: Using “TeX” not LaTeX
As discussed in the article What’s in a Name: A Guide to the Many Flavours of TeX a wide range of terms are used to reference/describe TeX, LaTeX and their derivatives. Consequently, it is worth, briefly, clarifying our use/meaning of “TeX” within the context of this series.
“TeX” is, somewhat confusingly, both the name of an executable program and the name of a typesetting language. To distinguish between the two, the term TeX engine is used to differentiate between an executable TeX program and the typesetting language. Some of the specific data, information and details used within this series are derived from a detailed examination of the source code to Knuth’s original TeX software but the principles described are common to all TeX engines. So, throughout our discussion, the use of “TeX” should be inferred to mean one of the executable TeX engines—such as Knuth’s original TeX, pdfTeX, XeTeX or LuaTeX.
Within the articles we use the TeX primitive command \def
to define our macro examples: we don’t use LaTeX command \newcommand
which is almost certainly more familiar to most Overleaf users. There’s a very good reason for this: our objective is to understand the fundamental principles underlying TeX’s macro behaviour but to do that we need to use the core commands (primitives) built into TeX software. LaTeX commands, such as \newcommand
, are themselves macros: commands with specific programmed behaviour and which, ultimately, are constructed from layers of lower-level TeX primitive commands. To better understand the fundamental behaviour of TeX we have to use TeX primitives, not LaTeX macros.
Examples and graphics
Instead of relying solely on a suite of example macros designed to demonstrate various features, edge cases and behaviours of TeX, we also use an extensive array of graphics to look inside TeX itself to see how and why its macro processing works the way it does. Many of the graphics (token lists/node diagrams) have been prepared using a specially modified version of Knuth’s original TeX.
Overleaf adapted Knuth’s TeX with additional code (written in C) that “hooks into” TeX’s macro processing and explores data, and data structures, which are normally inaccessible to users. Each time a macro is called, the modified TeX engine generates additional output files containing data in a format which can be processed using Graphviz, an open-source graph-visualization program. The end-result is graphics (node-list diagrams) which show exactly how TeX stores a macro’s definition, together with a graphical representation of any arguments supplied by the user when the macro was called.
Out of necessity, this series’ objectives require discussion of a wide range topics, many of which are quite low-level, and, initially, might seem to be very distant from the task of typesetting your documents. Hopefully, after taking a deeper-dive, you’ll come through with a foundation for building a better understanding that will, in the end, save you a lot of time and, perhaps, minimize the frustration levels too. It is also our hope that the specially-generated graphics which accompany this series also offer a uniquely valuable insight to help and support any readers in their quest to better understand TeX macros.
Overleaf’s customized version of TeX
The video gives a short demonstration of Overleaf’s modified version of TeX, adapted to generate Graphiviz node diagrams (.gv
files)—no other aspect of TeX’s behaviour is affected by those changes. The .gv
files contain representations of the TeX token lists used to store macro definitions and macro arguments. The Graphviz visualization is exported to SVG which is then imported into Inkscape for further annotation prior to incorporating the graphic into an article.
Overleaf guides
- Creating a document in Overleaf
- Uploading a project
- Copying a project
- Creating a project from a template
- Including images in Overleaf
- Exporting your work from Overleaf
- Working offline in Overleaf
- Using Track Changes in Overleaf
- Using bibliographies in Overleaf
- Sharing your work with others
- Debugging Compilation timeout errors
- How-to guides
LaTeX Basics
- Creating your first LaTeX document
- Choosing a LaTeX Compiler
- Paragraphs and new lines
- Bold, italics and underlining
- Lists
- Errors
Mathematics
- Mathematical expressions
- Subscripts and superscripts
- Brackets and Parentheses
- Fractions and Binomials
- Aligning Equations
- Operators
- Spacing in math mode
- Integrals, sums and limits
- Display style in math mode
- List of Greek letters and math symbols
- Mathematical fonts
Figures and tables
- Inserting Images
- Tables
- Positioning Images and Tables
- Lists of Tables and Figures
- Drawing Diagrams Directly in LaTeX
- TikZ package
References and Citations
- Bibliography management in LaTeX
- Bibliography management with biblatex
- Biblatex bibliography styles
- Biblatex citation styles
- Bibliography management with natbib
- Natbib bibliography styles
- Natbib citation styles
- Bibliography management with bibtex
- Bibtex bibliography styles
Languages
- Multilingual typesetting on Overleaf using polyglossia and fontspec
- International language support
- Quotations and quotation marks
- Arabic
- Chinese
- French
- German
- Greek
- Italian
- Japanese
- Korean
- Portuguese
- Russian
- Spanish
Document structure
- Sections and chapters
- Table of contents
- Cross referencing sections and equations
- Indices
- Glossaries
- Nomenclatures
- Management in a large project
- Multi-file LaTeX projects
- Hyperlinks
Formatting
- Lengths in LaTeX
- Headers and footers
- Page numbering
- Paragraph formatting
- Line breaks and blank spaces
- Text alignment
- Page size and margins
- Single sided and double sided documents
- Multiple columns
- Counters
- Code listing
- Code Highlighting with minted
- Using colours in LaTeX
- Footnotes
- Margin notes
Fonts
Presentations
Commands
Field specific
- Theorems and proofs
- Chemistry formulae
- Feynman diagrams
- Molecular orbital diagrams
- Chess notation
- Knitting patterns
- CircuiTikz package
- Pgfplots package
- Typing exams in LaTeX
- Knitr
- Attribute Value Matrices
Class files
- Understanding packages and class files
- List of packages and class files
- Writing your own package
- Writing your own class
- Tips