A six-part series: How do TeX macros actually work?

About this series

This article series has an ambitious goal: to explain, step-by-step, how TeX macros, such as LaTeX commands, actually work—by exploring processes which take place inside TeX engine software. It tries to tell the story of TeX’s processing behaviour:

reading input characters and using category codes;
the production of character and command tokens—and the formulae TeX uses;
how TeX identifies, then processes, TeX commands;
the internal details of storing macro definitions, and macro arguments, as tokens lists—using graphics produced from internal TeX data;
and concludes with exploring what macro expansion actually means—using real data from inside a TeX engine.

However, because TeX engines are such complex software programs we cannot hope to cover everything but we have tried to address the most important, core, features of TeX’s macro-processing capabilities.

Navigation bar

Each article has the following navigation bar before and after the text so that you can quickly jump to another article in the series:

Part 1 Part 2 Part 3 Part 4 Part 5 Part 6

Part 1: Category codes

This article examines the reasoning behind TeX’s concept of category codes: what they are and how TeX uses them to filter its input into content for typesetting and commands to be executed.

Part 2: Reading input through TeX’s “eyes”

Through a series of graphics we use the time-tested analogy of TeX having “eyes” with which to read (scan) its input. We explore examples of TeX’s use of category codes to create character tokens and how TeX recognizes and processes commands by using category code 0 (“escape character”).

Part 3: From input text to commands

This article takes an in-depth look at how TeX recognizes and process commands detected in the input. We explore how TeX stores and retrieves information about commands: command codes, command modifiers and survey a few internal variables that TeX uses to store information about items read-in from the input. Some of the article is quite low-level material which can be skipped on a first reading.

Part 4: The structure of a macro

We introduce and use the following “framework” for describing the structure of macros:

<TeX macro primitive><macro name><parameter text>{<replacement text>}

We then explore a range of examples to demonstrate the role and purpose of a macro’s <parameter text> as a “token template” which can be constructed through the use of tokens acting as delimiters.

Part 5: Token lists and TeX’s internal macro storage

This article explores, in detail, how TeX uses token lists to store macro definitions. Using extensive diagrams generated with a specially modified version of TeX, we explore the specialized tokens that TeX uses to identify and process a user’s macro arguments.

Part 6: Macro expansion and processing

In Part 6 we use some detailed graphics to explain and explore the exact meaning of macro expansion and the consequences of TeX’s tokenization of macro arguments prior to feeding them into a macro’s <replacement text>.

A short note: Using “TeX” not LaTeX

As discussed in the article What’s in a Name: A Guide to the Many Flavours of TeX a wide range of terms are used to reference/describe TeX, LaTeX and their derivatives. Consequently, it is worth, briefly, clarifying our use/meaning of “TeX” within the context of this series.

“TeX” is, somewhat confusingly, both the name of an executable program and the name of a typesetting language. To distinguish between the two, the term TeX engine is used to differentiate between an executable TeX program and the typesetting language. Some of the specific data, information and details used within this series are derived from a detailed examination of the source code to Knuth’s original TeX software but the principles described are common to all TeX engines. So, throughout our discussion, the use of “TeX” should be inferred to mean one of the executable TeX engines—such as Knuth’s original TeX, pdfTeX, XeTeX or LuaTeX.

Within the articles we use the TeX primitive command \def to define our macro examples: we don’t use LaTeX command \newcommand which is almost certainly more familiar to most Overleaf users. There’s a very good reason for this: our objective is to understand the fundamental principles underlying TeX’s macro behaviour but to do that we need to use the core commands (primitives) built into TeX software. LaTeX commands, such as \newcommand, are themselves macros: commands with specific programmed behaviour and which, ultimately, are constructed from layers of lower-level TeX primitive commands. To better understand the fundamental behaviour of TeX we have to use TeX primitives, not LaTeX macros.

Examples and graphics

Instead of relying solely on a suite of example macros designed to demonstrate various features, edge cases and behaviours of TeX, we also use an extensive array of graphics to look inside TeX itself to see how and why its macro processing works the way it does. Many of the graphics (token lists/node diagrams) have been prepared using a specially modified version of Knuth’s original TeX.

Overleaf adapted Knuth’s TeX with additional code (written in C) that “hooks into” TeX’s macro processing and explores data, and data structures, which are normally inaccessible to users. Each time a macro is called, the modified TeX engine generates additional output files containing data in a format which can be processed using Graphviz, an open-source graph-visualization program. The end-result is graphics (node-list diagrams) which show exactly how TeX stores a macro’s definition, together with a graphical representation of any arguments supplied by the user when the macro was called.

Out of necessity, this series’ objectives require discussion of a wide range topics, many of which are quite low-level, and, initially, might seem to be very distant from the task of typesetting your documents. Hopefully, after taking a deeper-dive, you’ll come through with a foundation for building a better understanding that will, in the end, save you a lot of time and, perhaps, minimize the frustration levels too. It is also our hope that the specially-generated graphics which accompany this series also offer a uniquely valuable insight to help and support any readers in their quest to better understand TeX macros.

Overleaf’s customized version of TeX

The video gives a short demonstration of Overleaf’s modified version of TeX, adapted to generate Graphiviz node diagrams (.gv files)—no other aspect of TeX’s behaviour is affected by those changes. The .gv files contain representations of the TeX token lists used to store macro definitions and macro arguments. The Graphviz visualization is exported to SVG which is then imported into Inkscape for further annotation prior to incorporating the graphic into an article.