Tip 83: Use m4 in the middle of your documents

16 March 12. [link] PDF version

level: macro hacker
purpose: eliminate any and all repetition

Part of a series of tips on POSIX and C. Start from the tip intro page, or get 21st Century C, the book based on this series.

The m4 macro language is mostly interesting because its macros are intended to be put anywhere in any text file. We'll bulletproof the example in a little bit, but let's say that we have the macro file:

m4_divert(-1)
m4_define(Emph, <em>$1</em>)
m4_divert(1)</body></html>
m4_divert(0)<html><head><meta charset="utf-8" /></head><body>

and the text file

Welcome to my Emph(lovely) web site.

then after you run m4 macros.m4 text > out.html you'd wind up with:

<html><head><meta charset="utf-8" /></head><body>
Welcome to my <em>lovely</em> web site.
</body></html>

What just happened:

The expansions are aggressive: if your macro doesn't have parens after it, it'll still get expanded, so if you happen to have Emph in plain text, that'll get turned into HTML tags. If we're going to have m4 operate on an arbitrary text or code file, we need to make certain that it doesn't surprise us. E.g., use macro names that don't make sense as standalone strings. Notice also that we're using m4 -P, which puts that m4_ tag at the head of every function name. Otherwise, if you use the word divert in your text, it gets eaten. You may also find stray line breaks due to expansions; use m4_dnl to prevent those (delete to new line). Here's an m4 file with some further protections and tricks built in:

m4_divert(-1)
m4_changequote(`‹',`›') # m4 eats all quote-endquote markers, so make sure
                        # they will never appear in your text by using odd ones.
                        # Notice how these aren't the plain <> signs; 
                        # vim users, try :help digraph. 
                        # I also wrote a vim macro to write (‹ and ›) for me.
                        # To avoid sad surprises, wrap all all macro inputs in these.

m4_changecom(‹m4 comment:›) #Octothorpes appear in plain text.

#A macro to define new macros.
m4_define(newXML, ‹m4_define($1, <$2>‹$›‹1›</$2>)m4_dnl›)

newXML(Emph,em)
newXML(Pp,p)

m4_divert(1)</body></html> 
m4_divert(0)<html><head><meta charset="utf-8" /></head></body>

# Let's throw some sample uses here, so we can test the
# m4 file with itself. When we're happy, move the m4 divert(0)
# line to the end so these get sent to /dev/null.
Pp(‹Dear reader,›)

Pp(‹HTML was Emph(originally) designed to be handwritten,
but now generating well-formed documents is just a pain.›)

The language does a few more tricks: optional arguments, if/thens, loops, but if you keep it simple, you can fix a lot of annoyances with just a few lines of macro definitions.


[Previous entry: "Tip 82: Insert NA, NaN, and other markers into your data set"]
[Next entry: "Tip 84: Use m4 to automate OOP boilerplate"]