Tip 83: Use m4 in the middle of your documents
16 March 12. [link] PDF version
level: macro hacker
purpose: eliminate any and all repetition
Part of a series of tips on POSIX and C. Start from the tip intro page, or get 21st Century C, the book based on this series.
The m4 macro language is mostly interesting because its macros are intended to be put anywhere in any text file. We'll bulletproof the example in a little bit, but let's say that we have the macro file:
m4_divert(-1) m4_define(Emph, <em>$1</em>) m4_divert(1)</body></html> m4_divert(0)<html><head><meta charset="utf-8" /></head><body>
and the text file
Welcome to my Emph(lovely) web site.
then after you run m4 macros.m4 text > out.html you'd wind up with:
<html><head><meta charset="utf-8" /></head><body> Welcome to my <em>lovely</em> web site. </body></html>
What just happened:
- We just automated a whole lot of cruft at once. There is just no way to produce HTML, XML, or even certain blocks of LATEXwithout help from something that automates redundancy, and m4 will do that for you.
- The
m4_define
is the simplest macro definition: you give the macro definer a name and an expansion, where the expansion can have the usual positional parameters like $1, $2, .... That's all it takes for m4 to know what to do when it gets into your text and sees Emph(lovely). - Diversions:
m4_divert(-1)
sends output to /dev/null. So after that line m4 is reading in macro definitions but isn't writing anything.m4_divert(1)
stores output into buffer 1.m4_divert(0)
writes to standard out, which should be the normal course of affairs. At the end of the text, the buffers get written to output in order, which is how we got the end tags at the end of the file. Or, write a footer.m4 to put on the command line after your main text. Usem4_undivert(1)
to empty out buffer 1 sooner.
The expansions are aggressive: if your macro doesn't have parens after it, it'll still get
expanded, so if you happen to have Emph in plain text, that'll get turned into HTML
tags. If we're going to have m4 operate on an arbitrary text or code file, we need to make
certain that it doesn't surprise us. E.g., use macro names that don't make sense as standalone
strings. Notice also that we're using m4 -P, which puts that m4_
tag at the head of
every function name. Otherwise, if you use the word divert in your text, it gets
eaten. You may also find stray line breaks due to expansions; use m4_dnl
to prevent
those (delete to new line). Here's an m4 file with some further protections and tricks built in:
m4_divert(-1) m4_changequote(`‹',`›') # m4 eats all quote-endquote markers, so make sure # they will never appear in your text by using odd ones. # Notice how these aren't the plain <> signs; # vim users, try :help digraph. # I also wrote a vim macro to write (‹ and ›) for me. # To avoid sad surprises, wrap all all macro inputs in these. m4_changecom(‹m4 comment:›) #Octothorpes appear in plain text. #A macro to define new macros. m4_define(newXML, ‹m4_define($1, <$2>‹$›‹1›</$2>)m4_dnl›) newXML(Emph,em) newXML(Pp,p) m4_divert(1)</body></html> m4_divert(0)<html><head><meta charset="utf-8" /></head></body> # Let's throw some sample uses here, so we can test the # m4 file with itself. When we're happy, move the m4 divert(0) # line to the end so these get sent to /dev/null. Pp(‹Dear reader,›) Pp(‹HTML was Emph(originally) designed to be handwritten, but now generating well-formed documents is just a pain.›)
The language does a few more tricks: optional arguments, if/thens, loops, but if you keep
it simple, you can fix a lot of annoyances with just a few lines of macro definitions.
[Previous entry: "Tip 82: Insert NA, NaN, and other markers into your data set"]
[Next entry: "Tip 84: Use m4 to automate OOP boilerplate"]