m4 without the misery

23 January 15. [link] PDF version

I presented at the DC Hack and Tell last week. It was fun to attend, and fun to present. I set the bar low by presenting my malcontent management system, which is really just a set of shell and m4 scripts.

I have such enthusiasm for m4, a macro language from the 1970s that is part of the POSIX standard, because there aren't really m4 files, the way there are C or Python or LaTeX files. Instead, you have a C or Python or LaTeX file that happens to have some m4 strewn about. Got something that is repetitive that a macro or two could clean up? Throw an m4 macro right there in the file and make immediate use. And so, m4 is the hammer I use to even out nails everywhere. C macros can't generate other C macros (see example below), and LaTeX macros are often fragile, so even where native macros are available it sometimes makes sense to use m4 macros instead. Even the markup for this column is via m4.





What discussion there was after the hack and tell was about how I can even use m4, given its terrible reputation as a byzantine mess. So an inquiry must be made about what I'm doing differently that makes this tolerable. I was in a similar situation with the C programming language, and my answer to how I use C differently from sources that insist that it's a byzantine mess turned into a lengthy opus.

M4 is simpler, so my answer is only a page or two.

I assume you're already familiar with m4. If not, there are a host of tutorials out there for you. I have two earlier entries, starting here. Frankly, 90% of what you need is m4_define, so if you get that much, you're probably set to start trying things with it.

As above, m4 passes not-m4-special text without complaint, but it is very aggressive in substituting anything that m4 recognizes. This leads to the advice that for every pair of parens, you should have a pair of quote-endquote markers to protect the text, which leads to m4-using files with a million quote-endquote markers.

I've found that this advice is overcautious by far.

In macro definitions, the `laziness' of the expansion is critical (do I evaluate $# when the macro is defined, when it is first called, or by a submacro defined by this macro?), and the quote-endquote markers are the mechanism to control that timing. This is a delicate issue that every macro language capable of macro-defining macros runs into. My only advice is to read the page of the manual on how macro expansion occurs very carefully. The first sentence is a bit misleading, though, because the scan of the text is itself treated as a macro expansion, so one layer of quote-endquote markers are stripped, dnl is handled, et cetera. But because I am focused on writing my other-language text with support from m4, not building a towering m4 edifice, my concern with careful laziness control is not as great.

So my approach, instead of putting hundreds of quotes and endqoutes all over my document, is to know what the m4 specials are, and make sure they never appear in my text unless I made an explicit choice to put them there.

The specials

Outside of macro definitions themselves (where dollar signs matter), there are five sets of m4-special tokens. There's a way to handle each of them.

So, let's reduce the lessons from this list:

Is that too much to remember? Are you bash or zsh user? Here's a function to paste onto the command line or your .bashrc or .zshrc:

m5 () { cat $* | sed 's/,/<|,|>/g' | sed 's/\~\~/,/g' | \
               m4 -P <(echo "m4_changecom()m4_changequote(<|, |>)") -
      }

Now you can run things like m5 myfile.m4 > myfile.py.

At this point, unless you are writing m4 macros to generate m4 macros, you can write your Python or HTML or what-have-you without regard to m4 syntax, because as long as you aren't writing m4_something, <|, |>, or ~~ in your text, m4 via this pipeline just passes your text through either to your defined macros or standard output without incident.

Are there ways to break it? Absolutely. Can you use these steps to more easily build macros upon macros upon macros? Yes, but that's probably a bad idea in any macro system. Can you use this to replace repetitive and verbose syntax with something much simpler, more legible, and maintaiable? Yes, when implemented with apropriate common sense.

An example

Here is a sample use. C macro names have to be plain text—we can't use macro tricks when naming macros. But we can use m4 to write C macros without such restrictions. This example is not especially good form (srsly) but gives you the idea. Cut and paste this entire example onto your command line to create pants_src.c pantsprogram, pants, and octopants.


#The above shell function again:
m5 () { cat $* | sed 's/,/<|, |>/g' | sed 's/\~\~/,/g' |\
             m4 -P <(echo "m4_changecom()m4_changequote(<|, |>)") - 
      }

# Write m4-imbued C code to a file
cat << '-- --' > pants_src.c

#include <stdio.h>

m4_define(Def_print_macro~~
  FILE *f_$1 = NULL;
  #define print_to_$1(expr, fmt)                   \
    {if (!f_$1) f_$1 = fopen("$1", "a+");          \
    fprintf(f_$1, #expr "== " #fmt "\n", (expr));  \
    }
)

int main(){
    Def_print_macro(pants)
    Def_print_macro(octopants)

    print_to_pants(1+1, %i);
    print_to_octopants(4+4, %i);

    char *ko="khaki octopus";
    print_to_octopants(ko, %s);
}

-- --

# compile. Use clang if you prefer.
# Or just call "m5 pants_src.c" to view the post-processed pure C file.
m5 pants_src.c | gcc -xc - -o pantsprogram

#Run and inspect the two output files.
./pantsprogram
cat pants
echo
cat octopants

[Previous entry: "A version control tutorial with Git"]
[Next entry: "Overlapping bus lines"]