Modeling with Data

Tip 34: Use the shell's for loops to operate on a set of files

08 December 11. [link]

level: you use the shell enough to be frustrated by it
purpose: do the same thing to a bunch of files

Part of a series of tips on POSIX and C. Start from the tip intro page, or get 21st Century C, the book based on this series.

Continuing on with the discussion of getting more from the shell (begun in Tip #32), let's get to some proper programming, with if statements and for loops.

But here's your first tip about programming using your shell's language: don't do it. The shell is Turing complete, and has variables and functions that look like those in any other language, but it's especially easy to write unmaintainable code in the shell. Scope is awkward--pretty much everything is global. It's a macro language, so all those things that they warned you about when you write two lines of C preprocessor code are relevant for every line of your shell script. There are little tricks that will easily catch you, like how you can't have spaces around the = in onething=another, but you must have spaces around the [ and ] in if [ -e ff ] (because they aren't characters--they're kewords that just happen to not have any human-language characters in them). Write shell scripts to automate what you would type at the command line, and if you need to go further take the time to switch to Perl, Python, &c.

for loops

My vote for greatest bang for the buck from having a programming language that you can type directly onto the command line goes to running the same command on several files. Here, let's back up every .c file the old fashioned way, by copying it to a new file with a name ending in .bkup:

for file in *.c;
do
 cp $file ${file}.bkup;
done

You see where the semicolon is: at the end of the list of files the loop will use, on the same line as the for statement. I'm pointing this out because I find it to be hopelessly counterintuitive, especially when we cram this onto one line:

for file in *.c; do cp $file ${file}.bkup; done

It somehow bothers me that the do is right there with the command, but there you have it.

For your scientific computing needs, the for loop is useful for dealing with a sequence of N runs. By way of a simple example, let's search our C code for digits, and write each line that has a given number to a file:

for i in 0 1 2 3 4 5 6 7 8 9; do grep $i *.c > lines_with_${i}; done
wc -l lines_with*  #a v. rough histogram of your digit usage.

Testing against Benford's law is left as an exercise for the reader.

The curly braces in ${i} are there to distinguish what is the variable name and what is subsequent text; you don't need it here, but you would to make a file name like ${i}lines.

You may have the seq command installed on your machine--it's BSD standard but not POSIX standard. Then we can use backticks to generate a sequence:

for i in `seq 0 9`; do grep $i *.c > lines_with_${i}; done

Running your simulation a thousand times is now trivial:

for i in `seq 1 1000`; do ./run_sim > ${i}.out; done

#or append all output to a single file:
for i in `seq 1 1000`; do echo run $i >> sim_out; ./run_sim >> sim_out; done

[Previous entry: "Tip 33: Replace shell commands with their outputs"]
[Next entry: "Tip 35: Use the shell to test for files"]

Modeling With Data

Tip 34: Use the shell's for loops to operate on a set of files

for loops