Tip 65: Easy threading with Pthreads

09 February 12. [link] PDF version

level: Computationally-intensive work
purpose: Use all the processors you've got

Part of a series of tips on POSIX and C. Start from the tip intro page, or get 21st Century C, the book based on this series.

If your computer is less than about five years old and is not a telephone, then it has several cores, which are independent pipelines for processing. Run top right now and hit the 1 key to break out the list of CPUs (though I'm not sure if that's POSIX-standard), and you can see how many you have and how busy they are.

Threading is billed as a complex and arcane topic, but when I first implemented threads in a program, I was delighted by how easy it is to get a serial loop to become a set of threads.

POSIX has had support for threads for forever. C11 adds a C-standard thread library, which I'd like to cover for you, but I'm not yet sure if there's an extant standard C library that implements the threading portion of the new standard.

So the syntax is really not a thing. The hard part is in dealing with the details of how threads interact.

The simplest case, sometimes called a process which is embarrassingly parallel, is when you are applying the same task to every element of an array, and each is independent of the other. Any shared data is used read-only. As per the nickname for this kind of thing, this is the easy case, and I'll show you the syntax for making it work here.

The first complication is when there is some resource that is writeable, and is shared across elements. For example, say that we'd like to write a message to stdout as events happen, and we have a badly implemented version of printf. If two threads write to stdout at once, then the messages may mangle each other. Instead of

Adding another agent.
Removing an agent.

we might get mangling like

AddingRem anooving ather agent.n age

I added that badly implemented caveat because POSIX dictates that printf and family must be thread-safe, meaning that this sort of mangling can't happen. I can't find a straight answer, but I'd be surprised if the Windows standard library's printf family isn't also thread-safe.

Here, we need a mutex, which locks a resource (here, stdout) while in use by one thread, and tells the other threads that want to use the resource to hold on until the prior thread is done and releases the lock. Using mutexes will come up next episode.

When you have multiple mutexes that may interact, or you have one thread reading data that was written by another thread, then you're up to rocket science. It's easy for threads to get into states where both are waiting for the other to release a mutex, or for our expectations about the rate at which data gets read or written, and debugging this sort of thing is now a question of the luck of replicating the surprising order of execution when the debugger is running. But I expect that the simple stuff I'll cover here will already be enough for you to safely speed up your code.

The pthreads checklist

You have a for loop over elements, such as an operation on every element of an array. As above, no iteration has any bearing on any other. If the iterations were run in random order, you wouldn't really care, as long as every array element gets hit once and only once.

We're going to turn that serial for loop into parallel threads. It's not trivial, but it's not all that difficult either. We are going to write a single function that will be applied to each element of the array, using pthread_create, and then use pthread_join to wait for each thread to return. At the end of that disperse/gather procedure, the program can continue as if nothing special had happened.

  1. For GCC and Clang, add -pthreads to the compiler line. If you're using a makefile, then set something like CFLAGS=-pthreads -g -Wall.

  2. #include <pthreads.h>

  3. Write a wrapper function, which will be called by every thread. It has to have a signature of the form void * your_function (void *). That is, it takes one void pointer in, and spits one void pointer out. If you started with a for loop, paste the body of the loop into this wrapper, and do the appropriate surgery so that you are acting on the function input instead of element i of the array.

  4. Disperse the threads: your for loop now applies pthread_create to each array element. See the example below.

  5. Gather the threads: Write a second for loop to call pthread_join to gather all of the threads and check their return values.

OK, here's the example. I am sorry to say that it is a word counter, which is such a typical example. However, it's at least a pretty zippy one, which runs about 3x faster than wc. [The definitions of a word also differ, so it'll be hard to seriously compare, though.] You should at this point be able to follow the above gather/disperse procedure. I'll point out several little details after the text.

You'll notice that I call in fstr.h, over from Tip #52, so compile with that.

#include "fstr.h"
#include <pthread.h>
#include <string.h> //strtok_r

typedef struct{
    int wc;
    char *docname;
} wc_struct;

void *wc(void *voidin){
    wc_struct *in = voidin;
    fstr_t *doc = fstr_new(in->docname);

    wc_struct *out = calloc(1, sizeof(wc_struct));

    char *scratch;
    char *delimiters = " `~!@#$%^&*()_-+={[]}|\\;:'\",<>./?\n";
    char *txt = strtok_r(doc->data, delimiters, &scratch);
    if (!txt) return out; //zero elements.
    while (txt) {
        txt = strtok_r(NULL, delimiters, &scratch);
    return out;

int main(int argc, char **argv){
    argv++; //step past the name of the program.

    pthread_t threads[argc];
    wc_struct s[argc];
    for (int i=0; i< argc; i++){
        s[i] = (wc_struct){.docname=argv[i]};
        pthread_create(&threads[i], NULL, wc, &s[i]);

    int values[argc];
    void *rvalue;
    for (int i=0; i< argc; i++){
        pthread_join(threads[i], &rvalue);
        values[i] = ((wc_struct*)rvalue)->wc;

    for (int i=0; i< argc; i++) printf("%s:\t%i\n",argv[i], values[i]);

To do:

Now that you have the form down (or at least, you have a template you can cut and paste), check your code for embarrassingly parallel loops, and thread `em.

Above, I gave each row of an array one thread; how would you split something into a more sensible number, like two or three threads? Hint: you've got a struct, so you can send in extra info, like start/end points for each thread.

[Previous entry: "Tip 64: Consistency-check external variables"]
[Next entry: "Tip 66: Protect threaded resources with mutexes"]