Raking with structural zeros

16 January 13. [link] PDF version

It's popular to set certain cells to zero, thus producing a not-quite-rectangular grid of values. The popular example around my workplace is that people under 15 can't be married, so we want our grid of options to exclude those cells for (14, married), (14, divorced), ... (0, married), (0, divorced), but to be otherwise square.

If you have a row in the table of margins that doesn't meet the constraint, then it gets thrown out as bad data, by the way. I'm not sure if there's a better way to deal with such things; feel free to leave your suggestions in the comments.

Here's an example, using almost the same synthesis problem as before, but w/o the NaN trick this time, and with extra data at (3, 1) to keep things interesting.

Apophenia uses a lot of SQL on the back end, so it's natural to express the structural zeros via SQL. In this case, we have only one structural zero, at row=1 and col=1.

apop_text_to_db -O  -d="|" '-' margins sample.db <<"----------"
row | col | weight
  1 |  1 |   2.5
  1 |  2 |   2.5
  2 |  1 |   7.5
  3 |  1 |   7.5
  2 |  2 |   7.5
----------

cat <<"----------" > rake.c
#include <apop.h>

int main(){
    apop_db_open("sample.db");
    apop_data_show(
        apop_rake(.margin_table="margins", .count_col="weight",
            .contrasts=(char*[]){"row", "col"}, .contrast_ct=2,
            .structural_zeros="row=1 and col=1"
    ));
}
----------

export CFLAGS="-g -Wall -O3 `pkg-config --cflags apophenia`"
export LDLIBS="`pkg-config --libs apophenia`" CC=c99
make rake
./rake



[Previous entry: "Raking to complete missing data"]
[Next entry: "In memory and on-disk databases for SQLite"]