9 December 14.

A table of narratives and distributions

link PDF version

Remember your first probability class? At some point, you covered how, if you take sets of independent and identically distributed (iid) draws from a pool of items, it can be proven that the means of those draws will be Normally distributed.

But if you're like the great majority of the college-educated population, you never took a probability class. You took a statistics class, where the first few weeks covered probability, and the next few weeks covered other, much more structured models. For example, the derivation of the linear regression formula is based on assumptions regarding an affine relation between variables and the minimization of an objective that is sensible, but just as sensible as many other possible objectives.

Another way to put it is that the Normal Distribution assumptions are bottom-up, with statements about each draw that lead to an overall shape, while the linear regression is top-down, assuming an overall shape and deriving item-level information like the individual error terms from the global shape.

There are lots of bottom-up models with sound microfoundations, each a story of the form if each observation experienced this process, then it can be proven that you will observe a distribution like this: Polya urns, Poisson processes, orthogonal combinations of the above. In fact, I'm making a list.

Maybe you read the many posts on this blog [this post, et seq] about writing and using functions to transform well-defined models into other well-defined models. A chain of such transformations can lead to an increasingly nuanced description of a certain situation. But you have to start the chain somwhere, and so I started compiling this list.

I've been kicking around the idea of teaching a probability-focused stats class (a colleague who runs a department floated the idea), and the list of narrative/distribution pairs linked above would be the core of the first week or two. You may have some ideas of where you'd take it from here; me, I'd probably have the students code up some examples to confirm that convergence to the named distribution occurs, which leads to discussion of fitting data to closed-form distributions and testing claims about the parameters; and then start building more complex models from these basic models, which would lead to more theoretical issues like decomposing joint distributions into conditional parts, and estimation issues like Markov Chains. Every model along the way would have a plausible micro-story underlying it.

This post is mostly just to let you know that the list of narrative/distribution pairs mentioned above exists for your reference. But it's also a call for your contributions, if your favorite field of modeling includes distributions I haven't yet mentioned, or if you otherwise see the utility in expanding the text further.

I've tried to make it easy to add new narrative/distribution pairs. The project is hosted on GitHub, which makes collaboration pretty easy. You don't really have to know a lot about git (it's been on my to-do list to post the git chapter of 21st Century C on here, but I'm lazy). If you have a GitHub account and fork a copy of the repository underlying the narrative/distribution list, you can edit it in your web browser; just look for the pencil icon.

Technical

The formatting looks stellar on paper and in the web browser, if I may say so, including getting the math and the citations right. There isn't a common language that targets both screen and text for this sort of application, so I invented one, intended to be as simple as possible:
Items(
∙ Write section headers like Section(Title here)
∙ Write emphasized text like em(something important)
∙ Write citation tags like Citep(fay:herriot)
∙ and itemized lists like this.
)

See the tech guide associated with the project for the full overview.

Pandoc didn't work OK for me, and Markdown gets really difficult when you have technical documents. When things like ++ and * are syntactically relevant, mentioning C++ will throw everything off, and the formatting of y = a * b * c will be all but a crapshoot.

On the back end, for those who are interested, these formatting functions are m4 macros that expand to LaTeX or HTML as needed. I wrote the first draft of the macros for this very blog, around June 2013, when I got tired of all the workarounds I used to get LaTeX2HTML to behave, and started entrusting my math rendering to MathJax. The makefiles prep the documents and send them to LaTeX and BibTeX for processing, which means that you'll need to clone the repository to a box with make and LaTeX installed to compile the PDF and HTML.

But the internals are no matter. This is a document that could, with further contributions from you, become a very useful reference for somebody working with probability models—and not just students, because, let's all admit it, working practitioners don't remember all of these models. It is implemented using a simple back-end that could be cloned off and used for generating collaborative technical documents of equal (or better!) quality in any subject.


5 November 14.

Why I capitalize distribution names

link PDF version

There are two ways to think about what a parameterized statistical distribution is.

$\def\Re{{\mathbb R}}$ As a single point: Here, the Normal distribution is a mapping of the form $f:(x, \mu, \sigma) \to \Re^+$. More specifically, it is $f(x, \mu, \sigma) = \frac{1}{\sigma \sqrt{2\pi} } \exp(-\frac{(x-\mu)^2}{2\sigma^2})$. Within the infinite space of functions, this is a single point. We often fix certain parameters, and get a function of fewer dimensions, like $f(x, \mu=0, \sigma=1) = \frac{1}{\sqrt{2\pi} } \exp(-\frac{x^2}{2})$.

As a family: under this perspective, when we fix, say, $\mu=2$, $\sigma=1$, we get a Normal Distribution. When we fix $\mu=3$, $\sigma=1$, we get a different Normal Distribution. Here, there is a meta-function of the form $N:(\mu, \sigma) \to (f:x\to\Re^+)$, which defines a family of functions, and produces a series of Normal distribution functions depending on the values to which $\mu$ and $\sigma$ have been fixed.

Both of these approaches are coherent, and if you go with either, I respect you fully. Almost any Wikipedia page about a distribution will jump back and forth between these two interpretations, so at the end it's impossible to say whether a Normal Distribution has the form $f(x, \mu, \sigma)$ or $f(x)$. But any given Wikipage is edited by several people, so finding anacoluthons on Wikipedia is something of a fish-in-a-barrel exercise. But my web analytics software tells me that a large percentage of the readers of this blog are individual human beings; if that's you, I recommend picking one interpretation or the other and sticking with it.

I prefer the single-point characterization over the family. At the least, the meta-function is confusing, and implies the two-step estimation process of fixing the parameters, then grabbing a data point. This is a certain type of workflow that may or may not be what we want.

Of course, this gets into the Bayesian versus Frequentist debate. The stereotypical Frequentist believes that there is a true value of $(\mu, \sigma)$, and our job is to find it. This more closely aligns with a search for a single Normal distribution in the family of Normals. The stereotypical Bayesian doesn't know what to believe, and thinks that reality may even be an amalgam of many different values of $(\mu, \sigma)$. Either perspective works under either the single-point or family interpretation—as they say, mathematics is invariant under changes in notation—but the Frequentist approach more closely aligns with the two-step estimation process of the family interpretation, and the Bayesian approach is much easier to express under the single-point interpretation. My earlier post about Bayesian updating, with frequent integrals of $f(x, \mu, \sigma)$ over parameters certainly would have been more awkward via the family interpretation.

Grammatically, this has a clear implication. If the Normal Distribution is the name for that single expression up there, then its name should be capitalized as a proper noun, like London or Jacob Bernoulli, which are also unique entities. If a normal distribution is one of a family of functions, then it is a class of entities, like cities or people, and should be lower case.

By the way, I used to write “Normal Distribution,'' but none of the style books would be OK with that. The C in London City is capitalized; same with the D in Normal Distribution.

There's a bonus of consistency, because so many statistical models are capitalized anyway:

  • Gaussian
  • Poisson
  • OLS
  • F distribution
  • Normal distribution [because the normal distribution can easily confuse the reader (and I prefer it over Gaussian because I'll always choose descriptive over appellative).]

At which point, the few distributions that would be lower-cased under the family interpretation start to stand out and look funny.


3 November 14.

The formalization of the conversation in a social network

link PDF version

Last time, I ran with the definition of an academic field as a social network built around an accepted set of methods. The intent there was to counter all the dichotomies that are iffy or even detrimental (typically of a form pioneered by Richard Pryor: "Our people do it like this, but their people do it like this".)

This time, I'm going to discuss peer-reviewed journals from this perspective, to clarify all the things journals aren't. The short version: if journals are the formalized discussion of a social network built around a certain set of methods, then we can expect that the choice of what gets published will be based partly on relatively objective quality evaluation and partly on social issues. It's important to acknowledge both.

Originally, journals were literally the formalized discussions within a social network. Peer review was (and still is) a group of peers in a social network deciding whether a piece of formalized discussion is going to be useful and appropriate to the group.

An idea that exists only in my head is worthless—somebody somewhere has to hear it, understand it, and think about using it. Because a journal is a hub for the social network built around a known set of tools, I have a reasonable idea of which journal to pick given the methods I used, and what tools readers will be familiar with; readers who prefer certain methods know where to look to learn new things about those methods. So journals curate and set social norms, both of which are important to the process of communicating research.

Factual validity

Something that is incorrect will be useless or worse; work that is sloppily done is unlikely to be useful. So an evaluation of utility to the social network requires evaluating basic validity.

Among non-academics, I get the sense that this is what the peer review process is perceived to be about: that a paper that is peer reviewed is valid; one that isn't is up for debate.

If you think about this for a few seconds, this is prima facie absurd. The reviewers are one or two volunteers who will only put a few hours into this. Peer reviewers do not visit the lab of the paper author and check all the phosphate was cleaned out of the test tubes. They rarely double-code the statistics work to make sure that there are no bugs in the code. If there is a theorem with a four-page proof in the appendix, you've got low odds that any reviewer read it. I have on at least one occasion directly stated in a review that I did not have time to check the proof in the appendix and this has never seemed to affect the editors' decisions either way.

The most you can expect from a few hours of peer review is a (nontrivial and important) verification that the author hasn't missed anything that a person having ordinary skill in the art would catch. Deeper validity comes from a much deeper inquiry that is more likely to happen outside the formalized discussion of a journal.

Prestige

If a journal is the formalized discussion of a social network built around a certain set of methods, we see why journal publications are the gold standard in tenure reviews and other such very important affairs. Academics don't get hired for their ability to discover Beautiful Truths, they get hired for their ability to convince grant making bodies to give grants, to convince grad students and potential new hires to attend this department, and so on. These things require doing good work that has social sway. Each journal publication is a statement that there is a well-defined group of peers who think of your work positively, and publications in more far-reaching journals indicate a more far-reaching network of peers.

Choice of inquiry

Sorry if that sounds cynical, but even in mathematics, whose infinte expanse exists outside of human society, the choice of which concepts are most salient and which discoveries are truly important is chosen by people based on what other people also find to be salient.

Maybe you're familiar with the Beauty Contest, which was a story Keynes made up to explain how money works: the newspaper publishes photos of a set of gals, and readers mail in their vote, not for the one who is most beautiful, but for the one who they expect will win the content. Who you like doesn't matter—it's about who you think others will like. No wait, that isn't it either: what's important is who you think other people will think other people will like. Infinite regress ensues.

When you're chatting with a circle of friends, you don't pick topics that are objectively interesting—that's meaningless. You pick topics of conversation that you expect will be of interest to your friends. Now let's say that you know that after the meeting, your friends will go to RateMyFriends.com and vote on how interesting you would be to other potential friends. Then you will need to pick topics that your friends think will be of interest to other potential friends. You're well on your way to the Beauty Contest (depending on the rating strategy used by raters on RateMyFriends).

The Beauty Contest easily leads to bland least-common-denominator output. You're going to pick the most typically attractive looking gal out of the newspaper, and are going to avoid conversation topics that most would find quirky or odd.

What if day-glo `80s leggings are trendy this year? You might pick the gal in florescent lime green not because her attire is objectively attractive (a view which I really can't endorse), but because the setup of the Beauty Contest pushes you to select contestants who follow the current trends. It's not hard to find examples, especially in the social sciences, where a subject takes on its own life, as this quarter's edition publishes papers that respond to last quarter's papers, that are primarily a response to the quarter before.

Diversity

Even the part where we get a fresh pair of eyes to notice the things the author missed or easy-to-spot blunders is limited, because we're still asking peers. If you ask an anthropologist to read an Econ paper, the anthropologist will tear apart the fundamental assumptions; if you ask an economist to read an Anthro paper, she'll tear apart the fundamental assumptions.

But because journals are the formalized discussions of already-formed social networks, we can't expect a lot of cross-paradigm discussion in the journals or in-depth critiques of the social network's fundamental assumptions.

In the software development industry (which often refers to itself as `the tech industry'), you'll find more than enough long essays about the myth of meritocracy. To summarize: even in an industry that is clearly knowledge-heavy and where there are reasonably objective measures of ability, homophily is still a common and relevant factor. Given that fact of life, promoting the network as a meritocracy does a disservice, implying that whoever won out must have done so because they are the best here in this, the best of all possible worlds. If a person didn't get hired, or their code didn't get used, then it must be because the person or the code didn't have as much merit as the winner. The possibility that the person who wasn't picked does better work but wasn't as good a cultural fit as the person who got picked is downplayed.

Academics, in my subjective opinion, are much more likely to be on guard against creeping demographic uniformity. But an academic field is a social network built around an accepted set of tools, and this definition directly constrains the breadth of methodological diversity. Journals will necessarily reflect this.

The fiction of journals as absolute meritocracy still exists, especially among non-academics who have never submitted to a journal and read an actual peer review, and it has the same implications, that if a work doesn't sparkle to the right peers in the right social network, it must be wrong. And it's especially untrue in the present day, when more good work is being done than there is space in traditional paper journals to print it all.

Conclusion segment

I do think that there is much meritocracy behind a journal. A journal editor is the social hub of a network, so you could perhaps socialize your way into such a job, but you're going to kill the journal if you can't hold technical conversations with any author about any aspect of the field. As a journal reviewer, I have seen a good number of papers that can be established as fatally flawed even after a quick skim. But I would certainly like to see a world where the part about improving the quality of inquiry and the part about gaining approval by a predefined set of peers is more separated than it is now.

Social networks aren't going away, so journals supporting them won't go away. But there are many efforts being made to offer alternatives. It's a long list, but the standouts to me are the Arxiv and the SSRN (Social Science Research Network). These are sometimes described as preprint networks, implying that they are just a step along the way to actual peer-reviewed publication, but if the approval of a social network is not essential for your work, then maybe it's not necessary to take that step. Especially in the social sciences, where review times can sometimes be measured in years, these preprint networks are increasingly cited as the primary source. Even the Royal Society, who started this whole journal thing when it was a homophilic society in the 1600s, has an open journal that “...will allow the Society to publish all the high-quality work it receives without the usual restrictions on scope, length or [peer expectations of] impact.''

PS: Did you know I contribute to another blog on social science and public policy? In this entry and its follow-up I discuss other aspects of the journal system. I wrote it during last year's government shutdown, when I had a lot of free time.


30 October 14.

The difference between Statistics and Data Science

link PDF version

An academic field is a social network built around an accepted set of methods.

Economics has grown into the study of human decision making in all sorts of aspects. At this point, nobody finds it weird that some of the most heavily-cited papers in the Economics journals are about the decision to commit crimes or even suicide. These papers use methods accepted by economists, writing down utility functions and using certain mathematical techniques to extract information from these utility functions. Anthropologists also study suicide and crime, but using entirely different methods. So do sociologists, using another set of tools. To which journal you submit your paper on crime depends on matching the methods you use to the methods readers will be familiar with, not on the subject.

A notational digression: I hate the term `data science'. First, there's a general rule (that has exceptions) that anybody who describes what they're doing as “science'' is not a scientist—self-labelling like that is just trying too hard. And are we implying that other scientists don't use data? Is it the data or the science that statisticians are lacking? Names are just labels, but I'll hide this one under an acronym for the rest of this. I try to do the same with the United States DHS.

I push that the distinction is about the set of salient tools because I think it's important to reject other means of cleaving apart the Statistics and DS networks. Some just don't work well and some are as odious as any other our people do it like this, but the other people do it like this kind of generalizations. These are claims about how statisticians are too interested in theory and too willing to assume a spherical cow, or that DSers are too obsessed with hype and aren't careful with hypothesis testing. Hadley explains that “...there is little work [in Statistics] on developing good questions, thinking about the shape of data, communicating results or building data products'' which is a broad statement about the ecosystem that a lot of statisticians would dispute, and a bit odd given that he is best known for building tools to help statisticians build data products. It's not hard to find people who say that DS is more applied than Stats, which is an environment issue that is hard to quantify and prone to observation bias. From the comment thread of this level-headed post: “I think the key differentiator between a Data Scientist and a Statistician is in terms of accountability and commitment.''

Whatever.

We can instead focus on characterizing the two sets of tools. What is common knowledge among readers of a Stats journal and what is common knowledge among readers of a DS journal?

It's a subjective call, but I think it's uncontroversial to say that the abstract methods chosen by the DSers rely more heavily on modern computing technique than commonly-accepted stats methods, which tend to top out in computational sophistication around Markov Chain Monte Carlo.

One author went to the extreme of basically defining DS as the practical problems of data shunting and building Hadoop clusters. I dispute that any DSer would really accept such a definition, and even the same author effectively retracted his comment a week later after somebody gave him an actual DS textbook.

If you want to talk about tools in the sense of using R versus using Apache Hive, the conversation won't be very interesting to me but will at least be a consistent comparison on the same level. If we want to talk about generalized linear models versus support vector machines, that's also consistent and closer to what the journals really care about.

The basic asymmetry that the price of admission for using DS techniques is greater computational sophistication will indeed have an effect on the people involved. If we threw a random bunch of people at these fields, those who are more comfortable with computing will sort themselves into DS and those less comfortable into Stats. We wind up with two overlapping bell curves of computing ability, such that it is not challenging to find a statistician-DSer pair where the statistician is a better programmer, but in expectation a randomly drawn DSer writes better code than a randomly drawn statistician. So there's one direct corollary of the two accepted sets of methods.

Three Presidents of the ASA wrote on the Stats vs DS thing, and eventually faced the same technical asymmetry:

Ideally, statistics and statisticians should be the leaders of the Big Data and data science movement. Realistically, we must take a different view. While our discipline is certainly central to any data analysis context, the scope of Big Data and data science goes far beyond our traditional activities.

This technical asymmetry is a real problem for the working statistician, and statisticians are increasingly fretting about losing funding—and for good reason. Methods we learned in Econ 101 tell us that an unconstrained set leads to an unambiguously (weakly) better outcome than a constrained set.

If you're a statistician who is feeling threatened, the policy implications are obvious: learn Python. Heck, learn C—it's not that hard, especially if you're using my C textbook, whose second edition was just released (or Modeling with Data, which this blog is ostensibly based on). If you have the grey matter to understand how the F statistic relates to SSE and SSR, a reasonable level of computing technique is well within your reach. It won't directly score you publications (DSsers can be as snobby about how writing code is a “mere clerical function'' as the statisticians and US Federal Circuit can be), but you'll have available a less constrained set of abstract tools.

If you are in the DS social network, an unconstrained set of tools is still an unambiguous improvement over a constrained set, so it's worth studying what the other social network takes as given. Some techniques from the 1900s are best left in the history books, but now and then you find ones that are exactly what you need—you won't know until you look.

By focusing on a field as a social network built around commonly accepted tools, we see that Stats and DS have more in common than differences, and can (please) throw out all of the bigotry that comes with searching for differences among the people or whatever environment is prevalent this week. What the social networks will look like and what the labels are a decade from now is not something that we can write a policy for (though, srsly, we can do better than “data science''). But as individuals we can strive to be maximally inclusive by becoming conversant in the techniques that the other social networks are excited by.

Next time, I'll have more commentary derived from the above definition of academic fields, then it'll be back to the usual pedantry about modeling technique.


23 October 14.

Bayes v Kolmogorov

link PDF version

$\def\Re{{\mathbb R}} \def\datas{{\mathbb D}} \def\params{{\mathbb P}} \def\models{{\mathbb M}} \def\mod#1{M_{#1}}$

We have a likelihood function that takes two inputs, which we will name the data and the parameter, and which gives the nonnegative likelihood of that combination, $L: d, p \to \Re^+$. [I wrote a lot of apropos things about this function in an early blog post, by the way.]

The two inputs are symmetric in the sense that we could slice the function either way. Fixing $p=\rho$ defines a one-parameter function $L_\rho: d\to \Re^+$; fixing $d=\delta$ defines a one-parameter function $L_\delta: p \to \Re^+$.

But the inputs are not symmetric in a key way, which I will call the unitary axiom (it doesn't seem to have a standard name). It's one of Kolmogorov's axioms for constructing probability measures. The axiom states that, given a fixed parameter, some value of $d$ will be observed with probability one. That is, \begin{equation} \int_{\forall \delta} L_\rho(\delta) d\delta = 1, \forall \rho. \end{equation} In plain language, when we live in a world where there is one fixed underlying parameter, one data point or another will be observed with probability one.

This is a strong statement, because we read the total density as an indication of the likelihood of the parameter taking on the given value. I tell you that $p=3$, and we check the likelihood and see that the total density on that state of the world is one. Then you tell me that, no, $p=4$, and we refer to $L(d, 4)$, and see that it integrates to one as well.

Somebody else comes along and points out that this may work for discrete-valued $p$, but a one-dimensional slice isn't the right way to read a continuous density, insisting that we consider only ranges of parameters, such as $p\in[2.75,3.25]$ or $p \in [3.75,4.25]$. But if the integral over a single slice is always one, then the double integral is easy: $\int_{\rho\in[2.75,3.25]}\int_{\forall \delta} L(\delta, \rho) d\delta d\rho$ $=\int_{\rho\in[2.75,3.25]} 1 d\rho$ $=.5$, and the same holds for $p \in [3.75,4.25]$. We're in the same bind, unable to use the likelihood function to put more density on one set of parameters compared to any other of the same size.

This rule is asymmetric, by the way, because if we had all the parameters in the universe, whatever that might mean, and a fixed data set $\delta$, then $\int_{\forall \rho} L_\delta(\rho) d\rho$ could be anything.

Of course, we don't have all the data in the universe. Instead, we gather a finite quantity of data, and find the more likely parameter given that subset of the data. For example, we might observe the data set $\Delta=\{2, 3, 4\}$ and use that to say something about a parameter $\mu$. I don't want to get into specific functional forms, but for the sake of discussion, say that $L(\Delta, 2)=.1$; $L(\Delta, 3)=.15$; $L(\Delta, 4)=.1$. We conclude that three is the most likely value of $\mu$.

What if we lived in an alternate universe where the unitary axiom didn't hold? Given a likelihood function $L(d, p)$ that conforms to the unitary axiom, let $$L'(d, p)\equiv L(d, p)\cdot f(p),$$ where $f(p)$ is nonnegative and finite but otherwise anything. Then the total density on $\rho$ given all the data in the universe is $\int_{\forall \delta} L_{\rho}(\delta)f(\rho) d\delta = f(\rho)$.

For the sake of discussion, let $f(2)=.1$, $f(3)=.2$, $f(4)=.4$. Now, when we observe $\Delta=\{2, 3, 4\}$, $L'(\Delta, 2)=.01$, $L'(\Delta, 3)=.03$, $L'(\Delta, 4)=.04$, and we conclude that $\mu=4$ is the most likely value of $p$.

Bayesian updating is typically characterized as a composition of two functions, customarily named the prior and the likelihood. In the notation here, these are $f(p)$ and $L(d, p)$. Without updating, all values of $p$ are equally likely in the world described by $L$, until data is gathered. The prior breaks the unitary axiom, and specifies that, even without gathering data, some values of $p$ are more likely than others. When we do gather data, our prior belief that some values of $p$ are more likely than others advises our beliefs.

Our belief about the relative preferability of one value of $p$ over another could be summarized into a proper distribution, but once again, there is no unitary axiom requiring that a distribution over the full parameter space integrate to one. For example, the bridge from the Bayesian-updated story to the just-a-likelihood story is the function $f(\rho)=1, \forall \rho$. This is an improper distribution, but it does express that each value of $p$ has the same relative weight.

In orthodox practice, everything we write down about the data follows the unitary axiom. For a given observation, $L'(\delta, p)$ is a function of one variable, sidestepping any issues about integrating over the space of $d$. We may require that this univariate function integrate to one, or just stop after stating that $L'(\delta, p) \propto f(p)L(\delta, p)$, because we usually only care about ratios of the form $L'(\delta, \rho_1)/L'(\delta, \rho_2)$, in which case rescaling is a waste of time.

In a world where all parameters are observable and fixed, the unitary axiom makes so much sense it's hard to imagine not having it. But in a meta-world where the parameter has different values in different worlds, the unitary axiom implies that all worlds have an equal slice of the likelihood's density. We usually don't believe this implication, and Bayesian updating is our way of side-stepping it.


14 July 14.

Microsimulation games, table top games

link PDF version

I wrote a game. It's called Bamboo Harvest, and you can see the rules at this link. You can play it with a standard deck of cards and some counters, though it's much closer to the sort of strategic games I discuss below than poker or bridge. I've played it with others and watched others play it enough to say it's playable and pretty engaging. Ms NGA of Baltimore, MD gets really emotional when she plays, which I take as a very good sign.

Why am I writing about a game on a web page about statistical analysis and microsimulation? I will leave to others the topic of Probability theory in table top games, but there is also a lot that we who write economic models and microsimulations of populations can learn from game authors. After all, the designers of both board games and agent-based models (ABMs) have the same problem: design a set of rules such that the players in the system experience an interesting outcome.

Over the last few decades, the emergent trend among board games have been so-called Eurogames, which are aimed at an adult audience, seek greater interaction among players, and typically include an extensive set of rules regarding resource trading and development. That is, the trend has been toward exactly the sort of considerations that are typical to agent-based models.

A game that has resource exchange rules that are too complex, or is simple enough to be easily `solved' will not have much success in the market. In most games, the optimal move in any given situation could theoretically be solved for by a hyperrational player. But the fact that players find them to be challenging demonstrates that the designers have found the right level of rule complexity for a rational but not hyperrational adult. We seek a similar complexity sweet spot in a good ABM. Readers can't get lost in all the moving parts, but if the model is so simple that readers know what your model will do before it is run—if there's no surprise—then it isn't worth running.

Of course, we are unconcerned as to whether our in silico agents are having any fun or not. Also, we get to kill our agents at will.

Simulation designers sometimes have a sky's-the-limit attitude, because processor time is cheap, but game designers are forced by human constraints to abide by the KISSWEA principle (keep it simple, stupid, without extraneous additions). It's interesting to see what game designers come up with to resolve issues of simultaneity, information provision and hiding, and other details of implementation, when the players have only counters and pencil and paper.

Market and supply chain

Settlers of Catan is as popular as this genre of games get—I saw it at a low-end department store the other day on the same shelf as Monopoly and Jenga. It is a trading game. Each round a few random resources—not random players—are productive, which causes gluts and droughts for certain resources, affecting market prices. The mechanics of the market for goods are very simple. Each player has a turn, and they can offer trades to other players (or all players) on their turn. This already creates interesting market dynamics, without the need for a full open-outcry marketplace or bid-ask book, which would be much more difficult to implement at the table or in code. How an agent decides to trade can also be coded into an artificial player, as demonstrated by the fact that there are versions of Settlers you can play against the computer.

Some games, like Puerto Rico, Race for the Galaxy, Bootleggers, and Settlers again, are supply chain games. To produce a victory point in Puerto Rico, you have to get fields, then get little brown immigrants to work the fields (I am not making this up), then get a factory to process the crops, then sell the final product or ship it to the Old World. There may be multiple supply chains (corn, coffee, tobacco). The game play is basically about deciding which supply chains to focus on and where in the supply chain to put more resources this round. The game design is about selecting a series of relative prices so that the cost (in time and previous supply-chain items) makes nothing stand out as a clear win.

One could program simple artifical agents to play simple strategies, and if one is a runaway winner with a strategy (produce only corn!) then that is proof that a relative price needs to be adjusted and the simulation redone. That is, the search over the space of relative prices maximizes an objective function regarding interestingness and balance. ABMers will be able to immediately relate, because I think we've all spent time trying to get a simple model to not run away with too many agents playing the same strategy.

I'm not talking much about war games, which seem to be out of fashion. The central mechanism of a war game is an attack, wherein one player declares that a certain set of resources will try to eliminate or displace a defending resource, and the defender then declares what resources will be brought to defense. By this definition, Illuminati is very much a war game; Diplomacy barely is. Design here is also heavily about relative prices, because so much of the game is about which resources will be effective when allocated to which battles.

Timing

How does simultaneous action happen when true simultaneity is impossible? The game designers have an easy answer to simultaneously picking cards: both sides pick a card at a leisurely pace, put the card on the table, and when all the cards are on the table, everybody reveals. There are much more complicated means of resolving simultaneous action in an agent-based model, but are they necessary? Diplomacy has a similar simultaneous-move arrangement: everybody picks a move, and an arbitration step uses all information to resolve conflicting moves.

Puerto Rico, San Juan, and Race for the Galaxy have a clever thing where players select the step in the production chain to execute this round, so the interactive element is largely in picking production chain steps that benefit you but not opponents. Setting aside the part where agents select steps, the pseudocode would look like this:

for each rôle:
    for each player:
        player executes rôle

Typical program designs make it really easy to apply a rôle function to an array of players. Josh Tokle implements a hawk and dove game via Clojure. His code has a game-step where all the birds play a single hawk-and-dove game from Game Theory 101, followed by all executing the death-and-birth-step, followed by all taking a move-step.

It's interesting when Puerto Rico and Race for the Galaxy have this form, because it's not how games usually run. The usual procedure is that each player takes a full turn executing all phases:

for each player:
    for each rôle:
        player executes rôle

I'd be interested to see cases where the difference in loop order matters or doesn't.

Topology

One short definition of topology is that it is the study of what is adjacent to what.

The Eurogamers seem to refer to the games with very simple topologies as abstracts—think Go or Chess. Even on a grid, the center is more valuable in Chess (a center square is adjacent to more squares than an edge square) and the corners are more valuable in Go (being adjacent to fewer squares $\Rightarrow$ easier to secure).

Other games with a board assign differential value to areas via other means. War games typically have maps drawn with bottlenecks, so that some land is more valuable than others. Small World has a host of races, and each region is a valuable target for some subset of races.

I'm a fan of tile games, where the map may grow over time (check out Carcassonne), or what is adjacent to what changes over the course of the game (Infinite City or Illuminati).

Other games have a network topology; see Ticket to Ride, where the objective is to draw long edges on a fixed graph.

War games often extol complexity for the sake of complexity in every aspect of the game, so I'm going to set those aside. But the current crop of Eurogames tend to focus on one aspect (topology or resource management or attack dynamics) and leave the other aspects to a barebones minimum of complicatedness. Settlers has an interesting topology and bidding rules, and the rest of the game is basically just mechanics. Carcasonne has the most complex (and endogenous) topology of anything I'm discussing here, so the resource management is limited to counting how many identical counters you have left. Race for the Galaxy, Puerto Rico, and Dominion have crazy long lists of goods and relative prices, so there is no topology and very limited player interaction rules—they are almost parallel solitaire. A lot of card games have a complete topology, where every element can affect every other.

An example: Monopoly

Back up for a second to pure race games, like Pachisi (I believe Sorry! is a rebrand of a Pachisi variant). Some have an interactive element, like blocking other opponents. Others, aimed at pre-literate children, like Chutes and Ladders or Candyland, are simply a random walk. Ideally, they are played without parental involvement, because adults find watching a pure random walk to be supremely dull. Adults who want to ride a random walk they have no control over can invest in the stock market.

Monopoly is a parallel supply chain game: you select assets to buy, which are bundled into sets, and choose which sets you want to build up with houses and hotels. On top of this is a Chutes and Ladders sort of topology, where you go around a board in a generally circular way at random speed, but Chance cards and a Go to Jail square may cause you to jump position.

The original patent has an explanation for some of these details—recall that Monopoly was originally a simulation of capital accumulation in the early 20th century:

Mother earth: Each time a player goes around the board he is supposed to have performed so much labor upon mother earth, for which after passing the beginning-point he receives his wages, one hundred dollars[...].

Poorhouse: If at any time a player has no money with which to meet expenses and has no property upon which he can borrow, he must go to the poorhouse and remain there until he makes such throws as will enable him to finish the round.

You have first refusal on unowned properties that your token lands on (then they go up for auction, according to the official rules that a lot of people ignore), and you owe rent when your token lands on owned properties, and Mother earth periodically pays you \$200. All of these cash-related events are tied to the board movement, which is not the easiest or most coherent way to cause these events to occur. E.g., how would the game be different if you had a 40-sided die and randomly landed on squares all around the board? Would the game be more focused if every player had a turn consisting of [income, bid on available land, pay rent to sleep somewhere] phases?

The confounding of supply chain game with randomization via arbitrary movement is what makes it succesful, because the Chutes and Ladders part can appeal to children (the box says it's for 8 year-old and up), while the asset-building aspects are a reasonable subgame for adults (although it is unbalanced: a competent early leader can pull unsurpassably ahead). But it is the death of Monopoly as a game for adults, because there are too many arbitrary moving parts about going around an arbitrary track.

I can't picture a modern game designer putting together this sort of combination of elements. I sometimes wonder if the same sort of question could be asked of many spatial ABMs (including ones I've written): is the grid a key feature of the game, or just a mechanism to induce random interactions with a nice visualization?

Conclusion

Microsimulation designers and for-fun game designers face very similar problems, and if you're writing microsimulations, it is often reasonable to ask how would a board game designer solve this problem?. I discussed several choices for turn order, trading, topology, and other facets, and in each case different choices can have a real effect on outcomes. In these games that are engaging enough to sell well, the game designers could only select a nontrivial choice for one or two facets, which become the core of the game, and other facets are left to the simplest possible mechanism, to save cognitive effort by players.

Also, now that you've read all that, I can tell you that Bamboo Harvest focuses on a shifting-tiles topology, with a relatively simple supply chain. We decided against marketplace/trading rules.