Diving into the deep end of ABMS

I don’t recall when I first discovered the field of Computational Social Science, but I do recall being stunned by the fact I hadn’t considered the application of computer science, and programming, to the modeling of social phenomena. I was stunned because ever since I discovered Economics my freshman year of college, I’ve had two primary interests in life: Economics and Computer Science. Having discovered Computational Social Science, and the sub-field of Computational Economics, I realized that the application of computer horsepower to solving economic problems made remarkable sense. 

While my interest in the field was immediately peaked, I had no idea where to start. Well, that, and I still had to keep up with classes and work and balance both with being a good husband to my wife.

Numerous web searches yielded little direction–what to do, how to do it, where to learn it, methodology, best practices, how to ensure rigor, how to reduce bias, and so on. . . Even still finding those who are doing agent-based modeling (ABM or ABMS), and tapping into what they know about what to do and what not to do is remarkably difficult.

However, with each new day there are new blogs, new books, new tools. . .this field is *literally* exploding.

Can I share with you one of my first, and most recent observations? Very few people really know what they’re doing in ABM yet. By which I mean that new ground is being broken all over the place. Models are being crafted in Netlogo, Repast, MASON, Swarm, and hacked together in R, C#, C++, C, Fortran, and virtually every other letter in the alphabet. At the same time people are trying to figure out *how* to do ABMS. What should the research/modeling process look like? How are results reported? Verified? How do you convey the assumptions made and the reasoning behind those assumptions in a way that doesn’t sound like you’re smoking something on the job?

So methodology issues are a major issue for this new-born field at this point.

From my perspective, the other major issue is that the vast majority of the tools available are poorly suited for doing ABMS well. Every existing ABMS platform or toolkit has serious shortcomings: C, Java, or C++ platforms present a learning curve *much* to steep for any social scientist to climb, while Netlogo, R, and Python–being much easier to learn and utilize–aren’t fast enough to carry out anything beyond the simplest of simulations.

What are we to do? The only answer I’ve got is, “I don’t actually know.”

See, I’m in the same boat as everyone else. I want to do ABMS, but I have zero interest in learning C++ or C or even Java (especially Java, <shudder>). . .though under duress I might do so. I am interested in learning R and Python but I’m afraid that their ease-of-use comes at the cost of speed. Speed is incredibly important, aaalmost moreso than ease-of-implementation. Speed is what gives us, as social simulators, the ability to lay the groundwork of a virtual world which yields emergent phenomena (both expected and unexpected) as we watch on our LCD screens. Speed gives us more agents, with more rules, interacting with each other in a more elaborate space. In other words, speed gives us MORE.

But there’s a tradeoff. Speed typically comes at the expense of ease-of-implementation. If speed was the only goal, we’d write all of our simulation programs in assembly (which very few people enjoy, and no one has the time for). Ease-of-implementation is especially important given that many social simulators are not, by trade, computer scientists. We are not, as a group, well-versed in algorithm design and “best practices” for software development.

It’s probably overkill to suggest that all of us need to get some kind of bachelor’s or master’s in computer science in order to ensure we bring a degree of rigor and expertice to the writing of ABMs. But short of that, how do we ensure that a sufficient degree of transparency exists to encourage rigorous standards of professionalism in creating ABMs, and be as efficient as possible at the same time?

The truth is that the average social scientist’s opportunity cost of time is simply too high to learn the more difficult languages like C++, C, and Java. This is the case now, but will become more obvious in the future. The reason why Netlogo is so popular–as slow and inflexible as it is–is because it is the easiest of all possible ways to do ABMS. And by that I mean that its programming language is extremely simple, and getting meaningful visualizations of results out of Netlogo is easy. In short, Netlogo presents the lowest cost alternative for PhD’s looking to get into modeling.

The problem is that Netlogo is a toy, pure and simple. It simply does *not* have the power to run a model of any degree of complexity or scale. I don’t say this to denigrate what Netlogo offers. It’s a great tool for getting one’s feet wet, but that’s about it.

And after has gotten one’s feet wet with Netlogo, their aren’t a lot of appealing options for creating more complex, larger models without expending significantly more effort (primarily on the education side of things, although it is true that implementation in languages other than Netlogo is much more challenging).

Thus, the community has got to realize that Repast and other Java-based offering are not a sufficient “next step” after Netlogo. Why not? Because Java (and C and C++ for that matter) is still too difficult to learn relative to its speed, scalability, and ease-of-implementation. Even when you know what you’re doing in Java, programming in it is a relatively tedious process (as is programming in many langauges).

So we’re left with R, Python, Haskell, F#, and similar languages. These languages all share in common that they’re “higher level” than Java, C, and C++. They’re not as fast, in general, as C and C++, but they’re *much* easier to learn and use.

It’s in these languages that I see the future of modeling. That, or someone will develop a really cool high-performance framework in C++ which will be drag-and-drop and incredibly easy to use.

On another note, I’m getting set up to do an undergraduate research project this spring utilizing R and F# in reconstructing the Sugarscape model. I hope to create two separate implementations of the model–one in R and one in F#–and compare the difficulty of implementation, and the performance of the simulation between them. More on that later. . .


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: