Using R and Other Languages (like Fortran)

R has taken the world by storm. Not too long ago the world of statistical analysis was a calm, serene place where expensive commercial statistical packages kept the peace. SaS, SPSS, and Stata, between them were the beginning and end of statistical analysis programming software in businesses and academic settings all over. And they charged extortionary prices for their capabilities.

R changed all that. Well, truthfully, as far as I know SaS, SPSS, and Stata are still out there charging an arm and a leg for those who learned the “old ways” and don’t want to change, but R is the new standard. It’s not perfect, but it’s rapidly getting there. It can do everything those other platforms can do and more—and it can do it all as well or even better.

The really attractive thing about R is that it is not just a platform for statistical analysis; It is also a programming platform of sorts. It has an internal scripting language which can be utilized to create a wide variety of new functions and packages of functions. These packages can then be easily uploaded and utilized by the entire community. And when one is creating functions and packages, they have—at their disposal—every other function and package created by the rest of the community. These aspects have helped make R immensely attractive to researchers, and it has developed what can only be called a ginormous community.

R has some smaller, less obtrusive “cons,” but only one major con: it is slow for computationally challenging tasks.

This becomes an issue for our purposes in agent-based modeling and simulation.

We need to create virtual environments of 500×500 cells (at a minimum) and let a large number of agents wander over them according to their internal rule sets. This is no small task once we consider what we’re really saying. A grid measuring 500×500 has a total of 250,000 cells in it. If we have, say, 500 agents running around the virtual landscape, then that’s 500 agents worth of rules and individual decisions, learning, living, eating, harvesting, fighting, and dying.

The kicker is that the simulation will only run as fast as the code allows it to. It doesn’t necessarily matter how fast your computer is if the language you’re using is simply slow in execution. Another source of slow execution is poor program design and coding choices. Nothing eats up system resources and/or slows down programs like bad programming.

And here’s where Fortran comes in.

Fortran is not good at a lot of the things that R is good at. It is absolutely a beast to do graphics of any sort with (for example). However, the Fortran language is very concise and the compiled code is extremely FAST.

Naturally, we’d like to know if we can use R and Fortran together. There are two ways this might work: R calls Fortran and then displays results from Fortran or Fortran uses R for all the graphics stuff that it’s bad at.

ASIDE: There are some who argue that C or C++ are so advanced and capable these days that Fortran is no longer necessary. Most of the arguments I’ve read against Fortran essentially claim that performance differences are negligible. This might be true. On the other hand, I’ve read other opinions which say that Fortran is much more readable than either C or C++ and much easier to use for mathematically intensive programs. This is certainly true—at least so far in my experience. On the other hand, the primary argument for adoption of C/C++ in my mind is that their adoption base is far broader than Fortran (in every arena other than High Performance Computing—super computers and the like). There are more libraries which support programming in C/C++ and bigger user communities to provide help. Fortran is actually a remarkable language in many ways but getting it to interface with other platforms (like R) is a pain. It all just feels like a hack job. More on that later. The point is that I’m inclined to agree that the performance differences between the two (C/C++ and Fortran) are minimal, but Fortran is more readable and concise than the other two.

The good news is that both approaches are apparently possible. The bad news is that neither of them is ideal. Both of them involve various ways of working around the reality that getting R and Fortran to work together is a square peg, round peg kind of situation.

R CALLING FORTRAN

The way I’m beginning with is the R CALLS FORTRAN solution. Running some code my mentor provided me with demonstrates that this approach works. R sets up the virtual environment, initializes agents, and passes everything to Fortran to run with. Then as the Fortran runs it returns updates to R which R then plots. Finally, when the Fortran is complete, R runs some statistical analyses of the data created by the Fortran program.

This works, which is good. It even works relatively well. The only problem is that only Fortran subroutines can be called—not entire functions. This may limit a programmers ability to maximize all that Fortran has to offer in terms of speed and power. Additionally, the package which provides this feature doesn’t support all the “newer” Fortran features such as those found in Fortran 2003 and 2008. It doesn’t even officially support the Fortran 95 standard. Thankfully, word has it that Fortran 90/95 features should work fine.

[note to self: add more here. specifically, how to get it to work with an illuminating example]

FORTRAN CALLING R

Apparently this functionality is possible by way of a package called RFortran. Here the problem is that RFortran requires the use of the Intel Fortran compiler (which I plan to purchase, but which might not be possible for others).

[add more here]

BOTTOM LINE: FORTRAN and R

After all my research, I’m a little un-nerved by the poor state of support for interoperability between R and Fortran. It works, but it certainly isn’t for the faint of heart. It’s basically a pain to find out how to install what you need and then how to configure and use it. There are innumerable nuances and gotchas that make using R and Fortran miserable for beginners.

And remember, I’m a fan of both R and Fortran. Individually I think they’re great—but getting them to work together is a royal pain in the butt.

Alternatives

If Fortran and R interoperability is poor, what are our other options?

Remember, the point of my undergraduate research project is to determine what other options exist for those agent-based modelers who’ve surpassed Netlogo’s capabilities. We’re also trying to determine if there are any options for more computationally intensive agent-based modeling other than Repast or MASON.

So we’re still in search of a language, platform, or combination of languages which offer a reasonable learning curve, lots of speed, and lots of flexibility.

Other common languages are: Python, C, C++, D, Java, C#, and F#. Do any of these others fit our needs?

We’ll have to explore our options further. One potential combination I’m looking into is the R and C++ combination (via Rcpp). I hear that it’s relatively easy to use and offers a ton of flexibility and all the power that C++ has to offer. On my first attempt at figuring out how to make it work, I have not been successful, but I’ll keep digging. One of the nice benefits of Rcpp (and related packages inline and Rinside) is that it is created by a pair of Googlers who keep it well maintained.

Other languages or combinations I’m going to explore at some point are Python + Fortran, Python + C++, Java, C#, and F#.

For Now…

I am somewhat bound to utilize R and Fortran to their fullest because that is what my undergraduate research project proposal states—and what my undergraduate research grant was awarded for. I’m looking forward to learning Fortran, although I’m a little sad that it’s so poorly supported with graphics libraries and interoperability with R. C++ being far more popular is also far better supported by a broad community of people. There are many quality open source libraries and free IDEs and debuggers.

What I’m discovering is that open source projects often suffer from a dearth of reliable quality documentation. The R project is a good example of this. Some packages and features are well-documented while others offer little to no documentation. Help(some function) sometimes reveals a wealth of information about the parameters of the function, how it works, and examples (from beginner to expert level). But all too often the documentation provided with the package is totally inadequate. In sheer desperation, I’ll find myself Googling like crazy, hopeful that someone has generously put up a tutorial on how to use some new awesome package I’ve discovered. I often find neither a decent package author’s website nor a tutorial written by some helpful web citizen. It’s immensely frustrating and I suspect it breeds mediocrity amongst users and the software itself.

It never fails to amaze me how some apparently incredible piece of programming is so poorly documented that none but experts can guess how to use it. Often times the really good programmers see it as being beneath them to create quality documentation for their work, and it really pisses me off. The Rcpp package is a good example of one which apparently offers numerous examples, but no directions on how to locate or load them. Instead, it is assumed that someone looking into Rcpp knows where R stores packages. And that is the crux of the matter: there is simply far too much assumption going on in the computer world. It seems that almost all software, packages, and platforms assume that their users are all of the same level of skill or knowledge. What an IMMENSELY DUMB assumption.

So, open source = poorly documented (more often than not this is the defining feature of open source).

Also, I’m finding that trying to get two languages to play well together is probably a violation of our stated goals (ease of use). For most of those who find Netlogo to be an acceptable level of difficulty, the two language situation is simply not tolerable. It might be if there were some exceptionally well documented and well-maintained “glue” projects to link one or more languages together, but their aren’t.

It is becoming more and more clear to me that single language solutions are probably what we’re going to end up with. At this point it seems likely to be far more tolerable to work with C++ entirely than with the nuances and irritations of R and Fortran.

Still we forge onward feeling a bit depressed to have our R + Fortran bubble burst so soon.

Posted on February 5, 2011 at 2:52 am in musings, programming | RSS feed | Reply | Trackback URL

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28

Learn Agent-Based Modeling with Abe

Categories