Real Economists do not useinsert-user-friendly-package-here .
--Variations on Real Programmers Don't Use Pascal
- Kids Don't Follow♭The Replacements 1982 (Nearly) everyone is a computer user, or a user of computer-like devices. But few students of economics are computer programmers. Casual observations over two decades suggests as use of computers (and computer-like gadgets) has risen so has knowledge of how computers work fallen. Over time computers have become so easy to use that understanding how they work has become less vital to success in many fields or majors, economics included. Yet original computer programming is probably more important to current economic research than ever before. Let's call this the programming gap.
- Why does economics have a programming gap? First, as taught to undergraduate economics, programming would appear to be a peripheral skill. We require calculus and linear algebra, and for decades most undergraduate programs have supplemented this with a mathematics for economics course. On the other hand, I know of no departments that require a computer programming course and only a few that offer Computational Economics, let alone require the course to graduate. Undergraduate econometrics involves some computer use, but it never involves programming. A bright econ major with a B.A. would be justified in thinking that economics is done mainly with pen-and-paper and Stata or Excel. Second, introductory computer science courses focus on issues that are not relevant to an economist, especially how to solve mathematical models numerically. Computer science as a discipline has evolved away from scientific computing, traditionally called numerical analysis. The things that a student of economics needs to know about computing has little to do with what a computer science professor works on or teaches to her students. As an analogy, if some other major relied heavily on input-output analysis most economics departments would have trouble finding anyone qualified and interested in teaching it. This means that even economics degree holders with some programming background have never applied that skill to economics and still think it is unnecessary to contribute to the field.
- Fixing a Hole♭The Beatles 1967 The objective of this document is to reduce the programming deficit in economics education. This book introduces programming to the econ student, perhaps an advanced undergraduate or a graduate student just realizing that most modern research does not end with manipulating symbols on paper (although it still starts there). It assumes the reader uses the Internet, produces documents, perhaps a spreadsheet, and packages such as Stata™. The book also assumes that the reader has a good understanding of economic theory and calculus. But it also assumes that the student has only a vague idea how any of the computer applications they use work, and even less insight about mathematics performed by a computer.
- Everyday I Write the Book♭Elvis Costello 1985 As an undergraduate I enjoyed computing science and to a lesser extent economics. When I went to graduate school in economics I did not think the formal computer background would play much role. But over time I pursued a comparative advantage in work that required original programming. Although FORTRAN was the first language I was taught, the computer science antipathy towards it rubbed off on me and I have typically avoided it. When I started to program as an economist, C was not a particular good option for computational science, so I returned to Pascal, often considered a 'teaching language'. Compilers for these languages to run on personal computers were not always affordable nor reliable in the 1980s. Gauss emerged as a convenient solution which I used for work on my dissertation.
- Why Ox? Why not
insert-your-favorite-language
?
To present programming and computational economics this book uses Ox, a computer language created by Jorgen Doornik. Ox is currently at version 7, and has been developed since the early 1990s. Ox has these key features for our purpose: it is free to students and academic researchers; it is portable and suitable for both learning computational economics and doing it. Despite this, it is unfortunately not one of the major languages used in economics.
Why computing continues to have almost no role in formal economic curricula is a mystery to me. One reason is that it is closely tied to ever-changing technology, so what seemed important to teach 20 years ago is not, and the same may true of what seems important now. But in many ways the fundamental aspects of computing in economics are not changing any more quickly than other tools. And certainly the barriers to hands-on training have completely disappeared. (Readers of a certain age will remember getting a Dickensian portion of CPU minutes on the campus mainframe to run regressions.)
Another reason is the tradition of teaching the way you were taught. Most academic economists that do original programming in their research learned the methods on their own. It is then natural for them to leave computing out of their formal teaching, even at the graduate level in fields that require intensive programming. Students start with code given to them by advisors or older students and then tweak it (and now download some Matlab code for a published paper and tried to figure out what it does). This perpetuates a cottage-industry approach to computational economics. Unlike disciplines that work on large-scale, multiple-author computing problems (like cosmology), economists work in groups of two or three. Code is handed down and modified or extended through trial-and-error.
Older cohorts believed that real programmers use FORTRAN (or more recently C). One perfectly valid reason for using any language is that the programmer has built up language-specific capital. But a good reason to stay with one language, chosen in the past at a very different stage of computer development, is not a valid reason to adopt that language fresh rather than something else. Advisors transmit this view to their students. Since formal teaching of FORTRAN is non-existent, self-study is the solution.
Now consider what the novice student programmer confronts. Code available in economics, especially when written by self-taught programmers, is usually badly documented (let alone coherently indented). No data structure more complicated than multi-dimensional arrays will be used. So the code and the coding style to learn is far removed from the mathematics it implements. Any attempt to teach programming to economists with this starting point quickly bogs down in nested loops, endless assignment statements and blackbox math libraries that may or not be available to students located elsewhere.
So, these patterns combine to produce the strange fact that the typical economics student in 2013 understands no more, and perhaps even less about numerical mathematics than their counterparts in the past while the practice of economics relies on programming and numerical methods more every year. The result is a widening gap in programming skills among economics students at the point they start to do research.
That assumed reader is a fair description of the median student I encounter in my classes. That person is somewhat reticent to admit they do not know what the difference between compiled and interpreted languages, nor how zeros and ones can represent real numbers. However, when I ask a class if they learned to invert matrices using cofactor matrices this assumed student nods. They ploughed through those complex formulas in their math econ class and are ready to do it again. It seems I am always the first person to ever tell them the truth: no one computes the inverse of a matrix this way and the knowledge is useless. They are taught that way because forty years ago it was the only recourse a student might have to invert a matrix and the math econ textbooks have put cofactor matrices, and Cramer's rule in the canon. It seems to me that it would be better that the student knew that computers solve linear systems with matrix decomposition even if they can't do it on paper for the 3x3 case.
In economics the major text on computational methods is Numerical Methods in Economics by Kenneth Judd. The book is comprehensive in its coverage of algorithms for solving economic models. However, it does not discuss computer programming at all. The algorithms are step-by-step mathematical expressions in pseudo code, and an experienced programmer can easily implement them in their favorite language. But the inexperienced programmer will not know where to start. If they took their advisor's advice and started teaching themselves FORTRAN they will soon discover the gap between elegant vector notation and the tedium of three levels of DO
loops. Further, Judd's book is comprehensive and supported by research on numerical analysis, but its emphasis is not on practical issues. For example, in one sentence Judd mentions that optimization algorithms can be made to respect bounds on parameters to keep them feasible (such as not trying to compute log(-2)) by non-linear transformations. This book includes a whole chapter to the implementation of this idea because it is essential for model building and estimation.
This book is part prequel to Judd and part companion to it. It discusses the process of creating, testing and describing a program. It also introduces topics, such as object-oriented programming, that help a programmer write a good program regardless of the algorithm it implements. And it covers high performance computing issues so that a student of economics can move their code from the laptop to the cluster or cloud. To be complete, there is a great deal of overlap with Judd when it concerns basic algorithms in digital mathematics.
But in the early 1990s I learned a lesson about programming languages and economics. When trying to solve a very large (for the time) problem I overwhelmed what a PC could do. But I was able to secure some precious hours of CPU time on a 'supercomputer'. Of course, as a DOS program, Gauss was not an option. So I realized I had backed myself into a corner. From then on I knew that I would avoid languages that were not available on multiple platforms. I bit the bullet and translated my code into Pascal, which served me well over 15 years, multiple machines, and architectures despite its minority-language status. Learning to use MPI (Message Passing Interface) for parallel execution in the 1990s was a key. The MPI library was available in FORTRAN and C. But this was no problem, because a little bit of C programming allowed me to access it from within Pascal and my model building continued apace. And I was even able to write Pscal code that a few other people used in C and FORTRAN . The lesson I learned: computer language popularity was less important as portability and inter-operability with other languages.
Around 1998 a grad student thought I should look at Ox. I did, and thought it was interesting but I was wary of the corner another matrix-based language had put me in. Further there was no hope that it could support the large-scale parallel execution I needed. But two years later I was planning to teach a short course on numerical methods at a different university. I had access to a computer lab and wanted to have assignments and demonstrations. But there was no hope of getting a licensed program such as Gauss installed. I remembered Ox, and being free and easy to install made it a perfect solution. I used it and found it perfect for that purpose. I still viewed it as primarily for teaching and small scale work. Only later did I start trying to use it for research. And once again the problem I was working on became much larger than a PC could handle. So with some effort I once again accessed MPI routines written in C from within Ox and was able to take advantage of high performance computing resources without translating my code.
The last piece of the whole story is object objected programming (OOP), an approach which I was only vaguely familiar with since my formal training took place before OOP had become a standard approach. With a desire to create a package for solving dynamic programs available to others, I saw a big tradeoff. My usual approach, which was the same as most economists, was to write a program specific to the problem at hand. I had 'libraries of routines' to use, but there was no way to define the problem as the program executed. So if another person was to use my code they would have to fiddle with the knobs and switches of the code and re-compile the result. Soon it became clear that this was onerous and very hard to make flexible and general. The only people who might use it would have to be guided by me. Eventually, I came to see a way around this problem using OOP in Ox. That large-scale (and on-going) project forced me to think carefully about distribution, documentation and efficiency.
Economists trained before the 1980s who write scientific programs but otherwise had no computer science training almost invariably used FORTRAN, which was synonymous with scientific programming. Students often adopt languages (naturally) used by their professors. So FORTRAN had momentum even when other languages and platforms became as good at scientific programming, such as C. Adoption of FORTRAN in economics has slowed markedly in recent years. Early young researchers would likely have used C and would be able to find mathematical packages in C.
When PCs came on the scene in the 1980s the basic languages like FORTRAN and C were not readily available for them. One of the first PC-based languages was Gauss, which was quite popular in economics through the 1990s. However, Gauss was a commercial program that did not run on "mainframe" computers. By the 2000s, Matlab was starting to replace it as was the open source statistical platform R. Stata has introduced a matrix language in order to support more general programming than its original data set orientation. Lately use of Python in scientific computing has been growing.
Each year one or two students ask me Why do you use Ox rather than X,"
where the value of X slowly evolves. No single choice could possibly suit every potential reader of this book. Most people who start to program do not survey all the available options, weigh the pros and cons and then pick the optimal choice. One reason is that they have no way to weigh the tradeoffs between features and capacities. So nearly everyone relies on trusted advice.