Sunday, March 19, 2006
Global variables
In many circles it is considered best practice to avoid the use of global variables when writing code. I've certainly adhered to this viewpoint for a long time, but as with all things this year it is time for questioning this! As a long time C/C++ programmer, I've thought that I've had a good understanding of the different approaches to structuring code, but the freedom afforded by a block structured language that treats functions as first class objects (e.g. Scheme, but not Pascal) is really amazing. The key idea is that variables can be lexically nested, and access to them can be controlled by functions passed back to the client. This is Parnas' data hiding to the max! Add a bit of garbage collection and much of the minutiae hindering code development has just evaporated.
The elegance of this approach, as opposed to OO, is that it is completely uniform - there is just a single mechanism being used. This is in complete contrast to the multiplicity of data hiding mechanisms available in C, C++ or Java. C is the simplest (or I've got no intention of discussing member access privileges, the differences between module/function/class scope, all the different uses of 'static' in C++, nor namespaces) so lets discuss C's mechanisms for data hiding: void pointers (or security by obscurity); and static variable scope (in functions or modules). Not much can be said about security by obscurity - it's a good idea for as long as you can get away with it ;-) There is almost no good reason to ever declare static variables within a function; indeed, it should never be done within a library function, as suddenly that entire library can no longer be simply used in a threaded or reenterant environment. Similarly, there are few good reasons to allow static module variables, as doing so makes a strong assumption about how that module is used. Inevitably I do end up using static module variables to avoid the programming overhead of declaring a structure to hold all of the module variables, and then adding an additional argument to every function in the module to reference this structure, and then adding indirections (->) to access the data etc... and then to find that after three revisions of the code two of the members are no longer referenced by any of the functions and can be removed, but that to add further functionality another four members, each highly stateful and closely replicating (but not identical to) existing members, must be added to the structure etc... There is just so much pain in trying to write good code!
A lot of this pain can be alleviated by allowing nested functions, and taking advantage of lexical scoping. Pascal allows this (and I'm kind of embarrassed to say that after 15 years of hard core C I'd forgotten this), but what Pascal lacks is the ability to return functions as results of a computation. This imposes a big limitation on what can be expressed in the language - it is certainly not possible to process code and data interchangably - and arguably it forces a multiplicity of mechanisms for data access to be built into the language. C does allow (pointers to) functions to be returned by a function, but it is necessary to manually define and manage the function environment (and hence a lot of the pain of programming).
What is really disturbing is that these problems have been well understood since the 1960's (see The Function of FUNCTION in LISP, or Why the FUNARG Problem Should be Called the Environment Problem, AI Memo 199). Why do we keep persisting, after 35 years, with languages that are so limited?
I'm going to keep on writing programs in C. But I'm going to stop trying to force abstraction and composibility. I'm going to stop pretending that every program I write will be reused or extended arbitrarily. I'm going to use more global variables when it is reasonable to do so, particularly if it allows me to write less code. I'm going to place all the code in a single file; when it becomes unmanagable the complexity of the program has most probably reached a natural bound, and further development should not occur until the complexity has been removed. And when I have the opportunity I'm going to throw away more code, and rewrite it in a better language where the minutiae look after themselves...