Monday, March 20, 2006

 

Monads in C

I'm trying really hard to get my head around monads, and I thought it might be a good idea to try and implement the array monad in C to solve a simple problem: initialise an array of 5 elements (say 5 down to 1), and calculate the sum. The C code for doing this directly is trivial:
void init( int a[ 5 ] )
{
  int i;
  for( i = 0 ; i < 5 ; ++i )
    a[ i ] = 5 - i;
}

int sum( int a[ 5 ] )
{
  int i, s;
  for( i = s = 0 ; i < 5 ; ++i )
    s += a[ i ];
  return s;
}
After two hours and over 200 lines of code I've now got a monad implementation that doesn't yet sum the array entries, and leaks memory like a sieve. You can find the C code here - any feedback is most welcome!

Updated: 21/March
Now have a working monad implementation that does sum the array entries! It is only 439 lines of code, and still leaks memory like a sieve. Actually, there is a full implementation of not one, but two monads in this code! Both the array transformer and array reader monads are implemented, along with the coercion operator for mapping a reader into a transformer. I apologise for using #include <stdio.h> instead of writing a monad for I/O, but I was keen to get this code out and didn't have time to write another 2000 lines of C code for the pure implementation. Otherwise the implementation is purely functional, with the entire computation being performed by

printf( "The sum is: %d\n", xblock( 0, xbind( build( ), summer( ) ) ) );
The function build() creates an array transformer monad that initialises the array, and the function summer() creates the object that performs the actual summation. These are combined using the binding xbind(), and finally the computation is performed using xblock(). Note that the actual array is not created until control is passed to xblock(), so one interpretation of the array transformer monad is that it is the intention to perform an action at a later stage, when xblock() is invoked.

There is also a cutdown implementation using just the array transformer monad. It is 'just' 347 lines of code ;-)

Updated: 22/March
Here is a Python implementation of the array transformer monad. The code is much simpler, given that python supports closures. All of the cruft that was necessary in the C implementation just disappears. The function summer() could be rewritten to use lambda functions (left as an exercise for the reader!).


Sunday, March 19, 2006

 

Global variables

In many circles it is considered best practice to avoid the use of global variables when writing code. I've certainly adhered to this viewpoint for a long time, but as with all things this year it is time for questioning this! As a long time C/C++ programmer, I've thought that I've had a good understanding of the different approaches to structuring code, but the freedom afforded by a block structured language that treats functions as first class objects (e.g. Scheme, but not Pascal) is really amazing. The key idea is that variables can be lexically nested, and access to them can be controlled by functions passed back to the client. This is Parnas' data hiding to the max! Add a bit of garbage collection and much of the minutiae hindering code development has just evaporated.

The elegance of this approach, as opposed to OO, is that it is completely uniform - there is just a single mechanism being used. This is in complete contrast to the multiplicity of data hiding mechanisms available in C, C++ or Java. C is the simplest (or I've got no intention of discussing member access privileges, the differences between module/function/class scope, all the different uses of 'static' in C++, nor namespaces) so lets discuss C's mechanisms for data hiding: void pointers (or security by obscurity); and static variable scope (in functions or modules). Not much can be said about security by obscurity - it's a good idea for as long as you can get away with it ;-) There is almost no good reason to ever declare static variables within a function; indeed, it should never be done within a library function, as suddenly that entire library can no longer be simply used in a threaded or reenterant environment. Similarly, there are few good reasons to allow static module variables, as doing so makes a strong assumption about how that module is used. Inevitably I do end up using static module variables to avoid the programming overhead of declaring a structure to hold all of the module variables, and then adding an additional argument to every function in the module to reference this structure, and then adding indirections (->) to access the data etc... and then to find that after three revisions of the code two of the members are no longer referenced by any of the functions and can be removed, but that to add further functionality another four members, each highly stateful and closely replicating (but not identical to) existing members, must be added to the structure etc... There is just so much pain in trying to write good code!

A lot of this pain can be alleviated by allowing nested functions, and taking advantage of lexical scoping. Pascal allows this (and I'm kind of embarrassed to say that after 15 years of hard core C I'd forgotten this), but what Pascal lacks is the ability to return functions as results of a computation. This imposes a big limitation on what can be expressed in the language - it is certainly not possible to process code and data interchangably - and arguably it forces a multiplicity of mechanisms for data access to be built into the language. C does allow (pointers to) functions to be returned by a function, but it is necessary to manually define and manage the function environment (and hence a lot of the pain of programming).

What is really disturbing is that these problems have been well understood since the 1960's (see The Function of FUNCTION in LISP, or Why the FUNARG Problem Should be Called the Environment Problem, AI Memo 199). Why do we keep persisting, after 35 years, with languages that are so limited?

I'm going to keep on writing programs in C. But I'm going to stop trying to force abstraction and composibility. I'm going to stop pretending that every program I write will be reused or extended arbitrarily. I'm going to use more global variables when it is reasonable to do so, particularly if it allows me to write less code. I'm going to place all the code in a single file; when it becomes unmanagable the complexity of the program has most probably reached a natural bound, and further development should not occur until the complexity has been removed. And when I have the opportunity I'm going to throw away more code, and rewrite it in a better language where the minutiae look after themselves...


This page is powered by Blogger. Isn't yours?