Random Thoughts: 2005-10-23

Saturday, October 29, 2005

Touring El-Torito (viewing Minix 3 sources)

Just released a simple utility to rip the 'hidden' files off the Minix 3 CD image. You can get the utility here - all sources (and VC6 project files) are included in the ZIP archive.

Briefly, the Minix3 CD takes a very liberal view of the standard CD formats, and appends raw filesystem images (/ and /usr) after the ECMA-119/IS0-9660 data. The rip3 utility accesses this data, which is otherwise hidden.

The rip3 project was fun to write, and had a number of unexpected hurdles. The project started as an excuse to read the ECMA-119 (ISO-9660) and El-Torito specifications - both of which are quite palatable, and the initial intent was to read data directly from the CD as a raw block device. The ECMA-119 specification divides the CD into system and data areas, with the system area occupying the first 16 sectors of the disc. The fun starts with the data area, which contains all of the file data and supporting structures, including a catalogue of 'volume descriptors' at the head of the data area. The ECMA-119 specification makes very few assumptions about the ordering of the descriptors, whereas El-Torito asserts very definitely that the primary volume descriptor will be at sector 16, and that the boot record descriptor will be at sector 17! These assumptions allow huge simplifications in the rip3 code.

The primary volume descriptor contains all sorts of interesting information about the CD, including the volume space size ('the number of logical blocks in which the volume space of the volume is recorded'). Now all of the interesting Minix file data is stored after the volume space, and my initial thought was that after opening the CD device as a raw device (c.f. CreateFile( "\\.\D:", ... ) with suitable buffer aligned required by FILE_FLAG_NO_BUFFERING which is assumed for raw access) I could treat the CD as a block device, and randomnly access data anywhere on the disc. This failed, as the Microsoft CD device driver prevents access beyond the end of the volume... quite different from Unix, where dd can slurp data freely... Fortunately, almost no effort was required to change the code to access an ISO file image, where seeking 'beyond the bounds' was quite acceptable!

Moral of the story: don't trust the metadata! Seems to apply to the semantic web as well :-)

An ugly consequence of attempting to access the raw CD device was the heavy use of Windows specific functions (e.g. CreateFile), so I next turned attention to cleaning up the code to just use the standard C I/O functions fopen() etc... The code changes were quite simple, but for some reason I started getting short data reads where fread() returned less data than requested. I've not diagnosed the problem, but a characteristic of the code was the use of fseek() prior to each read. Note to self: create a simple test case for the problem... in the meantime, the code still uses the Windows API, and hence, won't be directly portable to other systems (apart from the case-insensitivity issues discussed below).

Finally, I was able to access the raw Minix file systems. Writing the code to interpret the inodes, etc..., and access the file data was fun. Despite 'knowing' the theory for years, getting down and dirty with the code certainly tested my knowledge. To stick with the moral of todays post, I made the fatal mistake of trusting the metadata - in this case, the following comment in /usr/src/servers/fs/super.h:

/* Super block table.  The root file system and every mounted file system
 * has an entry here.  The entry holds information about the sizes of the bit
 * maps and inodes.  The s_ninodes field gives the number of inodes available
 * for files and directories, including the root directory.  Inode 0 is 
 * on the disk, but not used.  Thus s_ninodes = 4 means that 5 bits will be
 * used in the bit map, bit 0, which is always 1 and not used, and bits 1-4
 * for files and directories.  The disk layout is:
 *
 *    Item        # blocks
 *    boot block      1
 *    super block     1    (offset 1kB)
 *    inode map     s_imap_blocks
 *    zone map      s_zmap_blocks
 *    inodes        (s_ninodes + 'inodes per block' - 1)/'inodes per block'
 *    unused        whatever is needed to fill out the current zone
 *    data zones    (s_zones - s_firstdatazone) << s_log_zone_size
 *
 * A super_block slot is free if s_dev == NO_DEV. 
 */

The statement Inode 0 is on the disk, but not used is only partially true: inode 0 is expressed in the inode map, but does not appear in the block of inodes (following the zone map). In other words, the first entry in inodes is inode 1, which anchors the root directory for the file system. Prior to this observation, the extraction was reading wierd data (i.e. didn't work!) from all over the image.

The final hurdle was working around the (brain-dead) case-insensitivity of the Windows file system. Since the Minix file system is case-sensitive (like any reasonable system), conflicts can arise when a directory contains two names differing only in case. Luckily, in the Minix 3.1.1 release this only occurs once, inside the nvi sources. This is well enough away from the interesting kernel source code that it is unlikely to affect anybody. The code also handles the MS-DOS 'reserved names' (e.g. aux, con, prn, etc...) safely - I only wish I'd written this code earlier, since the original ACK code contained files inconveniently named 'aux.c' that refused to be extracted on Windows. Whilst not a big deal to work around, it was certainly an inconvenience.

In summary, the tour through El-Torito was all too short. I'd love to return to those exotic environs, and other exciting places, in the near future...

# posted by Michael Kennett @ 1:44 am 0 comments

Thursday, October 27, 2005

Charles Petzold

Saw a reference to a talk by Charles Petzold on slashdot titled Does Visual Studio Rot the Mind?. This latest rumination laments the artifical barrier that tools like Visual Studio place between the programmer and their code. This barrier is more than the automatically generated code that 'wizards' construct, and that we all struggle with reading once a project has moved into maintenance. The barrier is also the way in which the tool changes the way we write code, so that 'features' like IntelliSense completion work. The tool becomes the master, and begins to dictate how we should work with no consideration given to elegance or simplicity of the code. Petzold discusses these issues eloquently, and concludes the article with his experiences in starting to write ANSI C code in notepad: It’s just me and the code, and for awhile, I feel like a real programmer again.

I have similar feelings whenever I get the opportunity to tinker with Minix. The system is small enough that I can keep large chunks of it in my head, and provides just the basic tools required for writing and compiling code. There is a slow release cycle, so I never worry about keeping up with the next upgrade or the latest changes, and am confident that anything I do in the next month or 2 or 6 will still be relevant. This is a very different environment from Windows, Linux or the *BSD communities where there is continual change, and a large amount of effort needs to be spent just keeping up with the change. I use all three of these systems regularly (and have for many years), but I enjoy booting up my old Minix system, and just writing code with black-and-white glowing characters.

Petzold belongs to a select group of authors that is able to make reading about programming interesting and fun, and is up there with W. Richard Stevens (author of Advanced Programming in the UNIX Environment and other classics) and Andy Tanenbaum. I still have a copy of Petzold's OS/2 Presentation Manager Programming sitting on my bookshelf, which was an invaluable reference years ago when I was making the transition from being a 'Unix guy' to an 'OS/2 guy', and layed solid foundations for me to start programming in the Windows world.

Thanks Charles for all your work.

# posted by Michael Kennett @ 6:24 pm 0 comments

Monday, October 24, 2005

Minix 3

Today is the launch of Minix 3. It's a great little system (albeit not as little as Minix 2 and earlier versions), and I hope it takes off.

# posted by Michael Kennett @ 6:05 pm 0 comments

Markov-oriented Programming

Just been looking through some code with a reasonably complex hierarchy of objects, where each object is reasonably stateful, and no comments at all (joy...). Somehow I have to make sense of all this. The following thought occured to me in the midst of this that having a 'Markov' model for object state would make this task much, much simpler - after construction of the object, any method should be applicable without having to know the history of the object (i.e. all previous method invocations). This is inspired by Markov Processes (from probability theory) that have no memory... this all seems completely contradictory to the standard OO approach, which attempts to make objects stateful. However many simple objects are effectively Markov - consider a file object, which is created/initialised with a filename. The operating system maintains a file pointer, but at any point in time it is possible to apply the read(), seek(), write() etc... methods to the object. Such simplicity would be greatly welcome in the code that I'm currently digesting: complex initialisation sequences are required, and it is necessary to perform a 'foo()' before a 'bar()'; just don't even think about performing 'baz()' until all the planets are aligned. Having formal models of object state would help. Perhaps we can all find inspiration from communication protocols (e.g. the TCP/IP state transitions), and develop simple automata models. It would be nice to give this more thought.

# posted by Michael Kennett @ 4:10 pm

Sunday, October 23, 2005

Clement Greenberg

A rather pompous mathematics lecturer I once had often pontificated about life. One of his favourite topics was to criticise art subjects with comments similar to: a second rate engineer is more useful to society than a second rate art critic. For a long time I shared similar views (but never fully embraced the economic rationalist view that production is a measure of the meaning of life), but have I seen the light, and now accept that art washes away from the soul the dust of everyday life (Picasso). All of this rambling has been prompted by stumbling across the following webpage on Clement Greenberg, who is reputedly "the greatest art critic in the second half of the 20th century" (i.e. certainly not second rate). I'm certainly going to try and spend some time reading his essays, and spend less time writing software :-)

# posted by Michael Kennett @ 12:07 pm 1 comments

Random Thoughts