Deconstruction of a 16 byte demo

I occasionally see cool demoscene stuff on Twitter, and I thought it'd be cool to deconstruct a simple example to see how it worked.

I'm not very familiar with x86 assembler and DOS programming, so this will go slow - I'll talk about all my findings along the way and how I worked things out. If you are familiar with this area, this will bore the hell out of you. But if you're not, you might enjoy following along.

I decided to work out a masterpiece called kasparov 16b by the group Desire. It's a 16 byte DOS executable that displays a 4-colour scrolling chessboard.

Check out the GIF here before proceeding. I will not dwell on how absolutely remarkable it is that 16 bytes is enough to produce a plausible animated effect.

I worked through this mostly on Windows 10, although I used WSL for things like supporting scripts when working stuff out.

DOSBox

pouet.net provides .asm source code and a built .com file for the demo. The .com file worked perfectly in DOSBox, although the scrolling was a bit slow until I increased the simulated cycle speed.

Building x86 Assembler

I decided to use NASM. I don't know this space at all, but it's easy to use with approachable docs. Building the .asm was very easy - just a matter of running nasm kasparov.asm -o kasparov.com. Sure enough, it produced a 16-byte .com file!

Grokking the code

I did some assembler in university, but not x86. I've always wanted to learn a little more, and there are only 8 instructions to understand. How hard can it be?

Here's the program.

1   X: add al,13h
2   int 10h
3   mov al,cl
4   xor al,ch
5   or ax,0CDBh
6   loop X
7   dec dx
8   jmp short X

A detailed rundown of what I worked out follows.

Init Video

1 X: add al,13h

Note that there's no prelude, header, data segment or anything like that - this gets right down to business and starts issuing instructions.

X is a label which means this instruction can be jumped to which we'll get to later.

The file is written in "Intel Syntax" which means that the destination comes before the source.

This instruction adds 13h (hex notation is designated by the trailing h) to the lower byte of the AX register (designated by AL).

The AX register is initialised by DOS to zero, so this effectively sets AX to 13h. Not just directly setting it to 13h is an optimization which we'll get to later on.

Intel 80386 and higher CPUs have 4 general purpose 32 bit registers: EAX, EBX, ECX and EDX.

They can be used as 32 bit, 16 bit, or 8 bit registers, depending on how you refer to them:

  • All 32 bits: EAX EBX ECX EDX
  • The least significant 16 bits: AX BX CX DX
  • Bits 8-16: AH BH CH DH
  • The least significant 8 bits: AL BL CL DL

2 int 10h

This triggers interrupt 10h. Triggering an interrupt is how DOS programs ask the BIOS, DOS, or some piece of hardware to do things. The CPU jumps to some other preloaded function which interprets the arguments to the interrupt based on the values of certain registers. More modern operating systems and hardware now have different mechanisms for making system calls but the principle is the same.

On PCs, interrupt 10h is installed by the BIOS when the machine powers on, and provides a simple API for displaying graphics. There's a basic API that you can assume all PCs will support, plus some extensions which will be hardware specific. I'm not sure if the BIOS of modern PCs still loads video functions at this interrupt, but DOSBox emulates it.

Interrupt 10h does different things depending on the AH register. In this case, its 0, which means set the video mode according to value of the AL register.

This instruction is called in a loop as we'll see later, but the first time around, register AL is 13h, which is VGA - 320x200 pixels in 256 glorious colours.

Mysterious XORing and ORing

3 mov al,cl 4 xor al,ch 5 or ax,0CDBh

In the first instruction, we copy the contents of CL to AL. The first time this instruction runs, CX hasn't been set, but DOS has initialized it to 00FFh, so AL will now be FF.

We then xor CH with AL. Or in other words, set AL = CL xor CH.

In the third instruction, we do AX = AX bitwise-or OCDBh (a constant). Or in other words, breaking down the AX register into two 8-bit registers:

  • AH = AH bitwise-or 0C
  • AL = AL bitwise-or DB

AH prior to this instruction was 0, so now its OC. AL was FF, so it remains FF.

Let's set aside what the point of this is for one moment, but the post-condition is that AH = 0C and AL = FF.

Looping around

6 loop X

This instruction decrements CX and jumps back to X unless CX has become 0.

Whether it decrements CX (a 16 bit register) or ECX (a 32 bit register) depends on the address-size attribute of the instruction. To the best of what I could find out, unless the attribute explicitly set in the code (which it isn't here), this defaults to 16-bit when the CPU is in real mode. And I'm old enough to know that DOS programs run strictly in real mode, preventing them from executing if you have already started Windows, which switches the CPU to protected mode.

CX is a 16 bit register still not set by this program, so at this point its 00FFh, or 255 in decimal. This means that there will be 255 iterations of the code between this line and X while CX is decremented at each iteration. When CX hits 0, control proceeds to the next instruction.

What's the point of this looping?

Writing pixels

Well, in all subsequent iterations, AH will always be 0Ch. We only ever write to it on line 5 as part of or ax,0CDBh.

As explained above, AH controls what the interrupt 10h does - it selects which functionality is invoked. In the first call, it was set to 0 which caused the video mode to be set, but now its set to 0Ch. And we now know that this interrupt is being called every iteration in a loop.

When AH = 0Ch, interrupt 10h (on line 2) changes the color of a single pixel, using

  • AL to set color
  • CX to set column
  • DX to set row

Okay, now things are starting to make sense. We're writing coloured pixels in a loop - not changing the row but decrementing the column. And we're doing some funky XORing to change the color every iteration. So this loop draws a long horizontal patterned line. That makes sense for something that's supposedly drawing a chessboard - the colour sequence must be responsible for the tiled appearance somehow.

Last bit of the code:

7 dec dx 8 jmp short X

This decrements DX, which is responsible for the pixel row in the interrupt 10h, and jumps back to the top of the program. This effectively tells the program to run itself in a loop infinitely.

Note that when DX reaches 0, decrementing it here causes an integer underflow and DX wraps around from 0 to 65535 (the highest 16 bit number). There will then be 65535 iterations before DX again becomes 0 and wraps around again. This behaviour also applies to the CX register and means that after the initial 255 iterations of the inner loop (from lines 1 to 6) there will be 65535 inner loop iterations for every iteration of the outer loop.

DX is initialized by DOS to the value of the CS register which effectively makes it something random when the program starts.

So now we have the broad outline - the program sets VGA mode, then does something like this pseudo-code. I've abstracted the XORing and ORing into a function get_color for readability, and the graphics operations into write_pixel.

// x and y are whatever DOS sets them to.
uint16 x = 255
uint16 y = an undefined 16 bit integer

while true:
    // Write a row of pixels from x down to 0.
    //
    // The first row will be 255 pixels long.
    // Rows after the first will be 65535 pixels long.
    //
    // When writing rows after the first, x wraps around from 0
    // to 65535 on the first iteration.
    do:
        c = get_color(x, c)
        write_pixel(x, y, c)
        x--
    while x > 0

    // Decrement the y offset of the next row.
    y--

So we just need to understand why that code produces something that looks like a scrolling chessboard, which is the subject of the next post :)

References

Here's some references I used:

- jws,