Analysis of Revision'2020 PC 256 byte intro winner "Memories" by HellMood / DESiRE

Update Hellmood has be kind enough to just release this long article on the demo. Still this article may have some value and it is a lot of fun write to nail down the little details. 4/20/20

20 April 2020

It's that time of the year again: Revision 2020 party. The past weekend (Easter weekend) the Coronavirus has forced the demo party to be remote which meant we get to experience it as a first class citizen from the other end of the world (Argentina 🇦🇷). One of my favorite categories is the PC 256 byte intro. I'm always amazed at what smart people can accomplish with limited resources. This year's winner is, in my opinion incredible so I though I needed to have a deep look into how it worked.

The intro is called "Memories" and you can find it here: Pouet. You can also watch the video in YouTube. If you download the code you'll notice that the author was kind enough to provide the source code! This is really important and great in my opinion because it allows people to study it and learn from it. Surprisingly this doesn't seem to be the norm although I see that more and more people are releasing their source code or at least the shaders are shared via ShaderToy.

The result of my investigations is this article. It's quite long and is a work in progress. Even though the main idea of the code is easy to grasp each effect is its own world and the code is full of subtelties.

Pre-requisites

This article might interest people who want to understand a little how these tiny demos/intros work. What's great is that you need very little to be able to play with this particular demo. You need to install two tools: Dosbox (recommended version 0.74) and NASM.

I'm on Linux but you can deduce how to do the same in Windoze/MacOS. Install Dosbox and NASM with:

$ sudo apt-get install dosbox nasm

Once you have installed both tools you can watch the demo with

dosbox -conf dosbox-0.74-3.conf memories.com

To compile the demo just run:

nasm memories.asm -fbin -o memories.com

Just for reference here's an animated GIF with the 8 effects:

You can also watch the video in YouTube.

To undestand the code you'll need:

Some knowledge of 8086 assembly (or similar). In Appendix 1 I've included the main registers of the 8086 and some links so that you can learn a bit more.
Some knowledge of the 8086 architecture You need to know what segments are and how DOS interrupts work. Fortunately you don't need to know too much to understand most of the code so don't worry too much.
VGA mode 13h Even though VGA modes are pretty tricky, fortunately for us this demo uses whats known as mode 13h (see Appendinx 2) which is pretty straightforward.
Jupyter notebook To understand parts of this demo I used a Jupyter notebook to explore a little the functions that the demos uses. If you have experience you can play and tweak it.

Main idea of the Algorithm

So, the structure of the demo is the following: after a short initialization of the registers, video mode and timer interrupt we have the main loop of the intro which goes over all pixels of the 320x200 screen continuously. For each pixel an effect routine is called which returns the color it should be painted. This is the same way in which shaders work so in a sense each effect is a 'mini-shader'. The effect routine which is picked for the current pixel varies with time. Initially it uses one 'shader' and as time passes it start to mix pixels from the current shader and the next until all pixels start coming from the next shader. This is how the current shaders and the next are blended. You can see this clearly in the demo. The process repeats until the last effect is displayed. Simple! Right? Weeeell... so yeah, that's the main idea and its pretty straightforward to understand. But, as they say, the devil is in the details.

To understand the main idea lets go over a simplified version of the demo code. I've removed all sound related code to make things a bit easier to follow. Here's the simplified version with more comments:

  1 org 100h                ; program is loaded at 0x0100
  2 s:
  3     mov al, 0x13        ; 320x200, 256 color 1 page graphics 0x13
  4     int 0x10            ; int 10h. Assumes AH = 0x00
  5 
  6     xchg bp, ax         ; load BP (0x0000 at start) into AX
  7                         ; (efectively clearing it) (BP = 0x0013)
  8     push 0xa000-10      ; 0xA000:0000 is the begging of VGA Memory
  9     pop  es             ; ES = 0x9FF6 (0xA000 - 0x10)
 10 
 11     mov ax, 0x251c
 12     mov dl, timer       ; assume DH=1, mostly true on DosBox
 13     int 0x21            ; AX = 0x251c , DX = 0x0145 (org 0x100 + offset to
 14                         ; timer function):
 15 
 16 ; ------------------------------------------------------------------------------
 17 ; Main loop and initial values of key registers.
 18 ; DI has 0x0000 and has the offset into VGA RAM
 19 ; BP has 0x0013 and is our 'time' variable updated in our interrupt callback.
 20 ; ES starts with 0xA000 - 0x10 (0x9FF6) points to the start segment of VGA
 21 ; ------------------------------------------------------------------------------
 22 
 23 top:
 24 
 25     mov ax,0xcccd
 26     mul di
 27 
 28     ; Choose which effect to play next. The value is stored in AL and is
 29     ; an index to the 'table:' entry list. This code is responsible
 30     ; of the fading in / out of effects. It's a function of the
 31     ; DI and BP registers.
 32     add al,ah
 33     xor ah,ah
 34     add ax,bp
 35     shr ax,9
 36     and al,15
 37 
 38     xchg bx, ax
 39     mov bh, 1               ; Load offset to effect to BX
 40     mov bl, [byte bx+table]
 41     call bx                 ; Call effect
 42 
 43     stosb                   ; Store pixel color AL -> ES:[DI++]
 44     inc di
 45     inc di
 46     jnz top
 47 
 48     ; Set frequency for the timer interrupt routine
 49     mov al,tempo
 50     out 40h,al              ; This is the 'counter divisor'. It controls
 51                             ; how often the interrupt fires.
 52 
 53     in al,0x60              ; Read from scan-code from keyboard.
 54 
 55     dec al                  ; Compare the keycode with 1 (ESC key)
 56     jnz top
 57 
 58     ret                     ; exit
 59 
 60 ; Table of offsets to effects -----------------------------------
 61 table:
 62     db fx2-s,fx1-s,fx0-s,fx3-s,fx4-s,fx5-s,fx6-s,sounds-s,stop-s
 63 stop:
 64     pop ax
 65     ret
 66 
 67 timer:
 68     inc bp
 69     iret
 70 
 71 fx0:
 72     ...
 73 ret
 74 
 75 fx1:
 76     ...
 77 ret

Let's dive into the details.

Lines 1 - 13: initialization

This code is pretty straightforward:

lines 3-4 tell the computer to switch to VGA mode 0x13 (13h) which is a video mode with resoltion 320x200 and 256 colors. In this mode the color for each pixel on the screen is specified by a single byte in video ram which starts at [0xA000:0000]. See Appendix 2 for more details. To avoid using 2 bytes this code assumes that the high part of AX (AH) is zero which is the case when the program starts.
line 6 initialized BP with 0x13. We'll see why later. (@TODO)
lines 8-9 Load the extra segment to the start of the VGA memory (0xA000) minus 0x10 (we'll see why. (@TODO)
lines 11-13 setup the timer routine. This basically tells the computer to call the function timer: every so often. This is the function that basically increases our 'time' variable (the BP register) and also plays the next sound (not included in code for now).

Lines 14 - 58: main loop

This code is the main part of the demo and gets looped continuously to calculate the color of each pixel. To undestand it a little you need to keep in mind the initial values of some of the registers that are preserved throught the code. These are:

DI this is the offset into VGA RAM. When DI is zero its at the top left pixel of the screen. When DI is 319 its at the top right pixel and so on. DI basically goes from 0 - 0xFFFF continuously.
BP this is a 'time' variable that is increased every time the timer function is called. This register is used by the shaders to change with time. We'll see more later.
ES the extra segment is fixed at 0xA000 which is the start segment of the VGA RAM. Not modified again.

Now lets have a look at this code a bit. Now we have some pretty subtle code here.

lines 25-26 This code is pretty important and it took me a while to figure what it was doing. These two lines simply takes the content of DI (the offset to the current pixel we are looking at) and multiplies it by the 'magic' number 0xCCCD. Since we are multiplying two 16 bit number the result is a 32 bit number which is stored in DX:AX. The low 16 bits will be overriden in the next lines so let's forget about it. But the value of DX is used by the shaders and it's important to understand what it stores. This took me a night to figure out but I think I have a better idea. Have a look at Appendix 3 to see how it's calculated.
lines 28-36 This code takes the low 16 bits of the previous multiplication and does some magic to it. See Appendix 3 to understand the details but the end result is that the low 8 bits of AX (AL) has a number between 0 - 0xF. If you think of this as a function of DI it basically has an almost fixed number for a while and as the DI increases it starts oscillating between that value and the next until it finally settles to the next value. This is used to switch between shaders. So, the sequence looks something like this (as a function of time):

0 0 0 0 0 ... 0 0 1 0 0 1 0 0 1 0 0 ... 1 1 1 0 1 1 1 0 1 ... 1 1 1 1 1 1 ... 1 1 1 1 2 1 ...

lines 38 - 46 The register AX is copied onto BX and will be used as the location of the shader to call for the curren pixel. The value of AL is basically an offset into the array of shaders stored in lines 61-62. Once we have in BX the address of the 'shader' responsible for calculating the color of the current pixel line 41 calls it and lines 43-46 store the pixel on the screen, increase DI by 3 (not sure why we need it to be 3. As long as its odd the result is the same.)
lines 48 - 58 This code sets the 'counter divisor' to make the interrupt trigger more often (quicker) or slower.
lines 53 - 58 poll the keyboard and if the user hits ESC exits otherwise it goes back to top: and the continues.

Lines 61-70: Misc code

lines 61-62 hold the offsets to each of the shader functions and the final 'stop' routine which exists the demo once it finishes calling all shaders. This same offset seem to be used for the sound but I'm ignoring all that for now.
lines 63-66 is the exit code of the intro
lines 67-70 this is the interrupt handler for the timer. This gets called many times per second (dependent on the value of the tempo variable). In our simplified version this increases the value of BP (time) and returns. In the full demo this part of the code takes care of the music.

OK, great! That was easy! Right? Well ... it took a while to figure out but the fun is just starting. Now lets dive into each individual shader.

Effect: Board of chessboards

This is the first effect to display (in the code its labelled as fx2). To understand the way effects works and not have to worry about the rest I've made a simpler version of the demo that only plays one effect. You should now understand most of the code in there and we can focus on only the effect. You can play with it: it's the file called fx2.asm. To compile it you can do:

nasm fx2.asm -fbin -o fx2.com

and run it in the same way as you ran the intro:

dosbox -conf dosbox-0.74-3.conf fx2.com

This is what you should see when you run it:

The resulting code is 51 bytes! Let's take a look at fx2.asm:

  1 %define targetFPS               65
  2 %define tempo                   1193182/256/targetFPS 
  3 
  4 org 100h
  5 s:
  6     mov al, 0x13        ; initialization code
  7     int 0x10
  8     push 0xa000-10 
  9     pop  es
 10     mov ax, 0x251c
 11     mov dl, timer
 12     int 0x21  
 13 
 14 top: 
 15     mov ax, 0xcccd
 16     mul di              ; di * 0xcccd -> dx:ax
 17 
 18     ; Call effect
 19     call fx2
 20 
 21     ; Store AL register in ES:[DI] and increase DI.
 22     stosb
 23     inc di
 24     inc di
 25     jnz top
 26 
 27 ; setup frequency of timer
 28     mov al, tempo
 29     out 40h, al
 30 
 31 ; read a character from keyboard into AL
 32     in al,0x60
 33 
 34 ; return to top of loop if AL != 1
 35     dec al
 36     jnz top
 37 
 38     ret
 39 
 40 timer:
 41     inc bp
 42     iret
 43 
 44 ;
 45 ; Chessboard effect
 46 ;
 47 fx2:
 48     xchg dx,ax
 49     sub ax,bp
 50     xor al,ah
 51     or al,0xDB      ; limit to 4 posibble colors
 52     add al,13h
 53 ret

To simplify the code I've removed all other effects and constants, and also the blending of effects code. You should understand everything that is happening here from my previous explanation so now we can focus on lines 47 - 53 that are the 'meat' of the shader. Let's see how they work.

For starters lets look at the last two lines (51-52). If you look at the binary representation of 0xDB you get this: 1101 1011. When you perform the OR function of any 8 bit number with this you can only get 4 possible values: 1101 1011, 1101 1111, 1111 1011 or 1111 1111. In Hex that would be: 0xDB, 0xDF, 0xFB or 0xFF. Now if we add 0x13 to this numbers you can only get the following four numbers: 0xEE, 0xF2, 0x0E and 0x12. So, the first conclusion is this shader can return 4 possible colors and if we look at the color chart in Apendix 2 you'll see that they are: a dark color 1 (0xEE), black (0x12), a bright yellow (0x0E) and dark color 2 (0xF2). If you look at the closeup this can be seen (although the two dark colors are very close to each other so I've tweaked a little the colors of the image so you can see it):

So, that leaves us with only 3 lines! We're close. Line 48 simply exchanges the value of DX and AX. Since at the top of the loop DX is calculated again it doesn't matter that it gets overriden by AX. So now the value that was calculted at the beginning of the loop is in AX. If you comment line 49 and run the code you'll see the chessboard pattern but it won't move. That's because if you remember BP register has the 'time'. So substracting BP from AX basically shifts the whole pattern. So, we can ignore it for now. All the magic happens in line 50 which modifies what's in the accumulator (AX).

So, as I mentioned in the previous section, the value of DX is special. The register's high byte (DH) has the y coordinate of the pixel we are looking at and the low byte (DL) has the equivalent of the x coordinate. The only difference is that instead of beeing going from 0 - 319 it goes from 0 - 255. You can take a look at the Jupyter notebook in Appendix 3 to understand this in more detail. So now we need to understand what happens when we XOR these two numbers. To try to understand how it works you can do all possible XOR of the first nibble (4 bits). So we can create an XOR table.

This table might look a bit confusing at first but it's not so complicated. At the top we see the numbers from 0 - 15 (0x0 - 0xf). The same thing with the rows. At the intersection of the two numbers you also see number which is the result of doing the XOR between the row and the column number. This function is symetric across the diagonal since A XOR B is the same as B XOR A. Also at the intersection you can see a small bar with white/red pixes. That is basically the binary representation of the number. So, let's do an example: let's do 0x4 XOR 0x9. If we look at the table the interesction of row 4 and column 9 shows 13 (0xD) which has binary representation 1101. Finally there are 4 graphs below that should look familiar! The first graph shows the value of the first bit for each table. The second graph, the second, and so on. So, continuing with the example 0x4 XOR 0x9 we see that since the result in binary is 1101 then the intersection of row 4 with col 9 (an row 9, col 4) of the first, second and last graph will be red wheras it will be white for the third graph (3rd bit). You can explore this in the Jupyter notebook I include in the repo (and reproduce in Appendix 3)

Let's see how all this works together. We'll look with a little detail how the first row of pixels is rendered. To simplify a little more the analysis we'll set BP=0. During the first row er have: DI = 0, 1, ... 319, DH = 0, 0, ... 0, DL = 0, 0, 1, 2, 3, 4, 4, 5, 6, 7, 8, 8, 9, 10, ..., 253, 254, 255. So, we need to look at this:

 48     xchg dx,ax
 49     sub ax,bp
 50     xor al,ah
 51     or al,0xDB      ; 0xDB = 1101 1011
 52     add al,13h

We can ignore line 49 (BP=0). What determines which of the 4 colors we get for each pixel are the 2nd bit of the high nibble of AL XOR AH and the 3rd bit of the low nibble of the XOR. By looking at the 'XOR' table and the patterns we see that those bits will mix the second and third graphs. You'll probably need to go through it manually or play around with the Jupyter notebook I've provided.

Effect: Circles Zooming

To understand how this effect works we need to dive into a bit more details I've skipped so far. Again, the secret is de content of the DX register. One of the things I hadn't been cleared is that if you look at the initialization of the ES register you'll notice that its not initialized at 0xA000 which is the start of the first line. Its initialized to 0xA000 - 0x10. Since thats a segment address thats exactly 0x80 bytes before the start of VGA RAM. So the first row we are really drawing the second half of it because the first half will never be seen. The following graph shows how the memory is layed out and what values the x and y value take. The table lists for each value of DI what is the content of DH (y coordinate) and DL (x coordinate).

The effect is basically a group of concentric circles centerd at coordinate (100,160). The equation of circle centerd in coordinate x0, y0 is:

(x-x0)² + (y-y0)² = R²

If you look at the table you'll notice that x=0 happens for DI=140 so the equation of a circle in this coordinate space is:

(x)² + (y-y0)² = R²

That means that all coordinates x,y that fall on a circle will have a fixed value. We use this fixed value (R²) will correspond to a color. If we limit the palette we'll get the effect we see in the demo. Let's look at the source code:

  3 fx1:
  4 	mov al, dh
  5 	sub al, 100
  6 	imul al
  7 	xchg dx, ax
  8 	imul al
  9 	add dh, ah
 10 	mov al, dh
 11 	add ax, bp
 12 	and al, circles_pattern
ret

lines 4-5 copies the content of the y coordinate (DH) -> AL, we substract 100 (y0) and then we multiplay by itself to get (y-y0)². Line 7 saves the value of AX -> DX. And leaves the x coordinate in AL. Line 8 multiplies by itself and stores in AX so Ax = AL * AL. One important point is that imul performs a signed multiplication. Let's understand this with a couple of examples. If you have AL = 0x20 (32 in decimal notation) for example, then imul al would yield ax = 0x400. Nothing strange here. Now let's see what happens if al = 0xf0 (240 decimal). A regular multiplication would yield ax = E100 (57600 decimal) but the imul instruction interprets 0xf0 = - 0x10 and so -0x10 * -0x10 = 0x100 and therefore returns a totally different result.

Effect: Tilted plane scrolling

  1 fx0: ; tilted plane, scrolling
  2 	mov ax,0x1329
  3 	add dh,al
  4 	div dh
  5 	xchg dx,ax
  6 	imul dl
  7 	sub dx,bp
  8 	xor ah,dl
  9 	mov al,ah
 10 	and al,tilt_plate_pattern
 11 ret

Pending.

Effect: Parallax checkerboards

  1 fx3: ; parallax checkerboards
  2 	mov cx,bp
  3 	mov bx,-16
  4 fx3L:
  5 	add cx,di
  6 	mov ax,819
  7 	imul cx	 
  8 	ror dx,1	 
  9 	inc bx	 
 10 	ja fx3L
 11 	lea ax,[bx+31]	 
 12 ret

Pending.

Effect: Sierpinski rotozoomer

  1 fx4: ; sierpinski rotozoomer	
  2 	lea cx,[bp-2048]
  3     sal cx,3
  4 	movzx ax,dh
  5 	movsx dx,dl
  6 	mov bx,ax
  7 	imul bx,cx
  8 	add bh,dl
  9 	imul dx,cx
 10 	sub al,dh
 11 	and al,bh
 12 	and al,0b11111100
 13 	salc				; VERY slow on dosbox, but ok
 14 	jnz fx4q
 15 	mov al,sierp_color
 16 	fx4q:
 17 ret

Pending.

Effect: Raycast bent tunnel

  1 fx5: ; raycast bent tunnel
  2 	mov cl,-9
  3 	fx5L: 
  4 	push dx
  5 		mov al,dh
  6 		sub al,100
  7 		imul cl
  8 		xchg ax,dx	
  9 		add al,cl
 10 		imul cl
 11 		mov al,dh
 12 		xor al,ah
 13 		add al,4
 14 		test al,-8
 15 	pop dx
 11 	loopz fx5L
 17 	sub cx,bp
 18 	xor al,cl
 19 	aam tunnel_pattern; VERY slow on dosbox, but ok
 10 	add al,tunnel_base_color
 21 ret

Pending.

Effect: Ocean night / to day sky

  1 fx6: ; ocean night / to day sky
  2 	sub dh,120
  3 	js fx6q
  4 	mov [bx+si],dx
  5 	fild word [bx+si]
  6 	fidivr dword [bx+si]
  7 	fstp dword [bx+si-1]
  8 	mov ax,[bx+si]
  9 	add ax,bp
 10 	and al,128
 11 	dec ax
 12 fx6q:
 13 ret

Pending.

About the Author

Cesar Miquel has been involved with computers since he got his first computer in the 1980′s. He's gone from programming assembly language for the 6809 and 6502 processors, writting C and C++ code for industrial applications, running numerical simulations to web development. Since 1995 he has been mostly using open source software.

Currently he is a partner at Easytech, an Argentinian based company involved in developing web based applications and sites for the enterprise. There he currently leads the development team involved in developing enterprise web applications and sites.

He also holds a PHD in Physics from the University of Buenos Aires were he has done research in quantum computers. He has several publications in scientific journals including one in Nature.

You can find me: Twitter / Instagram / Tumblr / LinkedIn / Easytechgreen

Appendix 1: the 8086 and its architecture

I'm not going to talk mucho about the 8086 and its architecture. I'll just point out a couple of salient features that you need to know in order to follow the code.

The 8086 has the following eight 16 bit registers: AX, BX, CX, DX, SP, BP, SI, DI
Memory addresses in the 8086 are given by a combination of a 'segment' register plus an offset. Usually they are writen [A000:04F0]. A000 is the segment anf 04F0 the offset. To convert that to an absolute address you ned to muliply the segment by 16 and add the offset. So [A000:04F0] corresponds to A04F0.
The 8086 has 4 segment registers that are used in addition to the General purpose registers.
You can search for the most common instructions online.
The following table sums up the reisters and their common use

This image taken from this presentation which has a lot more details on the 8086 processor

Some links 🔗

[Intro and resources to 8086 assembly](Introduction to x86 Assembly Language) - lecture notes from course CIS-77 Introduction to Computer Systems
Modes of Memory Addressing on x86 - lecture notes from same course as above
The list of all interrupts that are currently supported by the 8086 assembler emulator

Appendix 2: VGA mode 13h

For some of the things I needed the RGB colors of all 256 colors. Since I couldn't find them I wrote a little PHP code to go over the image and extract them to RGB values. You can see the code in this gist and download a CSV file with them.

Apendix 3: Jupyter notebook

This Jupyter notebook will be useful to understand some of the ideas behind each of the different effects that appear in the demo. We'll start by looking at the values of the DX registers that is used in all effects. Then we'll go over each individual effect and try to explore the math that goes behind it.

The notebook is under the folder notebook/ and you can download it here: memories-notebook.ipynb.

Jupyter notebook to analyze the different effects used in the demo

This Jupyter notebook will be useful to understand some of the ideas behind each of the different effects that appear in the demo. We'll start by looking at the values of the DX registers that is used in all effects. Then we'll go over each individual effect and try to explore the math that goes behind it.

Analyzing the contents of DX

In the program, all the effects use the DX register which is pre-loaded when the loop starts with the high word of the result of multiplying the DI register with the magic number 0xcccd. To understand why the author uses this number we need to dig a bit into this bit of code:

    mov    ax, 0xcccd
    mul    di

Remember that DI is basically a counter. When you multiply DI (a 16 bit number) with 0xcccd (another 16 bit numbre) you get a 32 bit number. The high part is stored in DX and the lower part in AX. For the code of each effect the value of AX is not relevant so lets focus on DX. Lets look at how it behaves as a function of DI. For this let's do couple of simple graphs. Lets import the Matplotlib and NumPy libraries :

import matplotlib.pyplot as plt
import numpy as np

and set up two arrays to hold the top (DH) and bottom registers (DL) part of DX

result_dh = []
result_dl = []

Lets loop through the first n bytes of DI and calculate DX

n = 0xffff;
for di in range(0, n, 1):

    # mov ax, 0xcccd
    # mul di
    m = (di+160) * 0xcccd
    dx = (m & 0xffff0000) >> 16
    dx = dx & 0xffff
    ax = (m & 0xffff)
    
    result_dl.append(dx & 0xff)
    result_dh.append((dx & 0xff00) >> 8)

Now lets plot the low and high bytes of DX as a function of DI

offset = 0
plt.figure(figsize=(14,5))
plt.subplot(1,2,1)
plt.plot(result_dl[offset:offset+700])
plt.title("DL register as a functoin of DI")

plt.subplot(1,2,2)
plt.plot(result_dl[offset:offset+16])
plt.title("DL register as a functoin of DI (detail)")

plt.show()

plt.plot(result_dh[offset:offset+640*2])
plt.title("DH register as a function of DI")
plt.show()

So we see the following important properties:

Both functions have a period of 320 which is the width of the screen.
In the case of DH we see that its an increasing function which basically adds 1 after 320 bytes. So, in effect this is basically the y coordinate of the pixel at DI.
In the case of DL we see its a sawtooth with period 320. The values go from 0 to 255. Since there are only 256 possible values and the function reaches its maximum when DI is 320 then some values must repeat. If you look closely this can be seen. You can think of this as the x coordinate except that instead of going up to 319 it goes up to 255.

After HellMood published his article this is what he calls the "Rrrola trick". In this way you can get an estimate to y,x without the need to divide and move the results around which require more bytes

Board of Chessboards effect

The first efffect that appear in the demo is generated by this code:

140 fx2: ; board of chessboards
141     xchg dx,ax
142     sub ax,bp
143     xor al,ah
144     or al,0xDB
145     add al,13h
146 ret

If you read the main text you'll learn that the most important thing happening here is the line xor al,ah. To explore this let's look at what the XOR of two 8 bit numbers will look like.

Calculating the XOR of two bytes

Lets calculate the table of all possible XOR between two nibbles (4 bit numbers). We can calculate this easily like this:

matrix = []
for row in range(0, 16):
    matrix.append( map(lambda col: (row ^ col), range(0, 16)) )
    
print np.array(matrix)

[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
 [ 1  0  3  2  5  4  7  6  9  8 11 10 13 12 15 14]
 [ 2  3  0  1  6  7  4  5 10 11  8  9 14 15 12 13]
 [ 3  2  1  0  7  6  5  4 11 10  9  8 15 14 13 12]
 [ 4  5  6  7  0  1  2  3 12 13 14 15  8  9 10 11]
 [ 5  4  7  6  1  0  3  2 13 12 15 14  9  8 11 10]
 [ 6  7  4  5  2  3  0  1 14 15 12 13 10 11  8  9]
 [ 7  6  5  4  3  2  1  0 15 14 13 12 11 10  9  8]
 [ 8  9 10 11 12 13 14 15  0  1  2  3  4  5  6  7]
 [ 9  8 11 10 13 12 15 14  1  0  3  2  5  4  7  6]
 [10 11  8  9 14 15 12 13  2  3  0  1  6  7  4  5]
 [11 10  9  8 15 14 13 12  3  2  1  0  7  6  5  4]
 [12 13 14 15  8  9 10 11  4  5  6  7  0  1  2  3]
 [13 12 15 14  9  8 11 10  5  4  7  6  1  0  3  2]
 [14 15 12 13 10 11  8  9  6  7  4  5  2  3  0  1]
 [15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0]]

Each element in this matrix is also a nible so we can look at its binary representation. So, if we print this same matrix but each element has the binary representation we can create 4 new matrices with a 0 or 1 in it corresponding to the first, second, etc bits. Let's look at how this 4 matrices look like:

import matplotlib
import copy

cmap = matplotlib.colors.ListedColormap(["white", [.12,.1,.17,1]])

plt.figure(figsize=(10,10))
i = 1
for mask in [0b1000, 0b0100, 0b0010, 0b0001]:
    bits = copy.deepcopy(matrix)
    for row in range(0, 16):
        for col in range(0, 16):
            bits[row][col] = matrix[row][col] & mask

    plt.subplot(2, 2, i)
    plt.imshow(bits, interpolation='none', cmap=cmap, origin='upper' )
    plt.xticks([0,15],['0', '0xf'])
    plt.yticks([0,15],['0', '0xf'])
    plt.axis("off")
    plt.title("Values of matrix with bit %d" % (i))
    i = i + 1

plt.show()

These should start to look familiar.

Simulating the effect in Python code

To study a bit more we can write the effect as Python code. I've named all my variables with the name of the corresponding register. You'll notice that often I need to do an AND (&) operation with 0xFF or 0xFFFF to make sure the variable doesn't exceed the corresponding register size. I've commented each section of code with the corrresponding part of the orginal assembler effect.

We'll need to read the VGA palette values to use from here on. Let's read the palette froma Git repo I have with a CSV of all the colors and display the first 64 colors:

# Read VGA palette: the row number is the index into the array and the first 3 columns are the RGB values
vga_palette = np.genfromtxt('https://gist.githubusercontent.com/cesarmiquel/1780ab6078b9735371d1f10a9d60d233/raw/fb1ab3d46c81f1d43a11838f82cde8df767568d7/vga-palette.csv', delimiter=',')

rgb = []
numrows = 16
for color in range(0, 16 * numrows):
    rgb_pixel = vga_palette[ color ]
    rgb.append(int(rgb_pixel[0]))
    rgb.append(int(rgb_pixel[1]))
    rgb.append(int(rgb_pixel[2]))

rgb_m = np.array(rgb)
rgb_m.shape = (numrows, 16, 3)

plt.figure(figsize=(8,8))
plt.imshow(rgb_m, interpolation='none',origin='upper' )
plt.axis("off")
plt.show()

# ---------------------------------------------------------
# This code simulates the effect
# ---------------------------------------------------------
vga_memory = []

bp = 0                # you can change bp manually to see what happens

for di in range(0, 0xffff + 1):

    # mov ax, 0xcccd
    # mul di
    m =  di * 0xcccd
    dx = (m & 0xffff0000) >> 16
    dx = dx & 0xffff
    
    # Start of effect code ------------------------------------
    # xchg  ax, dx
    ax = dx
    
    # sub ax, bp
    ax = ax - bp
    
    # xor al, ah     ; calculate x XOR y
    al = 0xff & ax
    ah = (0xff00 & ax) >> 8
    al = al ^ ah 
    
    # or al, 0xDB
    al = al | 0xDB
    
    # add al, 13h
    al = al + 0x13
    al = al & 0xff # convert to 8bit
    
    # End of effect code ---------------------------------------
    
    # Save byte to Video RAM
    vga_memory.append(al)

    
# Plot the contents of VGA RAM with correct colors
bits = np.mat(vga_memory[160:320*200+160])
bits = bits.reshape(200, 320)

rgb = []
for row in range(0, 200):
    for col in range(0, 320):
        rgb_pixel = vga_palette[ bits[row, col] ]
        rgb.append(int(rgb_pixel[0]))
        rgb.append(int(rgb_pixel[1]))
        rgb.append(int(rgb_pixel[2]))

rgb_m = np.array(rgb)
rgb_m.shape = (200, 320, 3)

plt.figure(figsize=(16,16))
plt.imshow(rgb_m, interpolation='none',origin='lower' )
plt.xticks([0,320],['0', '320'])
plt.yticks([0,00],['0', '200'])
#plt.axis("off")
plt.show()

Zooming circles effect

	mov al,dh		; get Y in AL
	sub al,100		; align Y vertically
	imul al			; AL = Y²
	xchg dx,ax		; Y²/256 in DH, X in AL
	imul al			; AL = X²
	add dh,ah		; DH = (X² + Y²)/256
	mov al,dh		; AL = (X² + Y²)/256
	add ax,bp		; offset color by time
	and al,8+16		; select special rings

# ---------------------------------------------------------
# This code simulates the effect
# ---------------------------------------------------------

# Simulate signed 8 bit multiplation
# for two signed bytes as input and 16 bit out
def imul8(a, b):
    if a > 0x80:
        a = a - 0x100
    if b > 0x80:
        b = b - 0x100
    return (a * b) & 0xffff

vga_memory = []
bp         = 0                # you can change bp manually to see what happens

for di in range(0, 0xffff + 1):
    
    # mov ax, 0xcccd
    # mul di
    m = di * 0xcccd
    dx = (m & 0xffff0000) >> 16
    ax = (m & 0xffff)

    dl = dx & 0xff
    dh = (dx & 0xff00) >> 8
    al = ax & 0xff
    ah = (ax & 0xff00) >> 8
    
    # Start of effect code ------------------------------------
    # mov al, dh
    al = dh

    # sub al, 100
    al = (al - 100)

    # imul al
    ax = imul8(al, al)
    
    # xchg dx, ax
    temp = dx
    dx = ax
    ax = temp
    
    dl = dx & 0xff
    dh = (dx & 0xff00) >> 8
    al = ax & 0xff
    ah = (ax & 0xff00) >> 8

    # imul al
    ax = imul8(al, al)
    al = ax & 0xff
    ah = (ax & 0xff00) >> 8
    
    # add dh, ah
    dh = (dh + ah) & 0xff
    
    # mov al,dh     ; AL = (X² + Y²)/256
    al = dh
    
    # add ax,bp     ; offset color by time
    ax = (ah << 8) + al + bp
    ax = ax & 0xffff
    al = (ax & 0xff)
    
    # and al,8+16   ; select special rings
    al = al & 0b00011000

    # End of effect code ---------------------------------------
    
    # Save byte to Video RAM
    vga_memory.append(al)
    
# Plot the contents of VGA RAM with correct colors
bits = np.mat(vga_memory[160:320*200+160])
bits = bits.reshape(200, 320)

rgb = []
for row in range(0, 200):
    for col in range(0, 320):
        rgb_pixel = vga_palette[ bits[row, col] ]
        rgb.append(int(rgb_pixel[0]))
        rgb.append(int(rgb_pixel[1]))
        rgb.append(int(rgb_pixel[2]))

rgb_m = np.array(rgb)
rgb_m.shape = (200, 320, 3)

plt.figure(figsize=(16,16))
plt.imshow(rgb_m, interpolation='none',origin='upper' )
plt.xticks([0,320],['0', '320'])
plt.yticks([0,200],['0', '200'])
plt.axis("off")
plt.show()

An important thing to notice is the use of imul instead of the regular multiplication opcode mul. Below we plot the result of doing x*2 when x goes from 0 to 0xff and on the same plot we show the result of doing imul x.

result = []
for a in range(0, 0xff):
    result.append([a*a, imul8(a,a)])
plt.plot(result)
plt.xticks([0,128,255],['0', '0x80', '0xff'])
plt.yticks([0,0x4000, 0x8000, 0xc000, 0xffff],['0', '0x4000', '0x8000', '0xc000', '0xffff'])
plt.show()

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
imgs		imgs
notebook		notebook
original		original
.gitignore		.gitignore
dosbox-0.74-3.conf		dosbox-0.74-3.conf
function.php		function.php
fx2.asm		fx2.asm
fx2.com		fx2.com
memories-test.asm		memories-test.asm
readme.md		readme.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of Revision'2020 PC 256 byte intro winner "Memories" by HellMood / DESiRE

Pre-requisites

Main idea of the Algorithm

Lines 1 - 13: initialization

Lines 14 - 58: main loop

Lines 61-70: Misc code

Effect: Board of chessboards

Effect: Circles Zooming

Effect: Tilted plane scrolling

Effect: Parallax checkerboards

Effect: Sierpinski rotozoomer

Effect: Raycast bent tunnel

Effect: Ocean night / to day sky

About the Author

Appendix 1: the 8086 and its architecture

Some links 🔗

Appendix 2: VGA mode 13h

Apendix 3: Jupyter notebook

Jupyter notebook to analyze the different effects used in the demo

Analyzing the contents of DX

Board of Chessboards effect

Calculating the XOR of two bytes

Simulating the effect in Python code

Zooming circles effect

About

Releases

Packages

Contributors 2

Languages

cesarmiquel/memories-256b-msdos-intro

Folders and files

Latest commit

History

Repository files navigation

Analysis of Revision'2020 PC 256 byte intro winner "Memories" by HellMood / DESiRE

Pre-requisites

Main idea of the Algorithm

Lines 1 - 13: initialization

Lines 14 - 58: main loop

Lines 61-70: Misc code

Effect: Board of chessboards

Effect: Circles Zooming

Effect: Tilted plane scrolling

Effect: Parallax checkerboards

Effect: Sierpinski rotozoomer

Effect: Raycast bent tunnel

Effect: Ocean night / to day sky

About the Author

Appendix 1: the 8086 and its architecture

Some links 🔗

Appendix 2: VGA mode 13h

Apendix 3: Jupyter notebook

Jupyter notebook to analyze the different effects used in the demo

Analyzing the contents of DX

Board of Chessboards effect

Calculating the XOR of two bytes

Simulating the effect in Python code

Zooming circles effect

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages