Sega Megadrive – 8: Animated Sprites

This didn’t take nearly as long as I expected, I managed to knock it out in one afternoon all without looking at an instruction set. This stuff is finally beginning to sink in!

There are several options for animating a sprite, each with its own advantages, quirks and setbacks. The obvious one is to load all tiles for all frames into VRAM, and modify the tile ID in the sprite attribute table each frame. This is the fastest method (in terms of clock cycles) but when you consider a 32×32 sprite, that’s 16 tiles per frame, and with just 8 frames of animation that’s 4kb of VRAM for a very short animation (and I’m pretty sure the majority will end up having more than 8 frames each). The other method is to reserve space for just one frame in VRAM, and upload a new frame during vblanking as required. Straight from ROM, I’d imagine this to be a little slow, which leads us to the third method; caching it to RAM first.

Without any facts and figures to back up my claims of ROM access being slower than RAM (perhaps I’ll write a test later and see for sure), I’ll opt for uploading a new frame to VRAM during vblanking, straight from ROM. If it turns out to be slow later I’ll rethink the method, but for now I’d like to keep the code as simple as I can, without forking out the VRAM cost (since I know for sure that will be an issue later down the line).

I’ve created a small test sprite – 4 frames of 32×32 (16 tiles), each containing one letter of the word SEGA – exported one at a time in exactly the same way as in the previous articles (convert to Indexed Colour Mode in GIMP, saved as BMP, opened in BMP2Tile, set Sprite Output Mode, press * key to select whole bitmap, Save Tiles in ASM):

Let’s go!

Organising Sprite Tile Data

Again, I’m keen to use the preprocessor to make this as painless as possible. This time round, I need a bit more info than the sprite’s size – I’ll need to know eachframe’s size in bytes and tiles, too:


dc.l    $00000000
dc.l    $00000000
dc.l    $00000000
dc.l    $00000000
dc.l    $00000000
dc.l    $00000001
dc.l    $00000001
dc.l    $00000011

; Rest of frame 1...


dc.l    $00000000
dc.l    $00000000
dc.l    $00000000
dc.l    $00000000
dc.l    $00000000
dc.l    $00000001
dc.l    $00000001
dc.l    $00000011

; etc...

SegaLogoEnd                                           ; Sprite end address
SegaLogoSizeB:      equ (SegaLogoEnd-SegaLogo)        ; Animated sprite size in bytes
SegaLogoSizeT:      equ (SegaLogoSizeB/32)            ; Animated sprite size in tiles
SegaLogoSizeF:      equ (SegaLogoSizeT/16)            ; Animated sprite size in frames
SegaLogoOneFrameT:  equ (SegaLogoSizeT/SegaLogoSizeF) ; Size of one frame in tiles
SegaLogoOneFrameB:  equ (SegaLogoSizeB/SegaLogoSizeF) ; Size of one frame in bytes
SegaLogoTileID:     equ (SegaLogoVRAM/32)             ; ID of first tile
SegaLogoDimentions: equ (%1111)                       ; Sprite dimentions (4x4)

I’ve also added the sprite dimentions in a binary nybble, ready for use in the sprite attribute table, since it’s very much a static value and belongs here.

As for its assetmap entry:

SegaLogoVRAM:   equ Sprite2VRAM+Sprite2SizeB
                ; Remember, what comes next needs to be at (SegaLogoVRAM+SegaLogoSizeF),
                ; we only keep one anim frame in VRAM at a time

Sprite Animation Data

I need to define the order in which the frames appear, and for how long they appear. I’ve chosen a very simple (if a little crude) method of an array of bytes, each representing the frame ID to show for each game frame:

SegaLogoAnimData:    ; Animation data (which sprite frame gets displayed for each game frame)

dc.b    0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3

SegaLogoAnimDataEnd  ; End of animation data
SegaLogoAnimNumFrames: equ (SegaLogoAnimDataEnd-SegaLogoAnimData) ; Number of frames in animation data

At the very least, this would compress down well if/when I choose to implement some compression for the art assets. Grabbing the number of frames using the preprocessor is easy too – it’s just the number of bytes it takes up. This allows for up to 256 frames in the image, which I’m pretty sure won’t be a restriction.

Advancing a Sprite Frame

First, I’ll need a counter to keep track of the current frame. This means some organisation of the memory map – I’ve based it on the VRAM asset map method, and added a few defines for various type sizes to make it easier, since it looks like I’m in this for the long run:

; **********************************************
; Various size-ofs to make this easier/foolproof
; **********************************************
SizeByte:    equ 1
SizeWord:    equ 2
SizeLong:    equ 4
SizeTile:    equ 64
SizePalette: equ 64

; ************************************
; System stuff
; ************************************
hblank_counter        equ 0x00FF0000                ; Start of RAM
vblank_counter        equ (hblank_counter+SizeLong)

; ************************************
; Game globals
; ************************************
segalogo_anim_frame   equ (vblank_counter+SizeByte)

To advance the animation by one frame, we need to perform the following steps:

  • Get the current and next frame IDs from the anim data array
  • Increment the frame counter (and wrap round to zero if at end of anim)
  • If the current and next are the same, do nothing, otherwise…
  • Multiply the frame ID with the size of one frame, then add to the address of sprite tile data to get the ROM address of the new frame
  • Move the new sprite frame to VRAM (LoadTiles will do the job)

Here’s the resulting code in one dump, but hopefully the comments should explain each step well enough:

  ; Advance sprite to next frame
  ; d0 (w) Sprite address (VRAM)
  ; d1 (w) Size of one sprite frame (in bytes)
  ; d2 (w) Number of anim frames
  ; a0 --- Address of sprite data (ROM)
  ; a1 --- Address of animation data (ROM)
  ; a2 --- Address of animation frame counter (RAM, writeable)

  clr.l  d3              ; Clear d3
  move.b (a2), d3        ; Read current anim frame number (d3)
  addi.b #0x1, (a2)      ; Advance frame number
  cmp.b  d3, d2          ; Check new frame count with num anim frames
  bne    @NotAtEnd       ; Branch if we haven't reached the end of anim
  move.b #0x0, (a2)      ; At end of anim, wrap frame counter back to zero

  move.b (a1,d3.w), d4   ; Get original frame index (d4) from anim data array
  move.b (a2), d2        ; Read next anim frame number (d2)
  move.b (a1,d3.w), d5   ; Get next frame index (d5) from anim data array

  cmp.b  d3, d4          ; Has anim frame index changed?
  beq    @NoChange       ; If not, there's nothing more to do

  ; spriteDataAddr = spriteDataAddr + (sizeOfFrame * newTileID)
  move.l a0, d2          ; Move sprite data ROM address to d2 (can't do maths on address registers)
  move.w d1, d4          ; Move size of one sprite frame to d4 (can't trash d1, it's needed later)
  mulu.w d5, d4          ; Multiply with new frame index to get new ROM offset (result in d4)
  add.w  d4, d2          ; Add to sprite data address
  move.l d2, a0          ; Back to address register

  jsr LoadTiles          ; New tile address is in a0, VRAM address already in d0, num tiles already in d1 - jump straight to load tiles


No new trickery, it uses all of the familiar opcodes. It does require a large amount of parameters, though. If any routines require any more than this, then perhaps it’s time to start passing them in via the stack.

Putting it to Use

First, we need to load in the first frame ready:

lea      SegaLogo, a0           ; Move animated sprite address to a0
move.l   #SegaLogoVRAM, d0      ; Move VRAM dest address to d0
move.l   #SegaLogoOneFrameT, d1 ; Move number of tiles (in one anim frame only) to d1
jsr      LoadTiles              ; Jump to subroutine

…and not forgetting the sprite attribute entry, with its own palette (just contains transparency and blue):

dc.w 0x0000             ; Y coord (+ 128)
dc.b SegaLogoDimentions ; Width (bits 0-1) and height (bits 2-3) in tiles
dc.b 0x00               ; Index of next sprite (linked list)
dc.b 0x40               ; H/V flipping (bits 3/4), palette index (bits 5-6), priority (bit 7)
dc.b SegaLogoTileID     ; Index of first tile
dc.w 0x0000             ; X coord (+ 128)

Then all that’s needed is to advance the sprite animation during vblanking:

move.l  #SegaLogoVRAM, d0          ; Move sprite VRAM address to d0
move.l  #SegaLogoOneFrameB, d1     ; Move sprite size (num tiles in one anim frame) to d1
move.l  #SegaLogoAnimNumFrames, d2 ; Move number of anim frames to (size of anim data in bytes)  d2
lea     SegaLogo, a0               ; Move address of sprite data (ROM) to a0
lea     SegaLogoAnimData, a1       ; Move address of anim data (ROM) to a1
lea     segalogo_anim_frame, a2    ; Move address of current anim frame (RAM) to a2
jsr     AnimateSpriteFwd           ; Advance sprite animation

To slow down the game loop to see that each frame is displayed correctly (just for debugging purposes), I can actually make use of the delay function I wrote and abandoned in the previous article:

move.l #0x18, d0
jsr WaitFrames

That’s all I got. There’s plenty of improvements that could be made over time, such as writing the equivalent AnimateSpriteRev, and passing an animation speed param to slow it down or speed it up (which I’ll implement sooner rather than later). Here’s a dodgy GIF showing the obvious:


Source Code

Assemble with:

asm68k.exe /p spritetest.asm,spritetest.bin

Sega Megadrive – 7: Gamepad Input and the Game Loop

Since I plan on the next few articles to be about plane scrolling and sprite animation, I’d like to experiment with a few of the prerequisites first – basic pad input, timing, and the main game loop. I already have a sprite on screen, plus some subroutines for setting its X and Y coords, so I’ll aim to move it around the screen using the D-pad at various speeds. Hopefully this won’t take long.

Timing seems pretty awkward; in modern games programming I’m used to recording a frame’s delta time and using that to determine how far a character should move in one frame. This would require some extra maths which could bog the 68k down, and since the VDP has a fixed refresh rate the common technique seems to be to use hard-coded speed values, but wait for vsync at the end of the game loop. It sounds very hacky to me, since I was taught to use time deltas to achieve FPS independence all through my programming career, but let’s see how it goes.

Polling gamepad input

Gamepads are interacted with through the port control and port data addresses. These are generic 9-pin serial ports, used to connect pads, joysticks, light guns, even modems, but for simplicity I’ll assume we only have gamepads connected for now. I’ll also assume we’re not interested in port C, which is the EXT port on the back of the American Genesis model 1.

To read a pad’s state, we need to read from its data port – there’s one per port, 0x00A10003 and 0x00A10005. Only a byte at a time can be read from these addresses, and to tell the port whether we want the upper or lower byte returned we have to write bit 7 to it first. A typical read for all of the buttons goes something like this:

  • Read one byte from data port to a register (contains 00SA0000)
  • Shift the data to the upper byte
  • Write bit 7 ON to the port
  • Read one byte again to the register (contains 00CBRLDU)
  • Write bit 7 OFF to the port to put it back to normal
This should be pretty simple in code:
   ; d0 (w) - Return result
   move.b pad_data_a, d0    ; Read upper byte from data port
   rol.w  #0x8, d0          ; Move to upper byte of d0
   move.b #0x40, pad_data_a ; Write bit 7 to data port
   move.b pad_data_a, d0    ; Read lower byte from data port
   move.b #0x00, pad_data_a ; Put data port back to normal

After calling the subroutine, d0 should contain a word with bits representing Up, Down, Left, Right, A, B, C and Start, in this format: 00SA0000 00CBRLDU. I’ve added some defines to make it easy to BTST the word to check if a particular button is being held down:
pad_button_up    equ 0x0
pad_button_down  equ 0x1
pad_button_left  equ 0x2
pad_button_right equ 0x3
pad_button_a     equ 0xC
pad_button_b     equ 0x4
pad_button_c     equ 0x5
pad_button_start equ 0xD

Getting the state of a 6 button pad is a little more complex – it requires bit 7 to be set ON, and then OFF, and then ON again to retrieve the 3rd byte of data. For the moment I’ll leave it out, I don’t own any 6 button controllers for testing anyway.

Waiting for vertical blanking

In order to update a sprite’s position without causing any tearing or flickering, it’s best to modify them during vertical blanking. This is the period during which the electron beam has reached the bottom-right hand side of the screen and is in the process of moving back up to the top-left. To test for this state, we need to poll the VDP’s status register. This is as simple as reading a word from the VDP control port. The word’s bits represent the following:

  • 0: Region mode: OFF=NTSC, ON=PAL
  • 1: ON during a DMA operation
  • 2: ON during horizontal blanking
  • 3: ON during vertical blanking
  • 4: ON during odd frame in interlaced mode
  • 5: ON whilst two sprites have non-transparent pixels colliding
  • 6: ON whilst too many sprites are on a single scanline
  • 7: ON during a vertical interrupt
  • 8: ON if FIFO is full
  • 9: ON if FIFO is empty
  • 10-15: Unused

We’re interested in bit 4, which will get turned ON whilst the screen is being blanked to perform a vertical retrace, and OFF whilst the screen is active and drawing:

   move.w vdp_control, d0 ; Move VDP status word to d0
   andi.w #0x0008, d0     ; AND with bit 4 (vblank), result in status register
   bne    WaitVBlankStart ; Branch if not equal (to zero)

   move.w vdp_control, d0 ; Move VDP status word to d0
   andi.w #0x0008, d0     ; AND with bit 4 (vblank), result in status register
   beq    WaitVBlankEnd   ; Branch if equal (to zero)

Waiting for the vertical blanking has a second advantage – it happens once every 50th (PAL) or 60th (NTSC) of a second, so it forces our game loop to run at a maximum of 50 or 60 FPS.

Putting it all together

I’m assuming the whole idea is to ensure that the game code is fast enough to execute inside one whole frame, before the VBlank occurs, to keep it running at 24 frames per second. If it oversteps the mark, it’ll have to wait until the next VBlank which will reduce the framerate to 12. This all sounds awkward to me, but let’s see how it pans out when I make a start on the actual game.

Building on the code from the last article, I can now write a game loop which will check the gamepad data, and set the sprite’s X and Y coordinates during vertical blanking, and whilst maintaining 24 frames per second. I’ve also added a check for the A button, which will increase the speed of the sprite’s movement:

 move.l #0x80, d4 ; Store X pos in d4
 move.l #0x80, d5 ; Store Y pos in d5

 ; ************************************
 ; Main game loop
 ; ************************************

 ; ************************************
 ; Read gamepad input
 ; ************************************
 jsr ReadPad1 ; Read pad 1 state, result in d0

 move.l #0x1, d6              ; Default sprite move speed in d6

 btst   #pad_button_a, d0     ; Check A button
 bne    @NoA                  ; Branch if button off
 move.l #0x2, d6              ; Double sprite move speed

 btst   #pad_button_right, d0 ; Check right button
 bne    @NoRight              ; Branch if button off
 add.w  d6, d4                ; Increment sprite X pos by move speed

 btst   #pad_button_left, d0  ; Check left button
 bne    @NoLeft               ; Branch if button off
 sub.w  d6, d4                ; Decrement sprite X pos by move speed

 btst   #pad_button_down, d0  ; Check down button
 bne    @NoDown               ; Branch if button off
 add.w  d6, d5                ; Increment sprite Y pos by move speed

 btst   #pad_button_up, d0    ; Check up button
 bne    @NoUp                 ; Branch if button off
 sub.w  d6, d5                ; Decrement sprite Y pos by move speed

 ; ************************************
 ; Update sprites during vblank
 ; ************************************

 jsr    WaitVBlankStart ; Wait for start of vblank

 move.w #0x0, d0        ; Sprite ID
 move.w d4, d1          ; X coord
 jsr    SetSpritePosX   ; Set X coord
 move.w d5, d1          ; Y coord
 jsr    SetSpritePosY   ; Set Y coord

 jsr    WaitVBlankEnd   ; Wait for end of vblank

 jmp    GameLoop        ; Back to the top

More timing

I’ve been experimenting with some code which delays for a set number of frames, it’s currently of no use to me but along the way it’s forced me to take a more detailed look at the h/v-sync interrupts and the 68000 status register, so I’ll share my findings.

First, a correction. My original code for VDP registers 1 and 2 make the following claims:

dc.b 0x20 ; 0: Horiz. interrupt on, plus bit 2 (unknown, but docs say it needs to be on)
dc.b 0x74 ; 1: Vert. interrupt on, display on, DMA on, V28 mode (28 cells vertically), + bit 2

The values aren’t very useful, and the comments aren’t strictly true. I’ve had a good read of a document (in references) laying out each bit of each register and the following makes better sense:

dc.b 0x14 ; 0: Horiz. interrupt on, display on
dc.b 0x74 ; 1: Vert. interrupt on, screen blank off, DMA on, V28 mode (40 cells vertically), Genesis mode on

The mysterious “bit 2” of the first two registers are actually compatibility modes for the SEGA Master System. Bit 2 of register 1 OFF sets 8 colours per palette, and bit 2 of register 2 OFF puts the VDP in SMS display mode. The first register needed fixing up to turn on horizontal sync interrupts.

The horizontal and vertical sync interrupts are jumped to each time the proton beam reaches the right-hand side of the screen, and when it reaches the bottom-right hand corner of the screen. As far as I can find, the horizontal interrupt is the most frequently occuring event we can monitor. By reserving an integer’s worth of RAM, we can increment a counter every time it fires, and use that as a system tick count. Surprisingly, this is the first time I’ve even used main memory – everything I’ve done so far transfers data straight from cartridge ROM into the VDP’s arena. I’ll need a memory map:

hblank_counter        equ 0x00FF0000  ; Start of main RAM
vblank_counter        equ 0x00FF0004

I’ll start off simple, but perhaps later I could come up with some sort of macro to allow me to specify how much to allocate, and the address could be incremented automatically – like a simplified version of the art asset size/address defines.

The interrupts have already been defined in the init code, so we just need to put them to good use:

   addi.l #0x1, hblank_counter    ; Increment hinterrupt counter

   addi.l #0x1, vblank_counter    ; Increment vinterrupt counter

ADDI is one of the rare opcodes that can operate directly on a memory address, without having to load the value into a register first. This is a good thing, since the work done inside the interrupts must be absolutely minimal, we have very few clock cycles available before the proton beam has finished resetting. Next, interrupts must be enabled via the status register. In my original init code, I initialised this register to 0x2700 as per some sample code, with little thought as to what it was up to. I’ve found some information about its bits:

  • 0 – Trace exception
  • 1 – Unused
  • 2 – Supervisor mode (always enable)
  • 3 – Unused
  • 4 – Unused
  • 5 – Interrupt level (zero for all interrupts enabled)
  • 6 – Interrupt level
  • 7 – Interrupt level
  • 8 – Unused
  • 9 – Unused
  • 10 – Unused
  • 11 – CCR Extend
  • 12 – CCR Negative
  • 13 – CCR Zero
  • 14 – CCR Overflow
  • 15 – CCR Carry

I’ve encountered Supervisor Mode before, on the Atari ST. It allows non-user-mode operations to be called (OS traps and such), but I’m not sure what functionality is prohibited if it were turned of on the Megadrive. Bottom line, it needs to be ON. I also need to ensure that the three interrupt level bits are OFF – these determine the lowest interrupt level that is allowed to fire, and at the moment I don’t know which interrupts qualify for what level so I’ve enabled them all. With this in mind, I’ve corrected the init code to read:

; Init status register (no trace, supervisor mode, all interrupt levels enabled, clear condition code bits)
move #0x2000, sr

Now the h/v-sync interrupts should fire periodically, and we can fashion some sort of time delay out of them:

   ; d0 - Number of frames to wait

   move.l  vblank_counter, d1 ; Get start vblank count

   move.l  vblank_counter, d2 ; Get end vblank count
   subx.l  d1, d2             ; Calc delta, result in d2
   cmp.l   d0, d2             ; Compare with num frames
   bge     @End               ; Branch to end if greater or equal to num frames
   jmp     @Wait              ; Try again


I’ll probably use it to add delays between the startup game states (startup logo to main menu to first level) since it’s pretty useless for anything in the main game loop. Even if I don’t use it, I learned something along the way.


Source code

Assemble with:

asm68k.exe /p spritetest.asm,spritetest.bin


Sega Megadrive – 6: Scary Monsters and Nice Sprites

Sprites! Wonderful little things. I have fond memories of getting one on screen on the Commodore 64 back when I was wearing size 9’s, after typing in several hundred lines of code from some tutorial in a microcomputer magazine.

Armed with a few VDP skills – loading patterns, organising artwork into VRAM, setting tile IDs – basic sprite work seems to be pretty easy. Sprite tiles are uploaded to VRAM in the same way as plane A/B patterns, and are again referred to using tile IDs. The structure to describe a sprite’s attributes is a little more complex, as are some of the rules for displaying and sorting sprites, but I’m confident I can wrap the basics up in just a few paragraphs. Sprites can be displayed at any X/Y screen coordinates; they’re not tied to cells like the other planes. They have their own plane, too.

Sprites – The basics

Sprites use the same pattern data as the A/B planes, and can be made up of more than one pattern, too. The VDP supports a grid of up to 4×4 patterns, and will manage their positioning for us to fit together, so it’s possible to have a sprite of up to 32×32 pixels in size. It’s feasible to support larger sizes, but they would have to be positioned manually. All patterns in the sprite must share one palette.

First we need a sprite for testing. I’ve found some great free sprites after some hunting around, and settled on this little monster – he’s a 24×24, 16 colour beast under the Creative Commons license (source in References section). After some tidying up the PNG file in The Gimp, I converted it to indexed mode, 16 colour (optimised palette):

After importing it into Bmp2Tile, set Sprite Output Mode, press the * key to select the whole image, and then export both the tiles and palette. The palette looks garbled in the preview, but it does export correctly; either I don’t understand how to make it show properly, or Windows 7 doesn’t correctly handle DJGPP/Allegro applications.

Loading sprite artwork

Sprites use the same VRAM pattern memory as the A/B planes, so for moving them to VRAM I’ll simply rename the LoadFont subroutine to something more generic like LoadTiles. Now that we’re dealing with more than one art asset, it might be worth figuring out how to organise it all into VRAM. We can modify the preprocessor tricks to do this all for us, and move these addresses to a separate file so they can be layed out neatly. Here’s a quick example of how I’ve arranged the Pixel Font and a few tiles for various sprites, of various sizes:


; ************************************
; Art asset VRAM mapping
; ************************************
PixelFontVRAM:  equ 0x0000
Sprite1VRAM:    equ PixelFontVRAM+PixelFontSizeB
Sprite2VRAM:    equ Sprite1VRAM+Sprite1SizeB
Sprite3VRAM:    equ Sprite2VRAM+Sprite2SizeB

; ************************************
; Include all art assets
; ************************************
    include 'assets\fonts\pixelfont.asm'
    include 'assets\sprites\sprite1.asm'
    include 'assets\sprites\sprite2.asm'
    include 'assets\sprites\sprite3.asm'

; ************************************
; Include all palettes
; ************************************
    include 'assets\palettes\paletteset1.asm'


PixelFont: ; Font start address

    dc.l    $01111100
    dc.l    $11000110
    dc.l    $10111010
    dc.l    $10000010
    dc.l    $10111010
    dc.l    $10101010
    dc.l    $11101110
    dc.l    $00000000

; ...etc

PixelFontEnd                                 ; Font end address
PixelFontSizeB: equ (PixelFontEnd-PixelFont) ; Font size in bytes
PixelFontSizeW: equ (PixelFontSizeB/2)       ; Font size in words
PixelFontSizeL: equ (PixelFontSizeB/4)       ; Font size in longs
PixelFontSizeT: equ (PixelFontSizeB/32)      ; Font size in tiles
PixelFontTileID: equ (PixelFontVRAM/32)      ; ID of first tile



    dc.l    $11111111    ;  Tile: 1
    dc.l    $10000001
    dc.l    $10000001
    dc.l    $10000001
    dc.l    $10000001
    dc.l    $10000001
    dc.l    $10000001
    dc.l    $11111111

    dc.l    $11111111    ;  Tile: 2
    dc.l    $10000001
    dc.l    $10000001
    dc.l    $10000001
    dc.l    $10000001
    dc.l    $10000001
    dc.l    $10000001
    dc.l    $11111111

Sprite1End                                 ; Sprite end address
Sprite1SizeB: equ (Sprite1End-Sprite1)     ; Sprite size in bytes
Sprite1SizeW: equ (Sprite1SizeB/2)         ; Sprite size in words
Sprite1SizeL: equ (Sprite1SizeB/4)         ; Sprite size in longs
Sprite1SizeT: equ (Sprite1SizeB/32)         ; Sprite size in tiles
Sprite1TileID: equ (Sprite1VRAM/32)         ; ID of first tile

I’ve also moved the palettes to their own file for consistency. Now asset files can be added and removed at will, and it should be simple to keep them organised correctly in VRAM. It’s not an all-round solution, if the game gets big we have neither the space nor the need to fit everything in VRAM at once, so I’ll need to come up with a more dynamic solution if and when the time comes. For now, this is perfect for my needs.

Drawing sprites

Sprites are drawn by filling in details in a sprite attribute table. The VDP’s memory contains an area specifically for this attribute data – at 0xE000 – set in register 5 during intialisation. Each entry is 8 bytes long, and looks a little something like this:



Y   = Y coord (from -128 to screen height + 128)
H/V = Sprite grid dimensions, in tiles
N   = Index of next sprite attribute (a linked list next ptr)
D   = Draw priority
P   = Palette index
F   = Flip bits (vert. and horiz.)
T   = Index of first tile in sprite
X   = X coord (from -128 to screen width + 128)

The sprite window’s coordinate system has a 128 pixel border, assumingly to allow sprites to be partially or fully hidden off screen, so for the sprite to be visible in the top-left corner coords of 128,128 must be set. The X and Y coordinates (of the top-left corner of the sprite) are defined in 10 bits each, although I’m unsure as to why they are at opposite ends of the structure. The 4 bits for the grid dimensions define how large the sprite will be, in tiles. It accepts all combinations from 1×1 to 4×4, and the positioning of subsequent tiles will be handled by the VDP for us automatically, as well as flipping for the entire sprite as a whole. I’m unsure what range the draw priority accepts, I’ve left it as zero for now since it’s out of the scope of this article. There are two bits for the H and V flipping (I won’t be using those yet), and the index of the first pattern tile in the sprite.

This leaves us with the N – the index of the next sprite attribute struct. The VDP will draw subsequent sprites by jumping through these next pointers, until it hits 0. This is also used for the drawing order – strangely, the VDP draws front-to-back, unlike most graphics APIs I’ve worked with on other platforms where the drawing is done back-to-front, which seems to make logical sense to me. So with this in mind, expect the first sprite in the linked list to be drawn on top, and the second to be drawn underneath it.

Here’s an example:

dc.w 0x0080        ; Y coord (+ 128)
dc.b %00001111     ; Width (bits 0-1) and height (bits 2-3)
dc.b 0x00          ; Index of next sprite (linked list)
dc.b 0x00          ; H/V flipping (bits 3/4), palette index (bits 5-6), priority (bit 7)
dc.b Sprite1TileID ; Index of first tile
dc.w 0x0080        ; X coord (+ 128)

Prefixing a value with % allows it to be specified in raw binary, useful for defining the width/height bits. Here I’ve defined a sprite made of a 4×4 grid of tiles. The struct then needs moving to the VDP, using a quick subroutine:

   ; a0 - Sprite data address
   ; d0 - Number of sprites
   move.l    #vdp_write_sprite_table, vdp_control

   subq.b    #0x1, d0                ; 2 sprites attributes
   move.l    (a0)+, vdp_data
   move.l    (a0)+, vdp_data
   dbra    d0, @AttrCopy


…which is simply used with:

lea     SpriteDesc1, a0     ; Sprite table data
move.w  #0x1, d0            ; 1 sprite
jsr     LoadSpriteTables

Providing the 16 tiles have been loaded into VRAM, as well as the correct palette, we should have a big bad monster:

Moving sprites

Since sprites can be positioned at any X/Y coord, part of the point of them is to be able to move them about at runtime, so we’ll need subroutines to modify the X and Y coords. Not too difficult, just write to the correct addresses in the sprite attribute table:

   ; Set sprite X position
   ; d0 (b) - Sprite ID
   ; d1 (w) - X coord
   clr.l    d3                          ; Clear d3
   move.b    d0, d3                     ; Move sprite ID to d3

   mulu.w    #0x8, d3                   ; Sprite array offset
   add.b    #0x6, d3                    ; X coord offset
   swap    d3                           ; Move to upper word
   add.l    #vdp_write_sprite_table, d3 ; Add to sprite attr table

   move.l    d3, vdp_control            ; Set dest address
   move.w    d1, vdp_data               ; Move X pos to data port


   ; Set sprite Y position
   ; d0 (b) - Sprite ID
   ; d1 (w) - Y coord
   clr.l    d3                          ; Clear d3
   move.b    d0, d3                     ; Move sprite ID to d3

   mulu.w    #0x8, d3                   ; Sprite array offset
   swap    d3                           ; Move to upper word
   add.l    #vdp_write_sprite_table, d3 ; Add to sprite attr table

   move.l    d3, vdp_control            ; Set dest address
   move.w    d1, vdp_data               ; Move Y pos to data port


Used with:

move.w  #0x0,  d0      ; Sprite ID
move.w  #0xB0, d1      ; X coord
jsr     SetSpritePosX  ; Set X pos
move.w  #0xB0, d1      ; Y coord
jsr     SetSpritePosY  ; Set Y pos

Just for good measure, I’ve added another monster friend to demonstrate drawing two sprites, making sure to set the next sprite ID in the linked list, and terminating the second sprite with a 0:

dc.w 0x0000        ; Y coord (+ 128)
dc.b %00001111     ; Width (bits 0-1) and height (bits 2-3) in tiles
dc.b 0x01          ; Index of next sprite (linked list)
dc.b 0x00          ; H/V flipping (bits 3/4), palette index (bits 5-6), priority (bit 7)
dc.b Sprite1TileID ; Index of first tile
dc.w 0x0000        ; X coord (+ 128)

dc.w 0x0000        ; Y coord (+ 128)
dc.b %00001111     ; Width (bits 0-1) and height (bits 2-3) in tiles
dc.b 0x00          ; Index of next sprite (linked list)
dc.b 0x20          ; H/V flipping (bits 3/4), palette index (bits 5-6), priority (bit 7)
dc.b Sprite2TileID ; Index of first tile
dc.w 0x0000        ; X coord (+ 128)

Here’s the finished result:

Check those badasses out.

There’s plenty more I could expand on – draw priorities and sorting, limitations of sprite drawing, subroutines to add and remove sprites at runtime – all in good time.


Source code

Assemble with:

asm68k.exe /p spritetest.asm,spritetest.bin


Sega Megadrive – 5: Fonts and Text

The Hello World example was pretty simplistic, with only the necessary font glyphs created and all of the tile IDs hard coded to write the phrase. It can be taken a few steps further without too much work, allowing us to write arbitrary strings at any tile coordinates, in a variety of colours.

First, we need a complete font. I’ll abandon my embarrassing programmer art and instead convert a nice, tidy, opensource font to the pattern format, and keep it in a separate file to be loaded and dumped at any time using some Load/Unload routines – like how I’d expect to work with other art assets in the future. This also means I’ll need to deal with organising some locations in VDP memory to store arbitrary pieces of art, since up until now I’ve been uploading patterns to VRAM address 0x000, and this won’t work when dealing with more than one asset.

Secondly, I’ll write a text display subroutine, which accepts the font, string, X and Y coordinates and colour palette as parameters, which will be used to build the tile descriptor words before sending them to the VDP.

The font

I’ll need a nice font. For now I’ll be converting the font into a bitmap format and using a tool to convert each glyph into an assembly snippet, but if I need to do this sort of thing often I might put my C++ skills into gear and write a tool to do it automatically. I like tools. I won’t use every known character – after all, we’ve only got 64kb of graphics memory to play with, and it’s unlikely I’ll be making use of characters other than alphanumerical, full-stop and comma, and a few others. If I happen to need any more, I can always go back and add them at a later date.

The font needs to be perfectly legible at a size of 7×7 (8×8 tiles, but leaving a one-line space). It wasn’t easy to find one that matches the specification that’s also free to use, but low and behold I found an absolute beauty – a 7×7 pixel font under the Creative Commons license (link to the font and license in references section):

It needs a bit of tidying up first – I won’t make use of the smaller alpha characters, nor will I need all of those special characters, and a few of them don’t look like they’re 7×7 pixels in size either, but it’s a great start. Here’s my trimmed and corrected version – I’ve removed unneccesary characters, resized the brackets and created a new forward-slash from scratch, and aligned each character to an 8×8 grid (taking care that the bottom and right line of each cell remained blank, except for the comma):

Next, it needs converting to pattern data. For this, I used a tool called BMP2Tile, which dumps out tile data in assembly. To use this, I exported the font as a BMP file, opened it up in BMP2Tile, pressed the * key to select the entire image, then File -> Save Tiles -> In ASM. It dumps out a file containing each tile in ASM format, but it needed a few corrections making. I removed the size metadata (I’ll be writing my own) and replaced all 0’s with 1’s, and all F’s with 0’s so that the background is transparent, and the text will use colour 1. I could also go one step further and fill in the font face with a different colour, I might backtrack and do this at a later date, but for now I don’t want to waste any more palette entries on just a font.

Font attributes

As mentioned earlier, I’ll need to solve the problem of fitting more than one asset into the VDP at a time – I can’t just write artwork to VRAM address 0x000, there needs to be some organisation of what will fit where. To do this, and be able to refer to the correct tile IDs when setting up plane tiles, we need to know the address of the font, the size of the font in tiles, and the index of the first tile. Instead of sitting there counting it all, we can make use of the assembler’s preprocessor:

PixelFont: ; Font start address

dc.l    $01111100
dc.l    $11000110
dc.l    $10111010
dc.l    $10000010
dc.l    $10111010
dc.l    $10101010
dc.l    $11101110
dc.l    $00000000

; Rest of font data...

PixelFontEnd                                 ; Font end address
PixelFontSizeB: equ (PixelFontEnd-PixelFont) ; Font size in bytes
PixelFontSizeW: equ (PixelFontSizeB/2)       ; Font size in words
PixelFontSizeL: equ (PixelFontSizeB/4)       ; Font size in longs
PixelFontSizeT: equ (PixelFontSizeB/32)      ; Font size in tiles
PixelFontVRAM:  equ 0x0100                   ; Dest address in VRAM
PixelFontTileID: equ (PixelFontVRAM/32)      ; ID of first tile

Now we have some defines for all of the font’s sizes and addresses in various units, and they’ll be correct wherever we include the font file in code. I’ve chosen the arbitrary VDP address 0x0100 to upload the font to, simply as a demonstration (and to make sure the addressing works correctly when I implement the code), but I’m sure when I start making use of more artwork I’ll need to sit and plan the VDP’s memory layout properly.

LoadFont subroutine

This shouldn’t be too difficult, I did it in the last article, but this time we need to specify arbitrary fonts of any size, from any location, to any destination. This means we need to pass some parameters to a subroutine. There’s a few ways of achieving this – move the parameters to registers, or push data to the stack. Moving params to registers is the quickest (in terms of clock cycles) method, but we only have a limited amount of registers, and when the game code starts to get complex it would be difficult to juggle all of the registers around. The latter method allows us to specify a large amount of parameters, but since the subroutine would still need to make use of some registers internally we’d need some way of backup up and restoring them when entering and exiting the subroutine. For simplicity’s sake, I’ll go with the former method – moving parameters to registers – and if it starts to cause issues at a later date I’ll backtrack and change it.

Here’s what I came up with:

; a0 - Font address (l)
; d0 - VRAM address (w)
; d1 - Num chars (w)

swap     d0                   ; Shift VRAM addr to upper word
add.l    #vdp_write_tiles, d0 ; VRAM write cmd + VRAM destination address
move.l   d0, vdp_control      ; Send address to VDP cmd port

subq.b   #0x1, d1             ; Num chars - 1
move.w   #0x07, d2            ; 8 longwords in tile
move.l   (a0)+, vdp_data      ; Copy one line of tile to VDP data port
dbra     d2, @LongCopy
dbra     d1, @CharCopy


I’ve also defined the VDP control and data ports, as well as the VDP tile write command + address, since they’re likely to be used often. Using the subroutine should be pretty simple:

; Load font
lea        PixelFont, a0       ; Move font address to a0
move.l    #PixelFontVRAM, d0   ; Move VRAM dest address to d0
move.l    #PixelFontSizeT, d1  ; Move number of characters (font size in tiles) to d1
jsr        LoadFont            ; Jump to subroutine

As long as a palette has been uploaded too, we can use the Regen debugger to view the contents of VRAM and confirm that everything is in its right place:

Mapping ASCII characters

My aim is to be able to write arbitrary strings, defined in the ROM somewhere. The assembler encodes text characters as ASCII, which means I’ll need some method of converting each ASCII character to the font’s tile IDs. In my first attempt at this, I was only using alpha characters, and since character A in ASCII is 65 I could get away with just adding 65 to each byte in the string. Now that I’ve introduced numerical and special characters, I’ll need to come up with something else. I intend to ensure that every font I make sticks to the same characters and layout, so the simplest method would be to create a table which maps ASCII codes to tile IDs of the font. It certainly won’t be the fastest method, it means using a lookup table when drawing every character, but it’ll do for now. Perhaps a better method would be to encode the string itself to match the font tile IDs, but that would complicate development. If I need to do some optimisation, I’ll look into it.

ASCIIStart: equ 0x20 ; First ASCII code in table

dc.b 0x00   ; SPACE (ASCII code 0x20)
dc.b 0x28   ; ! Exclamation mark
dc.b 0x2B   ; " Double quotes
dc.b 0x2E   ; # Hash
dc.b 0x00   ; UNUSED
dc.b 0x00   ; UNUSED
dc.b 0x00   ; UNUSED
dc.b 0x2C   ; ' Single quote
dc.b 0x29   ; ( Open parenthesis
dc.b 0x2A   ; ) Close parenthesis
dc.b 0x00   ; UNUSED
dc.b 0x2F   ; + Plus
dc.b 0x26   ; , Comma
dc.b 0x30   ; - Minus
dc.b 0x25   ; . Full stop
dc.b 0x31   ; / Slash or divide
dc.b 0x1B   ; 0 Zero
dc.b 0x1C   ; 1 One
dc.b 0x1D   ; 2 Two
dc.b 0x1E   ; 3 Three
dc.b 0x1F   ; 4 Four
dc.b 0x20   ; 5 Five
dc.b 0x21   ; 6 Six
dc.b 0x22   ; 7 Seven
dc.b 0x23   ; 8 Eight
dc.b 0x24   ; 9 Nine
dc.b 0x2D   ; : Colon
dc.b 0x00   ; UNUSED
dc.b 0x00   ; UNUSED
dc.b 0x00   ; UNUSED
dc.b 0x00   ; UNUSED
dc.b 0x27   ; ? Question mark
dc.b 0x00   ; UNUSED
dc.b 0x01   ; A
dc.b 0x02   ; B
dc.b 0x03   ; C
dc.b 0x04   ; D
dc.b 0x05   ; E
dc.b 0x06   ; F
dc.b 0x07   ; G
dc.b 0x08   ; H
dc.b 0x09   ; I
dc.b 0x0A   ; J
dc.b 0x0B   ; K
dc.b 0x0C   ; L
dc.b 0x0D   ; M
dc.b 0x0E   ; N
dc.b 0x0F   ; O
dc.b 0x10   ; P
dc.b 0x11   ; Q
dc.b 0x12   ; R
dc.b 0x13   ; S
dc.b 0x14   ; T
dc.b 0x15   ; U
dc.b 0x16   ; V
dc.b 0x17   ; W
dc.b 0x18   ; X
dc.b 0x19   ; Y
dc.b 0x1A   ; Z (ASCII code 0x5A)

There we go, ASCII characters from 0x20 to 0x5A, mapped to font tile IDs. When looking them up, I’ll need to add 0x20 to the ASCII code, so I’ve also defined this for readability.

Drawing text

The methods used to get the text on screen should be very similar to the previous article – set up the Plane A tile IDs. We already have the ID of the first tile in VRAM (PixelFontTileID), so we just need to offset that by the tiles in the ASCII map. For the time being, I’ll be looking up the table whilst it is still in ROM, but I have doubts about the speed of reading data from cartridge so in future I may move the table into a location in main RAM to make the lookups faster (unless, of course, I discover that there’s no major difference). The same may go for the string data itself.

The first step is to calculate the destination address in VRAM. Since I plan to support specifying the X and Y coordinates in tiles, the address needs to be offset by 64 for each horizintal line (in H40 mode), plus 1 for each vertical tile:

; a0 (l) - String address
; d0 (w) - First tile ID of font
; d1 (bb)- XY coord (in tiles)
; d2 (b) - Palette

clr.l    d3                     ; Clear d3 ready to work with
move.b   d1, d3                 ; Move Y coord (lower byte of d1) to d3
mulu.w   #0x0040, d3            ; Multiply Y by line width (H40 mode - 64 lines horizontally) to get Y offset
ror.l    #0x8, d1               ; Shift X coord from upper to lower byte of d1
add.b    d1, d3                 ; Add X coord to offset
mulu.w   #0x2, d3               ; Convert to words
swap     d3                     ; Shift address offset to upper word
add.l    #vdp_write_plane_a, d3 ; Add PlaneA write cmd + address
move.l   d3, vdp_control        ; Send to VDP control port

It’s the most complex thing I’ve written yet, but hopefully the comments should explain it well enough. There’s a new opcode here – ror (roll right) – which shifts bits to the right by a specified offset (up to 8). Here, ror.l #0x08, d1 is used to shift the X coord from the upper to the lower byte of a word in d1, since the swap opcode can only operate on a longword, swapping two words around. The least significant bit gets brought back round to the most significant, who’s place is determined by the operation size (so a byte-sized ror operation with offset of 1 on 0001 would give us 1000). There’s also a corresponding rol (roll left) opcode, which isn’t demonstrated here. The offset is converted to words (since the tile descriptors are 1 word in size) and added to the ‘write to plane A’ VDP command + address, which I’ve defined for ease of use.

Next, we need to set up the word-sized tile descriptor, which contains the palette ID, the pattern ID, and flip bits (not used here). The palette ID fits into two bits, and belongs in bits 14 and 15 of the tile descriptor word, so we’ll start with that. I can use the ror opcode again for this, but since it can only move bits a maximum of 8 places at a time, it’ll need doing twice in order to shift the ID up 13 bits:

clr.l    d3                     ; Clear d3 ready to work with again
move.b   d2, d3                 ; Move palette ID (lower byte of d2) to d3
rol.l    #0x8, d3               ; Shift palette ID to bits 14 and 15 of d3
rol.l    #0x5, d3               ; Can only rol bits up to 8 places in one instruction

Now we need to loop round each byte in the string, adding the pattern ID of the text glyph to d2, before sending the complete tile descriptor word to the VDP. Our exit case for the loop will be a string terminator 0x0 (so we also need to make sure our strings actually end in 0x0), and along the way we need to convert the ASCII byte to a pattern ID using the ASCII table:

lea      ASCIIMap, a1           ; Load address of ASCII map into a1

move.b   (a0)+, d2              ; Move ASCII byte to lower byte of d2
cmp.b    #0x0, d2               ; Test if byte is zero (string terminator)
beq.b    @End                   ; If byte was zero, branch to end

sub.b    #ASCIIStart, d2        ; Subtract first ASCII code to get table entry index
move.b   (a1,d2.w), d3          ; Move tile ID from table (index in lower word of d2) to lower byte of d3
add.w    d0, d3                 ; Offset tile ID by first tile ID in font
move.w   d3, vdp_data           ; Move palette and pattern IDs to VDP data port
jmp      @CharCopy              ; Next character


Hopefully it should be self-explanatory, with the exception of that move.b  (a1,d2.w), d3 line. The parenthesis mean to offset the source address of the move command – so we’re moving the byte at address a1 + d2 to d3. This is how array access is done in 68k assembler. I haven’t yet tested, but I’m assuming the same can be done for the destination addresses, so offsets into the array can be written to as well.

The subroutine relies on the string being zero-terminated, else it will continue to loop until it finds one and just displays garbage. For each string, I’ll need to remember to append the zero manually, unlike in languages like C where strings inside double-quotes are automatically one byte longer than the string was defined, to hold the terminator.


Since the font includes the ” character, if we were to use it in a string constant we will need the equivalent of an ‘escape character’ in C, and that is to prefix the ” with another “. This seems to be unique to the ASM68K assembler, the C escape characters are used in other assemblers.

Here’s the finished result, showing off a few different strings, colour palettes and X/Y coordinates:

; Load font
lea       PixelFont, a0        ; Move font address to a0
move.l    #PixelFontVRAM, d0   ; Move VRAM dest address to d0
move.l    #PixelFontSizeT, d1  ; Move number of characters (font size in tiles) to d1
jsr       LoadFont             ; Jump to subroutine

; Draw text
lea       String1, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0501, d1          ; XY (5, 1)
move.l    #0x0, d2             ; Palette 0
jsr       DrawTextPlaneA       ; Call draw text subroutine

lea       String2, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0502, d1          ; XY (5, 2)
move.l    #0x1, d2             ; Palette 1
jsr       DrawTextPlaneA       ; Call draw text subroutine

lea       String3, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0503, d1          ; XY (5, 3)
move.l    #0x2, d2             ; Palette 2
jsr       DrawTextPlaneA       ; Call draw text subroutine

lea       String4, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0504, d1          ; XY (5, 4)
move.l    #0x3, d2             ; Palette 3
jsr       DrawTextPlaneA       ; Call draw text subroutine

lea       String5, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0106, d1          ; XY (1, 6)
move.l    #0x3, d2             ; Palette 3
jsr       DrawTextPlaneA       ; Call draw text subroutine

lea       String6, a0          ; String address
move.l    #PixelFontTileID, d0 ; First tile id
move.w    #0x0107, d1          ; XY (1, 7)
move.l    #0x3, d2             ; Palette 3
jsr       DrawTextPlaneA       ; Call draw text subroutine

  ; Text strings (zero terminated)
  dc.b "0123456789",0
  dc.b ",.?!()""':#+-/",0
  dc.b "OVER THE LAZY DOG",0

  ; Include art assets
  include 'fonts\pixelfont.asm'

There’s plenty of improvements which can be made in future – there’s only support for uppercase letters (although the ASCII table could map any lowercase characters to the uppercase pattern IDs just for completeness), there’s no text wrapping at the end of a line (although perhaps some higher-level UI code could handle that). It would also be quite easy to be able to specify a font’s colour in the LoadFont subroutine, which would just replace any 1’s in the patterns as it copies.

It’s also unlikely that the code will be the fastest and most optimal method to do this sort of thing, but I’m still learning.



This source contains some corrections and improvements to init.asm posted previously:


Sega Megadrive – 4: Hello, world!

Time to get serious. I’ve got as far as getting my assembler, emulator and debugger working, I’ve learned some basics of 68000 assembly language, and the Megadrive is now initialised and ready to do something. Unfortunately this step wasn’t any easier, the VDP is a complicated beast to get going and has many quirks. Anyway, the aim of this article – however long – is to explain how I got “HELLO WORLD” on screen.

It’s not as simple as printf(“Hello, world!”). The machine has no standard I/O library, no debug text system, and no concept of a font whatsoever. Tiles representing text glyphs need to be created from scratch and moved to the correct positions on the VDP, as do the colour palettes used to paint them. I’ll make a start with all of the theory that I’ve learned on palettes, patterns and planes first.


The Megadrive’s VDP represents a colour in 9 bits, using 3 bits each for the red, green and blue components. With 3 bits, each component has 8 possible values, therefore the VDP is capable of displaying 512 colours. Colours must be predefined, and are stored in a section of VDP memory in tables of 16 colours – called palettes. This section of memory is called CRAM (colour RAM), and there’s space for 64 colour entries, therefore the VDP can store 4 palettes of 16 colours at any one time. The palettes can be swapped in and out from main RAM at any time, so this isn’t a global restriction throughout the life of the program. A typical palette is defined something like this:

   dc.w 0x0000 ; Colour 0 - Transparent
   dc.w 0x000E ; Colour 1 - Red
   dc.w 0x00E0 ; Colour 2 - Green
   dc.w 0x0E00 ; Colour 3 - Blue
   dc.w 0x0000 ; Colour 4 - Black
   dc.w 0x0EEE ; Colour 5 - White
   dc.w 0x00EE ; Colour 6 - Yellow
   dc.w 0x008E ; Colour 7 - Orange
   dc.w 0x0E0E ; Colour 8 - Pink
   dc.w 0x0808 ; Colour 9 - Purple
   dc.w 0x0444 ; Colour A - Dark grey
   dc.w 0x0888 ; Colour B - Light grey
   dc.w 0x0EE0 ; Colour C - Turquoise
   dc.w 0x000A ; Colour D - Maroon
   dc.w 0x0600 ; Colour E - Navy blue
   dc.w 0x0060 ; Colour F - Dark green

…and that looks like this:

The colour names were guesses, I’m no artiste. Entry 0 of any palette is used to determine a transparent pixel, and is used as the background colour by default.


Patterns are blocks of image data 8 x 8 pixels in size. Each pixel colour is represented using one nybble – the ID of the colour inside a palette – so a pattern can be represented in 8 longwords of data. Here’s an example, the letter H:

   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11111110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x00000000

Assuming this pattern uses the palette given in the example above (so colour 1 represents red) and the background colour was white (colour 0 represents transparency, regardless of the value in the palette), we’d expect it to look like this if layed out on a grid:

If the 1’s were replaced with 2, it would be a green H, and if the 0’s were replaced with D it would sit on a maroon background. I haven’t utilised all of the space – there’s one line blank to the right and bottom of the glyph – to ensure that when font patterns are sat adjacent to each other there’s a very small gap, to ensure they are legible.


A plane is a kind of canvas, and the Megadrive’s VDP has 4 of them – two scrolling planes (plane A and plane B), a window plane, and a sprite plane. The scrolling and window planes can display grids made up of tiles of image patterns, positioned at predetermined cells depending on the VDP’s display mode (32×28 or 40×28 cells). The two scrolling planes can scroll lines of pixels (or groups of lines), or the entire contents left or right. The window plane is still a mystery to me, it can be moved around using the X and Y position in VDP registers 17 and 18, but it cannot overlap plane A. I don’t quite understand how it CAN’T overlap plane A, since the A and B planes can only scroll and not move around in their entirety. I’ll revisit this later.

The sprite plane can display patterns at arbitrary X and Y coordinates, and flip them vertically or horizontally. It also features priorities for each sprite, so their draw order can be defined. I’ll write up more about sprites in a further article, there’s quite a lot to them and since I’ll be doing the text display on plane A they’re beyond the scope of this post.

Preparing the VDP for writing data

This bit hurt my brain. In its basic form, moving palette and pattern data to the VDP comprises two operations: set the operation type and destination address through the control port, then move the data through the data port. Sounds simple, but the operation type and address need to be amalgamated into one longword, with a rather obscure bit structure. I’ll try to explain as best as I understand it myself. Here’s the operation/address longword split up into bits and nybbles:


The A’s hold the destination address, the B’s hold the operation type, and the 0’s are always 0. Let’s start with the address. The bits for the destination address need to be laid out in this pattern:

--DC BA98 7654 3210 ---- ---- ---- --FE

where 0 is the least significant bit, F is the most significant. For example, if we wanted to write to the VDP’s memory at address 0xC000 (which is the address of Plane A’s tile information, set via register 2 in our initialisation code), we’d first convert the address to a binary word:

1100 0000 0000 0000

and then rearrange it according to the bit template above:

0000 0000 0000 0000 0000 0000 0000 0011

Next, we need to set up the other bits to describe the type of operation we’re performing. Using six bits, we can describe the following operations:

  • 000000 – VRAM Read
  • 100000 – VRAM Write
  • 000100 – CRAM Read
  • 110000 – CRAM Write
  • 001000 – VSRAM Read
  • 101000 – VSRAM Write

These also need to be laid out into a specific order:

10-- ---- ---- ---- ---- ---- 5432 ----

So if we need to write to a VRAM address, we get:

01-- ---- ---- ---- ---- ---- 0000 ----

Put the address and the operation type together, and we get:

1000 0000 0000 0000 0000 0000 0000 0011

which in HEX is 0x40000003. Now we can move it to the VDP’s control port (I/O address 0x00C00004) to tell it we’re about to write data to VRAM address 0xC000:

move.l #0x40000003, 0x00C00004

I’m really not sure why this has to be so complex. Perhaps the bits are laid out in order of importance, so that they can be immediately acted on before the rest of the data is received. Perhaps we’re able to write a single word or byte to describe certain operations plus a small amount of data, so the bit layout needs to support this. For example, you only need to write a word of data to tell the VDP to change a register value. In any case, working this out is a bit of a pain when working regularly with the VDP, so I managed to find a javascript tool to calculate the longword for me. You’ll find it in the references section below.

Once the operation type and destination address have been written to the control port, the data itself can now be written to the data port. The VDP data port accepts data in bytes or words only, so if we need to write more data than that (which in 99% of cases, we will) then we could either increment the address manually and write it to the control port again, or make use of a feature called autoincrement. Autoincrement will – as the title vaguely suggests – automatically increment the destination address after each write to the port. Not only does this mean we can feed the data port a whole stream of information in one go, but it also means we can perform a longword write to the port, and it will be treated as two seperate word writes. To enable autoincrement, we set the autoincrement register (VDP register 15) to the amount of bytes we’d like it to increment by, which I’ll set as 2 and leave it:

   move.w #0x8F02, 0x00C00004   ; Set autoincrement to 2 bytes

Writing the data

Writing a palette

Let’s start with writing the palette. Palette 0 belongs in address 0x0000 of CRAM, so first we need to setup the VDP to write to CRAM (operation type 110000). Using the bit template above, a write operation to CRAM address 0x000 gives us 0xC0000003:

   move.l #0xC0000003, 0x00C00004 ; Set up VDP to write to CRAM address 0x0000

Next, assuming that autoincrement is still set to 2 bytes, we can move the palette data to the VDP’s data port at 0x00C00004 in one big loop:

   lea Palette, a0          ; Load address of Palette into a0
   move.l #0x07, d0         ; 32 bytes of data (8 longwords, minus 1 for counter) in palette

   move.l (a0)+, 0x00C00000 ; Move data to VDP data port, and increment source address
   dbra d0, @Loop

A new opcode here – LEA (load effective address) – which is a quicker way (both typing and CPU cycles) of loading the address of a label into an address register, verses using move.l.

We now have the opportunity to get our very first thing on screen, and confirm that everything blindly coded so far (the header, the initialisation code, the palette upload) is correct – we can use VDP register 7 to set the background colour to one of the colours in this palette. Bits 0-3 (first nybble) of register 7 represent the colour ID, and bits 4-7 (second nybble) represent the palette ID. So, using the example palette data above, we can set the background colour to pink (colour 8) using:

   move.w #0x8708, 0x00C00004  ; Set background colour to palette 0, colour 8

Build and run the ROM, and here we go:

Finally, 267 lines of code later, and we have something on screen! Fortunately, getting something a little more interesting than a big coloured window didn’t involve much work from this point.

Writing the patterns

The next step in the Hello World adventure is to design – and move to the VDP – some patterns representing all of the letters required to write the phrase. We’ll need H, E, L, O, W, R, and D.

   dc.l 0x11000110 ; Character 0 - H
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11111110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x00000000

   dc.l 0x11111110 ; Character 1 - E
   dc.l 0x11000000
   dc.l 0x11000000
   dc.l 0x11111110
   dc.l 0x11000000
   dc.l 0x11000000
   dc.l 0x11111110
   dc.l 0x00000000

   dc.l 0x11000000 ; Character 2 - L
   dc.l 0x11000000
   dc.l 0x11000000
   dc.l 0x11000000
   dc.l 0x11000000
   dc.l 0x11111110
   dc.l 0x11111110
   dc.l 0x00000000

   dc.l 0x01111100 ; Character 3 - O
   dc.l 0x11101110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11101110
   dc.l 0x01111100
   dc.l 0x00000000

   dc.l 0x11000110 ; Character 4 - W
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11010110
   dc.l 0x11101110
   dc.l 0x11000110
   dc.l 0x00000000

   dc.l 0x11111100 ; Character 5 - R
   dc.l 0x11000110
   dc.l 0x11001100
   dc.l 0x11111100
   dc.l 0x11001110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x00000000

   dc.l 0x11111000 ; Character 6 - D
   dc.l 0x11001110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11000110
   dc.l 0x11001110
   dc.l 0x11111000
   dc.l 0x00000000

Don’t stare at it too long, it makes funny patterns on your brain. There’s one character missing – the SPACE inbetween the words. That’s sort of already been implemented for us – a whole pattern of 0’s (transparency) will do the job, the VDP’s VRAM is already full of zeroes, and every tile ID on planes A, B and W is already set to zero, so an entire screen of blank patterns is already being displayed. If we skip the first pattern (32 bytes) when we write the font to the VDP, then pattern ID 0 will be a blank space.

So, we need to write this data to VRAM (that’s operation type 100000) at an offset of 0x20 (skips the first pattern). Using the bit template in the last section, that should give us the VDP command 0x40100000. 7 characters, 32 bytes each, that’s 56 longwords – let’s go:

   move.l #0x40200000, 0x00C00004 ; Set up VDP to write to VRAM address 0x0020
   lea Characters, a0             ; Load address of Characters into a0
   move.l #0x37, d0               ; 32*7 bytes of data (56 longwords, minus 1 for counter) in the font

   move.l (a0)+, 0x00C00000       ; Move data to VDP data port, and increment source address
   dbra d0, @Loop

Again, this assumes we haven’t touched the autoincrement register, and it’s still set to 2 bytes. Now the font data is in the VDP’s memory, sitting dormant until we set one of the planes up to paint them.

Matching Patterns to Tiles

As mentioned before, pattern #0 is already being drawn to every tile of planes A, B and W. To get some of these characters on screen, we need to change those tiles’ pattern IDs to those of the patterns we’d like to draw. The data that describes how each tile is drawn lives in VRAM, and there’s a block of data for each plane – the addresses for these are set up in VDP registers 2, 3, and 4, for planes A, W and B respectively. For this article, I’ll be drawing the text to plane A, which has been set to address 0xC000 in VDP register 3. All information needed to describe the tile fits into one word, and again we need to shuffle some bits around to match a template:



  • Bit A – Low or high plane (I don’t quite understand this yet)
  • Bits B – Colour palette ID (0, 1, 2 or 3)
  • Bit C – Horizontal flip (0 = drawn as-is, 1 = flip the tile horizontally)
  • Bit D – Vertical flip (0 = drawn as-is, 1 = flip the tile vertically)
  • Bits E – the ID of the pattern to be drawn

So, if we’d like to draw a pattern using colour palette 0, with no flipping, then it’s as easy as writing the pattern ID to the tile’s address. Let’s test it by setting plane A tile 0 to pattern ID 1, which should be the letter H. First, we need to put together the VDP command to write to VRAM (operation type 100000) at address 0xC000 using the bit template – this should give us 0x40000003.

   move.l #0x40000003, 0x00C00004 ; Set up VDP to write to VRAM address 0xC000 (Plane A)
   move.w #0x0001, 0x00C00000     ; Low plane, palette 0, no flipping, tile ID 1

Assemble and run, and we should see the Letter H in the top-left hand corner of the screen:

To keep this article simple, I won’t dwell into changing the palette or applying flipping, there’s no need yet.

Now it should be simple to display the rest of the characters; assuming autoincrement is still set to 2 we can write to consecutive tiles one by one:

   move.l #0x40000003, 0x00C00004 ; Set up VDP to write to VRAM address 0xC000 (Plane A)

   ; Low plane, palette 0, no flipping, plus tile ID...
   move.w #0x0001, 0x00C00000     ; Pattern ID 1 - H
   move.w #0x0002, 0x00C00000     ; Pattern ID 2 - E
   move.w #0x0003, 0x00C00000     ; Pattern ID 3 - L
   move.w #0x0003, 0x00C00000     ; Pattern ID 3 - L
   move.w #0x0004, 0x00C00000     ; Pattern ID 4 - O
   move.w #0x0000, 0x00C00000     ; Pattern ID 0 - Blank space
   move.w #0x0005, 0x00C00000     ; Pattern ID 5 - W
   move.w #0x0004, 0x00C00000     ; Pattern ID 4 - O
   move.w #0x0006, 0x00C00000     ; Pattern ID 6 - R
   move.w #0x0003, 0x00C00000     ; Pattern ID 3 - L
   move.w #0x0007, 0x00C00000     ; Pattern ID 7 - D

A tidier way would be to have a table of the pattern IDs and use a loop to write the data, but since the next article will be about writing a proper text display routine there’s no real need to complicate this supposedly “simple” example any further.
Here’s the finished result:



Sega Megadrive – 3: Awaking the Beast

This bit was difficult. When the Megadrive is turned on, you get a blank slate. Nothing is initialised for you – the RAM is full of garbage, the controller ports are dead, and the VDP is cold, alone and scared – you have to restore some sanity and set each piece up one by one. What makes it even more difficult, is that you get no visual feedback that it’s been done correctly until you’ve set up enough things to start displaying something on screen – and that takes a LOT of code.

I’ve found various tutorials and code samples showing how to initialise the Megadrive, to the point where we can begin doing some VDP work and get a few pixels showing. Unfortunately they were a little complex for me, I lost some hair trying to get it to work with my chosen assembler, a lot of things were left unexplained, and I’ve had to do some research to fill in the gaps. Now that I know how each step works I’ve since rewritten the code, breaking things down into smaller steps and commenting every line. Here’s each step explained:

1. Checking the Reset Button

The first thing to figure out is if we need to do anything at all. If the player pressed the reset button, then everything will already have been setup and we can just jump straight to the action again. From all the sample code I’ve seen, two separate reset indicators are checked – one is the physical button on the console, but I can’t find any information about the other one. Perhaps it has something to do with the expansion port, so that future addon hardware (the MegaCD or the 32X) can trigger a software reset. Anyway, here’s how to check:

EntryPoint:          ; Entry point address set in ROM header
   tst.w 0x00A10008  ; Test mystery reset (expansion port reset?)
   bne Main          ; Branch if Not Equal (to zero) - to Main
   tst.w 0x00A1000C  ; Test reset button
   bne Main          ; Branch if Not Equal (to zero) - to Main

If the results of the test are non-zero, then a soft reset has occurred and we can branch straight to Main, skipping all of this initialisation.

We test two addresses – they’re not addresses in main memory, but mapped to some specific hardware ports. Addresses starting from 0x00A00000 are not those of main RAM, but are the system I/O areas, which point to various ports or the memory of other coprocessors within the Megadrive. Most of the system I/O addresses can be found in a technical manual straight from Sega themselves, which can be found in the references at the bottom of this post.

2. Clearing the RAM

When the system is powered up, the RAM could be in any old state. Most good emulators clear it when loading a ROM, but this isn’t going to be of much help when I finally get hold of some development hardware and start scratching my head at the garbled mess on screen. We know the Megadrive’s RAM is 64kb in size, and technically we know where its address mappings begin and end since we’ve defined that in the ROM header, but it seems to be common practise to rely on the machine’s ability to wrap around the end of the physical addresses back to the beginning, and clear it from 0x00000000 backwards.

If we put 0x00000000 into an address register, and then use pre-decrement when writing a zero to that address, we’ll wrap around to the end of memory and clear the last byte:

move.l #0x00000000, d0     ; Place a 0 into d0, ready to copy to each longword of RAM
move.l #0x00000000, a0     ; Starting from address 0x0, clearing backwards
move.l #0x00003FFF, d1     ; Clearing 64k's worth of longwords (minus 1, for the loop to be correct)
move.l d0, -(a0)           ; Decrement the address by 1 longword, before moving the zero from d0 to it
dbra d1, @Clear            ; Decrement d0, repeat until depleted

I’ve purposely written a whole longword to the d1 register, where just a word-sized MOVE would suffice for the byte count 0x3FFF. This is because I have no idea if the registers will have been cleared or not when the system was powered on. Better safe than crashy.

3. Writing the TMSS

The Trade Mark Security Signature – or TMSS – was a feature put in by Sega to combat unlicensed developers from releasing games for their system, which is a kind of killswitch for the VDP. It’s the pinnacle of security systems, a very sophisticated encryption key which is almost uncrackable. You write the string “SEGA” to 0x00A14000.

This was only implemented in the second hardware version of the Megadrive, so we need to test the system’s version number at mapped I/O address 0x00A10001 before proceeding. This points to a byte of read-only memory, possibly on another chip, which stores the version ID (bits 0-3), CPU clock/region (bit 6 on = 7.60mhz PAL, off = 7.67mhz NTSC), and domestic/overseas model (bit 7). We only need to test the bottom four bits (one nybble):

   move.b 0x00A10001, d0      ; Move Megadrive hardware version to d0
   andi.b #0x0F, d0           ; The version is stored in last four bits, so mask it with 0F
   beq @Skip                  ; If version is equal to 0, skip TMSS signature
   move.l #'SEGA', 0x00A14000 ; Move the string "SEGA" to 0xA14000

I’m unsure at what point the signature is checked and VDP killswitch activated, whether it’s by time or the first VDP command is sent. Either way, the VDP is now safe. There’s also a new opcode there – ANDI (immediate logic AND), which ANDs two values, storing the result in d0.

4. Initialising the Z80

Next, we can begin initialising each of the Megadrive’s coprocessors, starting with the Zilog Z80. The Z80 is the same 8-bit chip used in the Sega Master System, and in the Megadrive it acts as both a controller for the PSG and FM sound chips, and a backwards compatibility processor for playing Master System games (with an appropriate adapter for the cartridge). The Z80 has its own set of registers, and various command and data ports for sending it instructions and information, as do the other coprocessors. It also has 8kb of RAM to itself. To send it commands, or some data, we can simply MOVE values to mapped I/O addresses.

The Z80 needs a few things doing – first, we need to request access to its bus, so that it can listen to us. We request – or release – control of the bus by writing 0x0100 or 0 to its BUSREQ port, and then wait in a loop until we have control, by reading this same port. We also need to stop it running by holding it in a reset state – again by writing a 1 to one of its ports. Whilst we’re holding it in this state, we can freely write a program to its RAM. Finally, we release control of the bus and let go of the reset state, and it can then be left alone to act on the data.

   move.w #0x0100, 0x00A11100 ; Request access to the Z80 bus, by writing 0x0100 into the BUSREQ port
   move.w #0x0100, 0x00A11200 ; Hold the Z80 in a reset state, by writing 0x0100 into the RESET port

   btst #0x0, 0x00A11100   ; Test bit 0 of A11100 to see if the 68k has access to the Z80 bus yet
   bne @Wait               ; If we don't yet have control, branch back up to Wait

Here’s a new opcode, BTST (bit test). It does the same as TST, but only compares the least significant bits.

Now the 68000 has access to the Z80’s bus, and the chip is held in a reset state, so we can write the program data to its memory. This is mapped from 0xA000000.

   move.l #Z80Data, a0      ; Load address of data into a0
   move.l #0x00A00000, a1   ; Copy Z80 RAM address to a1
   move.l #0x29, d0         ; 42 bytes of init data (minus 1 for counter)
   move.b (a0)+, (a1)+      ; Copy data, and increment the source/dest addresses
   dbra d0, @Copy

   move.w #0x0000, 0x00A11200 ; Release reset state
   move.w #0x0000, 0x00A11100 ; Release control of bus

Now the chip starts running again, and begins executing the program written to its memory. I keep glossing over this ‘program’ since I don’t yet have any clue as to what it does! I’ll get some documentation and dissect it bit by bit once I start doing some audio work.

   dc.w 0xaf01, 0xd91f
   dc.w 0x1127, 0x0021
   dc.w 0x2600, 0xf977
   dc.w 0xedb0, 0xdde1
   dc.w 0xfde1, 0xed47
   dc.w 0xed4f, 0xd1e1
   dc.w 0xf108, 0xd9c1
   dc.w 0xd1e1, 0xf1f9
   dc.w 0xf3ed, 0x5636
   dc.w 0xe9e9, 0x8104
   dc.w 0x8f01

5. Initialising the PSG

This one is the Programmable Sound Generator. It can generate square waves and white noise for procedurally creating sounds. As with the Z80 program, I have no idea what the sample data does yet, I’ll look into it at a later date. Copying data to the PSG is a lot simpler than the Z80, since we can just write the data straight to its RAM through an I/O address without requesting bus access:

   move.l #PSGData, a0      ; Load address of PSG data into a0
   move.l #0x03, d0         ; 4 bytes of data
   move.b (a0)+, 0x00C00011 ; Copy data to PSG RAM
   dbra d0, @Copy

   dc.w 0x9fbf, 0xdfff

6. Initialising the VDP

The VDP – or Visual Display Processor – is the most complex of the coprocessors. It’s a dedicated graphics chip for displaying sprites and patterns, and warrants its own chapter, which I’ll write up in the next post – getting something on screen.

The VDP has its own set of registers (24 of them), as well as 64kb of dedicated RAM. Communication with the VDP is via two ports – the control port and the data port, which are I/O addresses mapped to 0x00C00004 and 0x00C00000 respectively. The control port is used for setting registers, and supplying a VDP RAM address ready to send data through the data port. The VDP can only send and receive data in bytes or words, but we can make use of a feature which automatically increments the destination address for us, and it will treat a longword write as two separate word writes. More about this feature in the next post.

Each of the VDP’s registers are used to set its various graphics modes, plane addresses and scrolling settings, amongst other things. We initialise the VDP by setting all of these registers, using a word-size command sent to the control port:

  • The top nybble is the command – 0x8XXX means set register value
  • The next nybble is the register number – so 0x80XX = set register 0, 0x81XX = set register 1, etc
  • The bottom byte is the data – so 0x82FF writes FF into register 2

To make things easier, we just keep one big table of all of the VDP’s register values, and copy the whole lot in one go:

 move.l #VDPRegisters, a0 ; Load address of register table into a0
 move.l #0x18, d0         ; 24 registers to write
 move.l #0x00008000, d1   ; 'Set register 0' command (and clear the rest of d1 ready)

 move.b (a0)+, d1         ; Move register value to lower byte of d1
 move.w d1, 0x00C00004    ; Write command and value to VDP control port
 add.w #0x0100, d1        ; Increment register #
 dbra d0, @Copy

Explanations (albeit short explanations) of the VDP registers can be found in chapter 4 of the SEGA2 doc (I’ve added a link to an HTML version in the references). Below is the minimum of things enabled to get started, but these registers will be revisited quite often as I work with more graphics features.

   dc.b 0x20 ; 0: Horiz. interrupt on, plus bit 2 (unknown, but docs say it needs to be on)
   dc.b 0x74 ; 1: Vert. interrupt on, display on, DMA on, V28 mode (28 cells vertically), + bit 2
   dc.b 0x30 ; 2: Pattern table for Scroll Plane A at 0xC000 (bits 3-5)
   dc.b 0x40 ; 3: Pattern table for Window Plane at 0x10000 (bits 1-5)
   dc.b 0x05 ; 4: Pattern table for Scroll Plane B at 0xA000 (bits 0-2)
   dc.b 0x70 ; 5: Sprite table at 0xE000 (bits 0-6)
   dc.b 0x00 ; 6: Unused
   dc.b 0x00 ; 7: Background colour - bits 0-3 = colour, bits 4-5 = palette
   dc.b 0x00 ; 8: Unused
   dc.b 0x00 ; 9: Unused
   dc.b 0x00 ; 10: Frequency of Horiz. interrupt in Rasters (number of lines travelled by the beam)
   dc.b 0x08 ; 11: External interrupts on, V/H scrolling on
   dc.b 0x81 ; 12: Shadows and highlights off, interlace off, H40 mode (40 cells horizontally)
   dc.b 0x34 ; 13: Horiz. scroll table at 0xD000 (bits 0-5)
   dc.b 0x00 ; 14: Unused
   dc.b 0x00 ; 15: Autoincrement off
   dc.b 0x01 ; 16: Vert. scroll 32, Horiz. scroll 64
   dc.b 0x00 ; 17: Window Plane X pos 0 left (pos in bits 0-4, left/right in bit 7)
   dc.b 0x00 ; 18: Window Plane Y pos 0 up (pos in bits 0-4, up/down in bit 7)
   dc.b 0x00 ; 19: DMA length lo byte
   dc.b 0x00 ; 20: DMA length hi byte
   dc.b 0x00 ; 21: DMA source address lo byte
   dc.b 0x00 ; 22: DMA source address mid byte
   dc.b 0x00 ; 23: DMA source address hi byte, memory-to-VRAM mode (bits 6-7)

7. Initialising the Controller Ports

The controller ports are generic 9-pin I/O ports, and are not particularly tailored to any device. They have five mapped I/O address each – CTRL, DATATX, RX and S-CTRL:

  • CTRL controls the I/O direction and enables/disables interrupts generated by the port
  • DATA is used to send/receive data to or from the port (in bytes or words) when the port is in parallel mode
  • TX and RX are used to send/receive data in serial mode
  • S-CTRL is used to get/set the port’s current status, baud rate and serial/parallel mode.

The SEGA2 doc mentions three controller ports – Controller 1, Controller 2, and EXP. I’m guessing EXP is the 9-pin expansion port on the back of the version 1 Genesis, perhaps intended for basic non-joypad peripherals that didn’t require the full expansion port on the bottom of the unit.

   ; Set IN I/O direction, interrupts off, on all ports
   move.b #0x00, 0x000A10009 ; Controller port 1 CTRL
   move.b #0x00, 0x000A1000B ; Controller port 2 CTRL
   move.b #0x00, 0x000A1000D ; EXP port CTRL

8. Clearing the Registers and Tidying Up

Now everything should be initialised ready for some real work, but it would be best if the actual game code could start with a clean slate. Some rubbish is still in the registers, so let’s clear it:

   move.l #0x00000000, a0    ; Move 0x0 to a0
   movem.l (a0), d0-d7/a1-a7 ; Multiple move 0 to all registers

Here’s a very useful opcode – MOVEM (move multiple). It can move data to/from a list of registers or register ranges, for example d0,d3,d5 or a3-a5. A common use for it would be to backup/restore all of the registers to/from the stack, in a single instruction.

Next, the status register. The only thing I currently understand about the status register is that certain opcodes can leave the results of an operation in it, like a return value in C/C++. After some reading, it turns out that it can also store the stack pointer register used for interrupts (so that the JMP to an interrupt routine doesn’t trample over the real stack), enable or disable interrupts, and to enable or disable tracing (calls a routine after every opcode, useful for storing callstacks for an exception handler).

   ; Init status register (no trace, A7 is Interrupt Stack Pointer, no interrupts, clear condition code bits)
   move #0x2700, sr

And that’s it! The system is initialised, albeit in a very minimal state, ready to do some work. I’ll come back and amend the init code later if I need more functionality out of the machine. Now to jump to the main game code, which I’ve labelled as __main in a separate ASM file. I’ve also labelled the JMP itself as Main, so that we branch here if the reset button has been pressed and the initialisation is skipped:

   jmp __main ; Jump to the game code!




Sega Megadrive – 2: So, assembly language, then…

I’ve been toying with the idea of learning an assembly language for some considerable time. I tried – and failed – to get to grips with 68k ASM on the Atari STe, but that was mostly not being able to figure out how to get the DevPac IDE to stop crashing. Perhaps I had a bad disk, or not enough RAM in my STe (I think it was the measly 512k model). I’ve since given 68k a second shot, on the Megadrive, and this stuff is finally beginning to sink in. This post shows the things I’ve learned so far, some of the troubles I ran into, and some of things I still find confusing.

I’ve already got 10 or so years (three of those professionally) of C and C++ programming under my belt, so I’ve had a good head start, and I’m hopeful that this won’t to be too tricky to learn. I’m already familiar with some of the more advanced concepts of programming, such as working with raw bytes, bitwise operations, address alignment, and the best types of coffee to buy to make coding sessions more productive. So, here goes…

68k Assembly – The Basics

One line of 68k assembly code equals one CPU instruction (called an opcode) plus its parameters, so it’s an almost bare-metal experience working directly with the hardware. It’s one step up from working with machine code directly. Therefore, the programs used to create binaries only assemble the code into CPU instructions, there’s no real compiling involved. Fortunately, that means assembling is really fast, and you know exactly what you’re getting. Unfortunately, that means you have to do all of the hard work yourself, there’s limited language ‘features’ to help out – functions, enums, classes and structs, templates – just forget about them.

The purpose of most opcodes is to perform an operation on one or more bytes of data. This could be to move bytes from one location to another, or perform some arithmetic on them. The CPU is incapable of performing most tasks on the data whilst it is in main RAM, instead it has its own localised storage spaces (physically on the chip) where data is temporarily stored so it can be manipulated. These spaces are called registers, and the 68000 has 16 of them. 8 of them are general purpose registers – this is where the majority of arithmetic work will be done. Each general purpose register is 32 bits in size. The other 8 are address registers, and are only used for storing addresses of main memory for fetching or returning data from it, so they’re basically pointers that are attached to the CPU.

The general purpose registers have names d0 – d7, and the address registers a0 – a7. So, the fourth general purpose register is called d3, and the second address register is called a1. Some registers have aliases for ease of use. For example, a7 is commonly used as the stack pointer, and can also be referred to in code as ‘sp’.

Opcodes can perform operations using data from varying sources – main memory, one or more registers, or an immediate value (an integer, hex value, or binary value). Here’s a few examples of the MOVE opcode, it takes the first parameter, and moves it to the register or address in the second parameter:

 move.l #$10, d0   ; Moves the hex value 0x10 (decimal 16) to register d0
 move.l #%0101, d0 ; Moves the binary value 0101 (decimal 5) to register d0
 move.l #12, d0    ; Moves the decimal value 12 to register d0
 move.l d1, d0     ; Moves the value stored in register d1 to register d0
 move.l 0x8000, d0 ; Moves the value stored at address 0x8000 to register d0
 move.l d0, 0x8000 ; Moves the value stored in register d0 to address 0x8000
 move.l (a0), d0   ; Moves the value stored at the address in a0 to register d0
 move.l d0, (a0)   ; Moves the value stored in register d0 to the address stored in register a0

The first three examples show how to move immediate values to a register, signified by the # symbol before the value. An immediate value can be a hex value (prefixed with either $ or 0x), a binary value (prefixed with %), or a decimal value (no prefix). So to specify the immediate hex value 12, use #$12 or #0x12, to specify the binary value 0011 use #%0011, or for the decimal value 128 use #128. Example 4 shows how to move the contents of a register to another, and examples 5 and 6 show how to move the contents stored at an address in main memory to a register, and vice versa. Examples 7 and 8 show the same thing, but that main memory address is stored in the register a0. The brackets around register (a0) specify that the value at the address stored in a0 is to be moved, not the address itself, similar to the dereference operator in C/C++. Omitting the brackets would just move the address.

Not all opcodes can deal with data from all sources. Some can only operate on data in registers, some may or may not be able to use immediate values, and only select few opcodes can deal with data straight from main memory. A list of all of the 68k’s opcodes, including details of their usage and which source/destination values are permitted, are in the 68k Instruction Set PDF in the references section below.

The .l after the opcode is the size of the operation, in this case moving a longword of data (4 bytes). Opcodes can operate on bytes (.b, 8 bits), words (.w, 2 bytes) or longwords (.l, 4 bytes). Not all opcodes can operate on all data sizes, I’ve been checking the Instruction Set for which sizes are supported.

A few opcodes

I’ve been doing this for three months, and so far I’ve only used about 10 opcodes. It’s impressive how simple low-level computing like this can be, and even more impressive looking at some of the amazing games created with so few building blocks. Here’s a small guide to some of the opcodes I’ve found to be most useful:


The four basic arithmetic opcodes – add, subtract, multiply and divide. Add does exactly what it says on the tin. It adds the value in the first parameter (immediate value or register contents) to the register in the second parameter, and stores the result in that register. There’s a couple of variants of it – ADDI means add immediate, which only adds an immediate value to the contents of a register, ADDA adds a value to an address (NOT the value stored at the address, just the address itself), ADDQ which can very quickly add small immediate values (1 – 8), and ADDX which I’ve yet to figure out. There’s several variants because some are more expensive than others. I haven’t yet done any real optimisation to my code, but I guess paying attention to these small differences in opcode variants would be a good start when I get round to it. If I only needed to add 4 to a value, ADDQ would be faster than ADDI, for example.

add.l #0x10, d1  ; Adds the value 0x10 to register d1, and stores the result in d1
                 ; - longword size operation, so it uses all of the data in
                 ; the register

add.w d1, d2     ; Adds the contents of d1 to the contents of d2, and stores the
                 ; result in d2 - word size operation, so the top two bytes of both
                 ; registers are not referenced, and remain intact

addq.b #0x5, d3  ; Quickly adds 5 to the value in d3, storing the result in d3
                 ;  - byte size operation, so the upper three bytes are not
                 ; referenced, and stay intact

The last example is of byte size, so if d3 contained 0x000000FF the result would become 0x00000004, and would NOT roll to 0x00000100. It would need to be a word or longword size operation to do that.

MUL, SUB and DIV are used pretty much the same as ADD, and also have several variants. The Instruction Set doc shows each of their nuances and acceptable operation sizes.


CLR stands for clear. It sets a register (or data at an address), or part of a register depending on the operation size, to zero. It only takes one parameter, and that’s the register or address:

clr.l d0     ; Clears the whole of d0
clr.w (a0)   ; Clears the bottom word (2 bytes) of the data at the address in a0
clr.b d0     ; Clears the bottom byte of d0, leaving the rest intact


JMP means jump. It moves the program counter (the pointer to the current instruction) to another location, and continues executing. The address can be specified in hex, or more conveniently, using a label:

   jmp SomeLabel   ; An infinite loop!


JSR means to jump to subroutine. It does the same as JMP, but stores the original address of the program counter (by pushing it to the stack) before jumping, so that it can return later. RTS, meaning return to subroutine, pops the original address from the stack and does the jump back:

   move.l #0x8 d0   ; Do something useful
   jsr Label        ; Jump to Label
   move.l #0x12 d0  ; Will return here when RTS is called

   move.l #0x04, d0 ; Do something else
   rts              ; Return back


This means decrement and branch. It does the same as a jump, but tests to see if a register is zero first. If that register is non-zero, it decrements that register by 1, and then branches. If the register is zero, it doesn’t branch, and just continues to the next line. It’s a common tool used for implementing loops:

   move.b #0x6, d0 ; Looping round 7 iterations (includes the 0th iteration)

   add.l #0x1, d1  ; Add 1 do register d1
   dbra d0, Label  ; Test to see if d0 is zero yet, and if not decrement it and
                   ; jump back up to Label
   clr.l d1        ; Loop has finished, clear d1

CMP and Bcc

Bcc, meaning branch on condition, is a collection of various branch opcodes which only branch if the condition code of the status register adheres to some condition. The status register seems to be the state of the CPU after an operation, and each opcode leaves its condition code in a different state after execution, as a sort of return value. For example, the CMP opcode (meaning compare) will store the result of subtracting two values into the status register’s condition code. After that, the Bcc variant BEQ (branch if equal to zero) can test the result of that comparison, and branch or not based on it. It’s a common way to implement an IF statement.

Here’s a demonstration of most of the above opcodes, including a CMP and BEQ. It’s a subroutine which counts the number of characters in a null-terminated string, by iterating through each byte and checking if it is 0, whilst keeping count of each iteration:

   clr.l d0          ; Clear d0, ready to begin counting

   move.b (a0)+, d1  ; Move byte from address in a0 to d1, and then increment the address by 1 byte
   cmp.b #0x0, d1    ; Test if byte is zero
   beq.b @End        ; If byte was zero, branch to end
   addq.l #0x1, d0   ; Increment counter
   jmp @FindTerm     ; Jump back to FindTerm to loop round again

   rts               ; End of search, return back. Result is in r0

Example usage:

   move.l #StringAddr, a0  ; Move address of string to a0
   jsr GetStringLength     ; Jump to the GetStringLength subroutine
                           ; Length of string will now be stored in d0

   dc.b "HELLO WORLD", 0   ; A zero-terminated string

In the example, I’ve also introduced two new concepts. One is the + symbol after moving a value from (a0). This means post-increment; the address in a0 will be incremented by 1 byte after it has been read, similar to int a = b++ in the C++ language. The second concept is the @ symbol before the label FindTerm. This means the label is local – when referencing the label @FindTerm, it uses the address of the most recently defined @FindTerm label. This means you can have duplicate label names (loop could be a common name, perhaps) without any ambiguity.

That’s it for now. It doesn’t look like much, but I’ve managed to get as far as drawing text and sprites with no other opcodes than the ones listed, so they’re pretty powerful. There’s a few others I’ve touched briefly, like ROL and ROR, which shift bits left or right, but they don’t become useful until dealing with VDP addresses.


Sega Megadrive – 1: Getting Started

My favourite videogames console of all time – the Sega Megadrive. I’ve been pretty excited about getting started on this machine for many years now, and has been the catalyst which finally kicked me into learning some assembly language.

Now, I’ve jumped the gun a bit, since I was originally planning to work on these platforms in chronological order, which means the Nintendo Entertainment System is going to wait in the queue for a while (don’t be fooled, the Sega Master System II was released AFTER the Megadrive, because Sega are nuts like that). I also don’t yet have any development hardware (I’m currently in negotiations with some sellers, though), so I’ll be starting out with a PC emulator with debugging features. The Sonic disassembly packages over at Sonic Retro contain a modified (fixed for Windows 7) version of SN Systems’ 68000 assembler, which was a low cost alternative to Sega’s tools at the time, and used by many Megadrive developers.

The point is, I’m just too excited to leave this console alone, and if anything will kickstart my motivation for this project with a flying leap it will be the Megadrive.

The Sega Megadrive technical specifications

A quick and naive list of the console’s basics, but it’s all I need to know to get started:

  • CPU: Motorola 68000 at 7.61 mhz
  • Slave CPU: Zilog z80
  • Main memory: 64kb
  • Video: Yamaha YM7101 VDP (Video Display Processor)
  • Video memory: 64kb
  • Audio: Yamaha YM2612 FM chip, Texas Instruments SN76489 chip
  • Game media: Cartridge
  • Programming language: 68k Assembler language
  • Known development hardware: Official Sega Genesis dev unit, Cross Products MegaCD unit

The Tools

As mentioned, I’ve managed to get hold of the SN Systems ASM68K assembler, a command line tool for MS-DOS. Since there was no official IDE or text editor included, nor can I find any clues as to what was commonly used at the time, I’ll be using Microsoft Visual Studio, simply because I’m familiar with its keyboard shortcuts.

Until I can get hold of some hardware, I’ll be making use of a PC emulator which has some debugging features. After some searching around, it seems the MAME emulator MESS does a good job, Gens with the KMod plugin is capable of debugging, and I’ve also had Regen recommended to me on the forums. I’m inclined to start with MESS since it uses the same debugging shortcut keys as Visual Studio.

Testing the Assembler

Since documentation seems scarce, I’ve used the Sonic the Hedgehog disassemblies from Sonic Retro as a guide. The package contains a batch file used to build the Sonic source, and the assembler bit simply boils down to:

ASM68K.EXE source.asm,destination.bin

Let’s test something out:

   move.l #0xF, d0 ; Move 15 into register d0
   move.l d0, d1   ; Move contents of register d0 into d1
   jmp Loop        ; Jump back up to 'Loop'

…and that assembles just fine:

SN 68k version 2.53
Assembly completed.
0 error(s) from 4 lines in 0.1 seconds

I won’t pretend that I just came up with that assembly snippet like it was natural, it’s been a while since I last touched some 68k assembly (on the Atari STe) and it was the result of an hour or so of trawling through documentation and example code to refresh my memory!

The Megadrive ROM header

Unfortunately, it’s not as simple as loading up the generated ROM into an emulator and hitting Debug. A Megadrive ROM needs a header, which contains some meta info about the ROM, and a block of CPU vectors used to initialise the 68000 before the code gets executed. The header takes up 512 bytes at the very top of the ROM, and looks a little something like this:

	; ******************************************************************
	; Sega Megadrive ROM header
	; ******************************************************************
	dc.l   0x00FFE000      ; Initial stack pointer value
	dc.l   EntryPoint      ; Start of program
	dc.l   Exception       ; Bus error
	dc.l   Exception       ; Address error
	dc.l   Exception       ; Illegal instruction
	dc.l   Exception       ; Division by zero
	dc.l   Exception       ; CHK exception
	dc.l   Exception       ; TRAPV exception
	dc.l   Exception       ; Privilege violation
	dc.l   Exception       ; TRACE exception
	dc.l   Exception       ; Line-A emulator
	dc.l   Exception       ; Line-F emulator
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Spurious exception
	dc.l   Exception       ; IRQ level 1
	dc.l   Exception       ; IRQ level 2
	dc.l   Exception       ; IRQ level 3
	dc.l   HBlankInterrupt ; IRQ level 4 (horizontal retrace interrupt)
	dc.l   Exception       ; IRQ level 5
	dc.l   VBlankInterrupt ; IRQ level 6 (vertical retrace interrupt)
	dc.l   Exception       ; IRQ level 7
	dc.l   Exception       ; TRAP #00 exception
	dc.l   Exception       ; TRAP #01 exception
	dc.l   Exception       ; TRAP #02 exception
	dc.l   Exception       ; TRAP #03 exception
	dc.l   Exception       ; TRAP #04 exception
	dc.l   Exception       ; TRAP #05 exception
	dc.l   Exception       ; TRAP #06 exception
	dc.l   Exception       ; TRAP #07 exception
	dc.l   Exception       ; TRAP #08 exception
	dc.l   Exception       ; TRAP #09 exception
	dc.l   Exception       ; TRAP #10 exception
	dc.l   Exception       ; TRAP #11 exception
	dc.l   Exception       ; TRAP #12 exception
	dc.l   Exception       ; TRAP #13 exception
	dc.l   Exception       ; TRAP #14 exception
	dc.l   Exception       ; TRAP #15 exception
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)
	dc.l   Exception       ; Unused (reserved)

	dc.b "SEGA GENESIS    "									; Console name
	dc.b "(C)SEGA 1992.SEP"									; Copyrght holder and release date
	dc.b "YOUR GAME HERE                                  "	; Domestic name
	dc.b "YOUR GAME HERE                                  "	; International name
	dc.b "GM XXXXXXXX-XX"									; Version number
	dc.w 0x0000												; Checksum
	dc.b "J               "									; I/O support
	dc.l 0x00000000											; Start address of ROM
	dc.l __end												; End address of ROM
	dc.l 0x00FF0000											; Start address of RAM
	dc.l 0x00FFFFFF											; End address of RAM
	dc.l 0x00000000											; SRAM enabled
	dc.l 0x00000000											; Unused
	dc.l 0x00000000											; Start address of SRAM
	dc.l 0x00000000											; End address of SRAM
	dc.l 0x00000000											; Unused
	dc.l 0x00000000											; Unused
	dc.b "                                        "			; Notes (unused)
	dc.b "JUE             "									; Country codes

Note that the assembler requires code and data to be tabbed one to the right, I’ll look into why this is necessary at a later date. Labels, however seem happy with no tabs.

The top section is a block of CPU vectors, read in when the system boots, and are used to initialise various registers and interrupt addresses. The first longword is the value of the stack pointer register when the system boots, although the rest of the registers must be initialised manually so I’m confused as to why this one must be explicitly set. The EntryPoint is the address of the first line of code that gets run, and the majority of the rest point to an exception routine to catch errors. Eventually I plan to write a proper exception handler for each type of problem, and print to screen some information which would help me diagnose the issue.

The HBlankInterrupt and VBlankInterrupt are routines that get called when the electron beam in the TV reaches the right hand side of the screen, and when the beam hits the bottom right before switching off and moving back to the top left. I guess modern LCD and plasma  TVs don’t have this concept, but from the examples I’ve seen the timing for these interrupts being called is clock-accurate, so they’re perfect for implementing timers.

The second block is some information about the cartridge, hopefully the comments are self explanatory. The ROM/SRAM start and end addresses make sense to me since a cartridge and its savegame space (if any) can be of variable size, but I’ve yet to discover why the RAM start and end addresses need explicitly defining. The checksum is not read by the boot code itself and nothing is done with it, it’s only there for the programmer to implement a check if they wish.

All of the addresses can just be specified in hex, but the assembler allows for labels which makes things a great deal easier. EntryPoint, Exception, __end, HBlankInterrupt and VBlankInterrupt will need defining:

   move.l #0xF, d0 ; Move 15 into register d0
   move.l d0, d1   ; Move contents of register d0 into d1
   jmp Loop        ; Jump back up to 'Loop'

   rte   ; Return from Exception

   rte   ; Return from Exception

__end    ; Very last line, end of ROM address

EntryPoint just loops around the little snippet I used to test out the assembler above. Both H/VBlankInterrupts and the Exception handler do nothing and return for now, I’ll experiment with those later. __end contains no code, it’s just a marker for the address of the last byte of the ROM. I’ve prefixed the label with underscores, simply to indicate that it’s not a subroutine and shouldn’t be called explicitly.
Ok, it should be ready to build and run!

Debugging the ROM

The ROM assembles with the ASM68k.EXE line demonstrated earlier. My chosen emulator, MESS, needs to be configured to enable the debugging features. After running MESS once, a mess.ini file is generated alongside the .exe, which contains a debug flag which can be set to 1. Now the ROM can be run using:

mess64.exe genesis -cart test.bin

MESS fires up, loads the ROM, and displays a debugging window. Unfortunately, I ran into a problem: the disassembly window shows garbage. The opcodes are mostly ‘ori’ and ‘illegal’, and I couldn’t make head or tail of my code:

After some digging around and tearing my hair out, the guys at pointed out that the first 15 bytes of my ROM didn’t belong there (I’m assuming the assembler added some sort of meta data to the start of the binary, perhaps for the SN debugger), and would need removing before it would work. After deleting those bytes using a hex editor (or assembling with the /p option), the ROM seems to work:

Much better, the opcodes are recognisable now. Time to test it out – MESS uses the same keyboard shortcuts for debugging as Visual Studio:

  • F9 – Set/unset breakpoint
  • F10 – Step over
  • F11 – Step into
  • SHIFT + F11 – Step out

So, after a single step (F10) the program counter moves straight to the address specified as the entry point in the header and executes it, and the value 0xF is moved into register r0. After a second step the contents of r0 (still 0xF) are moved into register r1, and after a third step the program counter is jumped back up to the first line again:

It’s not exactly Crysis, but it demonstrates that everything is in the right place and ready for the next part – initialising the Megadrive.