Gameboy Overview

The Gameboy hardware is made up of several parts:

The emulator is implemented as a library. It does not provide any user interface, user input, or main loop. This is expected to be implemented in programs which link against the library. The library is designed with no global state; this allows multiple Gameboy systems to be emulated in the same process, and makes the code easier to understand.

To this end a struct is defined which will contain all the state of the emulator:

<< gameboy state >>=
struct Gameboy
{
#ifndef NDEBUG
    uint32_t (*trace_fn)(struct Gameboy* gb, struct gameboy_tp* tp);
#endif
    << state >>
};

The CPU

The CPU has eight 8-bit core registers:

<< cpu registers (0 1 2 3 4 5 6) >>=
uint8_t A, F;
uint8_t B, C;
uint8_t D, E;
uint8_t H, L;

A is the accumulator register and many instructions only support operating on this register. Some instructions use pairs of registers to provide 16-bit operands; the possible pairs as BC, DE, and HL.

The flags register F stores the additional state about the results of (primarily) arithmetic operations. It is a bitmask of four values:

There is a 16-bit stack pointer register used for stack operations. Few instructions allow direct access to SP.

<< cpu registers (0 1 2 3 4 5 6) >>=
uint16_t SP;

The CPU executes instructions from the current address in the program counter, PC, register. This register is not directly accessable.

<< cpu registers (0 1 2 3 4 5 6) >>=
uint16_t PC;
<< state (0 1 2 3 4 5 6 7 8) >>=
struct GameboyCPU {
    << cpu registers >>
} cpu;

Clocking

The clock runs at ~4.19MHZ, however most operations are measured in machine cycles, which are 4 clock cycles (~1.05MHz). The shortest CPU operation is 1 machine cycle - this is the length of time to execute a nop instruction or access a single byte of memory.

Each part of the emulator needs to perform work on every machine cycle.

<< clock functions (0 1 2 3 4) >>=
void clock_increment(struct Gameboy* gb)
{
    << per machine cycle updates >>
}

A cycle count is maintained for external use only.

<< state (0 1 2 3 4 5 6 7 8) >>=
uint64_t TotalCycles;
<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
gb->TotalCycles = 0;
<< per machine cycle updates (0 1 2 3 4) >>=
gb->TotalCycles += 4;

Divider and Timer Registers

The Gameboy keeps a counter which increments on every clock cycle. This counter is exposed in two ways: the divider and the timer registers. The divider register increments at a fixed frequency (1 per 256 clock cycles = 1 per 64 machine cycles). The timer register increments at a configurable frequency and can provide an interrupt when it overflows.

<< state (0 1 2 3 4 5 6 7 8) >>=
struct {
    uint16_t CycleCount;
    bool TimerOverflow;
    bool TimerLoading;
} clock;
<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
gb->clock.CycleCount = 0;
gb->clock.TimerOverflow = false;
gb->clock.TimerLoading = false;

The divider and timer actually share a single underlying 16-bit cycle counter which is incremented on each clock cycle. The top 8 bits of this counter is exposed directly as the divider register at 0xFF04. Writing any value to the divider resets the internal 16-bit counter to zero.

The timer counter register increments at a rate configured by the timer control register. When it overflows an interrupt is set and it is reloaded from the timer modulo register. When to increment the timer counter is determined by a falling edge detector, the input of which is a bit from the 16-bit cycle counter. The specific bit is selected depending on the speed selected in bits 1:0 of the timer control register.

<< clock functions (0 1 2 3 4) >>=
static bool clock_getTimerBit(uint8_t control, uint16_t cycles)
{
    switch(control & 0x03) { /* Timer clock select */
        case 0: /* 4.096 KHz (1024 cycles) */
            return cycles & (1u << 9);
        case 1: /* 262.144 KHz (16 cycles) */
            return cycles & (1u << 3);
        case 2: /* 65.536 KHz (64 cycles) */
            return cycles & (1u << 5);
        case 3: /* 16.384 KHz (256 cycles) */
            return cycles & (1u << 7);
    }
    assert(0);
}

If the timer counter register overflows when it is incremented an interrupt is set and its value is loaded from the timer modulo register, after a delay of one machine cycle. During the delay cycle the timer counter reads zero, but writes to the counter will cancel the interrupt and the load from the modulo register. During the machine cycle on which the modulo value is being loaded explicit writes to the counter register will be ignored, but writes to the modulo register will be respected (and that value will be loaded into the counter register).

<< clock functions (0 1 2 3 4) >>=
static void clock_timerIncrement(struct Gameboy* gb)
{
    uint8_t timer = gb->mem.IO[IO_TimerCounter];
    if(timer == 0xFF) {
        gb->clock.TimerOverflow = true;
    }
    gb->mem.IO[IO_TimerCounter] = timer + 1;
}
<< per machine cycle updates (0 1 2 3 4) >>=
gb->clock.TimerLoading = false;
if(gb->clock.TimerOverflow) {
    /* Delayed overflow effects */
    gb->mem.IO[IO_InterruptFlag] |= Interrupt_TIMA;
    gb->clock.TimerOverflow = false;

    /* In the next machine cycle the modulo is being loaded */
    gb->mem.IO[IO_TimerCounter] = gb->mem.IO[IO_TimerModulo];
    gb->clock.TimerLoading = true;
}
<< mmu write special cases (0 1 2 3 4 5) >>=
case IO_TimerCounter:
    /* Writes to the timer counter whilst it is loading are ignored */
    if(!gb->clock.TimerLoading) {
        gb->mem.IO[IO_TimerCounter] = value;
        /* Writing to timer counter suppresses any pending overflow effects */
        gb->clock.TimerOverflow = false;
    }
    break;
case IO_TimerModulo:
    gb->mem.IO[IO_TimerModulo] = value;
    /* Whilst the modulo is being loaded any writes are effective immediately */
    if(gb->clock.TimerLoading) {
        gb->mem.IO[IO_TimerCounter] = value;
    }

Changing the speed of the timer or disabling it (by writing to the timer control register) can change the input to the falling edge detector used to increment the timer counter. If this produces a falling edge then there will be an "extra" incremement of the counter.

<< clock functions (0 1 2 3 4) >>=
static void clock_updateTimerControl(struct Gameboy* gb, uint8_t val)
{
    uint8_t old = gb->mem.IO[IO_TimerControl];
    gb->mem.IO[IO_TimerControl] = val;

    /* When disabled the bit to the falling edge detector is zero */
    bool const oldBit = (old & 0x04) && clock_getTimerBit(old, gb->clock.CycleCount);
    bool const newBit = (val & 0x04) && clock_getTimerBit(val, gb->clock.CycleCount);

    /* Check for falling edge */
    if(oldBit && !newBit) {
        clock_timerIncrement(gb);
    }
}
<< mmu write special cases (0 1 2 3 4 5) >>=
case IO_TimerControl:
    clock_updateTimerControl(gb, value);
    break;

When the underlying 16-bit cycle count changes (either when reset or when incremented by the clock) this affects both the timer counter and the divider register.

<< function declarations (0 1 2 3 4) >>=
static void clock_countChange(struct Gameboy* gb, uint16_t new_value);
<< clock functions (0 1 2 3 4) >>=
static void clock_countChange(struct Gameboy* gb, uint16_t new_value)
{
    uint8_t tac = gb->mem.IO[IO_TimerControl];
    if(tac & 0x04) { /* Timer enable */
        if(!clock_getTimerBit(tac, new_value) && clock_getTimerBit(tac, gb->clock.CycleCount)) {
            clock_timerIncrement(gb);
        }
    }
    gb->clock.CycleCount = new_value;
    gb->mem.IO[IO_Divider] = new_value >> 8u;
}
<< per machine cycle updates (0 1 2 3 4) >>=
clock_countChange(gb, gb->clock.CycleCount + 4);
<< mmu write special cases (0 1 2 3 4 5) >>=
case IO_Divider:
    clock_countChange(gb, 0);
    break;

Interrupt handling

There are five interrupts. Each is assigned a bit in the interrupt control registers.

<< interrupt bits enum >>=
enum {
    Interrupt_VBlank = 0x01,
    Interrupt_LCDC = 0x02,
    Interrupt_TIMA = 0x04,
    Interrupt_Serial = 0x08,
    Interrupt_Joypad = 0x10,

    Interrupt_Mask = 0x1F,
};

Interrupts are raised by setting in the appropriate bit in the interrupt flag register (0xFF0F)

Typically this is done automatically by the Gameboy hardware, but software can manually set (or clear) interrupts by writing to the interrupt flag register.

Each interrupt can be individually disabled by setting the appropriate bits in the interrupt enable IO register (0xFFFF).

When an interrupt is disabled its service routine will not be called when it is signalled; its bit will remain set in the interrupt flag register unless it is manually cleared, and if the interrupt is re-enabled when its bit is set in the interrupt flag register its service routine will be called.

In addition the CPU has an internal global interrupt enable flag which is modified by the ei, di, and reti instructions.

<< cpu registers (0 1 2 3 4 5 6) >>=
bool InterruptsEnabled;

The CPU services an interrupt by calling to a fixed address for each interrupt, with interrupts disabled. An interrupt routine will typically return using the reti instruction, which returns to the address on the top of the stack and re-enables interrupts. Interrupts are serviced in order, from low bit to high bit (VBlank to Joypad). The interrupt service routine table starts at address 0x40 and provides 8 bytes per interrupt. When an interrupt is serviced its bit is cleared from the interrupt flag register.

<< service highest priority interrupt >>=
// handle interrupts in priority order
for(unsigned int i = 0; i < 5; i += 1) {
    uint8_t const bit = 1u << i;
    if(irqs & bit) {
        gb->cpu.InterruptsEnabled = false;
        clock_increment(gb);
        clock_increment(gb);
        Call(gb, 0x40 + (i * 8));
        iflag &= ~bit;
        break;
    }
}
gb->mem.IO[IO_InterruptFlag] = iflag;

When any enabled interrupt is raised it will bring the CPU out of halt mode to service it, if required.

<< interrupt handler function >>=
void cpu_handleInterrupts(struct Gameboy* gb)
{
    uint8_t iflag = gb->mem.IO[IO_InterruptFlag];
    uint8_t irqs = (gb->mem.InterruptEnable & iflag & Interrupt_Mask);
    if(irqs) {
        gb->cpu.Halted = false;
        if(gb->cpu.InterruptsEnabled) {
            assert(!gb->cpu.InterruptEnablePending);
            << service highest priority interrupt >>
        }
    }
}
<< mmu write special cases (0 1 2 3 4 5) >>=
case IO_InterruptFlag:
    /* Top 5 bits of IF always read 1s */
    gb->mem.IO[IO_InterruptFlag] = value | 0xE0;
    break;

CPU Emulation

<< cpu functions >>=
<< cpu helpers >>
<< interrupt handler function >>
<< cpu step >>

Operand Types

Registers

Immediate values directly follow the opcode in the instruction stream

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
static inline uint8_t Imm8(struct Gameboy* gb)
{
    gb->cpu.PC += 1;
    return mmu_read(gb, gb->cpu.PC - 1);
}

static inline int8_t Imm8i(struct Gameboy* gb)
{
    return (int8_t)Imm8(gb);
}

static inline uint16_t Imm16(struct Gameboy* gb)
{
    uint8_t const lo = Imm8(gb);
    uint8_t const hi = Imm8(gb);
    return (hi << 8u) | lo;
}

Memory read/write is indicated by surrounding a operand description with parenthesis, e.g. (imm16) refers to the data at memory address imm16.

Many instructions operate on a pair of registers as a single 16 bit value. There are four possible pairings of the the Gameboy CPU registers: AF, BC, DE and HL.

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
uint16_t ReadAF(struct Gameboy* gb) { return ((gb->cpu.A) << 8u) | (gb->cpu.F); }
uint16_t ReadBC(struct Gameboy* gb) { return ((gb->cpu.B) << 8u) | (gb->cpu.C); }
uint16_t ReadDE(struct Gameboy* gb) { return ((gb->cpu.D) << 8u) | (gb->cpu.E); }
uint16_t ReadHL(struct Gameboy* gb) { return ((gb->cpu.H) << 8u) | (gb->cpu.L); }

void WriteAF(struct Gameboy* gb, uint16_t af) { gb->cpu.A = (af >> 8u); gb->cpu.F = (af & 0xF0); }
void WriteBC(struct Gameboy* gb, uint16_t bc) { gb->cpu.B = (bc >> 8u); gb->cpu.C = (bc & 0xFF); }
void WriteDE(struct Gameboy* gb, uint16_t de) { gb->cpu.D = (de >> 8u); gb->cpu.E = (de & 0xFF); }
void WriteHL(struct Gameboy* gb, uint16_t hl) { gb->cpu.H = (hl >> 8u); gb->cpu.L = (hl & 0xFF); }

The flags register in the CPU contains four well-defined bits, and many instructions read and/or write one or more of those bits.

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
void UpdateZ(struct Gameboy* gb, bool set) { if(set) { gb->cpu.F |= 0x80; } else { gb->cpu.F &= ~0x80; } }
void UpdateN(struct Gameboy* gb, bool set) { if(set) { gb->cpu.F |= 0x40; } else { gb->cpu.F &= ~0x40; } }
void UpdateH(struct Gameboy* gb, bool set) { if(set) { gb->cpu.F |= 0x20; } else { gb->cpu.F &= ~0x20; } }
void UpdateC(struct Gameboy* gb, bool set) { if(set) { gb->cpu.F |= 0x10; } else { gb->cpu.F &= ~0x10; } }

bool ReadZ(struct Gameboy* gb) { return gb->cpu.F & 0x80; }
bool ReadN(struct Gameboy* gb) { return gb->cpu.F & 0x40; }
bool ReadH(struct Gameboy* gb) { return gb->cpu.F & 0x20; }
bool ReadC(struct Gameboy* gb) { return gb->cpu.F & 0x10; }

void UpdateZNHC(struct Gameboy* gb, bool z, bool n, bool h, bool c)
{
    UpdateZ(gb, z);
    UpdateN(gb, n);
    UpdateH(gb, h);
    UpdateC(gb, c);
}

The CPU has instructions which push and pop 16-bit values from a stack, the top of which is pointed to by the SP register. The stack grows towards address zero as items and pushed onto it. The bytes within the 16-bit values are arranged in little-endian order.

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
void Push16(struct Gameboy* gb, uint16_t val)
{
    mmu_write(gb, gb->cpu.SP - 2, (uint8_t)(val));
    mmu_write(gb, gb->cpu.SP - 1, (uint8_t)(val >> 8u));
    gb->cpu.SP -= 2;
}

uint16_t Pop16(struct Gameboy* gb)
{
    gb->cpu.SP += 2;
    uint16_t val = mmu_read(gb, gb->cpu.SP - 2);
    val |= mmu_read(gb, gb->cpu.SP - 1) << 8;
    return val;
}

The main CPU loop executes a single instruction. Most of the implementation is a large switch statement, with one case for each instruction encoding.

<< cpu step >>=
static uint8_t mmu_readDirect(struct Gameboy* gb, uint16_t addr);
void cpu_step(struct Gameboy* gb)
{
    << halt emulation >>

    if(gb->cpu.InterruptEnablePending) {
        gb->cpu.InterruptsEnabled = true;
        gb->cpu.InterruptEnablePending = false;
    }

    uint8_t const opcode = Imm8(gb);

    GBTRACE(gb, (&(struct gameboy_tp){ .point = GAMEBOY_TP_INSTR_START, .u = { .instr_start = { .opcode = opcode } } }));

    << halt bug emulation >>

    switch(opcode) {
    << cpu instructions >>

        default:
            raise(SIGTRAP);
    }
}

The number of cycles to execute an instruction is generally just the number of cycles to read all the bytes of the instruction. For taken jumps there is an additional 4 cycles, and 16bit arithmetic (e.g. adding two register pairs) appears to take an extra 4 cycles.

Load instructions (ld)

Register-to-register

A large number of instructions simply copy data from one register to another.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// ld $b, $reg8
case 0x40: gb->cpu.B = gb->cpu.B; break;
case 0x41: gb->cpu.B = gb->cpu.C; break;
case 0x42: gb->cpu.B = gb->cpu.D; break;
case 0x43: gb->cpu.B = gb->cpu.E; break;
case 0x44: gb->cpu.B = gb->cpu.H; break;
case 0x45: gb->cpu.B = gb->cpu.L; break;
case 0x47: gb->cpu.B = gb->cpu.A; break;

// ld $c, $reg8
case 0x48: gb->cpu.C = gb->cpu.B; break;
case 0x49: gb->cpu.C = gb->cpu.C; break;
case 0x4A: gb->cpu.C = gb->cpu.D; break;
case 0x4B: gb->cpu.C = gb->cpu.E; break;
case 0x4C: gb->cpu.C = gb->cpu.H; break;
case 0x4D: gb->cpu.C = gb->cpu.L; break;
case 0x4F: gb->cpu.C = gb->cpu.A; break;

// ld $d, $reg8
case 0x50: gb->cpu.D = gb->cpu.B; break;
case 0x51: gb->cpu.D = gb->cpu.C; break;
case 0x52: gb->cpu.D = gb->cpu.D; break;
case 0x53: gb->cpu.D = gb->cpu.E; break;
case 0x54: gb->cpu.D = gb->cpu.H; break;
case 0x55: gb->cpu.D = gb->cpu.L; break;
case 0x57: gb->cpu.D = gb->cpu.A; break;

// ld $e, $reg8
case 0x58: gb->cpu.E = gb->cpu.B; break;
case 0x59: gb->cpu.E = gb->cpu.C; break;
case 0x5A: gb->cpu.E = gb->cpu.D; break;
case 0x5B: gb->cpu.E = gb->cpu.E; break;
case 0x5C: gb->cpu.E = gb->cpu.H; break;
case 0x5D: gb->cpu.E = gb->cpu.L; break;
case 0x5F: gb->cpu.E = gb->cpu.A; break;

// ld $h, $reg8
case 0x60: gb->cpu.H = gb->cpu.B; break;
case 0x61: gb->cpu.H = gb->cpu.C; break;
case 0x62: gb->cpu.H = gb->cpu.D; break;
case 0x63: gb->cpu.H = gb->cpu.E; break;
case 0x64: gb->cpu.H = gb->cpu.H; break;
case 0x65: gb->cpu.H = gb->cpu.L; break;
case 0x67: gb->cpu.H = gb->cpu.A; break;

// ld $l, $reg8
case 0x68: gb->cpu.L = gb->cpu.B; break;
case 0x69: gb->cpu.L = gb->cpu.C; break;
case 0x6A: gb->cpu.L = gb->cpu.D; break;
case 0x6B: gb->cpu.L = gb->cpu.E; break;
case 0x6C: gb->cpu.L = gb->cpu.H; break;
case 0x6D: gb->cpu.L = gb->cpu.L; break;
case 0x6F: gb->cpu.L = gb->cpu.A; break;

// ld $a, $reg8
case 0x78: gb->cpu.A = gb->cpu.B; break;
case 0x79: gb->cpu.A = gb->cpu.C; break;
case 0x7A: gb->cpu.A = gb->cpu.D; break;
case 0x7B: gb->cpu.A = gb->cpu.E; break;
case 0x7C: gb->cpu.A = gb->cpu.H; break;
case 0x7D: gb->cpu.A = gb->cpu.L; break;
case 0x7F: gb->cpu.A = gb->cpu.A; break;

Memory-to-Register and Register-to-Memory

Each register can be loaded with a single byte from memory, using the HL register pair to specify the memory address. Conversely register values can be stored to the address specified by HL.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// ld $reg8, ($hl)
case 0x46: gb->cpu.B = mmu_read(gb, ReadHL(gb)); break;
case 0x4E: gb->cpu.C = mmu_read(gb, ReadHL(gb)); break;
case 0x56: gb->cpu.D = mmu_read(gb, ReadHL(gb)); break;
case 0x5E: gb->cpu.E = mmu_read(gb, ReadHL(gb)); break;
case 0x66: gb->cpu.H = mmu_read(gb, ReadHL(gb)); break;
case 0x6E: gb->cpu.L = mmu_read(gb, ReadHL(gb)); break;
case 0x7E: gb->cpu.A = mmu_read(gb, ReadHL(gb)); break;

// ld ($hl), $reg8
case 0x70: mmu_write(gb, ReadHL(gb), gb->cpu.B); break;
case 0x71: mmu_write(gb, ReadHL(gb), gb->cpu.C); break;
case 0x72: mmu_write(gb, ReadHL(gb), gb->cpu.D); break;
case 0x73: mmu_write(gb, ReadHL(gb), gb->cpu.E); break;
case 0x74: mmu_write(gb, ReadHL(gb), gb->cpu.H); break;
case 0x75: mmu_write(gb, ReadHL(gb), gb->cpu.L); break;
case 0x77: mmu_write(gb, ReadHL(gb), gb->cpu.A); break;

An immediate value can be stored to memory at address HL.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x36: mmu_write(gb, ReadHL(gb), Imm8(gb)); break; // ld ($hl), imm8

In addition there are more memory load/store instructions which are available only for the A register.

The register pairs BC and DE can be used to specify memory addresses:

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// ld $a, ($reg16)
case 0x0A: gb->cpu.A = mmu_read(gb, ReadBC(gb)); break;
case 0x1A: gb->cpu.A = mmu_read(gb, ReadDE(gb)); break;

// ld ($reg16), $a
case 0x02: mmu_write(gb, ReadBC(gb), gb->cpu.A); break;
case 0x12: mmu_write(gb, ReadDE(gb), gb->cpu.A); break;

An immediate address can be used for load/store of A.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0xEA: mmu_write(gb, Imm16(gb), gb->cpu.A); break; // ld (imm16), $a
case 0xFA: gb->cpu.A = mmu_read(gb, Imm16(gb)); break; // ld $a, (imm16)

There are special variants of the indirect memory loads ld ($hl), $a and ld $a, ($hl) which perform the load and increment or decrement the value of HL in the same instruction.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x22: // ld ($hl+), $a
    mmu_write(gb, ReadHL(gb), gb->cpu.A);
    WriteHL(gb, ReadHL(gb) + 1);
    break;
case 0x2A: // ld $a, ($hl+)
    gb->cpu.A = mmu_read(gb, ReadHL(gb));
    WriteHL(gb, ReadHL(gb) + 1);
    break;
case 0x32: // ld ($hl-), $a
    mmu_write(gb, ReadHL(gb), gb->cpu.A);
    WriteHL(gb, ReadHL(gb) - 1);
    break;
case 0x3A: // ld $a, ($hl-)
    gb->cpu.A = mmu_read(gb, ReadHL(gb));
    WriteHL(gb, ReadHL(gb) - 1);
    break;

Special variants of the immediate addressed load/stores are available which load/store to an address at an unsigned 8bit offset from 0xFF00. These are usually referred to as "high memory loads".

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0xE0: mmu_write(gb, 0xFF00 + Imm8(gb), gb->cpu.A); break; // ld (0xFF00 + imm8), $a
case 0xE2: mmu_write(gb, 0xFF00 + gb->cpu.C, gb->cpu.A); break; // ld (0xFF00 + $c), $a
case 0xF0: gb->cpu.A = mmu_read(gb, 0xFF00 + Imm8(gb)); break; // ld $a, (0xFF00 + imm8)
case 0xF2: gb->cpu.A = mmu_read(gb, 0xFF00 + gb->cpu.C); break; // ld $a, (0xFF00 + $c)

Immediate-to-Register

Each register can be loaded with an immediate value.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// ld $reg8, imm8
case 0x06: gb->cpu.B = Imm8(gb); break;
case 0x0E: gb->cpu.C = Imm8(gb); break;
case 0x16: gb->cpu.D = Imm8(gb); break;
case 0x1E: gb->cpu.E = Imm8(gb); break;
case 0x26: gb->cpu.H = Imm8(gb); break;
case 0x2E: gb->cpu.L = Imm8(gb); break;
case 0x3E: gb->cpu.A = Imm8(gb); break;

The 16-bit register pairs BC, DE, HL and the stack pointer SP can be loaded with immediate values.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// ld $reg16, imm16
case 0x01: WriteBC(gb, Imm16(gb)); break;
case 0x11: WriteDE(gb, Imm16(gb)); break;
case 0x21: WriteHL(gb, Imm16(gb)); break;
case 0x31: gb->cpu.SP = Imm16(gb); break;

Stack and Stack Pointer

The stack pointer (SP) can be set to an immediate value or the value of HL.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x08: { // ld (imm16), $sp
    uint16_t addr = Imm16(gb);
    mmu_write(gb, addr, (gb->cpu.SP & 0xFF));
    mmu_write(gb, addr + 1, (gb->cpu.SP >> 8u));
} break;
case 0xF9: // ld $sp, $hl
    clock_increment(gb);
    gb->cpu.SP = ReadHL(gb);
    break;

The stack pointer plus a signed immediate offset can be loaded into HL.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0xF8: { // ld $hl, $sp + imm8
    uint16_t ea = gb->cpu.SP + Imm8i(gb);
    clock_increment(gb);
    WriteHL(gb, ea);
    UpdateZNHC(gb, false, false, (ea & 0xF) < (gb->cpu.SP & 0xF), (ea & 0xFF) < (gb->cpu.SP & 0xFF));
} break;

Stack push/pop

Each of the register pairs BC, DE, HL and AF can be pushed and popped to/from the stack. Pushing a register pair takes an additional machine cycle.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// pop $reg16
case 0xC1: WriteBC(gb, Pop16(gb)); break;
case 0xD1: WriteDE(gb, Pop16(gb)); break;
case 0xE1: WriteHL(gb, Pop16(gb)); break;
case 0xF1: WriteAF(gb, Pop16(gb)); break;

// push $reg16
case 0xC5:
    clock_increment(gb);
    Push16(gb, ReadBC(gb));
    break;
case 0xD5:
    clock_increment(gb);
    Push16(gb, ReadDE(gb));
    break;
case 0xE5:
    clock_increment(gb);
    Push16(gb, ReadHL(gb));
    break;
case 0xF5:
    clock_increment(gb);
    Push16(gb, ReadAF(gb));
    break;

8bit Arithmetic

Addition of two 8 bit values is quite straight-forward. The output flags are calculated in a logical manner: - Z flag is set when the output is zero - N flag is cleared - H flag is set if there is a carry from the bottom nibble addition - C flag is set if there is a carry from the whole byte addition

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
uint8_t Add8(struct Gameboy* gb, uint8_t val0, uint8_t val1, bool carry)
{
    unsigned int sum = val0 + val1 + carry;
    unsigned int halfSum = (val0 & 0xF) + (val1 & 0xF) + carry;
    UpdateZNHC(gb, (sum & 0xFF) == 0, false, halfSum > 0xF, sum > 0xFF);
    return sum & 0xFF;
}

The add instruction adds two 8-bit operands, ignoring the current value of the carry flag.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// add $a, reg8
case 0x80: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.B, false); break;
case 0x81: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.C, false); break;
case 0x82: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.D, false); break;
case 0x83: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.E, false); break;
case 0x84: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.H, false); break;
case 0x85: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.L, false); break;
case 0x87: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.A, false); break;
// add $a, ($hl)
case 0x86: gb->cpu.A = Add8(gb, gb->cpu.A, mmu_read(gb, ReadHL(gb)), false); break;
// add $a, imm8
case 0xC6: gb->cpu.A = Add8(gb, gb->cpu.A, Imm8(gb), false); break;

The adc (add with carry) instruction adds the two operands and the current value of the carry flag together. This allow implementation of extended precision arithmetic by chaining multiple add/adc instructions.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// adc $a, reg8
case 0x88: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.B, ReadC(gb)); break;
case 0x89: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.C, ReadC(gb)); break;
case 0x8A: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.D, ReadC(gb)); break;
case 0x8B: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.E, ReadC(gb)); break;
case 0x8C: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.H, ReadC(gb)); break;
case 0x8D: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.L, ReadC(gb)); break;
case 0x8F: gb->cpu.A = Add8(gb, gb->cpu.A, gb->cpu.A, ReadC(gb)); break;
// adc $a, ($hl)
case 0x8E: gb->cpu.A = Add8(gb, gb->cpu.A, mmu_read(gb, ReadHL(gb)), ReadC(gb)); break;
// adc $a, imm8
case 0xCE: gb->cpu.A = Add8(gb, gb->cpu.A, Imm8(gb), ReadC(gb)); break;

Eight bit subtraction is very similar to addition; the meaning of the carry and half-carry flags are better described as borrow and half-borrow.

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
uint8_t Sub8(struct Gameboy* gb, uint8_t val0, uint8_t val1, bool carry)
{
    unsigned int sum = val0 - val1 - carry;
    unsigned int halfSum = (val0 & 0xF) - (val1 & 0xF) - carry;
    UpdateZNHC(gb, (sum & 0xFF) == 0, true, halfSum > 0xF, sum > 0xFF);
    return sum;
}

The sub instruction subtracts two 8-bit operands, ignoring the current flags.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// sub $a, reg8
case 0x90: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.B, false); break;
case 0x91: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.C, false); break;
case 0x92: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.D, false); break;
case 0x93: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.E, false); break;
case 0x94: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.H, false); break;
case 0x95: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.L, false); break;
case 0x97: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.A, false); break;
// sub $a, ($hl)
case 0x96: gb->cpu.A = Sub8(gb, gb->cpu.A, mmu_read(gb, ReadHL(gb)), false); break;
// sub $a, imm8
case 0xD6: gb->cpu.A = Sub8(gb, gb->cpu.A, Imm8(gb), false); break;

The sbc (subtract with carry) subtracts two 8-bit operands and the current value of the carry flag.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// sbc $a, reg8
case 0x98: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.B, ReadC(gb)); break;
case 0x99: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.C, ReadC(gb)); break;
case 0x9A: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.D, ReadC(gb)); break;
case 0x9B: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.E, ReadC(gb)); break;
case 0x9C: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.H, ReadC(gb)); break;
case 0x9D: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.L, ReadC(gb)); break;
case 0x9F: gb->cpu.A = Sub8(gb, gb->cpu.A, gb->cpu.A, ReadC(gb)); break;
// sbc $a, ($hl)
case 0x9E: gb->cpu.A = Sub8(gb, gb->cpu.A, mmu_read(gb, ReadHL(gb)), ReadC(gb)); break;
// sbc $a, imm8
case 0xDE: gb->cpu.A = Sub8(gb, gb->cpu.A, Imm8(gb), ReadC(gb)); break;

Increment adds one to a register, but never considers or updates the carry flag.

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
uint8_t Inc8(struct Gameboy* gb, uint8_t val)
{
    UpdateZ(gb, val == 0xFF);
    UpdateN(gb, false);
    UpdateH(gb, (val & 0xF) == 0xF);
    return val + 1;
}
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// inc reg8
case 0x04: gb->cpu.B = Inc8(gb, gb->cpu.B); break;
case 0x0C: gb->cpu.C = Inc8(gb, gb->cpu.C); break;
case 0x14: gb->cpu.D = Inc8(gb, gb->cpu.D); break;
case 0x1C: gb->cpu.E = Inc8(gb, gb->cpu.E); break;
case 0x24: gb->cpu.H = Inc8(gb, gb->cpu.H); break;
case 0x2C: gb->cpu.L = Inc8(gb, gb->cpu.L); break;
case 0x3C: gb->cpu.A = Inc8(gb, gb->cpu.A); break;

Conversely decrement subtracts one from a register, not considering or updating the carry flag.

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
uint8_t Dec8(struct Gameboy* gb, uint8_t val)
{
    UpdateZ(gb, val == 0x01);
    UpdateN(gb, true);
    UpdateH(gb, (val & 0xF) == 0x0);
    return val - 1;
}
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// dec reg8
case 0x05: gb->cpu.B = Dec8(gb, gb->cpu.B); break;
case 0x0D: gb->cpu.C = Dec8(gb, gb->cpu.C); break;
case 0x15: gb->cpu.D = Dec8(gb, gb->cpu.D); break;
case 0x1D: gb->cpu.E = Dec8(gb, gb->cpu.E); break;
case 0x25: gb->cpu.H = Dec8(gb, gb->cpu.H); break;
case 0x2D: gb->cpu.L = Dec8(gb, gb->cpu.L); break;
case 0x3D: gb->cpu.A = Dec8(gb, gb->cpu.A); break;

The cp (compare) instruction performs a subtraction between two registers, but does not store the result: its only change to the CPU state is to update the flag register. This instruction is principally used to form the predicate for a conditional jump.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// cp $a, reg8
case 0xB8: Sub8(gb, gb->cpu.A, gb->cpu.B, false); break;
case 0xB9: Sub8(gb, gb->cpu.A, gb->cpu.C, false); break;
case 0xBA: Sub8(gb, gb->cpu.A, gb->cpu.D, false); break;
case 0xBB: Sub8(gb, gb->cpu.A, gb->cpu.E, false); break;
case 0xBC: Sub8(gb, gb->cpu.A, gb->cpu.H, false); break;
case 0xBD: Sub8(gb, gb->cpu.A, gb->cpu.L, false); break;
case 0xBF: Sub8(gb, gb->cpu.A, gb->cpu.A, false); break;
// cp $a, ($hl)
case 0xBE: Sub8(gb, gb->cpu.A, mmu_read(gb, ReadHL(gb)), false); break;
// cp $a, imm8
case 0xFE: Sub8(gb, gb->cpu.A, Imm8(gb), false); break;

Bitwise boolean logic operations

Bitwise and, or and xor all set the Z flag to reflect if the result is zero. Other flags are set to fixed values.

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
void BitAnd(struct Gameboy* gb, uint8_t value)
{
    gb->cpu.A &= value;
    UpdateZNHC(gb, gb->cpu.A == 0, false, true, false);
}

void BitOr(struct Gameboy* gb, uint8_t value)
{
    gb->cpu.A |= value;
    UpdateZNHC(gb, gb->cpu.A == 0, false, false, false);
}

void BitXor(struct Gameboy* gb, uint8_t value)
{
    gb->cpu.A ^= value;
    UpdateZNHC(gb, gb->cpu.A == 0, false, false, false);
}
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
// and/or/xor $a, $reg8
case 0xA0: BitAnd(gb, gb->cpu.B); break;
case 0xA1: BitAnd(gb, gb->cpu.C); break;
case 0xA2: BitAnd(gb, gb->cpu.D); break;
case 0xA3: BitAnd(gb, gb->cpu.E); break;
case 0xA4: BitAnd(gb, gb->cpu.H); break;
case 0xA5: BitAnd(gb, gb->cpu.L); break;
case 0xA7: BitAnd(gb, gb->cpu.A); break;
case 0xB0: BitOr(gb, gb->cpu.B); break;
case 0xB1: BitOr(gb, gb->cpu.C); break;
case 0xB2: BitOr(gb, gb->cpu.D); break;
case 0xB3: BitOr(gb, gb->cpu.E); break;
case 0xB4: BitOr(gb, gb->cpu.H); break;
case 0xB5: BitOr(gb, gb->cpu.L); break;
case 0xB7: BitOr(gb, gb->cpu.A); break;
case 0xA8: BitXor(gb, gb->cpu.B); break;
case 0xA9: BitXor(gb, gb->cpu.C); break;
case 0xAA: BitXor(gb, gb->cpu.D); break;
case 0xAB: BitXor(gb, gb->cpu.E); break;
case 0xAC: BitXor(gb, gb->cpu.H); break;
case 0xAD: BitXor(gb, gb->cpu.L); break;
case 0xAF: BitXor(gb, gb->cpu.A); break;

// and/or/xor $a, ($hl)
case 0xA6: BitAnd(gb, mmu_read(gb, ReadHL(gb))); break;
case 0xB6: BitOr(gb, mmu_read(gb, ReadHL(gb))); break;
case 0xAE: BitXor(gb, mmu_read(gb, ReadHL(gb))); break;

// and/or/xor $a, imm8
case 0xE6: BitAnd(gb, Imm8(gb)); break;
case 0xF6: BitOr(gb, Imm8(gb)); break;
case 0xEE: BitXor(gb, Imm8(gb)); break;

Single bit rotate left & right of A

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x07: { // rlca
    UpdateZNHC(gb, false, false, false, (gb->cpu.A & 0x80));
    gb->cpu.A = (gb->cpu.A << 1u) | (gb->cpu.A >> 7u);
} break;
case 0x0F: { // rrca
    UpdateZNHC(gb, false, false, false, (gb->cpu.A & 0x01));
    gb->cpu.A = (gb->cpu.A >> 1u) | (gb->cpu.A << 7u);
} break;

Single bit rotate left & right of A, considering the carry flag as an extra bit (the most significant) of A.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x17: { // rla
    bool c = ReadC(gb);
    UpdateZNHC(gb, false, false, false, (gb->cpu.A & 0x80));
    gb->cpu.A = (gb->cpu.A << 1u) | (c? 1 : 0);
} break;
case 0x1F: { // rra
    bool c = ReadC(gb);
    UpdateZNHC(gb, false, false, false, (gb->cpu.A & 0x01));
    gb->cpu.A = (gb->cpu.A >> 1u) | (c? 0x80 : 0x00);
} break;

Complement (bitwise invert) all bits in A.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x2F: { // cpl
    UpdateN(gb, true);
    UpdateH(gb, true);
    gb->cpu.A ^= UINT8_MAX;
} break;

Extended Bitwise Operations (0xCB)

The CPU provides an additional set of bitwise operations via an extended instruction sequence: the 0xCB is a two-byte instruction which performs an operation determined by the second byte. The operation encoding is quite regular: the bits of the extension byte (MSB to LSB) OOIIIRRR encode a 2bit operation field OO, a three bit sub-operation or bit index (III), and a three bit register index (RRR).

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0xCB: {
    cpu_cb_op(gb);
} break;

The register operated on is selected by the register field:

Value Register
0 B
1 C
2 D
3 E
4 H
5 L
6 (HL) - memory at address HL
7 A
<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
uint8_t ReadRegN(struct Gameboy* gb, unsigned int regNum)
{
    switch(regNum) {
        case 0: return gb->cpu.B;
        case 1: return gb->cpu.C;
        case 2: return gb->cpu.D;
        case 3: return gb->cpu.E;
        case 4: return gb->cpu.H;
        case 5: return gb->cpu.L;
        case 6: return mmu_read(gb, ReadHL(gb));
        case 7: return gb->cpu.A;
    }
    assert(false);
}

void WriteRegN(struct Gameboy* gb, unsigned int regNum, uint8_t newVal)
{
    switch(regNum) {
        case 0: gb->cpu.B = newVal; break;
        case 1: gb->cpu.C = newVal; break;
        case 2: gb->cpu.D = newVal; break;
        case 3: gb->cpu.E = newVal; break;
        case 4: gb->cpu.H = newVal; break;
        case 5: gb->cpu.L = newVal; break;
        case 6: mmu_write(gb, ReadHL(gb), newVal); break;
        case 7: gb->cpu.A = newVal; break;
    }
}
<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
void cpu_cb_op(struct Gameboy* gb)
{
    uint8_t const cb_opcode = Imm8(gb);

    // XX XXX XXX
    // ~~ operation
    //    ~~~ bit index or sub-operation
    //        ~~~ register
    uint8_t const op = cb_opcode >> 6u;
    uint8_t const bit = (cb_opcode >> 3u) & 0x07;
    uint8_t const reg = cb_opcode & 0x07;

    switch(op) {
        case 0: { // shift/rotate & swap
            switch(bit) {
                case 0: { // rlc rN
                    uint8_t val = ReadRegN(gb, reg);
                    UpdateZNHC(gb, val == 0, false, false, (val & 0x80));
                    val = (val << 1u) | (val >> 7u);
                    WriteRegN(gb, reg, val);
                } break;
                case 1: { // rrc rN
                    uint8_t val = ReadRegN(gb, reg);
                    UpdateZNHC(gb, val == 0, false, false, (val & 0x01));
                    val = (val >> 1u) | (val << 7u);
                    WriteRegN(gb, reg, val);
                } break;
                case 2: { // rl rN
                    uint8_t val = ReadRegN(gb, reg);
                    uint8_t rotated = (val << 1u) | (ReadC(gb)? 1 : 0);
                    UpdateZNHC(gb, rotated == 0, false, false, (val & 0x80));
                    WriteRegN(gb, reg, rotated);
                } break;
                case 3: { // rr rN
                    uint8_t val = ReadRegN(gb, reg);
                    uint8_t rotated = (val >> 1u) | (ReadC(gb)? 0x80 : 0);
                    UpdateZNHC(gb, rotated == 0, false, false, (val & 0x01));
                    WriteRegN(gb, reg, rotated);
                } break;
                case 4: { // sla rN
                    uint8_t val = ReadRegN(gb, reg);
                    uint8_t shifted = val << 1u;
                    UpdateZNHC(gb, shifted == 0, false, false, (val & 0x80));
                    WriteRegN(gb, reg, shifted);
                } break;
                case 5: { // sra rN
                    uint8_t val = ReadRegN(gb, reg);
                    uint8_t shifted = (val >> 1u) | (val & 0x80);
                    UpdateZNHC(gb, (shifted == 0), false, false, (val & 0x01));
                    WriteRegN(gb, reg, shifted);
                } break;
                case 6: { // swap rN
                    uint8_t val = ReadRegN(gb, reg);
                    val = (val >> 4u) | (val << 4u);
                    UpdateZNHC(gb, val == 0, false, false, false);
                    WriteRegN(gb, reg, val);
                } break;
                case 7: { // srl rN
                    uint8_t val = ReadRegN(gb, reg);
                    uint8_t shifted = val >> 1u;
                    UpdateZNHC(gb, shifted == 0, false, false, (val & 0x01));
                    WriteRegN(gb, reg, shifted);
                } break;
            }
        }
        break;
        case 1: { // bit n, rN
            uint8_t val = ReadRegN(gb, reg);
            UpdateZ(gb, (val & (1u << bit)) == 0);
            UpdateN(gb, false);
            UpdateH(gb, true);
        }
        break;
        case 2: { // res n, rN
            uint8_t val = ReadRegN(gb, reg);
            WriteRegN(gb, reg, (val & ~(1u << bit)));
        }
        break;
        case 3: { // set n, rN
            uint8_t val = ReadRegN(gb, reg);
            WriteRegN(gb, reg, (val | (1u << bit)));
        }
        break;
    }
}

16 bit arithemtic

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
uint16_t Add16(struct Gameboy* gb, uint16_t val0, uint16_t val1)
{
    unsigned int sum = val0 + val1;
    unsigned int halfSum = (val0 & 0xFFF) + (val1 & 0xFFF);
    UpdateN(gb, false);
    UpdateH(gb, halfSum > 0xFFF);
    UpdateC(gb, sum > 0xFFFF);
    return sum & 0xFFFF;
}
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x03: { // inc $bc
    clock_increment(gb);
    WriteBC(gb, ReadBC(gb) + 1);
} break;
case 0x13: { // inc $de
    clock_increment(gb);
    WriteDE(gb, ReadDE(gb) + 1);
} break;
case 0x23: { // inc $hl
    clock_increment(gb);
    WriteHL(gb, ReadHL(gb) + 1);
} break;
case 0x33: { // inc $sp
    clock_increment(gb);
    gb->cpu.SP += 1;
} break;
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x0B: { // dec $bc
    clock_increment(gb);
    WriteBC(gb, ReadBC(gb) - 1);
} break;
case 0x1B: { // dec $de
    clock_increment(gb);
    WriteDE(gb, ReadDE(gb) - 1);
} break;
case 0x2B: { // dec $hl
    clock_increment(gb);
    WriteHL(gb, ReadHL(gb) - 1);
} break;
case 0x3B: { // dec $sp
    clock_increment(gb);
    gb->cpu.SP -= 1;
} break;
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x09: { // add $hl, $bc
    clock_increment(gb);
    WriteHL(gb, Add16(gb, ReadHL(gb), ReadBC(gb)));
} break;
case 0x19: { // add $hl, $de
    clock_increment(gb);
    WriteHL(gb, Add16(gb, ReadHL(gb), ReadDE(gb)));
} break;
case 0x29: { // add $hl, $hl
    clock_increment(gb);
    WriteHL(gb, Add16(gb, ReadHL(gb), ReadHL(gb)));
} break;
case 0x39: { // add $hl, $sp
    clock_increment(gb);
    WriteHL(gb, Add16(gb, ReadHL(gb), gb->cpu.SP));
} break;
case 0xE8: { // add $sp, imm8i
    uint16_t ea = gb->cpu.SP + Imm8i(gb);
    clock_increment(gb);
    clock_increment(gb);
    UpdateZNHC(gb, false, false, (ea & 0xF) < (gb->cpu.SP & 0xF), (ea & 0xFF) < (gb->cpu.SP & 0xFF));
    gb->cpu.SP = ea;
} break;
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x34: { // inc ($hl)
    uint16_t addr = ReadHL(gb);
    mmu_write(gb, addr, Inc8(gb, mmu_read(gb, addr)));
} break;
case 0x35: { // dec ($hl)
    uint16_t addr = ReadHL(gb);
    mmu_write(gb, addr, Dec8(gb, mmu_read(gb, addr)));
} break;

Branch/Jump (jr, jp)

Jump instructions modify the program counter, determining which instruction will be executed next. Jumps can be conditional, in which case the modification of the program counter only happens (the jump is taken) if the attached condition is true.

Possible conditions are:

When a jump is taken there is a delay of a single machine cycle in addition to the normal instruction fetch delay.

The "target" of a jump is the value which the program counter will be set to if the jump is taken. The target can be specified as either a relative or absolute address.

Absolute jump instructions (jp) simply copy the target address directly into the program counter.

<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
void Jump(struct Gameboy* gb, uint16_t addr)
{
    clock_increment(gb);
    gb->cpu.PC = addr;
}

void JumpCond(struct Gameboy* gb, uint16_t addr, bool cond)
{
    if(cond) {
        Jump(gb, addr);
    }
}
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0xE9: gb->cpu.PC = ReadHL(gb); break; // jp $hl
case 0xC3: Jump(gb, Imm16(gb)); break; // jp imm16
case 0xC2: JumpCond(gb, Imm16(gb), !ReadZ(gb)); break; // jp nz, imm16
case 0xCA: JumpCond(gb, Imm16(gb), ReadZ(gb)); break; // jp z, imm16
case 0xD2: JumpCond(gb, Imm16(gb), !ReadC(gb)); break; // jp nc, imm16
case 0xDA: JumpCond(gb, Imm16(gb), ReadC(gb)); break; // jp c, imm16

Relative jump instructions (jr) specify a signed offset which is added to the current value of the program counter. The program counter has already been advanced when this addition takes place, so the offset is relative to the address of the instruction immediately following the jr instruction.

0x00: jr +4
0x01: add a, 1
0x03: add a, 1
; execution will begin here, at 0x01 + 4
0x05: add a, 1
<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
void JumpRel(struct Gameboy* gb, int8_t offset)
{
    clock_increment(gb);
    gb->cpu.PC += offset;
}

void JumpRelCond(struct Gameboy* gb, int8_t offset, bool cond)
{
    if(cond) {
        JumpRel(gb, offset);
    }
}
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x18: JumpRel(gb, Imm8i(gb)); break; // jr imm8
case 0x20: JumpRelCond(gb, Imm8i(gb), !ReadZ(gb)); break; // jr nz, imm8
case 0x28: JumpRelCond(gb, Imm8i(gb), ReadZ(gb)); break; // jr z, imm8i
case 0x30: JumpRelCond(gb, Imm8i(gb), !ReadC(gb)); break; // jr nc, imm8i
case 0x38: JumpRelCond(gb, Imm8i(gb), ReadC(gb)); break; // jr c, imm8i

Call and Return (call, ret)

Call instructions push the address of the next instruction to be executed onto the stack and then jump to the specified address. e.g.

; if nz:
;   push 0x2003
;   jmp 0xC345
0x2000: call nz, 0xC345
0x2003: xor a, a
<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
void Call(struct Gameboy* gb, uint16_t addr)
{
    clock_increment(gb);
    Push16(gb, gb->cpu.PC);
    gb->cpu.PC = addr;
}

void CallCond(struct Gameboy* gb, uint16_t addr, bool cond)
{
    if(cond) {
        Call(gb, addr);
    }
}
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0xC4: CallCond(gb, Imm16(gb), !ReadZ(gb)); break; // call nz, imm16
case 0xCC: CallCond(gb, Imm16(gb), ReadZ(gb)); break; // call z, imm16
case 0xD4: CallCond(gb, Imm16(gb), !ReadC(gb)); break; // call nc, imm16
case 0xDC: CallCond(gb, Imm16(gb), ReadC(gb)); break; // call c, imm16

case 0xCD: Call(gb, Imm16(gb)); break; // call imm16

The reset group of instructions are single-byte instructions which unconditionally call a fixed address.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0xC7: Call(gb, 0x00); break; // rst 0x00
case 0xCF: Call(gb, 0x08); break; // rst 0x08
case 0xD7: Call(gb, 0x10); break; // rst 0x10
case 0xDF: Call(gb, 0x18); break; // rst 0x18
case 0xE7: Call(gb, 0x20); break; // rst 0x20
case 0xEF: Call(gb, 0x28); break; // rst 0x28
case 0xF7: Call(gb, 0x30); break; // rst 0x30
case 0xFF: Call(gb, 0x38); break; // rst 0x38

Return (ret) instructions pop an address from the stack and then jump to it.

; if nz:
;   addr = pop16
;   jmp addr
ret nz
<< cpu helpers (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) >>=
void Ret(struct Gameboy* gb)
{
    Jump(gb, Pop16(gb));
}

void RetCond(struct Gameboy* gb, bool cond)
{
    clock_increment(gb);
    if(cond) {
        Ret(gb);
    }
}
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0xC9: Ret(gb); break; // ret
case 0xC0: RetCond(gb, !ReadZ(gb)); break; // ret nz
case 0xC8: RetCond(gb, ReadZ(gb)); break; // ret z
case 0xD0: RetCond(gb, !ReadC(gb)); break; // ret nc
case 0xD8: RetCond(gb, ReadC(gb)); break; // ret c
case 0xD9: { // reti
    Ret(gb);
    gb->cpu.InterruptsEnabled = true;
} break;

Misc instructions

The simplest instruction is nop (no operation), which does nothing.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x00: break; // nop

Interrupt Enable/Disable

There are instructions which disable and enable interrupts. Note there is a specific ret instruction variant (reti) which performs a ret and ei. The ei instruction has a delay of a single instruction before taking effect; e.g

di
...
ei
inc a
inc a

Will always execute at least one inc a even if there is an interrupt pending when ei is executed.

<< cpu registers (0 1 2 3 4 5 6) >>=
bool InterruptEnablePending;
<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0xF3: {
    gb->cpu.InterruptsEnabled = false;
    gb->cpu.InterruptEnablePending = false;
}
break; // di
case 0xFB: gb->cpu.InterruptEnablePending = true; break; // ei

Halt

The halt instruction is used to halt the CPU until an interrupt is received. Whilst halted the CPU enters a lower power state and does not execute any instructions, however the clock still runs and all other parts of the system continue as normal.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x76: { // halt
    << halt handling >>
} break;

To emulate this a flag is stored in the CPU state:

<< cpu registers (0 1 2 3 4 5 6) >>=
bool Halted;

This flag is set by the halt instruction:

<< halt handling (0 1) >>=
gb->cpu.Halted = true;

When the CPU is halted the CPU emulation function only advances the system clock.

<< halt emulation >>=
if(gb->cpu.Halted) {
    clock_increment(gb);
    return;
}

There is a bug in the CPU which is triggered when a halt instruction is executed and interrupts are disabled. In this situation the halt state will still be exited when an interrupt fires, but the program counter is not advanced after reading the initial instruction byte of the subsequent instruction. So, for example, the code

0xf3 ; di
0x76 ; halt
0x3c ; inc a

actually increments the a register twice. This bug is especially dangerous for multi-byte instructions, as the initial instruction byte will be re-read as the first extension byte; e.g.

0xf3 ; di
0x76 ; halt
0xee 0x0a ; xor a,0x0a

will execute as

0xf3 ; di
0x76 ; halt
0xee 0xee ; xor a,0xee
0x0a ; ld a, (bc)

To emulate this behaviour another flag is added to the CPU state, which is set when entering halt mode with interrupts disabled.

<< cpu registers (0 1 2 3 4 5 6) >>=
bool HaltBug;
<< halt handling (0 1) >>=
if(gb->cpu.InterruptsEnabled == 0) {
    gb->cpu.HaltBug = true;
}

Then immediately after reading the first instruction byte of the next executed instruction (which will happen when an interrupt fires and the CPU leave halt mode) this flag is checked and the program counter adjuested appropriately.

<< halt bug emulation >>=
if(gb->cpu.HaltBug) {
    gb->cpu.HaltBug = false;
    gb->cpu.PC -= 1;
}

Decimal Adjust Accumulator

This instruction is used after adding two binary-coded-decimal (BCD) numbers using the binary addition instructions to convert the result to a BCD result.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x27: { // daa
    uint16_t a = gb->cpu.A;
    if(ReadN(gb)) {
        if(ReadH(gb)) {
            a -= 0x06;
            a &= 0xFF;
        }
        if(ReadC(gb)) {
            a -= 0x60;
        }
    }
    else {
        if((a & 0x0F) > 0x09 || ReadH(gb)) {
            a += 0x06;
        }
        if(a > 0x9F || ReadC(gb)) {
            a += 0x60;
        }
    }
    UpdateZ(gb, (a & 0xFF) == 0);
    UpdateH(gb, false);
    if(a & 0x100) { UpdateC(gb, true); }
    gb->cpu.A = a;
} break;

Stop

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x10: { // stop 0
    // TODO STOP
    assert(false);
} break;

Set & Complement Carry Flag

These instructions set (scf) or toggle (complement) (ccf) the carry flag.

<< cpu instructions (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38) >>=
case 0x37: { // scf
    UpdateN(gb, false);
    UpdateH(gb, false);
    UpdateC(gb, true);
} break;
case 0x3F: { // ccf
    UpdateN(gb, false);
    UpdateH(gb, false);
    UpdateC(gb, !ReadC(gb));
} break;

Undocumented

The CPU instructions 0xD3, 0xDB, 0xDD, 0xE3, 0xE4, 0xEB, 0xEC, 0xED, 0xF4, 0xFC and 0xFD are undocumented, and currently unemulated.

Core initialisation and main loop

<< state (0 1 2 3 4 5 6 7 8) >>=
struct {
    char Title[16];
    bool HasRTC;
    bool HasBattery;
    bool HasRumble;
} info;

ROM Loading

<< public function declarations (0 1 2 3 4) >>=
char const* gameboy_load(struct Gameboy*);
<< rom loading function >>=
char const* gameboy_load(struct Gameboy* gb)
{
    << rom loading >>

#if GAMEBOY_DEBUG
    gb->debug.Context = NULL;
    gb->debug.MemoryReadHook = NULL;
    gb->debug.MemoryWriteHook = NULL;
#endif

    return NULL;
}

When loading a ROM first clear any existing info

<< rom loading (0 1 2 3 4 5 6 7) >>=
/* Reset state */
memset(&gb->info.Title[0], 0, sizeof(gb->info.Title));
gb->info.HasRTC = false;
gb->info.HasBattery = false;
gb->info.HasRumble = false;
gb->mem.CartRAMSize = 0;
gb->mem.CartROMSize = 0;
gb->rtc.BaseTime = time(NULL);
for(unsigned i = 0; i < 5; i += 1) {
    gb->rtc.BaseReg[i] = 0x00;
}

The cartridge ROM contains a header from address 0x104 - 0x14F. The 0x104 - 0x133 is the Nintendo logo, which is checked by the boot ROM on real Gameboy hardware, as a form of copy protection.

A one-byte checksum of the header is stored at address 0x14D.

<< rom loading (0 1 2 3 4 5 6 7) >>=
{
    uint8_t headerChecksum = 0;
    for(unsigned int i = 0x134; i < 0x14D; i += 1) {
        headerChecksum = headerChecksum - gb->mem.CartROM[i] - 1;
    }
    if(headerChecksum != gb->mem.CartROM[0x14D]) {
        return "Header checksum incorrect";
    }
}

The name of the ROM is stored (zero-padded) in the header at addresses 0x134 - 0x143. Later ROMs re-purpose the last four characters for a manufacturer code, these bytes always have the high bit set.

<< rom loading (0 1 2 3 4 5 6 7) >>=
/* Copy ROM title */
for(unsigned int i = 0; i < 16; i += 1) {
    uint8_t x = gb->mem.CartROM[0x134 + i];
    /* High bytes are part of new-style licences */
    if(x <= 127) {
        gb->info.Title[i] = x;
    }
}

The byte at address 0x147 specifies the cartridge type. This includes the type of memory bank controller and what external hardware is connected (if any).

<< rom loading (0 1 2 3 4 5 6 7) >>=
switch (gb->mem.CartROM[0x147]) {
    case 0x00:
        gb->mem.MBCModel = Cart_MBC_None;
        break;
    case 0x01:
        gb->mem.MBCModel = Cart_MBC1_16_8;
        break;
    case 0x02:
        gb->mem.MBCModel = Cart_MBC1_16_8;
        break;
    case 0x03:
        gb->mem.MBCModel = Cart_MBC1_16_8;
        gb->info.HasBattery = true;
        break;
    case 0x05:
        gb->mem.MBCModel = Cart_MBC2;
        break;
    case 0x06:
        gb->mem.MBCModel = Cart_MBC2;
        gb->info.HasBattery = true;
        break;
    case 0x08:
        gb->mem.MBCModel = Cart_MBC_None;
        break;
    case 0x09:
        gb->mem.MBCModel = Cart_MBC_None;
        gb->info.HasBattery = true;
        break;
    case 0x0F:
        gb->mem.MBCModel = Cart_MBC3;
        gb->info.HasBattery = true;
        gb->info.HasRTC = true;
        break;
    case 0x10:
        gb->mem.MBCModel = Cart_MBC3;
        gb->info.HasBattery = true;
        gb->info.HasRTC = true;
        break;
    case 0x11:
        gb->mem.MBCModel = Cart_MBC3;
        break;
    case 0x12:
        gb->mem.MBCModel = Cart_MBC3;
        break;
    case 0x13:
        gb->mem.MBCModel = Cart_MBC3;
        gb->info.HasBattery = true;
        break;
    case 0x19:
        gb->mem.MBCModel = Cart_MBC5;
        break;
    case 0x1A:
        gb->mem.MBCModel = Cart_MBC5;
        break;
    case 0x1B:
        gb->mem.MBCModel = Cart_MBC5;
        gb->info.HasBattery = true;
        break;
    case 0x1C:
        gb->mem.MBCModel = Cart_MBC5;
        gb->info.HasRumble = true;
        break;
    case 0x1D:
        gb->mem.MBCModel = Cart_MBC5;
        gb->info.HasRumble = true;
        break;
    case 0x1E:
        gb->mem.MBCModel = Cart_MBC5;
        gb->info.HasBattery = true;
        gb->info.HasRumble = true;
        break;
    case 0x1F: return "Pocket Camera not supported";
    case 0xFD: return "Bandai TAMA5 not supported";
    case 0xFE: return "Hudson HuC-3 not supported";
    case 0xFF: return "Hudson HuC-1 not supported";
    case 0x0B:
    case 0x0C:
    case 0x0D:
        return "MMM01 not supported";
        break;
    default: return "Unknown ROM type";
}

The byte at address 0x148 specifies the size of the ROM in the cartridge.

<< rom loading (0 1 2 3 4 5 6 7) >>=
switch (gb->mem.CartROM[0x148]) {
    case 0x00: /* 256 Kbit */
        gb->mem.CartROMSize = 32768;
        break;
    case 0x01: /* 512 Kbit */
        gb->mem.CartROMSize = 65536;
        break;
    case 0x02: /* 1 Mbit */
        gb->mem.CartROMSize = 131072;
        break;
    case 0x03: /* 2 Mbit */
        gb->mem.CartROMSize = 262144;
        break;
    case 0x04: /* 4 Mbit */
        gb->mem.CartROMSize = 524288;
        break;
    case 0x05: /* 8 Mbit */
        gb->mem.CartROMSize = 1048576;
        break;
    case 0x06: /* 16 Mbit */
        gb->mem.CartROMSize = 2097152;
        break;
    case 0x52: /* 9 Mbit */
        gb->mem.CartROMSize = 1179648;
        break;
    case 0x53: /* 10 Mbit */
        gb->mem.CartROMSize = 1310720;
        break;
    case 0x54: /* 12 Mbit */
        gb->mem.CartROMSize = 1572864;
        break;
}

The byte at address 0x149 specifies the size of the RAM in the cartridge (if any).

<< rom loading (0 1 2 3 4 5 6 7) >>=
switch (gb->mem.CartROM[0x149]) {
    case 0: /* no RAM */
        gb->mem.CartRAMSize = 0;
        break;
    case 1: /* 16 kBit */
        gb->mem.CartRAMSize = 2048;
        break;
    case 2: /* 64 kBit */
        gb->mem.CartRAMSize = 8192;
        break;
    case 3: /* 256 kBit */
        gb->mem.CartRAMSize = 32768;
        break;
    case 4: /* 1 MBit */
        gb->mem.CartRAMSize = 131072;
        break;
}

There is an exception to the above check for the amount of RAM available: the MBC2 memory bank controller always contains 512x4bits of RAM.

<< rom loading (0 1 2 3 4 5 6 7) >>=
/* All MBC2 chips contain 512x4bits RAM even though ROM[0x149] == 0 */
if(gb->mem.MBCModel == Cart_MBC2) {
    gb->mem.CartRAMSize = 512;
}

A two-byte checksum of the entire ROM is stored in the final two bytes of the header (0x14E - 0x14F). Now that the size of the ROM is known the checksum can be checked.

<< rom loading (0 1 2 3 4 5 6 7) >>=
{
    uint16_t romChecksum = 0;
    for(unsigned int i = 0; i < gb->mem.CartROMSize; i+= 1) {
        romChecksum += gb->mem.CartROM[i];
    }
    /* ROM Checksum does not include the checksum bytes */
    romChecksum -= gb->mem.CartROM[0x14E];
    romChecksum -= gb->mem.CartROM[0x14F];

    if((((uint16_t)gb->mem.CartROM[0x14E] << 8u) | gb->mem.CartROM[0x14F]) != romChecksum) {
        return "ROM Checksum incorrect";
    }
}

Resetting

<< public function declarations (0 1 2 3 4) >>=
int gameboy_reset(struct Gameboy*, bool enableBootROM);
<< reset function >>=
int gameboy_reset(struct Gameboy* gb, bool enableBootROM)
{
    << reset >>
    return 0;
}

Store whether the boot ROM is enabled

<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
gb->mem.BootROMEnabled = enableBootROM;

If the boot ROM is enabled then execution begins at address zero, and the boot ROM will do all required initialisation. If it is not then execution should start at the beginning of the cartridge (0x100).

<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
/* Either start executing the boot ROM or the Cart code. */
if(gb->mem.BootROMEnabled == 1) {
    gb->cpu.PC = 0;
} else {
    gb->cpu.PC = 0x100;
}

Initialise all CPU registers to known values

<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
gb->cpu.SP = 0xFFFE;

// Taken from The Cycle Accurate GB Doc
gb->cpu.A = 0x01;
gb->cpu.F = 0xB0;
gb->cpu.B = 0x00;
gb->cpu.C = 0x13;
gb->cpu.D = 0x00;
gb->cpu.E = 0xD8;
gb->cpu.H = 0x01;
gb->cpu.L = 0x4D;

gb->cpu.InterruptsEnabled = false;
gb->cpu.InterruptEnablePending = false;
gb->cpu.Halted = false;
gb->cpu.HaltBug = false;

Clear VRAM

<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
/* Clear all VRAM - the bootrom does this. */
memset(gb->mem.VideoRAM, 0, sizeof(gb->mem.VideoRAM));

Initialise IO registers to known values.

<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
/* Initialise required IO registers */
gb->mem.IO[IO_Joypad] = 0xCF;
gb->mem.IO[IO_SerialControl] = 0x7E;
gb->mem.IO[IO_TimerCounter] = 0x00;
gb->mem.IO[IO_TimerModulo] = 0x00;
gb->mem.IO[IO_TimerControl] = 0x00;
gb->mem.IO[IO_LCDControl] = 0x91;
gb->mem.IO[IO_ScrollY] = 0x00;
gb->mem.IO[IO_ScrollX] = 0x00;
gb->mem.IO[IO_LCDYCompare] = 0x00;
gb->mem.IO[IO_BackgroundPalette] = 0xFC;
gb->mem.IO[IO_ObjectPalette0] = 0xFF;
gb->mem.IO[IO_ObjectPalette1] = 0xFF;
gb->mem.IO[IO_WindowX] = 0x00;
gb->mem.IO[IO_WindowY] = 0x00;
gb->mem.InterruptEnable = 0x00;

/* Initialise sound IO registers */
gb->mem.IO[0xFF10] = 0x80;
gb->mem.IO[0xFF11] = 0xBF;
gb->mem.IO[0xFF12] = 0xF3;
gb->mem.IO[0xFF14] = 0xBF;
gb->mem.IO[0xFF16] = 0x3F;
gb->mem.IO[0xFF17] = 0x00;
gb->mem.IO[0xFF19] = 0xBF;
gb->mem.IO[0xFF1A] = 0x7F;
gb->mem.IO[0xFF1B] = 0xFF;
gb->mem.IO[0xFF1C] = 0x9F;
gb->mem.IO[0xFF1E] = 0xBF;
gb->mem.IO[0xFF20] = 0xFF;
gb->mem.IO[0xFF21] = 0x00;
gb->mem.IO[0xFF22] = 0x00;
gb->mem.IO[0xFF23] = 0xBF;
gb->mem.IO[0xFF24] = 0x77;
gb->mem.IO[0xFF25] = 0xF3;
gb->mem.IO[0xFF26] = 0xF1;

Reset MBC state

<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
gb->mem.MBCRAMBank = 0;
gb->mem.CartRAMBankEnabled = false;

/* MBC1 always starts up in 16/8 mode */
if(gb->mem.MBCModel == Cart_MBC1_4_32) {
    gb->mem.MBCModel = Cart_MBC1_16_8;
}

gb->mem.MBCROMBank = 1;

Reset internal state

<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
gb->buttons.Pressed = 0;

gb->lcd.NewFrame = false;

Main loop

A single function is provided to execute a single CPU instruction and handle all required updates.

<< public function declarations (0 1 2 3 4) >>=
int gameboy_step(struct Gameboy*);
<< step function >>=
int gameboy_step(struct Gameboy* gb)
{
    input_update(gb);
    cpu_handleInterrupts(gb);
    cpu_step(gb);

    return 0;
}
<< gameboy functions >>=
<< rom loading function >>
<< reset function >>
<< step function >>

Boot ROM

When the boot rom is enabled the first 256 bytes of memory are mapped to an internal ROM. Since CPU execution starts at address zero this ROM is the first code to be executed on a real Gameboy.

<< boot rom >>=
uint8_t const BootROM[256] = {
    0x31,0xFE,0xFF,0xAF,0x21,0xFF,0x9F,0x32,0xCB,0x7C,0x20,0xFB,0x21,0x26,0xFF,0x0E,
    0x11,0x3E,0x80,0x32,0xE2,0x0C,0x3E,0xF3,0xE2,0x32,0x3E,0x77,0x77,0x3E,0xFC,0xE0,
    0x47,0x11,0x04,0x01,0x21,0x10,0x80,0x1A,0xCD,0x95,0x00,0xCD,0x96,0x00,0x13,0x7B,
    0xFE,0x34,0x20,0xF3,0x11,0xD8,0x00,0x06,0x08,0x1A,0x13,0x22,0x23,0x05,0x20,0xF9,
    0x3E,0x19,0xEA,0x10,0x99,0x21,0x2F,0x99,0x0E,0x0C,0x3D,0x28,0x08,0x32,0x0D,0x20,
    0xF9,0x2E,0x0F,0x18,0xF3,0x67,0x3E,0x64,0x57,0xE0,0x42,0x3E,0x91,0xE0,0x40,0x04,
    0x1E,0x02,0x0E,0x0C,0xF0,0x44,0xFE,0x90,0x20,0xFA,0x0D,0x20,0xF7,0x1D,0x20,0xF2,
    0x0E,0x13,0x24,0x7C,0x1E,0x83,0xFE,0x62,0x28,0x06,0x1E,0xC1,0xFE,0x64,0x20,0x06,
    0x7B,0xE2,0x0C,0x3E,0x87,0xE2,0xF0,0x42,0x90,0xE0,0x42,0x15,0x20,0xD2,0x05,0x20,
    0x4F,0x16,0x20,0x18,0xCB,0x4F,0x06,0x04,0xC5,0xCB,0x11,0x17,0xC1,0xCB,0x11,0x17,
    0x05,0x20,0xF5,0x22,0x23,0x22,0x23,0xC9,0xCE,0xED,0x66,0x66,0xCC,0x0D,0x00,0x0B,
    0x03,0x73,0x00,0x83,0x00,0x0C,0x00,0x0D,0x00,0x08,0x11,0x1F,0x88,0x89,0x00,0x0E,
    0xDC,0xCC,0x6E,0xE6,0xDD,0xDD,0xD9,0x99,0xBB,0xBB,0x67,0x63,0x6E,0x0E,0xEC,0xCC,
    0xDD,0xDC,0x99,0x9F,0xBB,0xB9,0x33,0x3E,0x3C,0x42,0xB9,0xA5,0xB9,0xA5,0x42,0x3C,
    0x21,0x04,0x01,0x11,0xA8,0x00,0x1A,0x13,0xBE,0x20,0xFE,0x23,0x7D,0xFE,0x34,0x20,
    0xF5,0x06,0x19,0x78,0x86,0x23,0x05,0x20,0xFB,0x86,0x20,0xFE,0x3E,0x01,0xE0,0x50
};

A register at address 0xFF50 is used to disable access to the boot ROM, and therefore allow the first 256 bytes of the cartridge ROM to be read instead. The boot ROM cannot be re-enabled once disabled.

<< mmu write special cases (0 1 2 3 4 5) >>=
case IO_BootROMDisable: /* Writing to this address disables the boot ROM */
    {
        gb->mem.BootROMEnabled = false;
    }
    break;

Memory subsystem emulation

<< state (0 1 2 3 4 5 6 7 8) >>=
struct {
    /* 0x8000 - 0x9FFF */
    uint8_t VideoRAM[8192];
    /* 0xC000 - 0xDFFF */
    uint8_t WorkRAM[8192];
    /* 0xFE00 - 0xFE9F */
    uint8_t OAM[160];
    /* 0xFF00 - 0xFF7F */
    uint8_t IO[128];
    /* 0xFF80 - 0xFFFE */
    uint8_t HighRAM[127];
    /* 0xFFFF */
    uint8_t InterruptEnable;

    /* The cartridge ROM & RAM is typically banked into the main address
     * space using a MBC chip */
    unsigned int CartROMSize;
    uint8_t CartROM[Cart_MaxROMSize];

    unsigned int CartRAMSize;
    uint8_t CartRAM[Cart_MaxRAMSize];

    /* If true then addresses 00-FF contain the boot ROM */
    bool BootROMEnabled;

    /* Cartridge RAM should be enabled before writing to it, and disabled when finished */
    bool CartRAMBankEnabled;

    /* Model and state of the MBC chip */
    int MBCModel;
    unsigned int MBCROMBank;
    unsigned int MBCRAMBank;
} mem;

All systems access memory via two functions:

<< function declarations (0 1 2 3 4) >>=
static uint8_t mmu_read(struct Gameboy*, int);
static void mmu_write(struct Gameboy*, int, uint8_t);

The memory map of the Gameboy is quite straight forward

Physically there are two 8 Kilobyte SRAM chips (video and work RAM). The video RAM is connected to a separate address and data bus from the work RAM (which shares its busses with the cartridge slot). This allows the video processor to access the video RAM in parallel to the main CPU accessing either the cartridge or work RAM.

<< mmu read direct >>=
static uint8_t mmu_readDirect(struct Gameboy* gb, uint16_t addr) {
    if ((addr <= 0x00FF) && gb->mem.BootROMEnabled) {
        return BootROM[addr];
    } else if (addr < 0x4000) {
        /* 16K - ROM Bank #0 (fixed) */
        return gb->mem.CartROM[addr];
    } else if (addr < 0x8000) {
        /* 16K - Banked ROM area */
        return mmu_readBankedROM(gb, addr - 0x4000);
    } else if (addr < 0xA000) {
        /* Video RAM */
        return gb->mem.VideoRAM[addr - 0x8000];
    } else if (addr < 0xC000) {
        /* 8K - Banked RAM Area */
        return mmu_readBankedRAM(gb, addr - 0xA000);
    } else if (addr < 0xE000) {
        /* 8K - Internal RAM */
        return gb->mem.WorkRAM[addr - 0xC000];
    } else if (addr < 0xFE00) {
        /* Mirror of internal RAM */
        return gb->mem.WorkRAM[addr - 0xE000];
    } else if (addr < 0xFE9F) {
        /* OAM */
        return gb->mem.OAM[addr - 0xFE00];
    } else if (addr < 0xFF00) {
        /* Empty */
        return 0x00;
    } else if (addr < 0xFF80) {
        /* IO registers */
        return gb->mem.IO[addr - 0xFF00] | IOUnusedBits[addr - 0xFF00];
    } else if(addr < 0xFFFF) {
        return gb->mem.HighRAM[addr - 0xFF80];
    } else {
        return gb->mem.InterruptEnable;
    }
}
<< io register addresses >>=
enum IORegisters {
    /* Addresses relative to 0xFF00 */
    IO_Joypad = 0x00,
    IO_SerialData = 0x01,
    IO_SerialControl = 0x02,
    IO_Divider = 0x04,
    IO_TimerCounter = 0x05,
    IO_TimerModulo = 0x06,
    IO_TimerControl = 0x07,
    IO_InterruptFlag = 0x0F,

    IO_Sound1Sweep = 0x10,
    IO_Sound1Mode = 0x11,
    IO_Sound1Envelope = 0x12,
    IO_Sound1FreqLo = 0x13,
    IO_Sound1FreqHi = 0x14,

    IO_Sound2Mode = 0x16,
    IO_Sound2Envelope = 0x17,
    IO_Sound2FreqLo = 0x18,
    IO_Sound2FreqHi = 0x19,

    IO_Sound3Enable = 0x1A,
    IO_Sound3Length = 0x1B,
    IO_Sound3Level = 0x1C,
    IO_Sound3FreqLo = 0x1D,
    IO_Sound3FreqHi = 0x1E,

    IO_Sound4Length = 0x20,
    IO_Sound4Envelope = 0x21,
    IO_Sound4Poly = 0x22,
    IO_Sound4Counter = 0x23,

    IO_SoundChannels = 0x24,
    IO_SoundOutput = 0x25,
    IO_SoundControl = 0x26,

    /* 0x30 - 0x3F wave RAM */

    IO_LCDControl = 0x40,
    IO_LCDStat = 0x41,
    IO_ScrollY = 0x42,
    IO_ScrollX = 0x43,
    IO_LCDY = 0x44,
    IO_LCDYCompare = 0x45,
    IO_OAMDMA = 0x46,
    IO_BackgroundPalette = 0x47,
    IO_ObjectPalette0 = 0x48,
    IO_ObjectPalette1 = 0x49,
    IO_WindowY = 0x4A,
    IO_WindowX = 0x4B,

    IO_BootROMDisable = 0x50,
};

Some IO Registers have unused bits which always read 1s.

<< io registers unused bits >>=
static uint8_t const IOUnusedBits[128] = {
    [IO_Joypad] = 0xC0,
    // ... 
};
<< memory functions >>=
<< boot rom >>
<< rtc functions >>
static uint8_t mmu_readBankedROM(struct Gameboy* gb, unsigned int relativeAddress)
{
    unsigned int cartAddr = (gb->mem.MBCROMBank * 16384) + relativeAddress;
    return gb->mem.CartROM[cartAddr % gb->mem.CartROMSize];
}

static uint8_t mmu_readBankedRAM(struct Gameboy* gb, unsigned int relativeAddress)
{
    if(gb->mem.MBCModel == Cart_MBC3 && gb->mem.MBCRAMBank >= Cart_MBC3_RTCBase) {
        return mmu_readRTC(gb, gb->mem.MBCRAMBank);
    }
    else {
        unsigned int cartAddr = (gb->mem.MBCRAMBank * 8192) + relativeAddress;
        if(gb->mem.CartRAMSize && gb->mem.CartRAMBankEnabled) {
            return gb->mem.CartRAM[cartAddr % gb->mem.CartRAMSize];
        }
        else {
            return 0xFF;
        }
    }
}
<< mmu read direct >>

static void mmu_writeBankedRAM(struct Gameboy* gb, unsigned int relativeAddress, uint8_t data)
{
    if(gb->mem.MBCModel == Cart_MBC3 && gb->mem.MBCRAMBank >= Cart_MBC3_RTCBase) {
        mmu_writeRTC(gb, gb->mem.MBCRAMBank, data);
    }
    else if(gb->mem.CartRAMBankEnabled) {
        unsigned int cartAddr = (gb->mem.MBCRAMBank * 8192) + relativeAddress;
        if(cartAddr < gb->mem.CartRAMSize) {
            if(gb->mem.MBCModel == Cart_MBC2) {
                // MBC2 internal RAM is 4bit
                data &= 0x0F;
            }
            gb->mem.CartRAM[cartAddr] = data;
        }
    }
}

static void mmu_setROMBank(struct Gameboy* gb, unsigned int addr, uint8_t data)
{
    switch(gb->mem.MBCModel) {
        << set rom bank cases >>

        default:
            break;
    }
}

static void mmu_setRAMBank(struct Gameboy* gb, unsigned int addr, uint8_t data)
{
    switch(gb->mem.MBCModel) {
        << set ram bank cases >>

        default:
            break;
    }
}

static uint8_t mmu_read(struct Gameboy* gb, int addr) {
    GBTRACE(gb, (&(struct gameboy_tp){ .point = GAMEBOY_TP_MEM_READ, .u = { .mem_read = { .addr = addr } } }));
    clock_increment(gb);

    if(gb->dma.Active && addr < 0xFF80) {
        /* When OAM DMA is in progress any memory accesses outside of
         * high RAM (0xFF80 - 0xFFFE) will return 0xFF */
        return 0xFF;
    }
    else {
        return mmu_readDirect(gb, addr);
    }
}

static void mmu_writeDirect(struct Gameboy* gb, uint16_t addr, uint8_t value)
{
    if (addr < 0x2000) {
        /* Cart RAM enable */
        gb->mem.CartRAMBankEnabled = (value & 0xF) == 0xA;
    }
    else if(addr < 0x4000) {
        /* ROM Bank select */
        mmu_setROMBank(gb, addr, value);
    }
    else if(addr < 0x6000) {
        /* RAM Bank select (or high bits of ROM Bank for MBC1 mode 16/8) */
        mmu_setRAMBank(gb, addr, value);
    }
    else if(addr < 0x8000) {
        /* MBC1 Mode selection or MBC3 RTC latching */
        if(gb->mem.MBCModel == Cart_MBC1_16_8 || gb->mem.MBCModel == Cart_MBC1_4_32) {
            //<< mbc1 model selection >>
        }
        else if(gb->mem.MBCModel == Cart_MBC3) {
            << rtc latching >>
        }
    } else if (addr < 0xA000) {
        /* Video RAM */
        // TODO: Writes to VRAM should be ignored when the LCD is being redrawn
        gb->mem.VideoRAM[addr - 0x8000] = value;
    } else if (addr < 0xC000) {
        /* Banked RAM Area */
        mmu_writeBankedRAM(gb, addr - 0xA000, value);
    } else if (addr < 0xE000) {
        /* Internal RAM */
        gb->mem.WorkRAM[addr - 0xC000] = value;
    } else if (addr < 0xFE00) {
        /* Mirror of internal RAM */
        gb->mem.WorkRAM[addr - 0xE000] = value;
    } else if (addr < 0xFE9F) {
        /* OAM */
        gb->mem.OAM[addr - 0xFE00] = value;
    } else if (addr < 0xFF00) {
        /* Empty */
    } else if (addr < 0xFF80) {
        /* IO registers */
        switch(addr - 0xFF00) {
            << mmu write special cases >>
            case IO_LCDStat:
                {
                    uint8_t cur = gb->mem.IO[IO_LCDStat];
                    gb->mem.IO[IO_LCDStat] = (cur & 0x3) | (value & ~0x3);
                }
                break;

            case IO_LCDY: /* Current scanline -> writing resets it to zero */
                {
                    gb->mem.IO[IO_LCDY] = 0;
                }
                break;

            default: gb->mem.IO[addr - 0xFF00] = value; break;
        }
    } else if (addr < 0xFFFF) {
        gb->mem.HighRAM[addr - 0xFF80] = value;
    } else {
        gb->mem.InterruptEnable = value;
    }
}

static void mmu_write(struct Gameboy* gb, int addr, uint8_t value) {
    GBTRACE(gb, (&(struct gameboy_tp){ .point = GAMEBOY_TP_MEM_WRITE, .u = { .mem_write = { .addr = addr, .data = value } } }));
    clock_increment(gb);

    /* TODO is access to IO space (0xFF00 - 0xFF7F) OK? */
    if(gb->dma.Active && addr < 0xFF00) {
        /* When OAM DMA is in progress any memory writes outside of
         * high RAM (0xFF80 - 0xFFFE) will be ignored */
    }
    else {
        mmu_writeDirect(gb, addr, value);
    }
}

uint8_t gameboy_read(struct Gameboy* gb, uint16_t addr)
{
    return mmu_readDirect(gb, addr);
}

void gameboy_write(struct Gameboy* gb, uint16_t addr, uint8_t value)
{
    mmu_writeDirect(gb, addr, value);
}
<< public function declarations (0 1 2 3 4) >>=
uint8_t gameboy_read(struct Gameboy* gb, uint16_t addr);
void gameboy_write(struct Gameboy* gb, uint16_t addr, uint8_t value);

Memory Banking

Most Gameboy cardridges contain more ROM than the 32KiB of available address space, and many include more RAM than the 8KiB of available address space for it. Therefore these cartridges contain a chip called a Memory Bank Controller (MBC). There are several different MBCs with various features. They are all controlled by writes to the ROM address space.

The models supported by the emulator are MBC1, MBC2, MBC3 and MBC5. All models bank the ROM in 16KiB pages and the RAM in 8KiB pages. This matches the ranges shown in the memory map.

<< mmu enum >>=
enum {
    Cart_MaxROMSize = 4 * 1024 * 1024,
    Cart_MaxRAMSize = 32 * 1024,

    Cart_MBC_None = 0,

    /* MBC1 can operate in two modes, switchable at runtime */
    Cart_MBC1_16_8,
    Cart_MBC1_4_32,

    Cart_MBC2,
    Cart_MBC3,
    Cart_MBC5,

    Cart_MBC3_RTCBase = 0x08,
    Cart_MBC3_RTCLast = 0x0C,
};

MBC1

Supports two modes, switchable at runtime: 32KiB banked RAM with 4MiB banked ROM, or 8KiB unbanked RAM with 16MiB banked ROM. This is implemented using two registers

<< set rom bank cases (0 1 2 3) >>=
case Cart_MBC1_16_8:
case Cart_MBC1_4_32:
    {
        /* Bottom 5 bits of ROM Bank number */
        unsigned int bankNo = data & 0x1F;
        /* Zero in this register always maps to 1 */
        if(bankNo == 0) {
            bankNo = 1;
        }
        gb->mem.MBCROMBank = (gb->mem.MBCROMBank & 0xE0) | bankNo;
    }
    break;
<< set ram bank cases (0 1 2) >>=
case Cart_MBC1_16_8:
    gb->mem.MBCROMBank = ((gb->mem.MBCROMBank & 0x1F) | ((data & 0x3) << 5u));
    break;
case Cart_MBC1_4_32:
    gb->mem.MBCRAMBank = (data & 0x3);
    break;

When switching modes the registers maintain their values.

<< mbc1 mode selection >>=
if(value & 1u) {
    // RAM Banking mode - 32Kbyte RAM in 4 banks, 4MBit ROM
    gb->mem.MBCModel = Cart_MBC1_4_32;
    gb->mem.MBCRAMBank = (gb->mem.MBCROMBank >> 5u) & 0x03;
    gb->mem.MBCROMBank &= ~0x1F;
}
else {
    // ROM Banking mode - 8Kbytes unbanked RAM, 16MBit ROM
    gb->mem.MBCModel = Cart_MBC1_16_8;
    gb->mem.MBCROMBank = (gb->mem.MBCROMBank & 0x1F) | ((gb->mem.MBCRAMBank & 0x03) << 5u);
    gb->mem.MBCRAMBank = 0;
}

MBC2

All MBC2 chips contain 512 cells of 4bit RAM, which is exposed as 512 bytes in the mapped RAM area. The top 4 bits of each of these bytes does not store data written to it. Supports up to 16 ROM banks (4 Megabit ROM). Writing zero to the ROM bank register actually sets it to one, preventing bank zero being mapped to the banked ROM area.

<< set rom bank cases (0 1 2 3) >>=
case Cart_MBC2:
    {
        unsigned int bankNo = data & 0x0F;
        gb->mem.MBCROMBank = bankNo? bankNo : 1;
    }
    break;

MBC3

Supports up to 128 ROM banks (16 Megabit ROM) and 4 RAM banks. Writing zero to the ROM bank register will actually set it to one. The MBC3 also has support for a real time clock.

<< set rom bank cases (0 1 2 3) >>=
case Cart_MBC3:
    {
        unsigned int bankNo = data & 0x7F;
        gb->mem.MBCROMBank = bankNo? bankNo : 1;
    }
    break;
<< set ram bank cases (0 1 2) >>=
case Cart_MBC3:
    gb->mem.MBCRAMBank = data;
    break;

MBC3 Real Time Clock (RTC)

The RTC provides 5 registers, which are selected by writing a specific RAM bank number:

RAM Bank Register
0x08 Seconds (0-59)
0x09 Minutes (0-59)
0x0A Hours (0-23)
0x0B Days (lower 8 bits, 0-255)
0x0C (bit 0) Upper bit of day counter
0x0C (bit 6) Halt flag (1 = stop timer)
0x0C (bit 7) Overflow flag, set when 9-bit day counter overflows. Remains set until explicitly cleared.

These registers cannot be directly read, there is another set of registers into which the underlying register values are copied (latched) when requested. This provides a consistent view of all the underlying registers as the latched values will only update all together when explicitly requested. When writing to RTC registers the underlying registers are written directly, and so the clock should be stopped before performing any writes.

<< state (0 1 2 3 4 5 6 7 8) >>=
struct {
    time_t BaseTime;

    /* seconds, minutes, hours, days, dayhi */
    uint8_t BaseReg[5];
    uint8_t LatchedReg[5];
    bool Latched;
} rtc;

To keep track of real time the emulator stores the time provided by the computer's clock (the base time) along with the calculated values of the RTC registers. Then to calculate a new set of RTC register values the difference between the base time and the current time is added on to the stored values of the RTC registers (unless the RTC is stopped), and the base time updated.

<< rtc functions >>=
static void mmu_updateRTC(struct Gameboy* gb)
{
    time_t now = time(NULL);
    time_t new_time = 0;
    if((gb->rtc.BaseReg[4] & 0x40) == 0 && now > gb->rtc.BaseTime) {
        new_time = now - gb->rtc.BaseTime;
    }
    new_time += (time_t)gb->rtc.BaseReg[0];
    new_time += (time_t)gb->rtc.BaseReg[1] * 60;
    new_time += (time_t)gb->rtc.BaseReg[2] * 60 * 60;
    new_time += (time_t)gb->rtc.BaseReg[3] * 60 * 60 * 24;
    new_time += (time_t)(gb->rtc.BaseReg[4] & 1u) * 60 * 60 * 24 * 256;

    gb->rtc.BaseReg[0] = new_time % 60;
    new_time /= 60;
    gb->rtc.BaseReg[1] = new_time % 60;
    new_time /= 60;
    gb->rtc.BaseReg[2] = new_time % 24;
    new_time /= 24;
    gb->rtc.BaseReg[3] = new_time % 256;
    new_time /= 256;
    /* Top bit of 9-bit day counter */
    gb->rtc.BaseReg[4] = (gb->rtc.BaseReg[4] & 0xFE) | (new_time % 2);
    new_time /= 2;
    /* Days overflow bit (sticky) */
    gb->rtc.BaseReg[4] |= (new_time > 0? 0x80 : 0);

    gb->rtc.BaseTime = now;
}

static uint8_t mmu_readRTC(struct Gameboy* gb, uint8_t reg)
{
    if(reg > 0x0C || !gb->info.HasRTC) {
        return 0xFF;
    }
    else {
        if(gb->rtc.Latched) {
            return gb->rtc.LatchedReg[reg - 0x08];
        }
        else {
            mmu_updateRTC(gb);
            return gb->rtc.BaseReg[reg - 0x08];
        }
    }
}

static void mmu_writeRTC(struct Gameboy* gb, uint8_t reg, uint8_t val)
{
    if(reg <= 0x0C) {
        mmu_updateRTC(gb);
        gb->rtc.BaseReg[reg - 0x08] = val;
    }
}
<< rtc latching >>=
if(value == 0x00 && gb->rtc.Latched) {
    gb->rtc.Latched = false;
}
else if(value == 0x01 && !gb->rtc.Latched) {
    mmu_updateRTC(gb);
    for(unsigned i = 0; i < 5; i += 1) {
        gb->rtc.LatchedReg[i] = gb->rtc.BaseReg[i];
    }
    gb->rtc.Latched = true;
}

MBC5

Supports up to 512 ROM banks (64 Megabit ROM) and 16 RAM pages. The address range used to write the ROM bank is split in two to provide two registers, which together form the 9-bit ROM bank number. Unlike the other MBCs it is possible to map bank 0.

<< set rom bank cases (0 1 2 3) >>=
case Cart_MBC5:
    if(addr < 0x3000) {
        gb->mem.MBCROMBank = ((gb->mem.MBCROMBank & ~0xFF) | data);
    }
    else {
        gb->mem.MBCROMBank = ((gb->mem.MBCROMBank & 0xFF) | ((data & 1u) << 9u));
    }
    break;

There is a single 4-bit register for the RAM bank number. If the cartridge has a rumble motor then bit 4 of this register controls whether it is active.

<< set ram bank cases (0 1 2) >>=
case Cart_MBC5:
    /* TODO Rumble is controlled by bit 4 */
    gb->mem.MBCRAMBank = (data & 0x0F);
    break;

OAM DMA (0xFF46)

The Gameboy supports simple DMA transfer of a 160-byte block of memory from 256-byte aligned addresses below 0xF100 into the OAM area. This is used to update the OAM during the LCD VSync period, as this is the only safe time to access the OAM. Whilst the DMA engine is transferring data only the high RAM (0xFF80 - 0xFFFE) is accessable - all other memory accesses return 0xFF.

DMA is initiated by writing to the DMA register at address 0xFF46 the top 8 bits of the source address for the transfer. The DMA will start after a delay of 1 machine cycle:

ld $a, 0x80
ld (0xFF00 + 0x46), $a
nop ; DMA not started - low memory still accessable
nop ; DMA is now running - low memory reads return 0xFF

If a DMA operation is in progress then it will continue as normal for the delay cycle, and then the DMA will restart with the new source address:

; assume a DMA is running
ld $a, 0x80
ld (0xFF00 + 0x46), $a
nop ; Old DMA still running - low memory reads return 0xFF
nop ; New DMA is now running - low memory reads return 0xFF

Each cycle is executed instantaneously prior to memory reads/writes, so a separate flag is maintained to indicate to the memory subsystem if a DMA is active (and low memory accesses are blocked).

<< state (0 1 2 3 4 5 6 7 8) >>=
struct {
    bool DelayStart;
    uint8_t PendingSource;

    /* Source for currently running DMA (0 if not active) */
    uint16_t Source;

    /* If OAM DMA is active on current cycle */
    bool Active;
} dma;
<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
gb->dma.PendingSource = 0;
gb->dma.DelayStart = false;
gb->dma.Source = 0;
gb->dma.Active = false;
<< mmu write special cases (0 1 2 3 4 5) >>=
case IO_OAMDMA: /* LCD OAM DMA transfer */
    {
        if(value <= 0xF1) {
            GBTRACE(gb, (&(struct gameboy_tp){ .point = GAMEBOY_TP_DMA_INIT, .u = { .dma = { .src = value << 8 } } }));
            gb->dma.PendingSource = value;
            gb->dma.DelayStart = true;
        } else {
            assert(false && "Invalid LCD OAM transfer range");
        }
    }
    break;

The DMA runs in parallel to CPU execution, transferring a single byte each machine cycle.

<< function declarations (0 1 2 3 4) >>=
static void dma_update(struct Gameboy*);
<< per machine cycle updates (0 1 2 3 4) >>=
dma_update(gb);
<< dma update >>=
static void dma_update(struct Gameboy* gb)
{
    if(gb->dma.PendingSource) {
        if(!gb->dma.DelayStart) {
            GBTRACE(gb, (&(struct gameboy_tp){ .point = GAMEBOY_TP_DMA_START, .u = { .dma = { .src = gb->dma.PendingSource << 8 } } }));
            gb->dma.Source = gb->dma.PendingSource << 8;
            gb->dma.PendingSource = 0;
        }
        gb->dma.DelayStart = false;
    }

    if(gb->dma.Source && (gb->dma.Source & 0xFF) < 160) {
        GBTRACE(gb, (&(struct gameboy_tp){ .point = GAMEBOY_TP_DMA, .u = { .dma = { .src = gb->dma.Source, } } }));
        gb->dma.Active = true;
        gb->mem.OAM[gb->dma.Source & 0xFF] = mmu_readDirect(gb, gb->dma.Source);
        gb->dma.Source += 1;
    }
    else {
        gb->dma.Active = false;
    }
}

Video emulation

The graphical output of the gameboy is displayed on a 160x144 pixel LCD with two bit colour. Pixel value 3 is the darkest and 0 is the lightest. All graphics are based on 8x8 pixel tiles which are stored in VRAM.

Each 8x8 pixel tile is made up of 16 bytes (2 bits per pixel x 8 x 8), stored in memory as 2 bytes per line. Within those two bytes the first byte contains the least-significant bit of the colour and the second byte the most significant bit. The left-most (smallest x coordinate) pixel data is stored in the most-significant bit of the bytes.

<< video helpers (0 1 2 3) >>=
static uint8_t video_linePixel(uint8_t const line[2], unsigned x)
{
    return (((line[0] << x) & 0x80) >> 7) | (((line[1] << x) & 0x80) >> 6);
}

The tiles are stored at addresses 0x8000 to 0x9800, which is space for 384 tiles, this is split into two overlapping banks of 256 tiles: 0x8000 to 0x9000 and 0x8800 to 0x9800.

<< video helpers (0 1 2 3) >>=
static uint16_t video_tileLineAddress(uint8_t index, unsigned y, bool lowBank)
{
    /* These addresses are relative to VRAM base address (0x8000) */
    uint16_t addr;
    if(lowBank) {
        addr = index * 16;
    }
    else {
        addr = 0x1000 + ((int8_t)index * 16);
    }
    return addr + (y * 2);
}

There are two tile maps of 32x32 tiles (256x256 pixels) each in VRAM at addresses 0x9800 (low) and 0x9C00 (high). Each byte in the map is the index of an 8x8 tile in one of two tile banks. When addressing the lower tile bank the index is an unsigned index in the range 0 - 255. When addressing the upper tile bank the index is a signed integer in the range -128 to 127.

<< video helpers (0 1 2 3) >>=
static uint8_t video_mapPixel(struct Gameboy* gb, bool hiMap, bool loTiles, unsigned int x, unsigned int y)
{
    uint8_t tileIndex = gb->mem.VideoRAM[(hiMap? 0x1C00 : 0x1800) + ((y / 8) * 32) + (x / 8)];
    uint16_t addr = video_tileLineAddress(tileIndex, (y % 8), loTiles);
    return video_linePixel(&gb->mem.VideoRAM[addr], x % 8);
}
<< state (0 1 2 3 4 5 6 7 8) >>=
struct {
    /* Machine cycles through current frame */
    unsigned int FrameProgress;

    /* Bits 0-1 = colour (0 = darkest, 3 = lightest)
     * Other bits currently all zero.
     */
    uint8_t Buffer[160][144];
    /* Set to true whenever a complete new frame is available */
    /* Reset this when you read the frame */
    bool NewFrame;

    struct GameboySprite {
        uint8_t x;
        uint8_t pixels[2];
        uint8_t attrs;
    } ScanlineSprites[10];
    unsigned int NumSprites;

    unsigned int CurX;
} lcd;
<< reset (0 1 2 3 4 5 6 7 8 9 10) >>=
gb->lcd.FrameProgress = 0;

Palettes (0xFF47 - 0xFF49)

There are three palettes, which map from the two-bit input colour from the tile pixel to the two-bit output colour which will be displayed on the LCD. These palettes are each stored in a single byte; bits 0-1 are the output colour for sprite colour 0, bits 2-3 are the output colour for sprite colour 1, bits 4-5 are the output colour for sprite colour 2, and bits 6-7 for sprite colour 3.

<< video helpers (0 1 2 3) >>=
uint8_t video_paletteLookup(uint8_t pixel, uint8_t palette)
{
    assert(pixel <= 3);
    return (palette >> (pixel * 2)) & 0x03;
}

There is a single palette for the background tiles at address 0xFF47.

There are two object palettes (at 0xFF48, 0xFF49), which can be selected per-object based on a bit in the object's attributes.

Video control registers

LCD Control (0xFF40)

Bits (LSB=0) Function
0 Background & Window enable
1 Object enable
2 Object size (0=8x8, 1=8x16)
3 Background tile map select (0=low, 1=high)
4 Background & Window tile bank select (0=high, 1=low)
5 Window enable
6 Window tile map select (0=low, 1=high)
7 LCD Enable

The LCD enable must only be set to 0 during VBlank!

LCD Status (0xFF41)

LCD Y & Y Compare (0xFF44 & 0xFF45)

Background

The background can be scrolled with pixel precision using the Scroll X and Scroll Y registers (0xFF42 & 0xFF43). These registers define the pixel coordinate in the 256x256 tile mapped background which will be displayed in the top left of the screen.

Window

Unlike the background the window cannot be scrolled. The Window X and Window Y registers (0xFF4A & 0xFF4B) define the screen pixel coordinate at which the top left of the window tilemap is drawn.

Objects (Sprites)

Objects/sprites are made up of one or two tiles (8x8 or 8x16 pixels, selected in the LCD Control register) positioned at exact pixel coordinates. Up to 40 objects can be specified in the Object Attribute Memory (OAM), at addresses 0xFE00 - 0xFE9F. Each object definition is 4 bytes, consiting of Y coordinate, X coordinate, tile index, and attributes (1 byte each, in that order). The X coordinates are offset by -8 pixels (i,e. an X coordinate of 8 places the first column of the object tile in the first screen column), and Y coordinates are offset by -16 in a similar way. This allows positioning objects partially on screen. The tile index is the index of the tile to use for the object, always in the low tile bank.

The object attribute byte is made up of the following fields:

Bits (0=LSB) Meaning
0-3 Unused (given meaning on Gameboy Colour)
4 Palette selection (0 = Object Palette 0, 1 = Object Palette 1)
5 X Flip - object image is flipped horizontally
6 Y Flip - object image is flipped veritcally
7 Priority - (0 = object always on top of background & window, 1 = object only on top of colour 0 pixels of window & background)

When in double height (8x16) object mode (selected in the LCD control register) each object is made up of two adjacent tiles in the low tile bank. The least-significant bit of the tile index is ignored, the first (even numbered) tile makes up the top half of the object and the second (odd numbered) tile makes up the bottom half. Note that Y Flip applies to the whole 8x16 pixel object, not the two tiles individually.

Objects which overlap are drawn in priority order: first by smallest X coordinate and then (for objects with the same X coordinate) by smallest address in OAM. Only a maximum of 10 objects can be drawn on a single scanline; these are the 10 object with the highest priority (i.e. the leftmost 10, with ties broken by lowest OAM address).

Objects with a Y coordinate of 0 or >= 160 will be ignored (as they are entirely off-screen), however objects with an X coordinate of 0 will not be visible but still count towards the object count of that scanline!

<< gpu functions (0 1 2) >>=
<< video helpers >>
static void video_readSprites(struct Gameboy* gb, int scanlineNum)
{
    /* sprites can be 8x8 or 8x16 */
    unsigned int spriteHeight = (gb->mem.IO[IO_LCDControl] & 0x04)? 16 : 8;

    /* Collect all the sprites on the current scanline */
    unsigned int numSprites = 0;
    struct GameboySprite* sprites = gb->lcd.ScanlineSprites;

    for(unsigned int i = 0; i < 160; i += 4) {
        /* Position of top-left corner of sprite, offset by (8,16)
         * i.e. top left corner of display is (8,16)
         */
        uint8_t ypos = gb->mem.OAM[i];
        uint8_t xpos = gb->mem.OAM[i + 1];
        if(ypos > 0 && ypos < 160 && xpos < 168) { /* on screen */
            if(scanlineNum + 16 >= ypos && scanlineNum + 16 < ypos + spriteHeight) { /* in scanline */
                /* Insert the sprite into the list, keeping the list in priority order */
                assert(numSprites <= 10);
                unsigned int insPos = numSprites;
                while(insPos > 0 && sprites[insPos - 1].x > xpos) {
                    if(insPos < 10) {
                        sprites[insPos] = sprites[insPos - 1];
                    }
                    insPos -= 1;
                }
                if(insPos < 10) {
                    uint8_t tile = gb->mem.OAM[i + 2];
                    uint8_t attr = gb->mem.OAM[i + 3];
                    if(spriteHeight == 16) {
                        tile &= 0xFE;
                    }

                    unsigned tileY = scanlineNum + 16 - ypos;
                    if(attr & 0x40) { /* Y Flip */
                        tileY = (spriteHeight - 1) - tileY;
                    }

                    uint16_t tileAddr = video_tileLineAddress(tile, tileY, true);

                    sprites[insPos].x = xpos;
                    sprites[insPos].pixels[0] = gb->mem.VideoRAM[tileAddr];
                    sprites[insPos].pixels[1] = gb->mem.VideoRAM[tileAddr + 1];
                    sprites[insPos].attrs = attr;

                    if(numSprites < 10) {
                        numSprites += 1;
                    }
                }
            }
        }
    }

    gb->lcd.NumSprites = numSprites;
}

Pixel rendering

<< gpu functions (0 1 2) >>=
static void video_drawPixel(struct Gameboy* gb, unsigned int scanlineNum, unsigned int x)
{
    uint8_t lcdc = gb->mem.IO[IO_LCDControl];
    bool hiMapBG = (lcdc & 0x08);
    bool hiMapWin = (lcdc & 0x40);
    bool bgEnable = (lcdc & 0x01);
    bool winEnable = (lcdc & 0x20);
    bool spriteEnable = (lcdc & 0x02);
    bool loTiles = (lcdc & 0x10);

    uint8_t wy = gb->mem.IO[IO_WindowY];
    uint8_t wx = gb->mem.IO[IO_WindowX];

    winEnable = winEnable && wx < 167 && wy < 144 && wy <= scanlineNum;
    spriteEnable = spriteEnable && gb->lcd.NumSprites > 0;

    if(winEnable || bgEnable || spriteEnable) {
        uint8_t scy = gb->mem.IO[IO_ScrollY];
        uint8_t scx = gb->mem.IO[IO_ScrollX];

        uint8_t bgPixel = 0;
        if(winEnable && x + 7 >= wx) {
            bgPixel = video_mapPixel(gb, hiMapWin, loTiles, x + 7 - wx, scanlineNum - wy);
        }
        else if(bgEnable) {
            bgPixel = video_mapPixel(gb, hiMapBG, loTiles, (x + scx) % 256, (scanlineNum + scy) % 256);
        }
        uint8_t finalColour = video_paletteLookup(bgPixel, gb->mem.IO[IO_BackgroundPalette]);

        if(spriteEnable) {
            uint8_t obp[2] = { gb->mem.IO[IO_ObjectPalette0], gb->mem.IO[IO_ObjectPalette1] };
            struct GameboySprite const* sprites = gb->lcd.ScanlineSprites;

            for(unsigned int n = 0; n < gb->lcd.NumSprites; n += 1) {
                if(x + 8 >= sprites[n].x && x + 8 < sprites[n].x + 8) {
                    unsigned int tileX = x + 8 - sprites[n].x;
                    bool const mirrored = (sprites[n].attrs & 0x20);
                    uint8_t pixel = video_linePixel(sprites[n].pixels, mirrored? (7 - tileX) : tileX);

                    if(pixel) {
                        bool hasPriority = (sprites[n].attrs & 0x80) == 0;
                        if(finalColour == 0 || hasPriority) {
                            uint8_t palette = obp[(sprites[n].attrs & 0x10)? 1 : 0];
                            finalColour = video_paletteLookup(pixel, palette);
                        }
                        /* Only draw first non-zero sprite pixel */
                        break;
                    }
                }
            }
        }

        gb->lcd.Buffer[x][scanlineNum] = finalColour;
    }
}

Video update

<< function declarations (0 1 2 3 4) >>=
void video_update(struct Gameboy* gb);
<< per machine cycle updates (0 1 2 3 4) >>=
/* Video runs at 1 pixel per clock (4 per machine cycle) */
video_update(gb);
video_update(gb);
video_update(gb);
video_update(gb);
<< gpu functions (0 1 2) >>=
void video_update(struct Gameboy* gb) {
    /* Each scanline takes 456 cycles to draw */
    uint8_t scanline = gb->lcd.FrameProgress / 456;
    assert(scanline <= 154);

    bool lcdOn = (gb->mem.IO[IO_LCDControl] & 0x80);
    gb->mem.IO[IO_LCDY] = scanline;

    uint8_t stat = gb->mem.IO[IO_LCDStat];

    /* Handle LCDY compare - a bit in the STAT register is set and optionally an
     * interrupt is fired when the LCDY == LCDYCompare */
    if(lcdOn) {
        if(scanline == gb->mem.IO[IO_LCDYCompare]) {
            if((stat & 0x04) == 0) {
                /* Set coincidence bit */
                gb->mem.IO[IO_LCDStat] |= 0x04;
                /* Fire interrupt if enabled */
                if(stat & 0x40) {
                    gb->mem.IO[IO_InterruptFlag] |= Interrupt_LCDC;
                }
            }
        }
        else {
            gb->mem.IO[IO_LCDStat] &= ~0x04;
        }
    }

    unsigned int lcdMode = (stat & 0x03);

    /* The last 10 scanlines are the VBlank - nothing is actually drawn */
    if(scanline >= 144) {
        if(lcdMode != 1) {
            /* Entering VBlank - trigger interrupt */
            gb->mem.IO[IO_LCDStat] = (stat & ~0x03) | 1;
            gb->mem.IO[IO_InterruptFlag] |= Interrupt_VBlank;
            if((gb->mem.IO[IO_LCDControl] & 0x80) == 0) {
                memset(gb->lcd.Buffer, 0, sizeof(gb->lcd.Buffer));
            }
            gb->lcd.NewFrame = true;
        }
    }
    else {
        /* During each scanline the LCD mode cycles through 3 states:
         * 92clks - mode 2 (reading OAM)
         * 160clks - mode 3 (reading OAM & VRAM)
         * 204clks - mode 0 (HBlank)
         * = 456clks total
         */
        uint64_t scanlineProgress = gb->lcd.FrameProgress % 456;

        if(scanlineProgress < 92) {
            if(lcdMode != 2) { /* Entering mode 2 */
                gb->mem.IO[IO_LCDStat] = (stat & ~0x03) | 2;
                if(stat & 0x20) {
                    gb->mem.IO[IO_InterruptFlag] |= Interrupt_LCDC;
                }
                video_readSprites(gb, scanline);
                gb->lcd.CurX = 0;
            }
        }
        else if(scanlineProgress < (160 + 92)) {
            gb->mem.IO[IO_LCDStat] = (stat & ~0x03) | 3;
            if(lcdOn)
            {
                for(; gb->lcd.CurX < (scanlineProgress - 92); gb->lcd.CurX += 1) {
                    video_drawPixel(gb, scanline, gb->lcd.CurX);
                }
            }
        }
        else {
            if(lcdMode!= 0) { /* Entering mode 0 */
                if(lcdOn) {
                    for(; gb->lcd.CurX < 160; gb->lcd.CurX += 1) {
                        video_drawPixel(gb, scanline, gb->lcd.CurX);
                    }
                }
                gb->mem.IO[IO_LCDStat] = (stat & ~0x03);
                if(stat & 0x08) {
                    gb->mem.IO[IO_InterruptFlag] |= Interrupt_LCDC;
                }
            }
        }
    }

    gb->lcd.FrameProgress = (gb->lcd.FrameProgress + 1) % 70224;
}

Input

The eight buttons on the Gameboy each have a bit

<< buttons enum >>=
enum {
    Button_Down = 0x80,
    Button_Up = 0x40,
    Button_Left = 0x20,
    Button_Right = 0x10,

    Button_Start = 0x08,
    Button_Select = 0x04,
    Button_B = 0x02,
    Button_A = 0x01
};

A bitfield in the Gameboy structure contains which buttons are currently pressed.

<< state (0 1 2 3 4 5 6 7 8) >>=
struct {
    uint8_t Pressed;
} buttons;

This bitfield is updated by two functions:

<< function declarations (0 1 2 3 4) >>=
void input_setUp(struct Gameboy*, int button);
void input_setDown(struct Gameboy*, int button);

The function to mark a button as pressed raises an interrupt.

<< input set down >>=
void input_setDown(struct Gameboy* gb, int button)
{
    if((gb->buttons.Pressed & button) == 0) {
        gb->buttons.Pressed |= button;
        gb->mem.IO[IO_InterruptFlag] |= Interrupt_Joypad;
    }
}

There is no interrupt when a button is released

<< input set up >>=
void input_setUp(struct Gameboy* gb, int button)
{
    gb->buttons.Pressed &= ~button;
}

The joypad register at address 0xFF00 is used by the game to query which buttons are pressed.

<< input update function >>=
void input_update(struct Gameboy* gb)
{
    << input update >>
}

The button bits provided to the software on the Gameboy are active-low

<< input update (0 1 2) >>=
uint8_t invButtons = ~gb->buttons.Pressed;

The software running on the Gameboy can only check four of the eight buttons at once. Bits 4 & 5 in the joypad register determine which subset of buttons are checked; bit 5 requests the directional buttons, bit 4 the others (A, B, Start, Select).

<< input update (0 1 2) >>=
uint8_t joyReg = gb->mem.IO[IO_Joypad];
if((joyReg & 0x20) != 0) { /* Directional keys */
    gb->mem.IO[IO_Joypad] = ((joyReg & 0xF0) | ((invButtons >> 4u) & 0x0F));
}
else if((joyReg & 0x10) != 0) { /* Buttons */
    gb->mem.IO[IO_Joypad] = ((joyReg & 0xF0) | (invButtons & 0x0F));
}

Additionally setting both bit 4 & 5 allow detection of the type of Gameboy hardware.

<< input update (0 1 2) >>=
else if(joyReg == 3) { /* Model check - 0xFX == classic gameboy */
    gb->mem.IO[IO_Joypad] = 0xFF;
}
<< input functions >>=
<< input set up >>
<< input set down >>
<< input update function >>
<< input interface function >>

An external function is provided for users of the library to easily provide user input to the Gameboy. It just calls the internal input_setUp and input_setDown functions.

<< public function declarations (0 1 2 3 4) >>=
void gameboy_setButtonState(struct Gameboy*, int button, bool down);
<< input interface function >>=
void gameboy_setButtonState(struct Gameboy* gb, int button, bool down)
{
    if(down) {
        input_setDown(gb, button);
    }
    else {
        input_setUp(gb, button);
    }
}

IO Register Index

CPU instruction index

gameboy.c

<< gameboy.c >>=
/* Copyright (C) 2014-2018 Thomas Spurden <thomas@spurden.name>
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <https://www.gnu.org/licenses/>.
 */
#include "gameboy.h"

#include <assert.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>

#ifndef NDEBUG
#define GBTRACE(gb, tp) do { if(gb->trace_fn) { gb->trace_fn(gb, tp); } } while(0)
#else
#define GBTRACE(gb, tp)
#endif

<< io register addresses >>
<< io registers unused bits >>
<< function declarations >>

<< clock functions >>
<< cpu functions >>
<< memory functions >>
<< dma update >>
<< gpu functions >>
<< input functions >>
<< gameboy functions >>

gameboy.h

<< gameboy.h >>=
#pragma once
/* Copyright (C) 2014-2018 Thomas Spurden <thomas@spurden.name>
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <https://www.gnu.org/licenses/>.
 */

#include <stdbool.h>
#include <stdint.h>
#include <time.h>

#ifdef __cplusplus
extern "C" {
#endif

#ifndef NDEBUG
struct gameboy_tp {
    enum {
        GAMEBOY_TP_MEM_READ,
        GAMEBOY_TP_MEM_WRITE,
        GAMEBOY_TP_INSTR_START,
        GAMEBOY_TP_DMA_INIT,
        GAMEBOY_TP_DMA_START,
        GAMEBOY_TP_DMA,
    } point;
    union {
        struct {
            uint16_t addr;
        } mem_read;
        struct {
            uint16_t addr;
            uint8_t data;
        } mem_write;
        struct {
            uint8_t opcode;
        } instr_start;
        struct {
            uint16_t src;
        } dma;
    } u;
};
#endif

<< interrupt bits enum >>
<< mmu enum >>
<< buttons enum >>

<< gameboy state >>

<< public function declarations >>

#ifdef __cplusplus
}
#endif