[back]
zxian - a ZX Spectrum emulator (for Windows, written in C)
The computer above has helped me learn programming and play many video games throughout my childhood. It is a Romanian ZX Spectrum clone called Cobra, and was built by my uncle. It features a keyboard superior to both Spectrum and Spectrum+'s. A small Romanian-made TV ("Sport" model) served as the monitor. Programs were loaded from cassette tape through a Russian-made tape player.

I decided to emulate it and thus zxian was born.

zxian is a ZX Spectrum emulator written in C, using SDL2 for graphics, input/output, and audio. As with my other projects, the source is available for download. I built it with Visual Studio Community 2019 (with C++ core features installed).

I hadn't coded in C in a while and it was very enjoyable. After several years of projects mostly in assembly language, C feels like a smarter assembly language, with really useful macros. I considered C++, but found no benefit from object orientation for this project, so I wrote in pure C.

In version 18, I've developed a friendlier (than zxian's command-line interface) UI-based zxian starter called zxianui.

Version summary (64bit)

  • v20 - first 64bit version

Version summary (32bit)

  • v19 - optimizations: reduced host CPU usage by 75%-90% of previous version
  • v18 - added zxianui, a friendlier, UI-based zxian starter
  • v17 - added support for taking screenshots
  • v16 - added support for CRT scanlines effect. Fixed a slowdown issue when using accelerated renderer. Removed an overly eager optimization which impacted CPU-screen sync - this fixes games where graphics are updated multiple times per frame
  • v15 - added fullscreen support
  • v14 - sound improvements via variable sync. This fixes sustained tones (such as the BEEP command in BASIC)
  • v13 - optimizations: reduced host CPU usage by 85%
  • v12 - CPU microcode fix: R register behaviour; this unfreezes some games which rely on R for timing, like Defender of the Crown. CPU microcode fix: DD/FD prefix opcodes fall-through to unprefixed
  • v11 - added support for saving and loading state; added a UI which allows memory modification (pokes)
  • v10 - fixed an interrupt bug which allowed reentrancy; this fixes games such as Zynaps
  • v9 - fixed an overflow bug which deteriorated sound after 20 minutes
  • v8 - tape UI improvements: current block size and progress; sound improvements: configurability and parameter tweaks
  • v7 - added support for frame skipping. Improved sound quality and configurability
  • v6 - significantly improved audio quality. Added support for TAP tape images
  • v5 - video frame duration can now be specified in milliseconds. Rewrote the "read key status" code to fix a bug, which fixes games such as Manic Miner
  • v4 - support for "floating bus", whereby data read by ULA can "leak" into hardware ports that are not wired, such as 0xFF. Some games rely on this for timing, instead of an interrupt handler. This fixes games such as Cobra and Arkanoid
  • v3 - improved game compatibility by supplying a well-known value for the LSB during IM2 handler lookup; previous behaviour can be attained through a switch. This fixes games such as Dizzy 7
  • v2 - fixed an SNA loading bug caused by incorrect IFF2 initialization; it was causing some games to soft reset, and some to have corrupted graphics
  • v1 - initial release

Downloads (current version)

zxian v20 emulator (for Windows) - unzip anywhere and run zxian.exe. Read the provided text files for more information.
zxian v20 source - load the solution file in Visual Studio to build zxian yourself.


Downloads (older versions)

zxian v19 emulator (for Windows)
zxian v19 source

zxian v18 emulator (for Windows)
zxian v18 source

zxian v17 emulator (for Windows)
zxian v17 source

zxian v16 emulator (for Windows)
zxian v16 source

zxian v15 emulator (for Windows)
zxian v15 source

zxian v14 emulator (for Windows)
zxian v14 source

zxian v13 emulator (for Windows)
zxian v13 source

zxian v12 emulator (for Windows)
zxian v12 source

zxian v11 emulator (for Windows)
zxian v11 source

zxian v10 emulator (for Windows)
zxian v10 source

zxian v9 emulator (for Windows)
zxian v9 source

zxian v8 emulator (for Windows)
zxian v8 source

zxian v7 emulator (for Windows)
zxian v7 source

zxian v6 emulator (for Windows)
zxian v6 source

zxian v5 emulator (for Windows)
zxian v5 source

zxian v4 emulator (for Windows)
zxian v4 source

zxian v3 emulator (for Windows)
zxian v3 source

zxian v2 emulator (for Windows)
zxian v2 source

zxian v1 emulator (for Windows)
zxian v1 source


zxian supports the popular Kempston joystick, which is mapped to the arrow keys, with the left control key acting as fire.

As of the current version, only SNA snapshots and TAP tape images are supported. zxian is completely command-line driven. If started with no arguments, it will simply boot into Sinclair BASIC (ZX Spectrum's 48k ROM).

Other utilities include support for saving and loading state, and support for modifying memory (pokes).



Development

Development began with a 50Hz (the Spectrum was made in Britain) timer-invoked routine which read memory and drew pixels following Spectrum's questionable video memory layout. This was followed by writing the functionality for reading Z80 instruction opcodes, with support for all of Z80's opcode prefixes.

Then came seveal weeks of microcode development, where each Z80 instruction was implemented and tested. The Z80 CPU manual was a good resource for findings details on how each instruction behaved, what flags it affected, etc.

There is a large number of undocumented Z80 instructions, which I also implemented.

This was followed by support for interrupts and ZX Spectrum-specific areas such as hardware ports.


Seeing the image above was a great milestone.

I think that one difficulty with emulator development is that you have access to low-level tests (you can manually test each instruction individually) and to high-level tests (the Sinclair ROM, or a game) - but not much in between.

This means that you keep testing at a very low level, as you progress, but can only hope that when everything has been written, the ROM (or game) boots and works.

Sound development

I enjoyed developing the sound capability of zxian, because I hadn't done anything like that before. While the SDL2 library abstracts the audio hardware of the host computer, it still requires a constant stream of data (audio samples) to function. The difficulty is that these samples have to be provided in real time.

The challenge I faced was that the Z80 CPU finishes a video frame's worth (20ms) of computation in much less time than the 20ms. Additionally, how much real time the Z80 actually needs varies from host computer to host computer, and is therefore unknown and unreliable.

However, the amount of CPU clock cycles (or tstates) that the Z80 is allowed to perform during each 20ms interval is constant, irrespective of the host machine. That specific amount of clock cycles might be performed in 9ms on one host computer, and in 5ms on a much faster host computer.


My solution was to sample the state of the speaker at fixed clock cycle intervals during the Z80's active time and write them to a buffer, such that 20ms's worth of Z80 CPU time yielded 20ms worth of real-time audio data.

Conversely, the SDL audio layer read from a second buffer, which was full of audio samples accumulated during the last video frame (20ms).

At the end of each video frame, the two buffers are swapped - the read buffer becomes the write buffer and vice-versa.
NOTE: As of version 6, the above has been replaced by a different approach, based on a circular buffer and automatic resynchronization between read and write "heads".

From version 6, here is a mini-log of changes I've made to the sound module, to ultimately make significant improvements to sound quality:
  • two buffers (read and write), swapped, rudimentary, poorly-working synchronization
  • same as above, but oversample and then average, no improvement
  • switched to stereo, with improvement, but still very annoying stutters
  • circular buffer, reset buffer read and write "heads" to sync, significant improvement
  • circular buffer, don't feed SDL audio buffer to sync, regression - it sounds worse
  • circular buffer, single-way desync detection, rewind read buffer "head" to sync, small improvement
  • circular buffer, two-way desync detection, rewind or fast-forward read buffer "head" to sync, good in most games, but still noticeable in continuous-music games like manic miner
  • circular buffer, two-way desync detection, rewind or fast-forward read buffer "head" to sync, with video frame tstate adjustment (that is, CPU gets fewer tstates during video frames that went over their tstate allocation) - much better
  • same as above but with more frequent (but smaller) resynchronizations seem much less noticeable than rarer, larger resynchronizations
  • use per-scanline tstate compensation so CPU speed is closer to 100% speed further reduces sound desynchronization rate
  • on computers where the video frames last slightly longer than the target, frameskip becomes enabled; in these cases, automatically switching between a static and a dynamic sampling interval yields a smaller amount of resyncs
  • further advances were made by allowing configuration of many different parameters, which led to changes which decreased the number of resyncs
  • (in v14) variable sync, to address crackling sustained tones (such as the BEEP command in BASIC); see more detail in the section below

I've concluded on a resynchronization strategy whereby the read head:
  • Is moved forward by 1.66ms when falling further than 50ms behind the write head
  • Is moved backward by 1.66ms when approaches to closer than 3.33ms behind the write head
This strategy:
  • Minimizes the total resync amount (occurrences*length) per second
  • Keeps the resyncs small in length (resyncs become noticeable if longer than 2.5ms forward or backward)

Here is a demonstration of how the sound improved from version 5 to version 6:






Sound improvements in version 14

In version 14, I solved an issue which existed from the beginning: crackling sustained tones. Due to the above-described resynchronization strategy, sustained tones (such as the lead tone when loading from tape, or the BEEP command in BASIC) crackle, because of the frequent (albeit tiny) resynchronizations.

If I changed the resynchronization strategy to allow more latency (lag) offset by larger resynchronizations, sustained tones sounded good, but resynchronizations were very noticeable when they did happen.

In version 14, the solution I implemented varies between an eager strategy (low lag, tiny resynchronizations) and a lazy strategy (high lag, large resynchronizations). The discriminant is the shape of the sound.

Upon collection, sound samples are analyzed to see if they represent a sustained tone. In this case, rising or falling edges are expected to exhibit a fixed period (equally-spaced), and thus, frequency. When this occurs, zxian chooses the lazy strategy.

During intervals of varying frequencies and silence, zxian chooses the eager strategy.

In practice during gameplay, I've observed a selection which mixes eager and lazy strategies. Fortunately, the extra lag allowed by the lazy strategy is offset by eager resyncs during periods of silence, which are not perceived by the user.

Thus, version 14 loses almost no sound quality during mixed-frequency scenarios (such as regular gameplay) and gains significant sound quality during sustained tones.



Optimizations - reducing host CPU usage by 85% in version 13

Graphics rendering code had remained unchanged since v1, with pixel-by-pixel rendering - which I knew it was much slower than it could be.

Likewise, the microcode (the module which executes Z80 instructions) was designed to be clean from a "object oriented" perspective, at the expense of speed.

Here are some of the changes which took place, to obtain a 85% reduction (that is, v13 uses 7 times less CPU than v12) in host CPU usage:

  • microcode relying on static memory - Each CPU instruction was previously stored on the heap, via malloc/free. While clean from a design standpoint, this was much slower than simply using a single-instruction "storage area" in static memory, and reusing it for each successive instruction
  • removal of ROM fetch/decode stage caching - Previously, I thought I'd speed up CPU code by caching the result of the fetch/decode phase based on the assumption that ROM does not change. After some experiments, games use so few ROM calls that the cache check overhead was not worth it, so I removed it. I did experiment with applying the same caching to RAM, but the speed increase was not enough to warrant the significant increase in complexity (since RAM can change, clever programs modify themselves, and instructions can be "jumped into" partway)
  • render only necessary video frames - I noticed CPU usage was the same when staring at the "copyright screen" versus when playing a game. Thus, I came up with a way to determine whether the video frame had changed at all (border, video memory, flash), skipping rendering altogether if nothing had changed. During gameplay, I've observed between 8% and 15% of video frames didn't change
  • when solid, block-drawn border - Rely on four SDL rectangles to draw the entire screen's border when the border colour remains unchanged throughout an entire video frame. Previously, the border was drawn pixel-by-pixel
  • when not solid, border via memset - When the border DOES change throughout a video frame, draw it via memset, which is very fast
  • scanline duplication for zoom - Instead of drawing a little square for each pixel to satisfy zoom (e.g. a 3x3 pixel square when zoom is 3), I changed it to draw a single horizontal line whose pixels were zoomed horizontally - followed by a copy down as many lines as needed (e.g. 2 copied lines for zoom 3)
  • faster pixel drawing - I sped this up by removing all multiplications used in offset and colour calculations by pre-computing them in lookup tables
  • pixel caching - After realizing that the basic unit of graphics is the horizontal 8-pixel wide line represented via a byte in ZX Spectrum's video memory. Thus, I cached all 256 possible renderings of 8-pixel wide lines, for all possible foreground/background colour combinations, for all inverse/flash combinations, for the current zoom. This converted a costly bit-by-bit loop into a single memcpy block operation. I greatly reduced memory usage here by only caching what was needed, since games typically don't use more than 20% of all possible 8-pixel lines.


UI-based zxianui in version 18

To simplify starting zxian and loading a program, in version 18 I've developed zxianui.

This tiny executable resides in the same directory as zxian and lets the user manipulate zxian's simplest configuration parameters (e.g. tape/snapshot to load, zoom, display mode, etc.) via a UI.

I wrote zxianui in C, relying on no UI libraries. All calls are WIN32 API calls. I chose this approach for 2 reasons: first, I wanted no further dependencies (e.g. .NET Framework, GTK, etc.). Second, I wanted to learn the basics of pure WIN32 programming (windows, events, controls, etc.)



Optimizations - reducing host CPU usage by a further 75% to 90 in version 19

Version 19 contains performance optimizations around video rendering and microcode execution. The purpose was to further reduce the load on the host CPU.

Here are summaries of the changes:
  • selective rendering - Individual 8-pixel wide segments are now tracked individually and only rendered if they change. Similarly, parts of the border are tracked for changes. This represents a change from the previous way of fully rendering each scanline every frame.
  • drawing via inline ASM - Low-level rendering routines changed to rely on inline, hand-written assembly language portions. Some functions were re-written to "convince" the compiler to inline them. Certain variable and function argument usage was changed to allow the compiler to rely more on registers and less on memory.
  • microcode hot path reduction - I've combed through the fetch-execute code path and changed/removed many things such as inefficient loops, redundant operations, unnecessary data copying, lazier-than-could-be short circuits.


Above are histograms of zxian's performance over time. The same test was performed multiple times on each version of zxian, along with several unreleased (incremental) builds of v19.

One image shows performance at zoom 3 and the other at zoom 4. This means that the window height and width were 3 and 4 times, respectively, larger than ZX Spectrum's screen.

Here are the descriptions:
  • Dizzy 1 - Idling on the first screen of the game, immediately after the game begins. There are few moving sprites, few changing attributes.
  • Copyright Screen - Idling on the (c)1982 Sinclair Research Ltd; BASIC bootup message. No screen activity.
  • 1943 - Leave the airplane flying through 2 games without any input in this shoot'em up. There is a high amount of screen activity: scrolling, enemies, bullets, etc.
  • Skool Daze - Idling through the demo for just over a minute. There are many sprites, occasional scrolling.
  • Nipper 2 - Idling on Jack the Nipper 2's title screen. There are few, large sprites, music, attribute cycling on text.


Version 20 - leap to 64bit

In version 20, zxian leaves the 32bit world behind. Version 19 will be the last 32bit version.

With the occasion, I was curious to see how efficient zxian was, compared to other ZX Spectrum emulators.

Above are histograms of the host CPU usage of zxian and other emulators. The same test was performed multiple times on each emulator under test.

One image shows performance at zoom 3 and the other at zoom 4. This means that the window height and width were 3 and 4 times, respectively, larger than ZX Spectrum's screen.

Here are the descriptions:
  • Dizzy 1 - Idling on the first screen of the game, immediately after the game begins. There are few moving sprites, few changing attributes.
  • Copyright Screen - Idling on the (c)1982 Sinclair Research Ltd; BASIC bootup message. No screen activity.
  • 1943 - Leave the airplane flying through 2 games without any input in this shoot'em up. There is a high amount of screen activity: scrolling, enemies, bullets, etc.
  • Skool Daze - Idling through the demo for just over a minute. There are many sprites, occasional scrolling.
  • Nipper 2 - Idling on Jack the Nipper 2's title screen. There are few, large sprites, music, attribute cycling on text.
  • Attributes - A stress test whereby an infinite loop repeatedly writes pseudorandom bytes to the screen attributes area


Development screenshots