PC Speaker Sound Format
All of the sound effects in the game were played through the PC speaker, a small device inside the computer case that was really only designed to play single beeps to provide audible feedback to a user. By rapidly varying the frequency (sometimes colloquially called “pitch”) of the beep, interesting sound effects could be produced. A total of 65 sound effects are included in the game.
Individual sound effects are packed together into sound files. Each sound file in the game contains up to 23 sound effects, and the group files contain three sound file entries in total:
|SOUNDS.MNI||PC speaker sound effects 1–23.|
|SOUNDS2.MNI||PC speaker sound effects 24–46.|
|SOUNDS3.MNI||PC speaker sound effects 47–65. Contains silence data for effects 66–69, which are loaded (but never used) by the game.|
Note: When all three sound files are loaded into the game’s memory, the sound effect numbers are 1-indexed. Whenever a sound file is analyzed directly in this document, the sound effect numbers are 0-indexed.
Some sources call these Inverse Frequency Sound Format files because the data words are inversely related to the frequency of the tone that is played. The format was most likely designed by id Software, first appearing in both Commander Keen in Invasion of the Vorticons for Apogee and Shadow Knights for Softdisk. This format, and variations built from it, was used in Apogee and Softdisk games throughout the early 1990s.
Sound files are noteworthy because they are among the few file formats in the game that have a formal header structure. This is required because each sound file contains many different sound effects, each of which has a different size.
The file begins with this header:
|0h||byte||The string |
|4h||word||Ostensibly the total size of the file, in bytes. In practice, all of the sound files contain some amount of data past this boundary that is not normally accessed.|
|6h||word||The number of entries in the file’s sound effect table.|
|8h||word||Unknown purpose; always 0032h.|
|Ah||byte||Six null bytes, which pad the header to a paragraph boundary.|
The header is really more of a formality than anything else. The game skips reading it entirely, and the total size value is incorrect at best and outright misleading at worst.
Sound Effect Table
Immediately following the file header, there is a 16-byte structure repeated once per sound effect:
|0h||word||Offset to the sound data relative to the beginning of the file, in bytes.|
|2h||byte||Priority value for this sound effect, in the range 0–255. New sounds will interrupt old sounds if the new priority is equal to or greater than the old priority.|
|3h||byte||Unknown purpose; always 08h. Not used by the game.|
|4h||byte||Name of the sound effect. Maximum 11 characters, plus a terminating null byte. Only SOUNDS.MNI contains meaningful names, the other files use |
The game makes some blind assumptions about the content and structure of the sound files. It assumes, without checking, that the 0th sound effect table entry is at offset 10h, and it reads exactly 23 table entries from every file. Since each sound file actually contains 24 table entries, the last one in each file is never used.
Note: Sound files intended for use with the game must have at least 23 valid table entries. Otherwise the sound loader will begin interpreting unintended data as offset values, which could potentially lead to playing arbitrary memory contents as sounds.
The actual data for each sound effect is stored as a variable-length sequence of little-endian words. The sound data starts at the offset specified in a sound effect table entry and continues until word value FFFFh is read.
The sound effects service runs at a frequency of 140 hertz (Hz), or 140 times each second. Each time the service runs, it consumes one word from the sound data and writes the value to the system’s Programmable Interval Timer. This timer is connected to the speaker inside the computer case, which emits an audible tone with a pitch related to the sound data value. Typical values encountered in the stock sound files range from 150–8,600 (in steps of 50) decimal, which translate roughly to 7,955–139 Hz, respectively. The rapid changes in pitch over time allows for interesting (albeit monophonic) sound effects to be generated.
Since the data values in the game’s sound files all fit in the range 150–8,600 in steps of 50, there are only 170 different values the game uses. These could’ve been encoded in bytes instead of words, halving the size of the sound data at the expense of requiring a multiplication during each run of the sound effects service. Granted, it would only save about 5,000 bytes overall.
If the value 0000h is read in the sound data, the speaker is silenced for that service cycle. If the value FFFFh is read, the speaker is silenced and the sound effects service goes dormant until another sound effect is started.
The simple reality is that the sound files seem to have been created with a sloppy tool.1 The accuracy of the file headers cannot be trusted and each file contains unreachable data beyond where any such data should logically occur. This section is a brief attempt to salvage and/or identify the data where possible.
The 23rd sound table entry is named JETSON. It cannot be readily determined if this is simply the phrase “jets on” or some kind of reference to The Jetsons television show. The game never reads this entry and the sound can never be heard during gameplay. This is what it sounds like:(Download MP3, AAC, or WAV.)
Following this, there are four slack bytes that are totally unreachable. These appear to be the first four bytes of JETSON repeated.
The 23rd sound table entry in this file is silence, encoded validly.
Following this, there are 950(!) unreachable slack bytes. These appear to be a mixture of silence, copies of other sound fragments, and junk. Here are the file offsets and analyses:
- B6Eh: 70 ms of silence followed by 330 ms of a 2.4 kHz tone. Not rendered here because it’s uninteresting and rather unpleasant to listen to.
- BDEh: Copy of sound effect 20.
- BF4h: Copy of sound effect 21.
- DF0h: Copy of sound effect 22.
- E0Ch: Silence, perhaps from sound effect 23.
- E10h: Copy of the tail end of sound effect 21.
- ED8h: Copy of sound effect 22.
- EF4h: Silence, perhaps from sound effect 23.
- EF8h: Copy of the tail end of sound effect 22.
- F0Ch: Silence, perhaps from sound effect 23.
- F10h: Extremely short (50 ms) descending tone. Similar data does not appear at any other point in any sound file. May be a remnant of an early version of a sound effect: (Download MP3, AAC, or WAV.)
- F20h: Silence.
The 19th–23rd sound table entries in this file are all silence, encoded validly.
Following this, there are 60 unreachable slack bytes. These contain a semi-regular pattern of 0000h and FFFFh words. These all appear to be pieces of silence – the 0000h words deactivate the speaker and the FFFFh words terminate the sound effect data. No other audible information appears in this range.
There’s a chance the software used to create the sound effects was Muse, an internal tool developed by id Software (possibly a solo effort by John Romero). The only evidence I have to support this theory is the mention of Muse in Tom Hall’s Doom Bible and the, uh, less than stellar opinion he had of it. ↩︎