This document describes the message database file format used by GBBS Pro on Apple II computers. Instead of regular text files, GBBS Pro stores bulletin board messages and private email in a compressed, block-chained format optimized for the limited storage and memory constraints of 8-bit systems. The reason for the reverse engineering of the format is to make it easy to extract messages from these files on modern machines for nostalgia and archival purposes.
Key Features:
File Types:
Typical Usage:
This format was designed for efficiency on Apple II systems with limited disk space (typically 140KB floppy disks or early hard drives). The 7-bit compression and block-chained structure allowed BBSs to store hundreds of messages while maintaining reasonable access speeds.
GBBS Pro message databases come in two formats, distinguished by byte 0 of the header:
Used for public bulletin boards (B1, B2, B3, etc.)
Characteristics:
Structure:
Used for private email/mail databases (MAIL file)
Characteristics:
Structure:
Message Format (different from bulletins):
<from_user_id>
Subj : <subject>
From : <username> (#<user_id>)
Date : <timestamp>
<message body>
Note: No “To” line in the message data - the recipient is implicit (the user whose chain this is)
Detection: Check byte 0 of file header. If 0x04, use email format; otherwise use bulletin format.
It was difficult to determine from dissecting a database file, so had
to look at the ACOS source code file DISK.S for some of
this. But what I found is below, including some references to the 6502
source’s function labels:
Each directory entry is 4 bytes:
Empty entries are marked with 00 00 00 00.
Important:
MSGINFO: [02 04 29 00 09 00 BC 05]
Byte 0: 0x02 = 2 bitmap blocks
Byte 1: 0x04 = 4 directory blocks
Bytes 2-3: 0x0029 = 41 used blocks
Bytes 4-5: 0x0009 = 9 messages
Bytes 6-7: 0x05BC = 1468 (new message number)
File structure:
0x000-0x007: Header (8 bytes)
0x008-0x107: Bitmap (2 × 128 = 256 bytes)
0x108-0x507: Directory (4 × 128 = 512 bytes, max 128 entries)
0x508-EOF: Data blocks (128 bytes each, 7-bit compressed)
Block number translation:
Directory says "block 1" -> file offset 0x508 + ((1-1) × 128) = 0x508
Directory says "block 5" -> file offset 0x508 + ((5-1) × 128) = 0x708
7-bit compressed data organized in 128-byte blocks.
0x0000 = end of message (no continuation)Deletion Process (from DISK.S DO_KILL):
Crunch Process (from DISK.S DO_CNCH):
Why Deleted Messages Are Recoverable:
Message Reading (from DISK.S RDMSG):
Self-Referencing Chain Pointers (block N -> block N):
Self-Reference as Sequential Continuation Pattern:
Chain Pointer Corruption Causes:
Orphaned Blocks:
The directory stores both for flexibility:
Example from B1:
The compression works by storing only 7 bits per character in each byte, using the 8th bit (bit 7, the high bit) to construct an additional character. Every 7 bytes of compressed data encodes 8 characters, achieving ~12.5% compression.
The first 7 characters are encoded with their high bit used to construct the 8th character ‘D’:
Char | ASCII | Encoded | Binary | Bit 7
-----|-------|---------|-------------|------
P | 0x50 | 0xD0 | 11010000 | 1
R | 0x52 | 0x52 | 01010010 | 0
E | 0x45 | 0x45 | 01000101 | 0
S | 0x53 | 0x53 | 01010011 | 0
U | 0x55 | 0xD5 | 11010101 | 1
M | 0x4D | 0x4D | 01001101 | 0
E | 0x45 | 0x45 | 01000101 | 0
High bits collected in order: 1000100 = 0x44 = ‘D’
Result: 7 bytes encode “PRESUMED” (8 characters)
(yes, this example was copied from the GBBS newsletter - see references at bottom of this document)
The ACOS code (DISK.S, RDMSG function) decodes as follows:
For each byte in the compressed data:
ASL - Shift byte left, bit 7 goes to carry flagROR CHAR8 - Rotate CHAR8 right, carry goes into bit 7
of CHAR8LSR - Shift accumulator right (gives original byte with
bit 7 cleared)DEC BYTE8 - Decrement counter (initialized to 6, counts
down to -1)LSR CHAR8 - Shift CHAR8 right once more before
outputNote: The assembly code uses ROR (rotate right) which accumulates bits in reverse order, then shifts right before output to correct the order.
def decode_7bit(compressed_data, stop_at_null=True):
"""Decode 7-bit compressed data to ASCII text."""
result = []
i = 0
while i + 6 < len(compressed_data):
bytes_7 = compressed_data[i:i+7]
char8 = 0
chars = []
for b in bytes_7:
char8 = (char8 >> 1) | ((b & 0x80) >> 0)
chars.append(b & 0x7F)
char8 = char8 >> 1
chars.append(char8)
for c in chars:
if stop_at_null and c == 0:
return bytes(result).decode('ascii', errors='replace').replace('\r', '\n')
result.append(c)
i += 7
return bytes(result).decode('ascii', errors='replace').replace('\r', '\n')Key points:
stop_at_null parameter controls whether to stop at null
terminator (used for bulletin format)0x00: End of message0x0D (13): Carriage return (convert to \n
for Unix)Message starting at block 7:
09 00
(next block = 9)00 00
(end of message)Note: Block numbers in chains are relative to the data block area, not absolute file offsets.
Each decoded bulletin message follows this structure:
<Subject line>
<To line: user_id,username>
<From line: user_id,username (#user_id)>
Date : MM/DD/YY HH:MM:SS [AM/PM]
<Message body text>
Example:
Re: Hahahaha
0,IronKnight (#5)
6,Shortround (#6)
Date : 01/04/88 08:36:52 PM
do you have to keep on doing that?
Field Descriptions:
Message Start Pattern (used by gbbsmsgtool.py to identify message starts):
^\d+, (number, comma,
text)^\d+, (number, comma,
text)Each decoded email message follows this structure:
<from_user_id>
Subj : <subject>
From : <username> (#<user_id>)
Date : MM/DD/YY HH:MM:SS [AM/PM]
<message body>
Example:
3
Subj : Test Message
From : The Wook (#3)
Date : 01/19/88 07:23:48 PM
Hey, just testing the mail system...
Field Descriptions:
Note: Email messages do NOT have a “To” line in the stored data. The recipient is implicit - it’s the user whose chain contains the message (directory entry N = User ID N). The gbbsmsgtool.py adds a “To:” line when extracting if the USERS file is provided.
Character decoding: Some characters may decode incorrectly due to:
Bulletin format directory entries: Not all directory entries point to message starts. Some may point to:
The gbbsmsgtool.py handles this by using the message start pattern to identify valid messages.
Email format limitations:
Self-referencing chain pointers: Some databases contain blocks where the chain pointer points to itself (block N -> block N). This appears to be a bug in ACOS where the current block number was written instead of the next block number. The gbbsmsgtool.py handles this by checking if the next sequential block is a valid continuation.
Status: Core algorithm working and tested on multiple database files. Handles both bulletin and email formats with auto-detection.
The complete message extraction process:
[Self-referencing chain pointer detected] - block
points to itself, sequential continuation failed[Next segment missing] - chain points to allocated or
non-existent block[Chain loop detected] - chain forms a loop (not
self-reference)Consolidated tool for analyzing and extracting messages from GBBS Pro message database files.
Key Features:
analyze - Display database statistics and block allocation map
python3 gbbsmsgtool.py analyze <filename> [--json]Shows: - File size and block statistics - Header (MSGINFO) breakdown - File layout (bitmap, directory, data offsets) - Block usage breakdown (allocated, deleted, orphaned, never used) - Visual block map with status indicators (bulletin format only)
Block map legend (bulletin format): - [H] = Active
header (directory entry points here, message start) - [C] =
Active chain (continuation of active message) - [D] =
Deleted header (message start pattern, not in directory) -
[d] = Deleted chain (continuation of deleted message) -
[o] = Orphaned block (has data but no header or chain) -
[ ] = Unused (never used or zeroed out)
Block breakdown (bulletin format):
extract - Extract messages from database
python3 gbbsmsgtool.py extract <filename> <type> [options]Required - specify extraction type:
--active - Extract active messages--deleted - Extract deleted messages--orphaned - Extract orphaned blocks--all - Extract all three typesOptional flags:
--output-dir <path> - Write to directory instead
of stdout--users <users_file> - Path to USERS file (for
email recipient names and alias detection)--data2 <data2_file> - Path to DATA2 file (for
board names)--pretty - Format messages with readable headers
(default: raw)--json - Output in JSON format (ignores
--pretty)--force - Overwrite existing files (default: abort if
files exist)File Protection: By default, the tool will abort
with an error if output files already exist. Use --force to
overwrite existing files.
Examples:
# Extract active messages to stdout
python3 gbbsmsgtool.py extract B5 --active
# Extract all types to directory
python3 gbbsmsgtool.py extract B5 --all --output-dir B5_messages
# Extract only deleted messages
python3 gbbsmsgtool.py extract B5 --deleted --output-dir B5_deleted
# Extract email with user names
python3 gbbsmsgtool.py extract MAIL --active --users USERS --output-dir MAIL_messages
# Extract with board names and pretty formatting
python3 gbbsmsgtool.py extract B1 --all --data2 DATA2 --users USERS --pretty --output-dir B1_messages
# Force overwrite existing files
python3 gbbsmsgtool.py extract B5 --active --output-dir B5_messages --force
# Extract as JSON to stdout
python3 gbbsmsgtool.py extract B1 --all --json
# Extract as JSON to output directory (writes B1.json)
python3 gbbsmsgtool.py extract B1 --all --json --output-dir B1_messages
# Analyze in JSON format
python3 gbbsmsgtool.py analyze B1 --jsonOptional DATA2 file can be provided to display board names for bulletin board files.
DATA2 File Format (standard GBBS Pro):
Message base record structure: - Board name (null-terminated, ends with - Filename in format F:B#e.g., F:B1, F:B2) - Additional fields (access levels, limits, etc.)
The tool extracts the mapping of filenames (B1, B2, etc.) to board
names (System News, Public Base, etc.) and displays them in message
output when using --pretty format.
Optional USERS file can be provided to display recipient names for email messages and detect alias usage in bulletin messages.
USERS File Format (standard GBBS Pro, may vary if modified by sysops):
Record structure:
The tool reads the Full_name field for: - Email recipient identification - Bulletin message alias detection (when poster name doesn’t match USERS file)
Email Message Output with USERS file:
To: Drone (#1)
3
Subj : Test Message
From : The Wook (#3)
Date : 01/19/88 07:23:48 PM
Message body...
Email Message Output without USERS file:
To: User ID 1 (#1)
3
Subj : Test Message
From : The Wook (#3)
Date : 01/19/88 07:23:48 PM
Message body...
Raw format (default): Preserves original GBBS Pro message structure as stored in the database.
Pretty format (with --pretty flag):
Reformats message headers for readability: - Adds board name header
(when DATA2 file provided) - Converts comma-separated headers to labeled
format - Detects and displays alias usage (when USERS file provided)
Stdout mode: Messages are written to stdout with
separators between types when using --all.
Directory mode: Messages are written as individual files:
Msg-0001.txt, Msg-0002.txt, etc.
(numbered by directory entry order for bulletins, by date for
email)Deleted-0001.txt,
Deleted-0002.txt, etc. (numbered by timestamp order,
bulletin format only)Orphan-0033.txt,
Orphan-0034.txt, etc. (numbered by starting block number,
bulletin format only)File timestamps are set to the ‘Date:’ timestamp from the message header when available.
Pretty format examples:
Bulletin message with board name:
Board: System News (B1)
Subject: Hello World!!
To: All
From: Drone (#1)
Date : 01/08/88 05:23:05 PM
[message body]
Bulletin message with alias detection:
Board: System News (B1)
Subject: Thats it. I'm pissed
To: All
From: DRONE: THE OWNER AND SYSOP (#2-Shortround)
Date : 01/13/88 08:34:03 AM
[message body]
The format (#2-Shortround) indicates user #2
(Shortround) posted using the alias “DRONE: THE OWNER AND SYSOP”.
Customization Note: The pretty format parser is
based on standard GBBS Pro message headers. If your BBS uses customized
headers, modify the prettify_message() function in the tool
(see code comments for customization points).
JSON format (with --json flag): Outputs
structured JSON with parsed message fields. When --json is
used, --pretty is ignored since JSON always includes
structured fields.
For analyze --json, outputs database statistics:
{
"source_file": "B1",
"format": "bulletin",
"file_size": 8200,
"bitmap_blocks": 2,
"directory_blocks": 4,
"used_data_blocks": 41,
"message_count": 9,
"total_blocks": 58,
"active_count": 9,
"deleted_count": 3,
"orphaned_count": 1,
"block_breakdown": {
"active_header": 9,
"active_chain": 32,
"deleted_header": 3,
"deleted_chain": 12,
"orphaned": 1,
"unused": 1
},
"usage_percent": 70.7
}For extract --json, outputs analysis plus message
arrays:
{
"analysis": { ... },
"active": [
{
"number": 1,
"entry": 0,
"block": 54,
"subject": "Hello World!!",
"to": "All",
"to_id": 0,
"from": "Drone",
"from_id": 1,
"from_alias": null,
"date": "01/08/88 05:23:05 PM",
"board": "System News",
"body": "message text here...",
"raw": "full raw message text"
}
],
"deleted": [ ... ],
"orphaned": [ ... ]
}Each message object includes: - subject,
to, to_id, from,
from_id: Parsed header fields - from_alias:
Real username if poster used an alias (requires USERS file), otherwise
null - date: Parsed date string - board: Board
name (requires DATA2 file), otherwise null - body: Message
body text only - raw: Complete original message text as
stored in the database - number: Message sequence number -
block: Starting data block number - entry:
Directory entry number (active bulletin messages only)
When using --output-dir with --json, a
single JSON file is written to the output directory, named after the
input file (e.g., B1.json for input file
B1).
JSON Customization Note: The JSON field parser is
based on standard GBBS Pro message headers. If your BBS uses customized
headers (e.g., “From->” instead of “From :”), modify the
parse_message_fields() function in the tool (see
CUSTOMIZATION comments in the code).
Active Messages:
Deleted Messages: (Bulletin format only) Messages that have been removed from the directory but still have their header block (containing message start pattern) intact. These are complete messages that can be fully reconstructed by following their block chains through unused space.
Orphaned Blocks: (Bulletin format only) Data blocks that contain readable content but lack a message header. These are fragments from:
Orphaned blocks are extracted by following their chain pointers as far as possible through unused space.
Never Used: Blocks that are all nulls or contain minimal data (< 10 non-null bytes). These blocks have never been written to or were explicitly zeroed.
Bulletin Format - Active Messages:
Bulletin Format - Deleted Messages:
Bulletin Format - Orphaned Blocks:
Email Format - Active Messages:
Most of the work was doing reverse engineering based on assumptions that I had of the file format. I had learned that there was the 7-bit compression back in the late 1980s from the author himself, and doing hex dumps of the files back then I had a good idea of what was going on. However, not all of the bytes were accounted for, so I had to look at a newsletter from back-in-the-day to confirm the compression format, and also I looked at the source code of ACOS and GBBS Pro to see how everything else was being done. I did also look at some ACOS tutorials on textfiles.com, but they don’t really explain what the msg() function is or how to completely use it.
Enjoy!
Brian J. Bernstein, February 2026