GBBS Pro Message Database File Format

Overview

This document describes the message database file format used by GBBS Pro on Apple II computers. Instead of regular text files, GBBS Pro stores bulletin board messages and private email in a compressed, block-chained format optimized for the limited storage and memory constraints of 8-bit systems. The reason for the reverse engineering of the format is to make it easy to extract messages from these files on modern machines for nostalgia and archival purposes.

Key Features:

File Types:

Typical Usage:

This format was designed for efficiency on Apple II systems with limited disk space (typically 140KB floppy disks or early hard drives). The 7-bit compression and block-chained structure allowed BBSs to store hundreds of messages while maintaining reasonable access speeds.

File Format Variants

GBBS Pro message databases come in two formats, distinguished by byte 0 of the header:

Bulletin Board Format (Byte 0 = 0x01 or 0x02)

Used for public bulletin boards (B1, B2, B3, etc.)

Characteristics:

Structure:

Email Format (Byte 0 = 0x04)

Used for private email/mail databases (MAIL file)

Characteristics:

Structure:

Message Format (different from bulletins):

<from_user_id>
Subj : <subject>
From : <username> (#<user_id>)
Date : <timestamp>

<message body>

Note: No “To” line in the message data - the recipient is implicit (the user whose chain this is)

Detection: Check byte 0 of file header. If 0x04, use email format; otherwise use bulletin format.

File Structure

Header (0x00 - 0x07, 8 bytes) - MSGINFO Array

It was difficult to determine from dissecting a database file, so had to look at the ACOS source code file DISK.S for some of this. But what I found is below, including some references to the 6502 source’s function labels:

Bitmap Section

Directory Section

Each directory entry is 4 bytes:

Empty entries are marked with 00 00 00 00.

Important:

Data Block Section

Example File Layout (B1)

MSGINFO: [02 04 29 00 09 00 BC 05]
  Byte 0: 0x02 = 2 bitmap blocks
  Byte 1: 0x04 = 4 directory blocks
  Bytes 2-3: 0x0029 = 41 used blocks
  Bytes 4-5: 0x0009 = 9 messages
  Bytes 6-7: 0x05BC = 1468 (new message number)

File structure:
  0x000-0x007: Header (8 bytes)
  0x008-0x107: Bitmap (2 × 128 = 256 bytes)
  0x108-0x507: Directory (4 × 128 = 512 bytes, max 128 entries)
  0x508-EOF:   Data blocks (128 bytes each, 7-bit compressed)

Block number translation:
  Directory says "block 1" -> file offset 0x508 + ((1-1) × 128) = 0x508
  Directory says "block 5" -> file offset 0x508 + ((5-1) × 128) = 0x708

Message Data Area

7-bit compressed data organized in 128-byte blocks.

Structure

Block Structure (128 bytes each)

Message Deletion and Recovery

Deletion Process (from DISK.S DO_KILL):

  1. Directory entry (4 bytes) is zeroed out
  2. Each block in the chain is deallocated in the bitmap
  3. Data blocks are NOT modified - chain pointers remain intact
  4. Message content remains in blocks until overwritten

Crunch Process (from DISK.S DO_CNCH):

  1. Compacts directory by removing zero entries
  2. Moves valid entries forward to fill gaps
  3. Only touches directory - never modifies data blocks
  4. Writes compacted directory back to disk

Why Deleted Messages Are Recoverable:

Message Reading (from DISK.S RDMSG):

Self-Referencing Chain Pointers (block N -> block N):

Self-Reference as Sequential Continuation Pattern:

Chain Pointer Corruption Causes:

  1. Buffer reuse: BLKBUF2 not cleared between messages, old chain data remains
  2. Incomplete writes: Disk errors during block write operations
  3. Block reuse: Previously used blocks allocated without initialization
  4. Software bugs: Edge cases in message writing not properly handled

Orphaned Blocks:

Byte Offset vs Block Number

The directory stores both for flexibility:

Example from B1:

7-Bit Compression Algorithm

Overview

The compression works by storing only 7 bits per character in each byte, using the 8th bit (bit 7, the high bit) to construct an additional character. Every 7 bytes of compressed data encodes 8 characters, achieving ~12.5% compression.

Encoding Example: “PRESUMED”

The first 7 characters are encoded with their high bit used to construct the 8th character ‘D’:

Char | ASCII | Encoded | Binary      | Bit 7
-----|-------|---------|-------------|------
 P   | 0x50  | 0xD0    | 11010000    |   1
 R   | 0x52  | 0x52    | 01010010    |   0
 E   | 0x45  | 0x45    | 01000101    |   0
 S   | 0x53  | 0x53    | 01010011    |   0
 U   | 0x55  | 0xD5    | 11010101    |   1
 M   | 0x4D  | 0x4D    | 01001101    |   0
 E   | 0x45  | 0x45    | 01000101    |   0

High bits collected in order: 1000100 = 0x44 = ‘D’

Result: 7 bytes encode “PRESUMED” (8 characters)

Encoding Process

  1. Take 8 characters to encode
  2. For the first 7 characters:
  3. The 8th character is reconstructed from the 7 high bits

Decoding Process (from ACOS assembly code)

The ACOS code (DISK.S, RDMSG function) decodes as follows:

For each byte in the compressed data:

  1. ASL - Shift byte left, bit 7 goes to carry flag
  2. ROR CHAR8 - Rotate CHAR8 right, carry goes into bit 7 of CHAR8
  3. LSR - Shift accumulator right (gives original byte with bit 7 cleared)
  4. Output the 7-bit character (bits 0-6 of original byte)
  5. DEC BYTE8 - Decrement counter (initialized to 6, counts down to -1)
  6. When counter reaches -1:

Note: The assembly code uses ROR (rotate right) which accumulates bits in reverse order, then shifts right before output to correct the order.

Python Implementation (in gbbsmsgtool.py)

def decode_7bit(compressed_data, stop_at_null=True):
    """Decode 7-bit compressed data to ASCII text."""
    result = []
    i = 0
    while i + 6 < len(compressed_data):
        bytes_7 = compressed_data[i:i+7]
        char8 = 0
        chars = []
        for b in bytes_7:
            char8 = (char8 >> 1) | ((b & 0x80) >> 0)
            chars.append(b & 0x7F)
        char8 = char8 >> 1
        chars.append(char8)
        
        for c in chars:
            if stop_at_null and c == 0:
                return bytes(result).decode('ascii', errors='replace').replace('\r', '\n')
            result.append(c)
        i += 7
    return bytes(result).decode('ascii', errors='replace').replace('\r', '\n')

Key points:

Special Characters

Statistics (from a bulletin file example I used to develop the tool with)

Block Chaining Example

Message starting at block 7:

Note: Block numbers in chains are relative to the data block area, not absolute file offsets.

Message Formats

Bulletin Board Message Format

Each decoded bulletin message follows this structure:

<Subject line>
<To line: user_id,username>
<From line: user_id,username (#user_id)>
Date : MM/DD/YY  HH:MM:SS [AM/PM]

<Message body text>

Example:

Re: Hahahaha
0,IronKnight (#5)
6,Shortround (#6)
Date : 01/04/88  08:36:52 PM

do you have to keep on doing that?

Field Descriptions:

Message Start Pattern (used by gbbsmsgtool.py to identify message starts):

Email Message Format

Each decoded email message follows this structure:

<from_user_id>
Subj : <subject>
From : <username> (#<user_id>)
Date : MM/DD/YY  HH:MM:SS [AM/PM]

<message body>

Example:

3
Subj : Test Message
From : The Wook (#3)
Date : 01/19/88 07:23:48 PM

Hey, just testing the mail system...

Field Descriptions:

Note: Email messages do NOT have a “To” line in the stored data. The recipient is implicit - it’s the user whose chain contains the message (directory entry N = User ID N). The gbbsmsgtool.py adds a “To:” line when extracting if the USERS file is provided.

Known Issues and Limitations

  1. Character decoding: Some characters may decode incorrectly due to:

  2. Bulletin format directory entries: Not all directory entries point to message starts. Some may point to:

    The gbbsmsgtool.py handles this by using the message start pattern to identify valid messages.

  3. Email format limitations:

  4. Self-referencing chain pointers: Some databases contain blocks where the chain pointer points to itself (block N -> block N). This appears to be a bug in ACOS where the current block number was written instead of the next block number. The gbbsmsgtool.py handles this by checking if the next sequential block is a valid continuation.

Unknown / To Be Determined

Header Fields

Message Linking

Message Extraction Algorithm

Working Decoder (Verified 2026-02-04)

Status: Core algorithm working and tested on multiple database files. Handles both bulletin and email formats with auto-detection.

The complete message extraction process:

  1. Auto-detect Format
  2. Read Directory Entries (0x88-0x107)
  3. Follow Block Chains
  4. Decode 7-bit Compression
  5. Handle Continuation Blocks with Date Headers
  6. Output Format

Key Implementation Details

Tools

gbbsmsgtool.py - Unified Message Database Tool

Consolidated tool for analyzing and extracting messages from GBBS Pro message database files.

Key Features:

Commands

analyze - Display database statistics and block allocation map

python3 gbbsmsgtool.py analyze <filename>

Shows: - File size and block statistics - Header (MSGINFO) breakdown - File layout (bitmap, directory, data offsets) - Block usage breakdown (allocated, deleted, orphaned, never used) - Visual block map with status indicators (bulletin format only)

Block map legend (bulletin format): - [H] = Active header (directory entry points here, message start) - [C] = Active chain (continuation of active message) - [D] = Deleted header (message start pattern, not in directory) - [d] = Deleted chain (continuation of deleted message) - [o] = Orphaned block (has data but no header or chain) - [ ] = Unused (never used or zeroed out)

Block breakdown (bulletin format):

extract - Extract messages from database

python3 gbbsmsgtool.py extract <filename> <type> [options]

Required - specify extraction type:

Optional flags:

File Protection: By default, the tool will abort with an error if output files already exist. Use --force to overwrite existing files.

Examples:

# Extract active messages to stdout
python3 gbbsmsgtool.py extract B5 --active

# Extract all types to directory
python3 gbbsmsgtool.py extract B5 --all --output-dir B5_messages

# Extract only deleted messages
python3 gbbsmsgtool.py extract B5 --deleted --output-dir B5_deleted

# Extract email with user names
python3 gbbsmsgtool.py extract MAIL --active --users USERS --output-dir MAIL_messages

# Force overwrite existing files
python3 gbbsmsgtool.py extract B5 --active --output-dir B5_messages --force

USERS File Support

Optional USERS file can be provided to display recipient names for email messages.

USERS File Format (standard GBBS Pro, may vary if modified by sysops):

Record structure:

The tool only reads the Full_name field for display purposes.

Email Message Output with USERS file:

To: Drone (#1)
3
Subj : Test Message
From : The Wook (#3)
Date : 01/19/88 07:23:48 PM

Message body...

Email Message Output without USERS file:

To: User ID 1 (#1)
3
Subj : Test Message
From : The Wook (#3)
Date : 01/19/88 07:23:48 PM

Message body...

Output Format

Stdout mode: Messages are written to stdout with separators between types when using --all.

Directory mode: Messages are written as individual files:

File timestamps are set to the ‘Date:’ timestamp from the message header when available.

Message Type Categories

Active Messages:

Deleted Messages: (Bulletin format only) Messages that have been removed from the directory but still have their header block (containing message start pattern) intact. These are complete messages that can be fully reconstructed by following their block chains through unused space.

Orphaned Blocks: (Bulletin format only) Data blocks that contain readable content but lack a message header. These are fragments from:

Orphaned blocks are extracted by following their chain pointers as far as possible through unused space.

Never Used: Blocks that are all nulls or contain minimal data (< 10 non-null bytes). These blocks have never been written to or were explicitly zeroed.

Extraction Algorithm

Bulletin Format - Active Messages:

  1. Read directory entries
  2. For each valid entry, follow block chain pointers (bytes 126-127)
  3. Decode 7-bit compressed data from each block
  4. Stop at null terminator or end of chain
  5. Number by directory entry (preserves chronological order)

Bulletin Format - Deleted Messages:

  1. Scan unused blocks for message start pattern (subject/to/from/date)
  2. Follow block chains through unused space only
  3. Stop when chain enters allocated space or hits null terminator
  4. Sort by timestamp and number sequentially

Bulletin Format - Orphaned Blocks:

  1. Find unused blocks with data but no message start pattern
  2. Skip blocks already included in deleted message chains
  3. Follow block chains through unused space
  4. Number by starting block number
  5. Stop when chain enters allocated space or hits null terminator

Email Format - Active Messages:

  1. Read directory entries (entry N = User ID N)
  2. For each non-zero entry, follow block chain pointers
  3. Decode 7-bit compressed data from complete chain
  4. Split on EOT (0x04) to extract individual messages
  5. Each message is TO that user (implicit recipient)
  6. Sort all messages by date
  7. If USERS file provided, prepend “To: Name (#ID)” to each message

References

Most of the work was doing reverse engineering based on assumptions that I had of the file format. I had learned that there was the 7-bit compression back in the late 1980s from the author himself, and doing hex dumps of the files back then I had a good idea of what was going on. However, not all of the bytes were accounted for, so I had to look at a newsletter from back-in-the-day to confirm the compression format, and also I looked at the source code of ACOS and GBBS Pro to see how everything else was being done. I did also look at some ACOS tutorials on textfiles.com, but they don’t really explain what the msg() function is or how to completely use it.

Enjoy!

Brian J. Bernstein, February 2026