-
Notifications
You must be signed in to change notification settings - Fork 23
GEOS Font Format
GEOS font files are VLIR files in which each available point size corresponds to a VLIR record. The 9 point font is stored in record 9, the 12 point font in record 12, and so on.
The data for each record starts with an 8-byte header, containing the font's metrics and offsets to a bitmap and an x-coordinate table. All multibyte values are little endian.
GEOS Font Header | ||
---|---|---|
$00 |
ascent | 1 |
$01-$02 |
row length | 2 |
$03 |
height | 1 |
$04-$05 |
offset to x-coord table | 2 |
$06-$07 |
offset to bitmap | 2 |
The bitmap usually starts at $CA
and consists of height rows of row length bytes each. The bitmap contains all the font's glyphs concatenated horizontally in ASCII order.
The x-coordinate table usually starts at $08
and contains exactly 97 entries, one for each ASCII character from $20
to $7F
(inclusive) plus the total width of the bitmap. The first entry gives the x-coordinate where the space character begins, the second entry gives the x-coordinate where the space character ends and the exclamation mark begins, and so on. The second-to-last entry indicates where the tilde character ends and the DEL character begins, and the last entry indicates where the DEL character ends. The width of each character is determined by the difference between its ending and starting x-coordinates.
X-Coordinate Table | ||
---|---|---|
$08-$09 |
start of space | 2 |
$0A-$0B |
end of space / start of ! | 2 |
... | ... | ... |
$C6-$C7 |
end of ~ / start of DEL | 2 |
$C8-$C9 |
end of DEL | 2 |
Some third-party GEOS fonts have the final x-coordinate set to an invalid value (less than the previous x-coordinate, or greater than row width × 8, the maximum possible x-coordinate). This normally does not cause a problem within GEOS, as the DEL character cannot be entered by the user and will never be displayed under normal circumstances. However, it is something to watch out for if, say, you're writing a font editor.
The GEOS file info block for a GEOS font file contains additional information. Starting at $61
there is a list of the lengths (in bytes) of each record. At $80
there is the font ID number, and starting at $82
there is a list of point sizes + the font ID × 64. Both lists have a maximum of 15 entries, and are terminated with a zero entry if they contain fewer than 15 entries. The structure of the second list results in a maximum point size of 63 and a maximum font ID of 1023.
GEOS Font Info Block | ||
---|---|---|
$61-$62 |
record length | 2 |
$63-$64 |
record length | 2 |
... | ... | ... |
$7B-$7C |
record length | 2 |
$7D-$7E |
record length | 2 |
$7F |
unused | 1 |
$80-$81 |
font ID | 2 |
$82-$83 |
font ID × 64 + point size | 2 |
$84-$85 |
font ID × 64 + point size | 2 |
... | ... | ... |
$9C-$9D |
font ID × 64 + point size | 2 |
$9E-$9F |
font ID × 64 + point size | 2 |
Mega fonts are larger fonts designed for desktop publishing that split the font bitmap between multiple VLIR records to avoid memory limitations. Only geoPublish supports these fonts natively.
Records in mega fonts use the same format as standard fonts. However, each record leaves all but 16 characters blank with a width of one pixel. Record 48 contains glyphs for characters $20-$2F
, record 49 for $30-$3F
, and so on. Finally, record 54 contains an x-coordinate table with proper values for all 96 characters but does not contain a bitmap.
Record | Glyphs Present |
---|---|
48 | $20-$2F |
49 | $30-$3F |
50 | $40-$4F |
51 | $50-$5F |
52 | $60-$6F |
53 | $70-$7F |
54 | all x-coords, no bitmap |
Bits'N'Picas detects mega fonts by the presence of a record 54 with a bitmap offset of the length of the record or greater.
Bits'N'Picas can generate GEOS fonts with additional tables for kerning and Unicode support. These fonts contain additional fields in the header at the start of a record for offsets to a kerning table and a UTF-8 master table (more on that later).
Extended GEOS Font Header | ||
---|---|---|
$00 |
ascent | 1 |
$01-$02 |
row width | 2 |
$03 |
height | 1 |
$04-$05 |
offset to x-coord table | 2 |
$06-$07 |
offset to bitmap | 2 |
$08-$09 |
extended header flags | 2 |
$0A-$0B |
offset to kerning table | 2 |
$0C-$0D |
offset to UTF-8 master table | 2 |
The flag word at $08-$09
determines whether this is an extended header and which additional tables are present. The header is an extended header and the rest of the flags are valid only if the most significant bit is set. In a standard GEOS font, this location will instead contain the first x-coordinate, which of course is unlikely to have the most significant bit set.
Bit | Mask | Value |
---|---|---|
15 | $8000 |
If set, the rest of the flags are valid. If cleared, the flag word is actually the first x-coordinate. |
14 | $4000 |
If set, this is an abbreviated font with only 64 glyphs (more on that later). If cleared, 96 glyphs are present. |
13 | $2000 |
If set, the font contains a kerning table, and $0A-$0B contains an offset to the kerning table. |
12 | $1000 |
If set, the font contains UTF-8 tables, and $0C-$0D contains an offset to the UTF-8 master table. |
11 | $0800 |
Reserved, should be zero. |
10 | $0400 |
Reserved, should be zero. |
... | ... | Reserved, should be zero. |
1 | $0002 |
Reserved, should be zero. |
0 | $0001 |
Reserved, should be zero. |
If both bits 15 and 13 of the flag word are set, the font contains a kerning table, and $0A-$0B
contains an offset to the kerning table. The kerning table contains exactly 96 entries, one for each ASCII character from $20
to $7F
, each of which consists of a (signed) x offset and an (unsigned) advance width. (The exclusion of a kerning table corresponds to an x offset of zero and an advance width of the difference between ending and starting x-coordinates.)
Kerning Table | ||
---|---|---|
+$00 |
x offset for space | 1 |
+$01 |
advance width for space | 1 |
+$02 |
x offset for ! | 1 |
+$03 |
advance width for ! | 1 |
... | ... | ... |
+$BC |
x offset for ~ | 1 |
+$BD |
advance width for ~ | 1 |
+$BE |
x offset for DEL | 1 |
+$BF |
advance width for DEL | 1 |
In mega fonts, the kerning table is present in all records if it is present at all. In records 48 to 53, the entries for the glyphs not in that record are set to zero. In record 54, all entries are populated with proper values.
Bits'N'Picas places the kerning table between the x-coordinate table and the bitmap.
If both bits 15 and 12 of the flag word are set, the font contains UTF-8 tables, and $0C-$0D
contains an offset to the UTF-8 master table. The UTF-8 tables are much more complicated compared to the other tables encountered so far, so will require much additional explanation.
In mega fonts, only record 54 contains UTF-8 tables.
Bits'N'Picas places the UTF-8 tables after the bitmap.
UTF-8 is a variable-length encoding of Unicode. In UTF-8, a single character is encoded using a sequence of anywhere from 1 to 4 bytes.
A single byte of $00
to $7F
encodes the characters U+0000
to U+007F
, also known as ASCII. These are handled as in standard GEOS fonts.
A two byte sequence, or a lead byte of $C0
to $DF
followed by a continuation byte of $80
to $BF
, encodes the characters U+0080
to U+07FF
. (U+0000
to U+007F
can also be encoded this way, however, this is called an overlong encoding and is forbidden in valid UTF-8.) The extended GEOS font format calls these low characters.
A three byte sequence, or a lead byte of $E0
to $EF
followed by two continuation bytes, encodes the characters U+0800
to U+FFFF
. (Again, U+0000
to U+07FF
can also be encoded this way, but it is not valid UTF-8.) The extended GEOS font format calls these high characters.
A four byte sequence, or a lead byte of $F0
to $F5
followed by three continuation bytes, encodes the characters U+010000
to U+10FFFF
. (Again, U+0000
to U+FFFF
can also be encoded this way, but it is not valid UTF-8. This encoding also covers U+110000
to U+17FFFF
, but those are not valid Unicode values at all.) These are called astral characters.
A lead byte of $F6
to $FF
is invalid.
The structure of the UTF-8 tables highly reflects this encoding scheme, hence their name.
A frequently-occurring structure in the UTF-8 tables is the abbreviated font pointer. This is a pointer to an abbreviated font in another VLIR record.
Abbreviated Font Pointer | ||
---|---|---|
+$00 |
VLIR record number | 1 |
+$01 |
sector offset | 1 |
+$02-$03 |
length | 2 |
An abbreviated font resembles a GEOS font but only contains 64 characters (its x-coordinate table will contain only 65 entries and its kerning table, if present, will contain only 64 entries). There can be many abbreviated fonts in a single VLIR record, so the pointer includes a sector offset where the abbreviated font begins within the record.
Bits'N'Picas places abbreviated fonts starting in record 125. The record number used decreases whenever the record is about to exceed 48 sectors or just under 12K.
There is also a complete list of all abbreviated font pointers stored in record 126. None of the records described here are listed in the info block.
The extended font header points to the UTF-8 master table. The UTF-8 master table consists of abbreviated font pointers or subtable offsets for every valid UTF-8 lead byte. If there are no glyphs in the font for the character range covered by a certain lead byte, that lead byte's entry will be set to zero. (Entries for $C0
, $C1
, and $F5
, which can only be used for overlong encodings, should also be set to zero.)
UTF-8 Master Table | ||
---|---|---|
+$00-$03 |
abbreviated font pointer for lead byte $C0 (U+0000-U+003F ) |
4 |
+$04-$07 |
abbreviated font pointer for lead byte $C1 (U+0040-U+007F ) |
4 |
+$08-$0B |
abbreviated font pointer for lead byte $C2 (U+0080-U+00BF ) |
4 |
+$0C-$0F |
abbreviated font pointer for lead byte $C3 (U+00C0-U+00FF ) |
4 |
... | ... | ... |
+$78-$7B |
abbreviated font pointer for lead byte $DE (U+0780-U+07BF ) |
4 |
+$7C-$7F |
abbreviated font pointer for lead byte $DF (U+07C0-U+07FF ) |
4 |
+$80-$81 |
offset to high subtable for lead byte $E0 (U+0000-U+0FFF ) |
2 |
+$82-$83 |
offset to high subtable for lead byte $E1 (U+1000-U+1FFF ) |
2 |
... | ... | ... |
+$9C-$9D |
offset to high subtable for lead byte $EE (U+E000-U+EFFF ) |
2 |
+$9E-$9F |
offset to high subtable for lead byte $EF (U+F000-U+FFFF ) |
2 |
+$A0-$A1 |
offset to astral subtable for lead byte $F0 (U+000000-U+03FFFF ) |
2 |
+$A2-$A3 |
offset to astral subtable for lead byte $F1 (U+040000-U+07FFFF ) |
2 |
+$A4-$A5 |
offset to astral subtable for lead byte $F2 (U+080000-U+0BFFFF ) |
2 |
+$A6-$A7 |
offset to astral subtable for lead byte $F3 (U+0C0000-U+0FFFFF ) |
2 |
+$A8-$A9 |
offset to astral subtable for lead byte $F4 (U+100000-U+13FFFF ) |
2 |
+$AA-$AB |
offset to astral subtable for lead byte $F5 (U+140000-U+17FFFF ) |
2 |
Entries for lead bytes for two-byte sequences will point to an abbreviated font. The continuation byte will determine which of the 64 characters in the abbreviated font will be rendered.
Entries for lead bytes for three-byte sequences will point to a high character subtable with 64 abbreviated font pointers. See below.
Entries for lead bytes for four-byte sequences will point to an astral character subtable with 64 offsets to a high character subtable. See below.
Master table entries for lead bytes for three-byte sequences will point to a high character subtable with 64 abbreviated font pointers, one for each possible continuation byte. The first continuation byte in the sequence will determine which of the 64 entries in the high character subtable to follow, and the second continuation byte will determine which of the 64 characters in the abbreviated font will be rendered. If there are no glyphs in the font for the character range covered by a certain continuation byte, that continuation byte's entry will be set to zero.
UTF-8 High Character Subtable | ||
---|---|---|
+$00-$03 |
abbreviated font pointer for continuation byte $80
|
4 |
+$04-$07 |
abbreviated font pointer for continuation byte $81
|
4 |
... | ... | ... |
+$F8-$FB |
abbreviated font pointer for continuation byte $BE
|
4 |
+$FC-$FF |
abbreviated font pointer for continuation byte $BF
|
4 |
Master table entries for lead bytes for four-byte sequences will point to an astral character subtable with 64 offsets to a high character subtable, one for each possible continuation byte. The first continuation byte in the sequence will determine which of the 64 entries in the astral character subtable to follow, the second continuation byte will determine which of the 64 entries in the high character subtable to follow, and the final continuation byte will determine which of the 64 characters in the abbreviated font will be rendered. (Whew.) If there are no glyphs in the font for the character range covered by a certain continuation byte, that continuation byte's entry will be set to zero.
UTF-8 Astral Character Subtable | ||
---|---|---|
+$00-$01 |
offset to high subtable for continuation byte $80
|
2 |
+$02-$03 |
offset to high subtable for continuation byte $81
|
2 |
... | ... | ... |
+$7C-$7D |
offset to high subtable for continuation byte $BE
|
2 |
+$7E-$7F |
offset to high subtable for continuation byte $BF
|
2 |
Lead Byte | Cont Byte | Cont Byte | Cont Byte | ||||
---|---|---|---|---|---|---|---|
$00-$7F |
char. in standard font | ||||||
$80-$BF |
invalid | ||||||
$C0-$DF |
AFP in master table | $80-$BF |
char. in abbr. font | ||||
$E0-$EF |
offset to HCS in master table | $80-$BF |
AFP in high subtable | $80-$BF |
char. in abbr. font | ||
$F0-$F5 |
offset to ACS in master table | $80-$BF |
offset to HCS in astral subtable | $80-$BF |
AFP in high subtable | $80-$BF |
char. in abbr. font |
$F6-$FF |
invalid |