imessage-exporter: Reverse Engineering Apple's
typedstream Format
imessage-exporter’s goal is to provide the most comprehensive representation of iMessage data available. Message data is stored in a legacy format that appears to be a stream that represents objects.
Originally, imessage-exporter used a naive algorithm to extract text data from this blob and inferred other context from the surrounding table data. However, as Apple introduced new iMessage features1, additional information was stored only in this blob.
Since typedstream contains critical message content, imessage-exporter must understand the format in a platform-agnostic way. This post explores the reverse engineering process, revealing the structure and logic behind this proprietary binary serialization protocol.
In the iMessage database, message body data is stored in a BLOB column called attributedBody that appears to describe an instance of a NSMutableAttributedString.
If we save a blob into a file called sample and inspect it with the file program, it emits:
❯ file sample
sample: NeXT/Apple typedstream data, little endian, version 4, system 1000
The system recognizes this blob, so let’s examine its contents.
typedstream OriginsThe typedstream format is a binary serialization protocol designed for C and Objective-C data structures. It is primarily used by Apple’s Foundation framework, specifically within their internal implementation of NSArchiver and NSUnarchiver. While those classes are the public APIs, typedstream is the underlying implementation detail.
The format itself is not part of the official Foundation specification, meaning other implementations use different approaches. This also means Apple’s typedstream remains largely undocumented: it was never intended to be a cross-platform standard, rather it is Apple’s internal solution for object serialization. In fact, archived documentation makes no reference to the typedstream format or its implementation.
An example of a simple iMessage attributedBody looks like:
00000000 04 0b 73 74 72 65 61 6d 74 79 70 65 64 81 e8 03 |..streamtyped...|
00000010 84 01 40 84 84 84 19 4e 53 4d 75 74 61 62 6c 65 |[email protected]|
00000020 41 74 74 72 69 62 75 74 65 64 53 74 72 69 6e 67 |AttributedString|
00000030 00 84 84 12 4e 53 41 74 74 72 69 62 75 74 65 64 |....NSAttributed|
00000040 53 74 72 69 6e 67 00 84 84 08 4e 53 4f 62 6a 65 |String....NSObje|
00000050 63 74 00 85 92 84 84 84 0f 4e 53 4d 75 74 61 62 |ct.......NSMutab|
00000060 6c 65 53 74 72 69 6e 67 01 84 84 08 4e 53 53 74 |leString....NSSt|
00000070 72 69 6e 67 01 95 84 01 2b 0a 4e 6f 74 65 72 20 |ring....+.Noter |
00000080 74 65 73 74 86 84 02 69 49 01 0a 92 84 84 84 0c |test...iI.......|
00000090 4e 53 44 69 63 74 69 6f 6e 61 72 79 00 95 84 01 |NSDictionary....|
000000a0 69 01 92 84 98 98 1d 5f 5f 6b 49 4d 4d 65 73 73 |i......__kIMMess|
000000b0 61 67 65 50 61 72 74 41 74 74 72 69 62 75 74 65 |agePartAttribute|
000000c0 4e 61 6d 65 86 92 84 84 84 08 4e 53 4e 75 6d 62 |Name......NSNumb|
000000d0 65 72 00 84 84 07 4e 53 56 61 6c 75 65 00 95 84 |er....NSValue...|
000000e0 01 2a 84 9b 9b 00 86 86 86 |.*....... |
After extracting several samples, we can start to infer some patterns from this data.
The first 16 bytes are always identical and appear to be some sort of header describing the data structure:
0000000004 0b 73 74 72 65 61 6d 74 79 70 65 64 81e8 03 |. . streamtyped ... |
Breaking this down, we see 2 bytes 04 0b, 11 bytes representing the text streamtyped, and 3 bytes 81 e8 03. Interestingly, 0x0b is 11, which suggests that it describes the length of the data that follows it, leaving us with two unknowns to investigate: the first 0x04, and the final 81 e8 03.
Let’s re-examine the output of the file command:
NeXT/Apple typedstream data, little endian, version 4, system 1000
The first header byte, 0x04, matches the version emitted by file, and the last two bytes e8 03 form 1000 as u16, leaving 0x81 to solve for2.
Each legible text segment in the typedstream is preceded by a byte sequence like 84 xx, where xx varies depending on the data. Strings sometimes end in 0x00, suggesting null-termination3. For example:
0000001084 01 40 84 8484 19 4e 53 4d 75 74 61 62 6c 65 |..@... . NSMutable | 0000002041 74 74 72 69 62 75 74 65 64 53 74 72 69 6e 67 |AttributedString | 00000030 008484 12 4e 53 41 74 74 72 69 62 75 74 65 64 |... . NSAttributed | 0000004053 74 72 69 6e 67 00 |String . |
Before NSMutableAttributedString, we see 84 19, where 0x19 is 25, the length of the string. Similarly, NSAttributedString is prepended by 84 12, where again 0x12 matches the length of the data.
This pattern suggests that 0x84 signals the beginning of a new data block, and the subsequent byte represents how much data that block contains. Given these are class names, it appears to be some metadata block describing classes, but not the data that the classes contain.
Objects contain more than just class names; they also have fields that contain data owned by the object. The first class name we encounter–NSMutableAttributedString–is documented by Apple, and includes a field that contains the string data.
Following the pattern we saw above, let’s find the first 0x84 before the message’s text:
00000070 84 01 2b 0a 4e 6f 74 65 72 20 | ..+.Noter |
00000080 74 65 73 74 86 |test. |
0x84 denotes some type of new data, but the next byte 0x01 suggests we only need to read one more byte of data: 0x2b, which happens to be the char +. However, the subsequent byte 0a matches the length of the message text: "Noter test "4. The final byte, 0x86, is also repeated at the very end of the stream, suggesting it may indicate the end of some data.
Continuing from our prior assumptions, we see this short blob:
00000080 84 02 69 49 01 0A | ..iI.. |
The 84 02 suggests we need to read two bytes, 0x69 and 0x49, which happen to be i and I. Twice now the stream has a pattern where 0x84 seems to denote some data type, so let’s take a look at the NSMutableAttributedString documentation and see if there are any clues:
The primitive
setAttributes(_:range:)method sets attributes and values for a given range of characters, replacing any previous attributes and values for that range.
Under the changing attributes section, many of the methods receive a parameter like range: NSRange. Per the documentation, NSRange encodes a location integer and a length integer.
The i in integer stands out now, because the new data iI in the stream seems to indicate that we are meant to read a pair of integers. The bytes 01 0A follow, where 0A matches the message length and 01 appears to represent the starting character index, suggesting this sequence defines an NSRange spanning the complete message.
The fields for these range objects aren't documented, indicating they're likely private class members. The typedstream format’s deterministic packing ensures NSRanges consistently store their location and length values in sequence, probably as different integer types.
+ MysteryLet’s translate the stream of bytes into plain language given our current assumptions:
0000008084 02 69 49 01 0A |. . iI .. |
We can read this as “a new data type of two bytes, iI, followed by the packed field data of 1 and 10”. The data appears to be stored as u8 since it is only one byte, but that leaves an open question as to how the stream can store larger values5.
Ignoring that caveat for now, let’s apply that logic to the message text field:
0000007084 01 2b 0a 4e 6f 74 65 72 20 |. . + . Noter | 0000008074 65 73 74 86 |test . |
We can read this as “new data type of 1 byte, +, followed by the field data for a string of 0x0a length.”
There is one other instance of this pattern early on, right before the class names appear:
0000001084 01 40 84 84 84 19 4e 53 4d 75 74 61 62 6c 65|. . @ ....NSMutable|
Since the data following this initial 0x84 appears to describe the class names, the 1 byte 0x40 (@) may indicate the start of a new object instance.
0x84 indicates the start of some data blob0x86 indicates the end of some data blob0x81 indicates something that we don’t know yet0x84 denotes the length of the data blobi and I seem to indicate an integer value u8+ seems to indicate a string@ seems to indicate a new object instanceGiven these assumptions, let us mark the bytes in the stream for which we can make a reasonable guess at their meaning:
0000000004 0b 73 74 72 65 61 6d74 79 70 65 6481e8 03|..streamtyped...| 0000001084 01 4084 8484 19 4e53 4d 75 74 61 62 6c 65|..@....NSMutable| 0000002041 74 74 72 69 62 75 7465 64 53 74 72 69 6e 67|AttributedString| 00000030008484 12 4e 53 41 7474 72 69 62 75 74 65 64|....NSAttributed| 0000004053 74 72 69 6e 67 008484 08 4e 53 4f 62 6a 65|String....NSObje| 0000005063 74 0085 92 84 84840f 4e 53 4d 75 74 61 62|ct.......NSMutab| 000000606c 65 53 74 72 69 6e 6701 8484 08 4e 53 53 74|leString....NSSt| 0000007072 69 6e 6701 9584 012b 0a 4e 6f 74 65 72 20|ring....+.Noter| 0000008074 65 73 74 86 84 02 6949 01 0a92 84 8484 0c|test...iI.......| 000000904e 53 44 69 63 74 69 6f6e 61 72 79 009584 01|NSDictionary....| 000000a069 0192 84 98 981d 5f5f 6b 49 4d 4d 65 73 73|i......__kIMMess| 000000b061 67 65 50 61 72 74 4174 74 72 69 62 75 74 65|agePartAttribute| 000000c04e 61 6d 65 8692 84 8484 08 4e 53 4e 75 6d 62|Name......NSNumb| 000000d065 72 008484 07 4e 5356 61 6c 75 65 009584|er....NSValue...| 000000e0012a 84 9b 9b 00 86 8686|.*.......|
That’s a lot! Let’s focus in on these unknown sections.
There are several places where multiple 0x84s bytes appear together, possibly indicating a nested structure. Let’s isolate the first major block that appears to define the NSMutableAttributedString:
00000010 84 01 40 84 84 84 19 4e 53 4d 75 74 61 62 6c 65 |[email protected]|
00000020 41 74 74 72 69 62 75 74 65 64 53 74 72 69 6e 67 |AttributedString|
00000030 00 84 84 12 4e 53 41 74 74 72 69 62 75 74 65 64 |....NSAttributed|
00000040 53 74 72 69 6e 67 00 84 84 08 4e 53 4f 62 6a 65 |String....NSObje|
00000050 63 74 00 85 92 84 84 84 0f 4e 53 4d 75 74 61 62 |ct.......NSMutab|
00000060 6c 65 53 74 72 69 6e 67 01 84 84 08 4e 53 53 74 |leString....NSSt|
00000070 72 69 6e 67 01 95 84 01 2b 0a 4e 6f 74 65 72 20 |ring....+.Noter |
00000080 74 65 73 74 86 |test. |
Operating under the assumption that the first 3 bytes tell us we are looking at a new object instance, we see 3 0x84s proceeding 3 strings that look like class names: NSMutableAttributedString, NSAttributedString, and NSObject. Continuing further, we see 3 more 0x84s followed by 2 more class names, NSMutableString and NSString, and the interior string data.
These two blocks are separated by an 0x85. Given we assume 0x84 is a start byte and 0x86 is an end byte, it is likely that 0x85 has some special meaning. Checking the NSMutableAttributedString docs again, we see that NSMutableAttributedString inherits from NSAttributedString, and its only field contains a NSMutableString, which inherits from NSString.
We can infer that 0x85 serves as an end token, perhaps a terminator or something similar, as it seems to separate the class hierarchy from its field data. Let's add some color to denote the possible nested ranges of this block:
0000001084 01 40 84 84 84 19 4e 53 4d 75 74 61 62 6c 65 |. . . ... . NSMutable | 0000002041 74 74 72 69 62 75 74 65 64 53 74 72 69 6e 67 |AttributedString | 00000030 0084 84 12 4e 53 41 74 74 72 69 62 75 74 65 64 |... . NSAttributed | 0000004053 74 72 69 6e 67 0084 84 08 4e 53 4f 62 6a 65 |String ... . NSObje | 0000005063 74 00 85 9284 84 84 0f 4e 53 4d 75 74 61 62 |ct ...... . NSMutab | 000000606c 65 53 74 72 69 6e 67 0184 84 08 4e 53 53 74 |leString ... . NSSt | 0000007072 69 6e 67 01 9584 01 2b 0a 4e 6f 74 65 72 20|ring ....+.Noter| 0000008074 65 73 7486 |test. |
This leaves us with a few bits of missing data. Class names end with a single byte, here 0x01 or 0x00. Previously, the assumption was that names were null terminated, but perhaps this is class data or some type of version number. Immediately after the 0x85, there is a 0x92 that we are ignoring for now. Further, there is a 0x95 that seems to take the place of what should be a third part to the second class hierarchy:
└── NSMutableAttributedString (v0)
├── Superclass Chain
│ └── NSAttributedString (v0)
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
└── NSMutableString (v1)
├── Superclass Chain
│ └── NSString (v1)
│ └── 0x95
│
└── Fields
└── "Noter test "
We know that NSString extends NSObject, so even though no bytes explicitly indicate this relationship, it must be represented somehow. But how can we confirm that?
The 0x92 and 0x95 must mean something. They are only 3 values apart, and they appear in places where we would expect something else to occur: The first time we saw 84 84 84, it was prepended by 84 01 40 (@), which we are assuming represents the start of a new object instance. However, the second time we saw that same pattern, it was prepended by 0x92. We also know that NSString inherits from NSObject, but instead the stream contains 0x95.
There has to be a pattern here, so let’s look at the data every time we see a new 0x84 in the stream to see if we can discern anything. Beginning after the header and ending before the known encoded message content, we have the following:
| Index | Item |
|---|---|
| 1 | @ |
| 2 | NSMutableAttributedString |
| 3 | NSAttributedString |
| 4 | NSObject |
| 5 | NSMutableString |
| 6 | NSString |
Our intuition tells us that where we see 0x92, we should be 84 01 40 (the start of a a new object), and where we see 0x95, we should see the NSObject bytes. Just as 0x92 and 0x95 are 3 values apart, so are @ and NSObject in the order of streamed data. Could these bytes represent an index?
If we assume that 0x92 is the first index, we get the following:
| Index | Predicted Offset | Item |
|---|---|---|
| 1 | 0x92 |
@ |
| 2 | 0x93 |
NSMutableAttributedString |
| 3 | 0x94 |
NSAttributedString |
| 4 | 0x95 |
NSObject |
| 5 | 0x96 |
NSMutableString |
| 6 | 0x97 |
NSString |
The indexes 0x92 and 0x95 appear to point to our missing data! Let’s reassemble our object with this in mind:
└── NSMutableAttributedString (v0)
├── Superclass Chain
│ └── NSAttributedString (v0)
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
└── NSMutableString (v1)
├── Superclass Chain
│ └── NSString (v1)
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
└── "Noter test "
This indicates that 0x92 and larger values indicate the index in a cache of previously-seen data.
0x84 indicates the start of a data blob
0x84 denotes the length of the data blob0x85 indicates the end of a class inheritance chain0x86 indicates the end of a data blob0x81 indicates something that we don’t know yet, but seems to be related to integers0x92, and references are stored in the stream directlyi and I seem to indicate an integer value u8+ seems to indicate a string@ seems to indicate a new object instance* seems to indicate some unknown data typeu8 version informationApart from the header, these assumptions account for all remaining bytes in the stream. Let’s follow the logic, placing references where we expect them:
0000000004 0b 73 74 72 65 61 6d 74 79 70 65 64 81 e8 03|..streamtyped...| 0000001084 01 40 84 84 84 19 4e 53 4d 75 74 61 62 6c 65|..@ ....NSMutable| 0000002041 74 74 72 69 62 75 74 65 64 53 74 72 69 6e 67|AttributedString| 0000003000 84 84 12 4e 53 41 74 74 72 69 62 75 74 65 64|....NSAttributed| 0000004053 74 72 69 6e 67 00 8484 08 4e 53 4f 62 6a 65 |String....NSObje | 0000005063 74 00 8592 84 84 84 0f 4e 53 4d 75 74 61 62|ct. .. ....NSMutab| 000000606c 65 53 74 72 69 6e 67 01 84 84 08 4e 53 53 74|leString....NSSt| 0000007072 69 6e 67 01 9584 01 2b 0a 4e 6f 74 65 72 20|ring....+ .Noter| 0000008074 65 73 74 86 84 02 69 49 01 0a92 84 84 84 0c|test...iI... ....| 000000904e 53 44 69 63 74 69 6f 6e 61 72 79 0095 84 01 |NSDictionary.... | 000000a069 0192 8498 98 1d 5f 5f 6b 49 4d 4d 65 73 73|i .. ... .__kIMMess| 000000b061 67 65 50 61 72 74 41 74 74 72 69 62 75 74 65|agePartAttribute| 000000c04e 61 6d 65 8692 84 84 84 08 4e 53 4e 75 6d 62|Name.. ....NSNumb| 000000d065 72 00 84 84 07 4e 53 56 61 6c 75 65 00 95 84|er....NSValue...| 000000e001 2a 849b 9b 00 86 86 86|.*... ....|
In this structure, the referenced data is:
| Reference Index | Description | Symbol |
|---|---|---|
0x92 |
Type of data indicating a new object instance | 0x40 / "@" |
0x95 |
NSObject class name |
"NSObject" |
0x98 |
Type of data indicating a string | 0x2b / "+" |
0x9b |
Type of data indicating a single integer | 0x69 / "i" |
This gets us a little further, but there are two curiosities that violate this pattern. In two places, we see 0x84 followed by two references, not just one: at 0xa4 we see 84 98 98, and at 0xe3 we see 84 9b 9b. Putting those aside, let’s try and assemble what we can from the stream6.
Leveraging our assumptions, we can define a few concepts:
+, @, and the like seem to define primitive types like integers and strings. Since these define data that is packed together, let’s think of them as a group like Vec<Type>.
iI would be a type tag like [Int, Int].Vec<Vec<Type>>.0x81 and 0x84..0x86 seem to have specific meanings, indicating that we are meant to read the subsequent data in a specific way.In order to validate that our logic works, let’s apply it to the initial sample stream up to 0xa4, where our assumptions are violated. In order of appearance, the type tags thus far are:
| Index | Type Tag |
|---|---|
0x92 |
[@] |
0x93 |
[String("NSMutableAttributedString")] |
0x94 |
[String("NSAttributedString")] |
0x95 |
[String("NSObject")] |
0x96 |
[String("NSMutableString")] |
0x97 |
[String("NSString")] |
0x98 |
[+] |
0x99 |
[i, I] |
0x9a |
[String("NSDictionary")] |
0x9b |
[i] |
And the archivable objects stored in the stream:
| Order | Data |
|---|---|
| 1 | Top-level object container |
| 2 | Class { name: "NSMutableAttributedString", version: 0, ... } |
| 3 | Class { name: "NSAttributedString", version: 0, ... } |
| 4 | Class { name: "NSObject", version: 0} |
| 5 | Object(Class { name: "NSMutableString", version: 1, ... }, [String("Noter test")]) |
| 6 | Class { name: "NSMutableString", version: 1, ... } |
| 7 | Class { name: "NSString", version: 1, ... } |
| 8 | Object(Class { name: "NSDictionary", version: 0, ... }, [SignedInteger(1)]) |
| 9 | Class { name: "NSDictionary", version: 0, ... } |
One thing that stands out later in the stream are these bytes:
000000a069 0192 84 98 98 1d 5f 5f 6b 49 4d 4d 65 73 73|i.. . .. .__kIMMess|
Given our assumptions, we can read the first half of this slice:
0x92 refers to @ in the type tags table, indicating a new object0x84 indicates we want to start a new blob of dataThe remaining two 0x98 are pointers, but to what? In the type tag table, 0x98 points to +, but it wouldn’t make sense to have two type tags referenced together, as a type tag can already have multiple types within it7.
So far, we have some readable object instances and some class hierarchies defined in inheritance order. One thing that stands out is that 0x98, what we previously thought was a type tag reference, also aligns with an entry in the archivable objects table. If we instead number the output starting at 0x92, we get:
| Index | Predicted Data |
|---|---|
0x92 |
Top-level object container |
0x93 |
Class { name: "NSMutableAttributedString", version: 0, ... } |
0x94 |
Class { name: "NSAttributedString", version: 0, ... } |
0x95 |
Class { name: "NSObject", version: 0} |
0x96 |
Object(Class { name: "NSMutableString", version: 1 }, [String("Noter test")]) |
0x97 |
Class { name: "NSMutableString", version: 1, ... } |
0x98 |
Class { name: "NSString", version: 1, ... } |
0x99 |
Object(Class { name: "NSDictionary", version: 0, ... }, [SignedInteger(1)]) |
0x9a |
Class { name: "NSDictionary", version: 0, ... } |
This makes a lot more sense: the first 0x98 isn’t referencing the type tag [+], rather it is referencing the NSString class at 0x98. The subsequent 0x98, then, denotes the data associated with that class instance. Further, our 0x95 from earlier is not simply referencing the string "NSObject", rather it is referencing the specific NSObject class that contains 0x85 as its parent.
With this new assumption, we can now read the rest of the data in the slice:
0x92 refers to @ in the type tags table, indicating a new object0x84 indicates we want to start a new blob of data0x98 refers to the NSString class0x98 refers to +, indicating we should read the next data as a stringThus, we can read this slice as “new NSString object, whose field data is encoded as +.”
This logic also follows the pattern we see with string length bytes. Just as with the text after the first +, the bytes after the referenced + start with a byte that tells us the length of the string (here, 0x1d, or 29):
000000a069 019284 98 98 1d 5f 5f 6b 49 4d 4d 65 73 73 |i... .. . __kIMMess | 000000b061 67 65 50 61 72 74 41 74 74 72 69 62 75 74 65 |agePartAttribute | 000000c04e 61 6d 65 8692 84 84 84 08 4e 53 4e 75 6d 62|Name ......NSNumb|
Let’s assemble this object:
└── NSString (v1)
├── Superclass Chain
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
└── "__kIMMessagePartAttributeName"
The class comes from the data referenced by 0x98 in the archivable objects table. The string __kIMMessagePartAttributeName is the data owned by this instance of NSString, encoded in the stream as +.
Let’s isolate the last part of the stream that appears to define a NSDictionary object to see if our assumptions hold:
0000008074 65 73 74 86 84 02 69 49 01 0a92 84 84 84 0c |test...iI... ....| 00000090 4e 53 44 69 63 74 69 6f 6e 61 72 79 00 95 84 01 |NSDictionary....| 000000a0 69 0192 84 98 98 1d 5f 5f 6b 49 4d 4d 65 73 73 |i.. ....__kIMMess| 000000b0 61 67 65 50 61 72 74 41 74 74 72 69 62 75 74 65 |agePartAttribute| 000000c0 4e 61 6d 65 8692 84 84 84 08 4e 53 4e 75 6d 62 |Name.. ....NSNumb| 000000d0 65 72 00 84 84 07 4e 53 56 61 6c 75 65 00 95 84 |er....NSValue...| 000000e0 01 2a 84 9b 9b 00 86 86 86 |.*....... |
The three 0x92 bytes indicate that there should be three objects stored here. Given the provided class names, we can intuit that this slice probably stores a dictionary that looks like:
{
"__kIMMessagePartAttributeName": NSNumber(?)
}
Let’s try and translate the stream manually.
Here is the first new object definition:
0000008074 65 73 74 86 84 02 69 49 01 0a9284 84 84 0c |test...iI...... . | 000000904e 53 44 69 63 74 69 6f 6e 61 72 79 00 9584 01 |NSDictionary . .. . | 000000a069 0192 84 98 98 1d 5f 5f 6b 49 4d 4d 65 73 73|i ......__kIMMess|
And here is how our assumptions apply to that slice:
| Byte(s) | Component | Description |
|---|---|---|
0x84 |
Blob Indicator | Signals the start of a new data block |
0x0c |
Length Byte | Indicates the next 12 bytes contain relevant data |
NSDictionary |
Class Name | The 12 bytes encoding the class name |
0x00 |
Version Tag | Previously thought to be a null terminator |
0x95 |
Class | References NSObject in the archivable objects table |
0x84 |
Blob Indicator | Signals the start of a new data block |
0x01 |
Length Byte | Indicates the next byte contains relevant data |
0x69 |
Type Tag | Represents i (likely u8) |
0x01 |
Dictionary Size | Indicates a single key/value pair |
The final 0x01 may represent something else, but given we can look at the stream and see only a single key/value pair, for now let’s assume it represents the length field. Let’s visualize what we have so far:
└── NSDictionary (v0)
├── Superclass Chain
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
└── 0x01
This is the NSString we parsed earlier:
└── NSString (v1)
├── Superclass Chain
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
└── String("__kIMMessagePartAttributeName")
Given its location in the stream and its prefix of __k, let’s assume this is the first key in the dictionary and add it to the overall object:
└── NSDictionary (v0)
├── Superclass Chain
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
├── 0x01
└── NSString (v1)
├── Superclass Chain
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields:
└── String("__kIMMessagePartAttributeName")
Finally, let’s isolate the last slice of bytes we need to translate:
000000c04e 61 6d 65 869284 8484 08 4e 53 4e 75 6d 62 |Name..... . NSNumb | 000000d065 72 008484 07 4e 53 56 61 6c 75 65 00 9584 |er ... . NSValue ... | 000000e001 2a 84 9b 9b 00 8686 86|. * .......|
Most of this we have already seen: until address 0xdf, the stream encodes the inheritance hierarchy for NSNumber:
└── NSNumber (v0)
├── Superclass Chain
│ └── NSValue (v0)
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
└── ?
However, where we expect the field data, we see a byte pattern indicating a new type tag *. Checking the NSNumber documentation, we can start to chase these type tags down. The docs make an offhand mention of these tags:
- Your implementation of
objCTypemust return one of “c”, “C”, “s”, “S”, “i”, “I”, “l”, “L”, “q”, “Q”, “f”, and “d”. This is required for the other methods ofNSNumberto behave correctly.- Your subclass must override the accessor method that corresponds to the declared type—for example, if your implementation of
objCTypereturns “i”, you must overrideintValue.
This confirms our prior hypothesis that i (and probably I) tell the stream that the following data is an integer! However, the documentation for objCType doesn’t have much information about what these characters mean. Searching for site:apple.com "@encode", we find this archived documentation pointing to a book called The Objective-C Programming Language.
In that book, we land upon a table of enumerated type encodings. That table confirms that i and I do represent integers, signed and unsigned, respectively. It also tells us that this mystery * refers to “A character string (char *)”.
So far, we know that NSNumber extends NSValue, specifically dealing with numeric variants. We also know that NSValue uses objCType to represent the type it encapsulates internally. The objCType documentation states that it is:
A
Cstring containing theObjective-Ctype of the data contained in the value object.
Thus, it seems like the data that follows * should be read as a type tag describing the data that follows. Let’s isolate just that block:
000000d065 72 00 84 84 07 4e 53 56 61 6c 75 65 00 9584 |er....NSValue... | 000000e001 2a 84 9b 9b 00 8686 86|. * .. .....|
Translating this following our assumptions, we get the following result:
| Byte(s) | Component | Description |
|---|---|---|
0x84 |
Blob Indicator | Signals the start of a new data block |
0x01 |
Type Tag | * |
0x84 |
Blob Indicator | Signals the start of a new data block |
0x9b |
Type Tag Reference | i |
0x9b |
Type Tag Reference | i |
0x00 |
Signed Integer | 0 |
0x86 |
End of Object Indicator | Signals the end of the new object |
This sequence represents an NSNumber object containing an integer value. Like other object instances in the stream, the object’s data follows its class definition. Here, the first 0x9b represents the objCType of the NSValue instance. Because it is larger than 0x92, we look it up in the type tags table. This means our NSValue represents the objCType of i, or a signed integer.
The second 0x9b references a type tag in the type tags table, indicating that the next byte (0x00) should be read as a signed integer. Both of these values refer to the same type tag, i, but they use that type tag in different ways: the first one tells us what type of data the NSValue instance owns, and the second tells us how to read the next byte from the stream.
The object structure can be represented as:
└── NSNumber (v0)
├── Superclass Chain
│ └── NSValue (v0)
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
└── SignedInteger(0x00)
Combining all of what we have translated, the resultant dictionary looks like this:
└── NSDictionary (v0)
├── Superclass Chain
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
├── SignedInteger(0x01)
├── NSString (v1)
│ ├── Superclass Chain
│ │ └── NSObject (v0)
│ │ └── 0x85
│ │
│ └── Fields:
│ └── "__kIMMessagePartAttributeName"
└── NSNumber (v0)
├── Superclass Chain
│ └── NSValue (v0)
│ └── NSObject (v0)
│ └── 0x85
│
└── Fields
└── SignedInteger(0x00)
This confirms that the stream encodes a dictionary with a single key-value pair, as expected.
Through systematic analysis and validation of our assumptions against the data samples, we can now attempt to describe the typedstream specification’s core structure and behavior.
First, let’s update our assumptions based on what we have learned:
0x84 indicates the start of a data blob
0x84 denotes the length of the data blob0x92 or larger, it indicates a reference to a type tag stored in the type tags table0x85 indicates the end of a class inheritance chain0x86 indicates the end of a data blob0x81 indicates something related to integers0x81..0x860x92, and references are stored in the stream directlyu8 version informationBy collecting a large amount of attributedBody data, we can search for specific bytes. Since we predicted that indicators lie in the range 0x81..0x86, let’s search for the missing bytes to see what we find.
0x81Aside from the header, this byte showed up in several very long iMessages8:
0000007072 69 6e 67 01 95 84 01 2b8137 09 53 65 64 20|ring....+.7. Sed|
This sample contained a message with 2359 characters. The bytes following 0x81 are 0x37 0x09, which represent that value as a 16-bit integer. This also matches our header, which we know ends with the bytes for 1000, confirming that 0x81 indicates a 16-bit integer follows.
0000000004 0b 73 74 72 65 61 6d 74 79 70 65 6481e8 03 |..streamtyped... |
0x82 and 0x83 were not in any samples, but continuing from the pattern, we can infer that since 0x81 represents a 2-byte (u16/i16) integer, 0x82 and 0x83 also refer to different width numbers. Going back to the table defined in the Objective-C Runtime Programming Guide, we can use the type tags to infer the meaning of these indicators:
| Type Tag | Meaning | Inferred Indicator Byte |
|---|---|---|
i |
An int |
None |
s |
A short |
0x81 |
l |
A long``l is treated as a 32-bit quantity on 64-bit programs. |
0x82 |
q |
A long long |
0x83 |
The type tag determines whether the integer is signed or unsigned, while the presence and value of the indicator byte determines the integer’s width in the stream. We can use this to predict the byte representation we might see:
| Byte Sequence | Inferred Meaning | Expected Result |
|---|---|---|
0x69 0x00 |
Integer, single byte width | 0 as i8 |
0x69 0x81 0x37 0x09 |
Integer, two-byte width | 2359 as i16 |
0x49 0x81 0x37 0x09 |
Unsigned Integer, two-byte width | 2359 as u16 |
0x49 0x81 0xe8 0x03 |
Unsigned Integer, two-byte width | 1000 as u16 |
In order to confirm this, we would need more sample data that included values of these sizes.
Given this information, we can write a simple set of steps as a baseline for reading this format:
0x84 indicates we are creating a new piece of archivable data
0x92 or greater, we have a reference to a previously-seen object or class0x92, it describes the length of the object or class datatypedstream DataNow that we understand how to read a typedstream, we need to think about how to use and represent it in imessage-exporter.
Since the stream encodes Objective-C data structures, specifically instances of classes and the data they own, we can use these rules to yield objects out of the stream. imessage-exporter does this, then reads the objects as they are yielded to build a message’s components.
Since non-Apple platforms do not natively support Foundation data structures, we must define an alternative representation.
We can define an enum to represent the known type tags. Given the legacy documentation and our own observations, this definition should distill it to a single data structure9:
fn from_byte(byte: &u8) -> Self {
match byte {
0x40 => Self::Object,
0x2B => Self::Utf8String,
0x2A => Self::EmbeddedData,
0x66 => Self::Float,
0x64 => Self::Double,
0x63 | 0x69 | 0x6c | 0x71 | 0x73 => Self::SignedInt,
0x43 | 0x49 | 0x4c | 0x51 | 0x53 => Self::UnsignedInt,
other => Self::Unknown(*other),
}
}
Note that 0x2B, or +, is not mentioned in the legacy documentation, but we can infer what the type tag represents based on the context in the collected typedstream samples.
This data structure allows us to leverage Rust’s pattern matching to dispatch data as we encounter it in the stream. It also implies we need a separate data structure to represent the data stored after these type tags:
pub enum OutputData {
String(String),
SignedInteger(i64),
UnsignedInteger(u64),
Float(f32),
Double(f64),
Byte(u8),
Array(Vec<u8>),
Class(Class),
}
By combining these structures, we can implement the serialization logic in Rust as follows:
fn extract(data_type: Type) -> OutputData {
match data_type {
Type::SignedInt => OutputData::SignedInteger(read_signed_int()),
Type::UnsignedInt => OutputData::UnsignedInteger(read_unsigned_int()),
Type::Float => OutputData::Float(read_float()),
_ => ...
}
}
We can define a simple structure that represents the class data stored in the stream10:
pub struct Class {
pub name: String,
pub version: u64,
}
The only data stored in the stream is the class name and the class version. We can pattern match against these names to build any arbitrary structures later, if necessary.
We also need to encapsulate the data to store in the archivable object cache. This can be a class, an object, or an object’s field data11.
pub enum Archivable {
Object(Class, Vec<OutputData>),
Data(Vec<OutputData>),
Class(Class),
Type(Vec<Type>),
}
An object, for example, contains a specific class, followed by a vector of field data owned by that class’s instance. An example of this looks like:
Archivable::Object(
Class {
name: "NSDictionary".to_string(),
version: 0,
},
vec![OutputData::SignedInteger(2)],
)
Given what we learned about NSDictionary, the OutputData here refers to the number of key-value pairs in the dictionary, indicating the next 4 objects yielded from the stream are alternating key-value pairs belonging to this NSDictionary.
The crabstep crate provides a deserialization struct that yields objects and their data from a typedstream.
imessage-exporter body module leverages the foregoing data models to represent attributes of the message body in a data structure called BubbleComponent1.
Reverse engineering of Apple’s typedstream format reveals a sophisticated and elegantly designed binary serialization protocol. Through careful analysis of patterns, documentation fragments, and sample data, we've uncovered a format that efficiently encodes complex object hierarchies.
The resulting implementation in imessage-exporter demonstrates how this legacy format can be deserialized, enabling platform-agnostic access to iMessage data that was previously locked within Apple’s ecosystem. This work not only enables practical applications like message export and analysis but also serves as a case study in reverse engineering binary formats through pattern recognition and hypothesis testing.
Including Audio Message transcripts, text formatting, and edited message content. ↩︎
We discover the meaning towards the end of this article. ↩︎
This suggestion is disproved later in the article as we unravel the format. ↩︎
I misspelled “Another test” when creating this test message data, and it became canon in my testing. ↩︎
This caveat is resolved later. ↩︎
We will get back to them later. ↩︎
As we saw earlier with [iI]. ↩︎
Thanks, Davis! ↩︎
Examples of what this data looks like can be found in the module’s tests. ↩︎