-
Notifications
You must be signed in to change notification settings - Fork 5
SAT_DataLib : specifications
This wiki page presents the specification of the SAT DataLib binary data storage model. This document is still in construction!
The general purpose of the SAT_DataLib is to propose a framework for coding/storing binary data in the ArduSat experiments, and the specifications to decode that data once retrieved on earth. The point of using binary data is to reduce the consumption of data storage : using plain text like CSV would require more space than using raw binary values.
This library (and the corresponding data storage framework) have been designed to have the following specific capacities :
- store different kind of data in one single file : the point is not to store only one kind of message, but several in one single file, the different messages can be decoded one after the other. For instance, depending on your experiment, you might be interested by storing raw values from data periodically, then send some computed values of your own, then a serie of values or a text message. You have to be able to do that in one single file.
- consume as less data as possible
- enable the use of extensible structures, not fixed size messages
The general mechanism underlying the storage framework is to structure your data by messages, called "Packets" in the following. Different packet types are proposed, each has a specific purpose (store raw values from sensors, store data series, store text messages...). Each packet consists in one header that caracterizes some content, its format and its length. The length of the packet can be exactly computed from the content of the packet's header.
All the packet have the same generic structure. They begin with a HEADER CODE (1 byte). This header's first byte defines the packet's type. Then each type of packet has a different HEADER CONTENT. The HEADER CONTENT typically contains parameters values that describe the underlying struture of the following BODY. Then comes the BODY of the packet, whose structure depends on the packet type and the parameters stored in the header content.
Address | 0 | 1 | (HEADER_SIZE) |
---|---|---|---|
Content | HEADER_CODE | HEADER_CONTENT | BODY |
Depending on the HEADER_CODE, the following table specifies the kind of content the packet is designed to carry, and the size of the header.
HEADER_CODE | PACKET NAME, HEADER CONTENT | HEADER_SIZE(*) |
---|---|---|
0x23 ['#'] | “CHUNK” packet - Contains only raw values taken from the sensors, each value is assigned to a DATATYPE. The header contains a 2 byte block containing all datatypes. The length of the packet can be calculated from the datatypes and their corresponding length. | 3 |
0x21 ['!'] | “SERIE” packet - Contains a serie of values indexed by a key (time ?). | 5 |
0x55 ['U'] | “USER DEFINED” packet - Whatever fits your needs to put the values you'd like | 2 |
0x53 ['L'] | “LOG” packet - Contains a string (for logging purpose, or error messages). The headers consists in 1 byte that contains the length of the whole packet (CODE+HEADER+BODY). The body contains the string (no need for a '\0' tail char, as the length is given by the header). | 2 |
(*) in bytes, the header code byte is included in this number |
Address | 0 | 1-2 | 3-(LENGTH) |
---|---|---|---|
Content | 0x23 | DATATYPES | CONTENT |
The parameter "DATATYPES" corresponds to a 2 byte block and aggregates (OR) several "DATATYPE". Each "DATATYPE" is designed to be a unique identifier of a sensor. There are 16 possible unique identifiers (each corresponds to one bit in the 2 byte block), that can be aggregated (OR). The size of the data in the CONTENT and the underlying DATA TYPE is given by the following table. In order to calculate the full LENGTH of the packet, one needs to sum all the DATA SIZE corresponding to the bits that are up in the DATATYPES block.
DATATYPE LOW BYTE | DATATYPE HIGH BYTE | CONTENT / SENSOR | DATA SIZE | DATA STRUCTURE |
---|---|---|---|---|
0x01 (b00000001) | 0x00 (b00000000) | MS : milliseconds | 4 | UINT32 |
0x02 (b00000010) | 0x00 (b00000000) | Luminosity sensor 1, VISIBLE + IR | 4 | 2 * INT16 |
0x04 (b00000100) | 0x00 (b00000000) | Luminosity sensor 1, VISIBLE + IR | 4 | 2 * INT16 |
0x08 (b00001000) | 0x00 (b00000000) | Magnetometer X,Y,Z | 6 | 3 * INT16 |
0x10 (b00010000) | 0x00 (b00000000) | Temperature 1 | 2 | INT16 |
0x20 (b00100000) | 0x00 (b00000000) | Temperature 2 | 2 | INT16 |
0x40 (b01000000) | 0x00 (b00000000) | Temperature 3 | 2 | INT16 |
0x80 (b10000000) | 0x00 (b00000000) | Temperature 4 | 2 | INT16 |
0x00 (b00000000) | 0x01 (b00000001) | InfraTherm | 2 | INT16 |
0x00 (b00000000) | 0x02 (b00000010) | Accelerometer X,Y,Z | 6 | 3 * INT16 |
0x00 (b00000000) | 0x04 (b00000100) | Gyroscope X,Y,Z | 6 | 3 * INT16 |
0x00 (b00000000) | 0x08 (b00001000) | Geiger 1(*) | ??? | ??? |
0x00 (b00000000) | 0x10 (b00010000) | Geiger 2(*) | ??? | ??? |
0x00 (b00000000) | 0x20 (b00100000) | User defined block 1 | 5 | depending |
0x00 (b00000000) | 0x40 (b01000000) | User defined block 2 | 5 | depending |
0x00 (b00000000) | 0x80 (b10000000) | CRC 16(*) | 2 | UINT16 |
(*) not implemented yet |
Let's take an example. Let's say the DATATYPES block indicates 0x42 (addr 1), 0x01 (addr 2). In binary, this gives b00000110 b00000001. So that means the content is : Luminosity sensor 1, Luminosity sensor 2, InfraTherm. The body of the packet's body will contain 2INT16 + 2INT16 + INT16 = 10 bytes.
The values in the packet's body are always ordered as indicated by the table above (increasing datatype). In our example, that means the packet will look like :
Address | 0 | 1 | 2 | 3-4 | 4-5 | 6-7 | 8-9 | 10-11 |
---|---|---|---|---|---|---|---|---|
Content | 0x23 | 0x42 | 0x01 | LUM1 VISIBLE | LUM1 IR | LUM2 VISIBLE | LUM2 IR | INFRATHERM |
Be aware that the INT16, UINT16 or UINT32 values are stored by the Arduino in Little Endian. This means the least significant byte comes first in memory address.
Address | 0 | 1 | 2 | 3-4 | 5 - ... |
---|---|---|---|---|---|
Content | 0x21 | KEYSTRUCT | VALSTRUCT | COUNT | SERIE CONTENT |
The idea here is that a serie is defined by a set of couples (KEY,VAL). The size of this set is given by COUNT (2 bytes unsigned integer, between 0 and 65536). Both KEY and VAL can have specified UNIT types (you're free to set up what you need !). This UNIT type is specified by KEYSTRUCT and VALSTRUCT.
- KEYSTRUCT (or VALSTRUCT) LOWEST 4 BITS codes the type of the data unit (see table below).
- KEYSTRUCT (or VALSTRUCT) HIGHEST 4 BITS codes the dimensionality of the data unit (between 0 and 15).
UNIT | CODE (HEX) | CODE (BIN) | type of values / output |
---|---|---|---|
HEX8 | 0x00 | b00000000 | hexadecimal 1 byte |
HEX16 | 0x01 | b00000001 | hexadecimal 2 bytes |
HEX24 | 0x02 | b00000010 | hexadecimal 3 bytes |
HEX32 | 0x03 | b00000011 | hexadecimal 4 bytes |
INT8 | 0x04 | b00000100 | 1 byte signed integer |
INT16 | 0x05 | b00000101 | 2 bytes signed integer |
INT24 | 0x06 | b00000110 | 3 bytes signed integer |
INT32 | 0x07 | b00000111 | 4 bytes signed integer |
UINT8 | 0x08 | b00001000 | 1 byte unsigned integer |
UINT16 | 0x09 | b00001001 | 2 bytes unsigned integer |
UINT24 | 0x0A | b00001010 | 3 bytes unsigned integer |
UINT32 | 0x0B | b00001011 | 4 bytes unsigned integer |
0x0C | b00001100 | unused | |
STR | 0x0D | b00001101 | 4 chars |
0x0E | b00001110 | unused | |
FLOAT | 0x0F | b00001111 | float (4 bytes) |
NOTE: except for STR, the size of the unit can be calculated by operating (UNIT CODE & 0x03) + 1)
.
For instance, a data serie that would consist in 64 values of the magnetometer X,Y,Z (INT16), indexed by time measured in millis (UINT32):
- KEY STRUCT = 0x1B (dimensionality 1, unit = UINT32)
- VAL STRUCT = 0x35 (dimensionality 3, type = INT16)
- COUNT = 64 (0x4000 in hex, little endian)
Now, the content of the serie is coded as a sequence of (KEY,VAL) as specified by KEYSTRUCT and VALSTRUCT. For instance, let's take the example above (size of KEY is 4 bytes, size of VAL is 6 bytes). That means we have something like :
| Address | 0 | 1 - 4 | 5 | 9 | 15 | 19 | 25 | | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | | Content | 0x21 | 0x1B 0x35 0x40 0x00 | KEY 1 | VAL 1 | KEY 2 | VAL 2 | ... |
To be sure everything's clear, in our example, that means we would have :
Address | 0 | 1 - 4 | 5 | 9 | 11 | 13 | 15 | 19 | 21 | 23 | 25 |
---|---|---|---|---|---|---|---|---|---|---|---|
Content | 0x21 | 0x1B 0x35 0x40 0x00 | MS1 | MAGX1 | MAGY1 | MAGZ1 | MS2 | MAGX2 | MAGY2 | MAGZ2 | ... |
Address | 0 | 1 | 2 to LENGTH |
---|---|---|---|
Content | 0x55 | LENGTH | VALUE BLOCKS |
LENGTH is a 1 byte unsigned integer (UINT8) that indicates the total length of the packet between 0 and 255. That means that the length of VALUE BLOCKS is LENGTH minus 3 (length of the header).
The following BODY is made of VALUE BLOCKS. These blocks consist in a first byte coding the unit of the value, and the value itself.
This first byte is taken in the table of UNIT CODES and is consistent with the units used in the coding of series.
Except that, in this user defined packet, even if dimensionality is zero, it should be considered as at least 1 (or else, you wouldn't have put a block !).
Let's say, for instance, that you only want to send one temperature value (INT16) and a variance of this temperature (FLOAT). The corresponding user packet would read as follows :
Address | 0 | 1 | 2 | 3-4 | 5 | 6-9 |
---|---|---|---|---|---|---|
Content | 0x55 | 0x0A (length=10) | 0x05 (INT16) | TEMP | 0x0F (FLOAT) | VARIANCE |
The purpose of the LOG packets is just to send verbose ascii chars. The point may be to send some comment, or some error or debug message. You never know...
For format is pretty easy :
Address | 0 | 1 | 2 to LENGTH |
---|---|---|---|
Content | 0x53 | LENGTH | CHAR CONTENT |
LENGTH is a 1 byte unsigned integer (UINT8) that indicates the total length of the packet between 0 and 255. That means that the length of CHAR CONTENT is LENGTH minus 3 (length of the header).
You don't have to use an ending character at the end of your CHAR CONTENT, because the length of the content is defined by LENGTH.