Skip to content
This repository was archived by the owner on Jun 7, 2023. It is now read-only.

Data Model

rstml edited this page May 26, 2012 · 8 revisions

ElasticInbox uses 4 column families:

  • Accounts
  • MessageMetadata
  • IndexLabels
  • Counters

Complete schema can be found here: https://github.com/elasticinbox/elasticinbox/blob/master/config/elasticinbox.cml

RFC5322 compatible email address is used as a unique account identifier in all CFs.

Accounts CF

This CF contains account information such as labels.

Schema syntax:

create column family Accounts with 
	key_validation_class=UTF8Type and
	rows_cached=100000 and
	comment='Basic information about accounts';

Sample contents:

"Accounts" {
    "[email protected]" {
        "label:0"  : "all",
        "label:1"  : "inbox",
        "label:2"  : "drafts",
        ...
    }
}

MessageMetadata CF

MessageMetadata is a super column family. Each row contains all messages for the particular account identified by email address. This helps to store all messages for an account on the same Cassandra node and speedup read operation.

Each super column contains information about particular message identified and ordered by message UUID. Message UUID generated based on the message time.

Schema syntax:

create column family MessageMetadata with 
	column_type=Super and 
	key_validation_class=UTF8Type and
	comparator=TimeUUIDType and 
	subcomparator=BytesType and
	comment='Message metadata including headers, labels, markers, physical location, etc.';

Sample contents:

"MessageMetadata" {
    "[email protected]" {
        "550e8400-e29b-41d4-a716-446655440000" {
            "from"     : "[["EI Test","[email protected]"]]", # JSON encoded data
            "to"       : "[["Me","[email protected]"],[...]]",
            "subject"  : "Hello world!",
            "date"     : "12 March 2011",
            "location" : "blob://fs-local/container/[email protected]:753eef70-d5fb-14ce-abd4-040cced3bd7a",
            "l:1"      : true,   # Label ID
            "m:1"      : true,   # Marker ID
            ...
        }
    }
}

IndexLabels CF

IndexLabels is reverse index for labels. Each row uniquely identified by composite key of email address and label id. Contents of each label index are message UUIDs which belong to this label and sorted as TimeUUID.

Schema syntax:

create column family IndexLabels with
	key_validation_class=UTF8Type and
	comparator=TimeUUIDType and 
	rows_cached=100000 and
	comment='Message ID indexes grouped by labels and ordered by time';

Sample contents:

"IndexLabels" {
    "[email protected]:1" {
        "550e8400-e29b-41d4-a716-446655440000" : null,
        "892e8300-e29b-41d4-a716-446655440000" : null,
        "a0232400-e29b-41d4-a716-446655440000" : null,
        ...
    }
}

Purge Indexes

In addition to the normal label indexes, there's specific purge index type in IndexLabels CF. Purge index keeps track of deleted messages.

Each time message deleted, ElasticInbox will remove it from all label indexes and add entry to purge index. Purge index's column name is timestamp of delete event (in form of TimeUUID) and column value is message UUID.

Sample contents:

"IndexLabels" {
    "[email protected]:purge" {
        "550e8400-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
        "892e8300-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
        "a0232400-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
        ...
    }
}

Note that delete message operation does not remove message from MessageMetadata and Blob Store. This is done in order to 1) speedup delete operation, 2) provide restore mechanism in case of accidental deletes. Deleted messages can be purged by issuing relevant command from API.

Counters CF

Counters is a column family. Counters keep track of mailbox stats (potentially may be used for POP3 and IMAP serial ID generation).

Following stats are currently stored for each label:

  • Size in Bytes (only available for ALL_MAILS label)
  • Total message count
  • New message count

For total mailbox stats use label with ID = 0 (ALL_MAILS).

Schema syntax:

create column family Counters with
	comparator='CompositeType(UTF8Type,UTF8Type,UTF8Type)' and
	key_validation_class=UTF8Type and
	default_validation_class=CounterColumnType and
	replicate_on_write=true and
	rows_cached=300000 and
	comment='All counters for an account';

Sample contents:

"Counters" {
    "[email protected]" {
        "l:0:b" : 18239090,   # bytes, composite type, label ID, or other counter identified
        "l:0:m" : 394,        # messages
        "l:0:u" : 12          # unread
    }
}
Clone this wiki locally