Cassandra is an open source distributed database, it combines dynamic key/value and column oriented feature of Bigtable.
Features of Cassandra are:
- Flexible schema, no need to design schema first, it's very convenient to add or delete strings
- Support range search on keys
- High usability, extensible. The single node error will not affect the cluster.
We can think Cassandra's data model as a 4 or 5 dimensional Hash.
COLUMN
Columns is the smallest data unit in Cassandra, it is a 3 dimensional data type including name,value and timestamp.
{ // This is a column name: "逖é–寒的世界", value: "gpcuster@gmali.com", timestamp: 123456789 }
To be simple, we ignore timestamp here, we can treat columns as name/value pairs. Here name and value are both byte[].
SUPERCOLUMN
We can think SuperColumn as an array of Column, it has a name and a set of columns. The JSON format of a SuperColumn is :
{ // This is SuperColumn name: "逖é–寒的世界", // A set of Columns value: { street: {name: "street", value: "1234 x street", timestamp: 123456789}, city: {name: "city", value: "san francisco", timestamp: 123456789}, zip: {name: "zip", value: "94107", timestamp: 123456789}, } }
Column and SuperColumn both contains name/value pairs. The biggest difference is value in Column is a string while the value in SuperColumn is a Map of Column.
SuperColumn itself doesn't contain timestamp
COLUMNFAMILY
ColumnFamily is a structure containing many rows, we can think it as a Table in RDBMS. Each row contains a Key and related Columns supplied by the client.
UserProfile = { // This is a ColumnFamily phatduckk: { // This is the key of ColumnFamily // Column related to the key username: "gpcuster", email: "gpcuster@gmail.com", phone: "6666" }, // End of first row ieure: { // This is another key of ColumnFamily //Column related to the key username: "pengguo", email: "pengguo@live.com", phone: "888" age: "66" }, }
The type of ColumnFaily can be standard or super.
The above example is a standard ColumnFamily. Standard ColumnFamily contains a set of columns(not SuperColumn). Super ColumnFamily contains a set of SuperColumns, but it cannot contain Standard ColumnFamily.
AddressBook = { // This is a super ColumnFamily phatduckk: { // key friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"}, John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"}, Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"}, Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"}, Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"}, ... }, // row ending ieure: { // key joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"}, William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"}, }, }
KEYSPACE
All ColumnFamily belong to one Keyspace, usually one application will have only one Keyspace.
Simple test
cassandra> set Keyspace1.Standard1['jsmith']['first'] = 'John' Value inserted. cassandra> set Keyspace1.Standard1['jsmith']['last'] = 'Smith' Value inserted. cassandra> set Keyspace1.Standard1['jsmith']['age'] = '42' Value inserted.
After executing above commands, Cassandra now contains 3 records. The meaning of the inserted data is:
Next we do a search operation:
cassandra> get Keyspace1.Standard1['jsmith'] (column=age, value=42; timestamp=1249930062801) (column=first, value=John; timestamp=1249930053103) (column=last, value=Smith; timestamp=1249930058345) Returned 3 rows.
SORTING
When we insert data into Cassandra, data will be automatically sorted. All the Columns related to one Key is sorted by Name, we can specify the order type in storage-conf.xml.
The sort type provided by Cassandra are : BytesType, UTF8Type,LexicalUUIDType, TimeUUIDType, AsciiType,and LongType.
Now assume the raw data are:
{name: 123, value: "hello there"}, {name: 832416, value: "kjjkbcjkcbbd"}, {name: 3, value: "101010101010"}, {name: 976, value: "kjjkbcjkcbbd"}
If we specify the sort type as LongType:
<!-- Deinition of ColumnFamily in storage-conf.xml --> <ColumnFamily CompareWith="LongType" Name="CF_NAME_HERE"/>
The data after sorting will be :
{name: 3, value: "101010101010"}, {name: 123, value: "hello there"}, {name: 976, value: "kjjkbcjkcbbd"}, {name: 832416, value: "kjjkbcjkcbbd"}
For SuperColumn, there is one more sorting dimension. So we need to specify the value of CompareSubcolumnsWith.
{ // first SuperColumn from a Row name: "workAddress", // and the columns within it value: { street: {name: "street", value: "1234 x street"}, city: {name: "city", value: "san francisco"}, zip: {name: "zip", value: "94107"} } }, { // another SuperColumn from same Row name: "homeAddress", // and the columns within it value: { street: {name: "street", value: "1234 x street"}, city: {name: "city", value: "san francisco"}, zip: {name: "zip", value: "94107"} } }
Then we specify CompareSubcolumnsWith and CompareWith as UTF8Type.
{ // this one's first b/c when treated as UTF8 strings { // another SuperColumn from same Row // This Row comes first b/c "homeAddress" is before "workAddress" name: "homeAddress", // the columns within this SC are also sorted by their names too value: { // see, these are sorted by Column name too city: {name: "city", value: "san francisco"}, street: {name: "street", value: "1234 x street"}, zip: {name: "zip", value: "94107"} } }, name: "workAddress", value: { // the columns within this SC are also sorted by their names too city: {name: "city", value: "san francisco"}, street: {name: "street", value: "1234 x street"}, zip: {name: "zip", value: "94107"} } }
We can implement the sorting method by ourselves in Cassandra, we only need to extend org.apache.cassandra.db.marshal.IType.
Reference:
WTF is a SuperColumn? An Intro to the Cassandra Data Model
Source : http://www.cnblogs.com/gpcuster/archive/2010/03/12/1684072.html