CASSANDRA data model

  Peter        2013-06-08 22:07:40       3,149        0    

Cassandra is an open source distributed database, it combines dynamic key/value and column oriented feature of Bigtable.

Features of Cassandra are:

  • Flexible schema, no need to design schema first, it's very convenient to add or delete strings
  • Support range search on keys
  • High usability, extensible. The single node error will not affect the cluster.

We can think Cassandra's data model as a 4 or 5 dimensional Hash.

COLUMN

Columns is the smallest data unit in Cassandra, it is a 3 dimensional data type including name,value and timestamp.

 {  // This is a column
    name: "逖靖寒的世界",
    value: "gpcuster@gmali.com",
    timestamp: 123456789
} 

To be simple, we ignore timestamp here, we can treat columns as name/value pairs. Here name and value are both byte[].

SUPERCOLUMN

We can think SuperColumn as an array of Column, it has a name and a set of columns. The JSON format of a SuperColumn is :

{   // This is SuperColumn
     name: "逖靖寒的世界",
     // A set of Columns
    value: {
        street: {name: "street", value: "1234 x street", timestamp: 123456789},
        city: {name: "city", value: "san francisco", timestamp: 123456789},
        zip: {name: "zip", value: "94107", timestamp: 123456789},
     }
}

Column and SuperColumn both contains name/value pairs. The biggest difference is value in Column is a string while the value in SuperColumn is a Map of Column.

SuperColumn itself doesn't contain timestamp

COLUMNFAMILY

ColumnFamily is a structure containing many rows, we can think it as a Table in RDBMS. Each row contains a Key and related Columns supplied by the client.

UserProfile = { // This is a ColumnFamily
     phatduckk: {   // This is the key of ColumnFamily
         // Column related to the key
         username: "gpcuster",
         email: "gpcuster@gmail.com",
         phone: "6666"
    }, // End of first row
    ieure: {   // This is another key of ColumnFamily
         //Column related to the key
         username: "pengguo",
         email: "pengguo@live.com",
         phone: "888"
         age: "66"
     },
}

The type of ColumnFaily can be standard or super.

The above example is a standard ColumnFamily. Standard ColumnFamily contains a set of columns(not SuperColumn). Super ColumnFamily contains a set of SuperColumns, but it cannot contain Standard ColumnFamily.

AddressBook = { // This is a super ColumnFamily
     phatduckk: {    // key
         friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"}, 
         John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"},
         Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"},
         Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"},
         Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"},
         ...
    }, // row ending
     ieure: {     // key
         joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},
         William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"},
     },
}

KEYSPACE

All ColumnFamily belong to one Keyspace, usually one application will have only one Keyspace.

Simple test

cassandra> set Keyspace1.Standard1['jsmith']['first'] = 'John' 
Value inserted. 
cassandra> set Keyspace1.Standard1['jsmith']['last'] = 'Smith' 
Value inserted. 
cassandra> set Keyspace1.Standard1['jsmith']['age'] = '42' 
Value inserted.

After executing above commands, Cassandra now contains 3 records. The meaning of the inserted data is:

image

Next we do a search operation:

cassandra> get Keyspace1.Standard1['jsmith'] 
  (column=age, value=42; timestamp=1249930062801) 
  (column=first, value=John; timestamp=1249930053103) 
  (column=last, value=Smith; timestamp=1249930058345) 
Returned 3 rows.

SORTING

When we insert data into Cassandra, data will be automatically sorted. All the Columns related to one Key is sorted by Name, we can specify the order type in storage-conf.xml.

The sort type provided by Cassandra are : BytesType, UTF8Type,LexicalUUIDType, TimeUUIDType, AsciiType,and LongType.

Now assume the raw data are:

{name: 123, value: "hello there"}, 
{name: 832416, value: "kjjkbcjkcbbd"}, 
{name: 3, value: "101010101010"}, 
{name: 976, value: "kjjkbcjkcbbd"}

If we specify the sort type as LongType:

<!-- 
      Deinition of ColumnFamily in storage-conf.xml 
--> 
<ColumnFamily CompareWith="LongType" Name="CF_NAME_HERE"/>

The data after sorting will be :

{name: 3, value: "101010101010"},   
{name: 123, value: "hello there"}, 
{name: 976, value: "kjjkbcjkcbbd"}, 
{name: 832416, value: "kjjkbcjkcbbd"}

For SuperColumn, there is one more sorting dimension. So we need to specify the value of CompareSubcolumnsWith.

{ // first SuperColumn from a Row 
    name: "workAddress", 
    // and the columns within it 
    value: { 
        street: {name: "street", value: "1234 x street"}, 
        city: {name: "city", value: "san francisco"}, 
        zip: {name: "zip", value: "94107"} 
    } 
}, 
{ // another SuperColumn from same Row 
    name: "homeAddress", 
    // and the columns within it 
    value: { 
        street: {name: "street", value: "1234 x street"}, 
        city: {name: "city", value: "san francisco"}, 
        zip: {name: "zip", value: "94107"} 
    } 
}

Then we specify CompareSubcolumnsWith and CompareWith as UTF8Type.

{ 
    // this one's first b/c when treated as UTF8 strings 
    { // another SuperColumn from same Row 
        // This Row comes first b/c "homeAddress" is before "workAddress"            
        name: "homeAddress", 
        // the columns within this SC are also sorted by their names too 
        value: { 
            // see, these are sorted by Column name too 
            city: {name: "city", value: "san francisco"},                
            street: {name: "street", value: "1234 x street"}, 
            zip: {name: "zip", value: "94107"} 
        } 
    },        
    name: "workAddress", 
    value: { 
        // the columns within this SC are also sorted by their names too 
        city: {name: "city", value: "san francisco"},            
        street: {name: "street", value: "1234 x street"}, 
        zip: {name: "zip", value: "94107"} 
    } 
}

We can implement the sorting method by ourselves in Cassandra, we only need to extend org.apache.cassandra.db.marshal.IType.

Reference:

WTF is a SuperColumn? An Intro to the Cassandra Data Model

DataModel

Source : http://www.cnblogs.com/gpcuster/archive/2010/03/12/1684072.html

CASSANDRA  DATABASE  SORT 

       

  RELATED


  0 COMMENT


No comment for this article.



  RANDOM FUN

Concurrency Programming with Ruby