NOsql-Cassandra & CQL
Cassandra --an extensible record (wide column) store
General idea...
– Different types
– Scale up vs. scale out
• Key features
– Flexible data model
– High availability & Scalability
Amazon DynamoDB
– Data model, partition & sort key
– Data types (string, number, set, map, list)
– Consistent hashing
• Apache Cassandra
– Write & read path
– Upsert
– Minor & major compaction
Apache Hive
– HiveQL: SQL-like language
– Analyze data stored in HDFS
– Queries compiled into MapReduce jobs
Cassandra & DynamoDB
– Key-based (~ OLTP)
– Processing a small amount of data per query
– Analytical workload (~ OLAP)
– A query may need to process terabytes of data
Cassandra使用Google 设计的 BigTable的数据模型,与面向行(row)的relational database或键值存储的key-value数据库不同,Cassandra使用Wide Column Stores,每行数据由row key唯一标识,最多20亿个列,每列由column key标识,每个column key对应若干value。这种模型可理解为一个二维的key-value存储,整个数据模型定义成一个类似
map<key1, map<key2,value>>。
新版Cassandra采用与SQL类似的CQL,实现数据模型定义和数据读写。desc keyspaces;
create keyspace xxx with replication = {'class':'SimpleStrategy','replication_factor':1}
drop keyspace xxx;
create table xxx()
create column family(name type primary key, name type)
SELECT * FROM users WHERE lastname=
insert into users (lastname, age, city,
firstname) values ('Smith', 35, 'LA', 'John');
--note not check content of SSTable
--insert but actually an update--upsert
insert into users (lastname, age) values
('Smith', 25);
– This insert is actually an update (of age in SSTable)
update users set city = 'SFO' where lastname =
• Both update and insert are implemented as
• Update if exists; otherwise, insert-similar to MongoDB
• Insert if not exists yet; otherwise, update
deletes a specific column– The entire row will be removed!
Secondary Index
• create index age_idx on users(age);
– drop index age_idx;
• select * from users where age = 25;
– This now works
Range or Inequality or non-key attribute query are not supported,No join, No foreign key
• 1st column is the partition key
– Decides how rows are distributed among nodes
• Remaining are clustering columns(sort key in DynamoDB)
– Decides how rows with same partition key are stored
– Default: ascending
Compound key
CREATE TABLE playlists (
id uuid,
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order)
Change default order
columns are grouped into column families --table
each row belongs to a column family
rows are stored on disk in SSTable (sorted string table)
Sorted string table--SSTable
- Rows are stored by row key
Each row starts with a row key, followed by a sorted list of columns by column name / timestamp
Each column contains: 1. column name, 2. column value, 3. timestamp
- Immutable--once created,no overwrite & random write
2 ways to create SSTable:
1. flush in-memory data stored in Mem-table--(Minor compaction):
- In-memory structure holding new data & updates
- 1 mem-table per column family
- Minor compaction– Flushed to disk as a new SSTable (when size exceeds threshold), releases buffer pages & shrink memory usage
2. Major compaction: merge a set of SSTable for the same column family, which can be efficient since rows are sorted by key, then Old data are removed & disk space is
Each SSTable has an index
– Efficient lookup of row content from row key
the index structure has 2 parts:
1. bloom filtering--no false negative, but has false positive
if bloom filter say no, it won't be wrong
2. B+ tree index
Cassandra的row key决定该行数据存储在哪些节点,因此row key按哈希来存储,不能顺序扫描或读取,而一个row内的column key是顺序存储的,可以进行有序扫描或范围查找.(clustering columns like sort key in DynamoDB)
A log entry is appended to a commit log file
Write data to memtable & acknowledge
completion to client
When memtable is full, flush it as a new SSTable
& purge corresponding entries from commit log (minor compaction)
Periodically, merge SSTables of the same column
family (major compaction)
Content of row is distributed among Memtable & Multiple SSTables
=> Read is expensive than write & may require:
– disk access (to locate SSTables that contain fragments of row)
– merging (row content in mem-table & SSTables)
Cassandra的每个keyspace(database in RDBMS, contatiner for column family)可配置一行数据写入多少个节点(设个数为N)(replication strategy)
simple replication strategy
all replicas are in the same data center
1st replica on a node decided by consistent hashing
additional replica on next nodes clockwise in the ring
not rack-aware
HBase是Apache Hadoop的子项目,Google BigTable的克隆,与Cassandra一样,都使用BigTable的列族式的数据模型,但:Cassandra只有一种节点,而HBase有多种不同角色,除处理读写请求的region server之外,架构在一套完整的HDFS分布式系统上,需ZooKeeper同步集群状态,部署上Cassandra更简单。