NOsql-Cassandra & CQL

Cassandra --an extensible record (wide column) store 

General idea...
• NoSQL
– Different types
– Scale up vs. scale out 
• Key features
– Flexible data model 
– High availability & Scalability 

Amazon DynamoDB
– Data model, partition & sort key 
– Data types (string, number, set, map, list) 
– Consistent hashing 
• Apache Cassandra 
Write & read path 
– Upsert
– Minor & major compaction 

Apache Hive
– HiveQL: SQL-like language 
– Analyze data stored in HDFS
Queries compiled into MapReduce jobs 

Cassandra & DynamoDB
Key-based (~ OLTP)
– Processing a small amount of data per query 
Hive
– Analytical workload (~ OLAP)
– A query may need to process terabytes of data 

模型:

Cassandra使用Google 设计的 BigTable的数据模型,与面向行(row)的relational database或键值存储的key-value数据库不同,Cassandra使用Wide Column Stores,每行数据由row key唯一标识,最多20亿个列,每列由column key标识,每个column key对应若干value。

这种模型可理解为一个二维的key-value存储,整个数据模型定义成一个类似
map<key1, map<key2,value>>

交互

新版Cassandra采用与SQL类似的CQL,实现数据模型定义和数据读写。
desc keyspaces;
create keyspace xxx with replication = {'class':'SimpleStrategy','replication_factor':1}
drop keyspace xxx;
create table xxx()
create column family(name type primary key, name type)

SELECT * FROM users WHERE lastname= 'Smith'; 

insert into users (lastname, age, city, firstname) values ('Smith', 35, 'LA', 'John'); 
--note not check content of SSTable
--insert but actually an update--upsert
 insert into users (lastname, age) values ('Smith', 25);
This insert is actually an update (of age in SSTable) 

update users set city = 'SFO' where lastname = 'Smith'; 

 Upsert
Both update and insert are implemented as upsert
Update if exists; otherwise, insert-similar to MongoDB
Insert if not exists yet; otherwise, update 

 Delete
 deletes a specific columnThe entire row will be removed! 


Secondary Index 
create index age_idx on users(age);
drop index age_idx;
select * from users where age = 25;
This now works 


Range or Inequality or non-key attribute query are not supported,No joinNo foreign key 

Compound key

A primary key that contains multiple columns
1st column is the partition key
Decides how rows are distributed among nodes
Remaining are clustering columns(sort key in DynamoDB)
Decides how rows with same partition key are stored
Default: ascending


CREATE TABLE playlists ( id uuid,
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (
id, song_order)
); 


CREATE TABLE playlists ( id uuid,
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (
id, song_order)
Change default order
) WITH CLUSTERING ORDER BY (song_order DESC);

结构
columns are grouped into column families --table 
each row belongs to a column family 
rows are stored on disk in SSTable (sorted string table)

Sorted string table--SSTable

  • Rows are stored by row key 
Each row starts with a row key, followed by a sorted list of columns by column name / timestamp
Each column contains: 1. column name, 2. column value, 3. timestamp


  • Immutable--once created,no overwrite & random write


2 ways to create SSTable:
 1. flush in-memory data stored in Mem-table--(Minor compaction):
  • In-memory structure holding new data & updates 
  • 1 mem-table per column family 
  • Minor compaction– Flushed to disk as a new SSTable (when size  exceeds threshold), releases buffer pages & shrink memory usage 

 2. Major compaction: merge a set of SSTable for the same column family, which can be efficient since rows are sorted by key, then Old data are removed & disk space is reclaimed

Each SSTable has an index

Efficient lookup of row content from row key 
NOTE:
the index structure has 2 parts:
1. bloom filtering--no false negative, but has false positive
if bloom filter say no, it won't be wrong
2. B+ tree index

BigTable中的列族(Column Family)在Cassandra中被称作类似关系型数据库中的表(table),而Cassandra/BigTable中, 1. row key和2. column key并称为主键(primary key)

Cassandra的row key决定该行数据存储在哪些节点,因此row key按哈希来存储,不能顺序扫描或读取,而一个row内的column key是顺序存储的,可以进行有序扫描或范围查找.(clustering columns like sort key in DynamoDB)

Write:insert/delete/update

  1. A log entry is appended to a commit log file
  2. Write data to memtable & acknowledge completion to client
  3. When memtable is full, flush it as a new SSTable & purge corresponding entries from commit log (minor compaction)
  4. Periodically, merge SSTables of the same column family (major compaction)

Read: 
Content of row is distributed among Memtable Multiple SSTables
=> Read is expensive than write & may require:
disk access (to locate SSTables that contain fragments of row)
merging (row content in mem-table & SSTables)

存储

与BigTable和其模仿者HBase不同,Cassandra数据并不存储在分布式文件系统如GFS或HDFS中,而是直接存于本地。

与BigTable一样,Cassandra也是日志型数据库,把新写入的数据存储在内存的Memtable中,通过磁盘的CommitLog做持久化,内存填满后将数据按key的顺序写进一个只读文件SSTable中,每次读取数据时,将所有SSTable内存中的数据查找和合并。这种系统特点是写入比读取快,因为写入一条数据是顺序计入commit log中,不需随机读取磁盘及搜索。

系统架构

Cassandra系统架构与Dynamo类似,基于一致性哈希,每行数据通过哈希决定存在哪些节点。集群没有master的概念,所有节点都是同样角色,避免了系统的单点问题导致的不稳定性。

每个节点都把数据存在本地,都接受来自客户端的请求。

每次客户端随机选择集群中的一个节点来请求数据,对应接受请求的节点将对应的key在一致性哈希的环上定位是节点,将请求转发到对应的节点,并将对应若干节点的查询反馈返回。

在一致性,可用性,分区耐受能力(CAP)的问题,Cassandra和Dynamo一样灵活。
Cassandra的每个keyspace(database in RDBMS, contatiner for column family)可配置一行数据写入多少个节点(设个数为N)(replication strategy)

simple replication strategy
all replicas are in the same data center
1st replica on a node decided by consistent hashing
additional replica on next nodes clockwise in the ring
not rack-aware

与Hbase

HBase是Apache Hadoop的子项目,Google BigTable的克隆,与Cassandra一样,都使用BigTable的列族式的数据模型,但:
Cassandra只有一种节点,而HBase有多种不同角色,除处理读写请求的region server之外,架构在一套完整的HDFS分布式系统上,需ZooKeeper同步集群状态,部署上Cassandra更简单。
Cassandra的数据一致性策略可配置,选择强一致性or性能更高的最终一致性;HBase总是强一致性。

Cassandra通过一致性哈希决定一行数据存储在哪些节点,靠概率平均来实现负载均衡;
HBase每段数据(region)只有一个节点负责处理,由master动态分配一个region是否大到需要拆分成两个,同时将过热的节点的一些region动态的分配给负载较低的节点,因此实现动态的负载均衡。每个region同时只能有一个节点处理,一旦这个节点无响应,在系统将这个节点的所有region转移到其他节点之前这些数据便无法读写,加上master也只有一个节点,备用master的恢复也需要时间,因此HBase在一定程度上有单点问题;而Cassandra无单点问题。
Cassandra的读写性能优于HBase。

评论

发表评论

此博客中的热门博文

8 Link Analysis

1 Map reduce problems

NoSql and AWS DynamoDB practices