NOsql-Cassandra & CQL
Cassandra --an extensible record (wide column) store
General idea...
• NoSQL
– Different types
– Scale up vs. scale out
– Different types
– Scale up vs. scale out
• Key features
– Flexible data model
– High availability & Scalability
Amazon DynamoDB
– Data model, partition & sort key
– Data model, partition & sort key
– Data types (string, number, set, map, list)
– Consistent hashing
• Apache Cassandra
– Write & read path
– Upsert
– Minor & major compaction
– Minor & major compaction
Apache Hive
– HiveQL: SQL-like language
– HiveQL: SQL-like language
– Analyze data stored in HDFS
– Queries compiled into MapReduce jobs
– Queries compiled into MapReduce jobs
Cassandra & DynamoDB
– Key-based (~ OLTP)
– Processing a small amount of data per query
– Key-based (~ OLTP)
– Processing a small amount of data per query
Hive
– Analytical workload (~ OLAP)
– A query may need to process terabytes of data
– Analytical workload (~ OLAP)
– A query may need to process terabytes of data
模型:
Cassandra使用Google 设计的 BigTable的数据模型,与面向行(row)的relational database或键值存储的key-value数据库不同,Cassandra使用Wide Column Stores,每行数据由row key唯一标识,最多20亿个列,每列由column key标识,每个column key对应若干value。这种模型可理解为一个二维的key-value存储,整个数据模型定义成一个类似
map<key1, map<key2,value>>。
交互
新版Cassandra采用与SQL类似的CQL,实现数据模型定义和数据读写。desc keyspaces;
create keyspace xxx with replication = {'class':'SimpleStrategy','replication_factor':1}
drop keyspace xxx;
create table xxx()
create column family(name type primary key, name type)
SELECT * FROM users WHERE lastname=
'Smith';
insert into users (lastname, age, city,
firstname) values ('Smith', 35, 'LA', 'John');
--note not check content of SSTable
--insert but actually an update--upsert
--note not check content of SSTable
--insert but actually an update--upsert
insert into users (lastname, age) values
('Smith', 25);
– This insert is actually an update (of age in SSTable)
– This insert is actually an update (of age in SSTable)
update users set city = 'SFO' where lastname =
'Smith';
Upsert
结构
• Both update and insert are implemented as
upsert
• Update if exists; otherwise, insert-similar to MongoDB
• Insert if not exists yet; otherwise, update
Delete
deletes a specific column– The entire row will be removed!
• Update if exists; otherwise, insert-similar to MongoDB
• Insert if not exists yet; otherwise, update
Delete
deletes a specific column– The entire row will be removed!
Secondary Index
• create index age_idx on users(age);
– drop index age_idx;
• select * from users where age = 25;
– This now works
• create index age_idx on users(age);
– drop index age_idx;
• select * from users where age = 25;
– This now works
Range or Inequality or non-key attribute query are not supported,No join, No foreign key
• 1st column is the partition key
– Decides how rows are distributed among nodes
• Remaining are clustering columns(sort key in DynamoDB)
– Decides how rows with same partition key are stored
– Default: ascending
Compound key
• A primary key that contains multiple columns• 1st column is the partition key
– Decides how rows are distributed among nodes
• Remaining are clustering columns(sort key in DynamoDB)
– Decides how rows with same partition key are stored
– Default: ascending
CREATE TABLE playlists (
id uuid,
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order)
);
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order)
);
CREATE TABLE playlists (
id uuid,
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order)
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order)
Change default order
) WITH CLUSTERING ORDER BY (song_order DESC);
columns are grouped into column families --table
each row belongs to a column family
rows are stored on disk in SSTable (sorted string table)
Sorted string table--SSTable
- Rows are stored by row key
Each row starts with a row key, followed by a sorted list of columns by column name / timestamp
Each column contains: 1. column name, 2. column value, 3. timestamp
- Immutable--once created,no overwrite & random write
2 ways to create SSTable:
1. flush in-memory data stored in Mem-table--(Minor compaction):
- In-memory structure holding new data & updates
- 1 mem-table per column family
- Minor compaction– Flushed to disk as a new SSTable (when size exceeds threshold), releases buffer pages & shrink memory usage
2. Major compaction: merge a set of SSTable for the same column family, which can be efficient since rows are sorted by key, then Old data are removed & disk space is
reclaimed
Each SSTable has an index
– Efficient lookup of row content from row key
NOTE:
the index structure has 2 parts:
1. bloom filtering--no false negative, but has false positive
if bloom filter say no, it won't be wrong
2. B+ tree index
NOTE:
the index structure has 2 parts:
1. bloom filtering--no false negative, but has false positive
if bloom filter say no, it won't be wrong
2. B+ tree index
Cassandra的row key决定该行数据存储在哪些节点,因此row key按哈希来存储,不能顺序扫描或读取,而一个row内的column key是顺序存储的,可以进行有序扫描或范围查找.(clustering columns like sort key in DynamoDB)
Write:insert/delete/update
-
A log entry is appended to a commit log file
-
Write data to memtable & acknowledge
completion to client
-
When memtable is full, flush it as a new SSTable
& purge corresponding entries from commit log (minor compaction)
-
Periodically, merge SSTables of the same column
family (major compaction)
Read:
Content of row is distributed among Memtable & Multiple SSTables
=> Read is expensive than write & may require:
– disk access (to locate SSTables that contain fragments of row)
– merging (row content in mem-table & SSTables)
Content of row is distributed among Memtable & Multiple SSTables
=> Read is expensive than write & may require:
– disk access (to locate SSTables that contain fragments of row)
– merging (row content in mem-table & SSTables)
存储
与BigTable和其模仿者HBase不同,Cassandra数据并不存储在分布式文件系统如GFS或HDFS中,而是直接存于本地。系统架构
Cassandra系统架构与Dynamo类似,基于一致性哈希,每行数据通过哈希决定存在哪些节点。集群没有master的概念,所有节点都是同样角色,避免了系统的单点问题导致的不稳定性。每个节点都把数据存在本地,都接受来自客户端的请求。
每次客户端随机选择集群中的一个节点来请求数据,对应接受请求的节点将对应的key在一致性哈希的环上定位是节点,将请求转发到对应的节点,并将对应若干节点的查询反馈返回。
在一致性,可用性,分区耐受能力(CAP)的问题,Cassandra和Dynamo一样灵活。
Cassandra的每个keyspace(database in RDBMS, contatiner for column family)可配置一行数据写入多少个节点(设个数为N)(replication strategy)
simple replication strategy
all replicas are in the same data center
1st replica on a node decided by consistent hashing
additional replica on next nodes clockwise in the ring
not rack-aware
与Hbase
HBase是Apache Hadoop的子项目,Google BigTable的克隆,与Cassandra一样,都使用BigTable的列族式的数据模型,但:Cassandra只有一种节点,而HBase有多种不同角色,除处理读写请求的region server之外,架构在一套完整的HDFS分布式系统上,需ZooKeeper同步集群状态,部署上Cassandra更简单。
Cassandra的数据一致性策略可配置,选择强一致性or性能更高的最终一致性;HBase总是强一致性。
Cassandra通过一致性哈希决定一行数据存储在哪些节点,靠概率平均来实现负载均衡;
HBase每段数据(region)只有一个节点负责处理,由master动态分配一个region是否大到需要拆分成两个,同时将过热的节点的一些region动态的分配给负载较低的节点,因此实现动态的负载均衡。每个region同时只能有一个节点处理,一旦这个节点无响应,在系统将这个节点的所有region转移到其他节点之前这些数据便无法读写,加上master也只有一个节点,备用master的恢复也需要时间,因此HBase在一定程度上有单点问题;而Cassandra无单点问题。
Cassandra的读写性能优于HBase。
赞
回复删除