NoSql and AWS DynamoDB practices

四月 16, 2017

no sql and dynamoDB

https://mp.weixin.qq.com/sbiz=MzAxMzA2MDYxMw==&mid=2651560478&idx=1&sn=e8ce8d0ff0067059cf9946eee50bcbf6&chksm=80576574b720ec626102d9eca08da2424fedb3d0d58770cc26ed5ae8402a37ee97ae2c4441b1&mpshare=1&scene=24&srcid=0416hM7NzIQUooRU92fn4VOv&key=017ace5cd749d20d13db07527244da37a6c4e17bce708cfa37916f41f4f2cf0450b750489db9c37ee67227bbbd43207b03aaed62b9cea6e3d931a61a83e29b5ce94e014a496ea83f5effe88ca0c7d4f8&ascene=0&uin=ODc5NTM5MzAx&devicetype=iMac+MacBookPro13%2C2+OSX+OSX+10.12.3+build(16D32)&version=12020110&nettype=WIFI&fontScale=100&pass_ticket=TZiwhj6StjhSR05qVUyKEcClLlArTC5CqpFW3qEaROK%2FfI7ojwh9a1esoikANmQh

Relational database:

Mature and stable

Feature rich versatile query language

Acid

No sql

Flexible

Scaled out easily

Data replicated over multi-servers

Weak consistency high availability

Scale up out

Scale up vertical,CPU

Scale out adding more computers

Types of no sql database

K-v tuple row stores

redis (k-v), dynamoDB(row)

Document store

Son xml mongoDB

Extensible record column stores

Bigtable cassandra Hbase

Graph store: spark graph library distributed computation of graph data

Extensible record store: similar to relational database, with rows & columns, columns may be grouped into column family

Different rows may have different columns

Wide column store

Dynamo db

Schema-less no predefined schema

database contains a single table, which consists of a set of items. Each item contains a set of attributes

Items—rows in relational db

Different row s may have different set of attributes

Max size of item 400k

No concept of columns in dynamo db

Each item is uniquely identified by a primary key

Primary key consists of

– partition key
– (optional) sort key

Partition key

– Partition (by hashing) the data across hosts for scalability & availability

• Pick an attribute with wide range of values& evenly distributed patterns for partition key E.g., user ID
• Hash function may put "Rod Stewart" and "Maria Kelly" in the same partition e.g. artist name

Sort key
• Allow searching within a partition E.g., year
– So primary key = artist + year

• This allows search of CDs by a specific artist and produced in certain years

DynamoDB is not good for...

• Ad-hoc query

– Since it does have query language like SQL & does not support joins

• OLAP
– Require joining of fact and dimension tables

• BLOB (binary large objects) storage – E.g., images, videos Better for S3

Consistent hashing

A hash function h(x) = y

– x: a value of arbitrary size / length, e.g., a string of characters

– y: a fixed-size / fixed-range value, e.g., 128 bits, [0, n-1], [0, 1]

– h(s) = (sum of values of characters in string s) % 11

Partitioning by hashing

Items are stored in different servers based on hash values of their partition keys: h(k)

Suppose n nodes in a cluster, h(k) is typically a very big number, e.g., 128 bits, Assign item with key k to node: h(k) % n

Problem in scaling out - # servers (n) grows

• Key k is now assigned to h(k)%(n+1)– may be different from h(k) % n

• Consequence: Almost all items (keys) need to be moved (assigned) to different servers

Consistent hashing
• Hash key to a value in a fixed range, say [0, 1]

– E.g., h'(k) = h(k) / max(h(k))
• Assign each server to a point in the same range

– E.g., hashing machine serial to range [0, 1]

• Assign each key to the first machine with a larger hash value
– If over range, find next one from beginning of range

How much improvement? • m=#of keys n=#of servers

• m/n = # of keys to be moved on average

• Typically, m/n << m (large n) increase n => reduce movement

rmouniak2018年8月30日 02:21
I really enjoy your blog it's a nice post AWS Online Training Hyderabad
回复删除
回复

添加评论

Find !

Stepping

NoSql and AWS DynamoDB practices

评论

发表评论

此博客中的热门博文

8 Link Analysis

How to do addition for sparse vectors

Sorting Algorithms Summary