1 Map reduce problems

PROBLEMS

Join  
Matrix multiplication 2 / 1 phase 
Matrix vector multiplication 

POINTS

Map reduce input output
Memory requirement 
Reducer size 
Communication cost

P1 Join: R(A, B) S(A, C) 

1. Map
r(a, b) => (a, ('R', b));  s(a, c) => (a, ('S', c)) 

2. Reduce
(a, [('R', b), ('S', c1), ('S', c2)]) => (a, (b, c1)), (a, (b, c2)) 

Dangling tuples:
Key with values from only 1 relation 
(a, [('R', b)]) => left dangling ; (a, [('S', c)]) => right dangling 

P2 Matrix-multiplication 

CmXn= AmXpBpXn 
2 phase:
a. phase1  product 

Map: Extract key from As c, Bs r
A[i, k] => (k, ('A', i, A[i, k])) ; B[k, j] => (k, ('B', j, B[k, j])) 

Communication cost: mp + np 
Reducer size: m + n (p different keys)
Memory requirement: m+n

Reduce: Pairwise product --(join, multiply, dispatch)
Input: (k, [ ('A', i, A[i, k]), ..., ('B', j, B[k, j]), ...] 
 the change in key 
Output: => ( (i, j), A[i, k] * B[k, j] ), ... 

b. phase2  sum 
Map: pass the output of phase 1
Communication cost: each key p, m*n pairwise products, m*n*p 

Reduce: for each entry of C, p-1 multiplications
Reducer size: p products 

1 phase:
map: 
A (m, p) B(p, n) = C(m, n)
A [I, k ] map —> C I row = n element s
(I, c),  (‘A’, k, A [I, k ])
B [k, j ] map —> C j column = m elements

(R, j), (‘B’, k, B [k, j ] )

communication cost: mp*n + np*m = 2m*n*p

reduce:
((i,j), [all elements in i-th row of A+ j-th column of B])  
((i,j), [('A', 1, A[i, 1]), ..., ('A', p, A[i, p]), ('B', 1, B[1, j]), ..., ('B', p, B[p, j])] 
reducer size: p+p
memory requirement: 2p 

Communication cost :
mp+pn=p(m+n)<pmnOne-phase has higher communication cost 
Reducer size:One-phase = 2p;  2nd phase of 2-phase = p 


Matrix-vector multiplication 

M m*n V n*1 结果是一个m维向量
Map function: 
a copy of v (in memory) + a chunk of M (vector v在内存中)
– Input: Mij 
– Output: (i, Mij vj)  -key = M row index, value = product 

Reduce function:
1. Input: 
(i, [Mi1v1, Mi2v2, ..., Minvn])
i —mkey,每个key i n个元素
communication cost :  Immediate k-v pairs m*n 
Memory requirement: a copy of v 

2. Output:
(i, Mi1v1 + Mi2v2 + ... + Minvn) 
i个元素

when v is too big : striping 
Divide v into horizontal stripes
Divide M into corresponding vertical stripes 
 not require M in column-major 

s stripes => s MapReduce jobs=>Mivi
=>Handles the i-th stripe of matrix Mi Has a copy of vi in memory

Another 1 MapReduce job to add up results –> M1v1 + M2v2 + ... + Msv





评论

此博客中的热门博文

8 Link Analysis

NoSql and AWS DynamoDB practices