1 Map reduce problems
PROBLEMS
Join
Matrix multiplication 2 / 1 phase
Matrix vector multiplication
POINTS
Map reduce input output
Memory requirement
Reducer size
Communication cost
P1 Join: R(A, B) S(A, C)
1. Map
r(a, b) => (a, ('R', b)); s(a, c) => (a, ('S', c))
2. Reduce
(a, [('R', b), ('S', c1), ('S', c2)]) => (a, (b, c1)), (a, (b, c2))
Dangling tuples:
Key with values from only 1 relation
(a, [('R', b)]) => left dangling ; (a, [('S', c)]) => right dangling
P2 Matrix-multiplication
CmXn= AmXpBpXn
2 phase:
a. phase1 product
Map: Extract key from A’s c, B’s r
A[i, k] => (k, ('A', i, A[i, k])) ; B[k, j] => (k, ('B', j, B[k, j]))
Communication cost: mp + np
Reducer size: m + n (p different keys)
Memory requirement: m+n
Reduce: Pairwise product --(join, multiply, dispatch)
Input: (k, [ ('A', i, A[i, k]), ..., ('B', j, B[k, j]), ...]
the change in key
Output: => ( (i, j), A[i, k] * B[k, j] ), ...
b. phase2 sum
Map: pass the output of phase 1
Communication cost: each key p, m*n pairwise products, m*n*p
Reduce: for each entry of C, p-1 multiplications
Reducer size: p products
1 phase:
map:
A (m, p) B(p, n) = C(m, n)
A [I, k ] map —> C I row = n element s
(I, c), (‘A’, k, A [I, k ])
B [k, j ] map —> C j column = m elements
(R, j), (‘B’, k, B [k, j ] )
communication cost: mp*n + np*m = 2m*n*p
reduce:
((i,j), [all elements in i-th row of A+ j-th column of B])
((i,j), [('A', 1, A[i, 1]), ..., ('A', p, A[i, p]), ('B', 1, B[1, j]), ..., ('B', p, B[p, j])]
((i,j), [('A', 1, A[i, 1]), ..., ('A', p, A[i, p]), ('B', 1, B[1, j]), ..., ('B', p, B[p, j])]
reducer size: p+p
memory requirement: 2p
Communication cost :
mp+pn=p(m+n)<pmn⇒One-phase has higher communication cost
Reducer size:One-phase = 2p; 2nd phase of 2-phase = p
Matrix-vector multiplication
M m*n V n*1 结果是一个m维向量
Map function:
a copy of v (in memory) + a chunk of M (vector v在内存中)
– Input: Mij
– Output: (i, Mij vj) -key = M row index, value = product
Reduce function:
1. Input: (i, [Mi1v1, Mi2v2, ..., Minvn])
1. Input: (i, [Mi1v1, Mi2v2, ..., Minvn])
i —m个key,每个key i 有n个元素
communication cost : Immediate k-v pairs m*n
Memory requirement: a copy of v
2. Output: (i, Mi1v1 + Mi2v2 + ... + Minvn)
第i个元素
when v is too big : striping
• Divide v into horizontal stripes
评论
发表评论