19 Aug 2013
We can optimize on joining sources; Say we have the sources, A and B.
If A is relatively of the same size as B, we use the normal join (e.g. joinWithSmaller).
If A is much much smaller than B, but A is more than 100M, we use the skew join (e.g. skewJoinWithSmaller)
If A is so small that it is less than 100M (fit in memory), we use the block join (e.g. blockJoinWithSmaller)