Scalding: Choosing the right Join's type to speed up

19 Aug 2013

We can optimize on joining sources; Say we have the sources, A and B.

If A is relatively of the same size as B, we use the normal join (e.g. joinWithSmaller).

If A is much much smaller than B, but A is more than 100M, we use the skew join (e.g. skewJoinWithSmaller)

If A is so small that it is less than 100M (fit in memory), we use the block join (e.g. blockJoinWithSmaller)

Give it a kudos