Hadoop job with too large of data

23 Sep 2013

I request to read data for 6 months. I guess it was way too big. If it is too big, you will get this obscure exception:

Failure Info:Job initialization failed: java.io.IOException: Split metadata size exceeded 10000000. Aborting job job_201309090848_230129 at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48) at org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:816) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:710) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3728) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)

Just don't read a very very large data…

Give it a kudos