How can I explain Hadoop not to split my file in some special MapReduce task? -
given have file process hadoop , know size of file smaller block size of hdfs. guarantees file not splitted , dont need write inputsplit because default 1 not split it?
given file saved sequencefileoutputformat (or other output format) bigger block size, consists of 1 key-value pair. implies file block's stored on same node (except replicated copies) , mapreduce task not waste time fetch them? means dont need write own inputsplit because key not splitted (key size smaller block size , there 1 key)?
Comments
Post a Comment