How can I explain Hadoop not to split my file in some special MapReduce task? -


  1. given have file process hadoop , know size of file smaller block size of hdfs. guarantees file not splitted , dont need write inputsplit because default 1 not split it?

  2. given file saved sequencefileoutputformat (or other output format) bigger block size, consists of 1 key-value pair. implies file block's stored on same node (except replicated copies) , mapreduce task not waste time fetch them? means dont need write own inputsplit because key not splitted (key size smaller block size , there 1 key)?


Comments

Popular posts from this blog

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

c++ - OpenMP unpredictable overhead -

javascript - Wordpress slider, not displayed 100% width -