postgresql - How can I load incrementally migrate data from PostgresSQL to HDFS? -


i have postgresql database use production server. want set hadoop/spark cluster run mapreduce jobs. in order need load data postgres database hdfs. naive approach have batch job once day dumps contents of database (120gb) hdfs. wasteful , costly. since data won't change 1 day next, theoretically cheaper , more efficient send diffs every day. possible?

i've read little sqoop, , seems provide functionality want, requires making changes database , application. there way doesn't require making changes database?

  • apache sqoop can connect postgresql database.

    sqoop provides incremental import mode can used retrieve rows newer previously-imported set of rows, i.e, can table updates happened between previous run , current run.

  • no changes required database.

using sqoop postgresql connector can connect sqoop database , incremental imports without database changes.


Comments

Popular posts from this blog

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

c++ - OpenMP unpredictable overhead -

javascript - Wordpress slider, not displayed 100% width -