postgresql - How can I load incrementally migrate data from PostgresSQL to HDFS? -
i have postgresql database use production server. want set hadoop/spark cluster run mapreduce jobs. in order need load data postgres database hdfs. naive approach have batch job once day dumps contents of database (120gb) hdfs. wasteful , costly. since data won't change 1 day next, theoretically cheaper , more efficient send diffs every day. possible?
i've read little sqoop, , seems provide functionality want, requires making changes database , application. there way doesn't require making changes database?
apache sqoop can connect postgresql database.
sqoop provides incremental import mode can used retrieve rows newer previously-imported set of rows, i.e, can table updates happened between previous run , current run.
no changes required database.
using sqoop postgresql connector can connect sqoop database , incremental imports without database changes.
Comments
Post a Comment