bzip using HDFS command

When your Hadoop distribution lacks NFS Gateway support (unix command support), simple tasks becomes complicated.

For example bzipping a file is such a simple task if nfs gateway is enabled

$ bzip2  /hdfspath/file.csv

but when nfs is not present then it becomes little bit challenging

$  hdfs dfs -cat /hdfspath/file.csv | bzip2 | hadoop fs -put - /hdfspath/file.bz2 && hdfs dfs -rm /hdfspath/file.csv

The above command first  opens the  csv file, the output is passed to bzip2 command and then zipped output is sent to  .bz2 via  hdfs put command.  Finally when all these are done we manually remove the original .csv file.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: