HomeHadoop

bzip using HDFS command

bzip using HDFS command
Like Tweet Pin it Share Share Email

When your Hadoop distribution lacks NFS Gateway support (unix command support), simple tasks becomes complicated.

For example bzipping a file is such a simple task if nfs gateway is enabled

$ bzip2  /hdfspath/file.csv

but when nfs is not present then it becomes little bit challenging

$  hdfs dfs -cat /hdfspath/file.csv | bzip2 | hadoop fs -put - /hdfspath/file.bz2 && hdfs dfs -rm /hdfspath/file.csv

The above command first  opens the  csv file, the output is passed to bzip2 command and then zipped output is sent to  .bz2 via  hdfs put command.  Finally when all these are done we manually remove the original .csv file.

 

Comments (0)

Leave a Reply

Your email address will not be published. Required fields are marked *