This script loops through hdfs files system and reads the first line and writes it to console.  Most part its self explanatory. This script uses pipeline delimiter  “|” .  Its optional and can be skipped. import org.apache.hadoop.fs.Path import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem   val path = "/hdfspath/" val conf = new Configuration() val fs = FileSystem.get(conf)… (0 comment)

Its pretty straight forward to add Command Line Arguments to Spark (scala) from shell. $ ./spark-2.0.0-bin-hadoop2.6/bin/spark-shell -i ~/scalaparam.scala --conf spark.driver.args="param1value  param2value  param3value" Parameter values are separated by  spaces  (param1value  param2value  param3value) contents of  scalaparam.scala val args = sc.getConf.get("spark.driver.args").split("\\s+") val param1=args(0) val param2=args(1) val param3=args(2) println("param1 passed from shell : " + param1) println("param2 passed from… (0 comment)

bzip using HDFS command
When your Hadoop distribution lacks NFS Gateway support (unix command support), simple tasks becomes complicated. For example bzipping a file is such a simple task if nfs gateway is enabled $ bzip2  /hdfspath/file.csv but when nfs is not present then it becomes little bit challenging $  hdfs dfs -cat /hdfspath/file.csv | bzip2 | hadoop fs -put… (0 comment)

In this post  Hive Job Submission failed with exception I had mentioned about deleting .Trash folder.  For any reason if you are not able to delete the folder you can go with Option B. Change the  scratchdir for Hive / Sqoop to use Create a new folder under  hadoop file system.  Example   /db/tmphive Grant read write… (0 comment)

Hive Job Submission failed with exception
Job Submission failed with exception ‘org.apache.hadoop.hdfs.protocol.DSQuotaExceededException(The DiskSpace quota of /user/username is exceeded: quota = xx’ This is a common problem when you are working in a multi tenant environment with limited  quota. Reasons: When large quantity of data is processed via Hive / Pig the temporary data gets stored in .Trash folder which causes  /home… (0 comment)

SQOOP  : mapred.FileAlreadyExistsException : Output directory
Sometimes when you import data from RDBMS to Hadoop via Sqoop you will see this error. org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hadoopcluster/user/username/importtable  already exists Solution: $ hdfs dfs -rm -r -skipTrash  hdfs://hadoopcluster/user/username/importtable Reason: When Sqoop is used for Importing data, sqoop creates a temporary file under  home directory and later deletes those files. Sometimes due to some issue,… (0 comment)