This script loops through hdfs files system and reads the first line and writes it to console.  Most part its self explanatory. This script uses pipeline delimiter  “|” .  Its optional and can be skipped. import org.apache.hadoop.fs.Path import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem   val path = "/hdfspath/" val conf = new Configuration() val fs = FileSystem.get(conf)… (0 comment)

Its pretty straight forward to add Command Line Arguments to Spark (scala) from shell. $ ./spark-2.0.0-bin-hadoop2.6/bin/spark-shell -i ~/scalaparam.scala --conf spark.driver.args="param1value  param2value  param3value" Parameter values are separated by  spaces  (param1value  param2value  param3value) contents of  scalaparam.scala val args = sc.getConf.get("spark.driver.args").split("\\s+") val param1=args(0) val param2=args(1) val param3=args(2) println("param1 passed from shell : " + param1) println("param2 passed from… (0 comment)

MySQL client ran out of memory
Error Message : Open Database Connectivity (ODBC) error occurred. state: ‘HY000’. Native Error Code: 2008. [MySQL][ODBC 5.3(a) Driver][mysqld-5.5.5-10.1.22-MariaDB]MySQL client ran out of memory This error is caused due to large number of rows that are stored in client machine memory. If you are reading data from MYSQL and not performing any seek operation, then you… (0 comment)

bzip using HDFS command
When your Hadoop distribution lacks NFS Gateway support (unix command support), simple tasks becomes complicated. For example bzipping a file is such a simple task if nfs gateway is enabled $ bzip2  /hdfspath/file.csv but when nfs is not present then it becomes little bit challenging $  hdfs dfs -cat /hdfspath/file.csv | bzip2 | hadoop fs -put… (0 comment)

In this post  Hive Job Submission failed with exception I had mentioned about deleting .Trash folder.  For any reason if you are not able to delete the folder you can go with Option B. Change the  scratchdir for Hive / Sqoop to use Create a new folder under  hadoop file system.  Example   /db/tmphive Grant read write… (0 comment)

Hive Job Submission failed with exception
Job Submission failed with exception ‘org.apache.hadoop.hdfs.protocol.DSQuotaExceededException(The DiskSpace quota of /user/username is exceeded: quota = xx’ This is a common problem when you are working in a multi tenant environment with limited  quota. Reasons: When large quantity of data is processed via Hive / Pig the temporary data gets stored in .Trash folder which causes  /home… (0 comment)