This script loops through hdfs files system and reads the first line and writes it to console.  Most part its self explanatory. This script uses pipeline delimiter  “|” .  Its optional and can be skipped. import org.apache.hadoop.fs.Path import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem   val path = "/hdfspath/" val conf = new Configuration() val fs = FileSystem.get(conf)… (0 comment)

Its pretty straight forward to add Command Line Arguments to Spark (scala) from shell. $ ./spark-2.0.0-bin-hadoop2.6/bin/spark-shell -i ~/scalaparam.scala --conf spark.driver.args="param1value  param2value  param3value" Parameter values are separated by  spaces  (param1value  param2value  param3value) contents of  scalaparam.scala val args = sc.getConf.get("spark.driver.args").split("\\s+") val param1=args(0) val param2=args(1) val param3=args(2) println("param1 passed from shell : " + param1) println("param2 passed from… (0 comment)

MySQL client ran out of memory
Error Message : Open Database Connectivity (ODBC) error occurred. state: ‘HY000’. Native Error Code: 2008. [MySQL][ODBC 5.3(a) Driver][mysqld-5.5.5-10.1.22-MariaDB]MySQL client ran out of memory This error is caused due to large number of rows that are stored in client machine memory. If you are reading data from MYSQL and not performing any seek operation, then you… (0 comment)

bzip using HDFS command
When your Hadoop distribution lacks NFS Gateway support (unix command support), simple tasks becomes complicated. For example bzipping a file is such a simple task if nfs gateway is enabled $ bzip2  /hdfspath/file.csv but when nfs is not present then it becomes little bit challenging $  hdfs dfs -cat /hdfspath/file.csv | bzip2 | hadoop fs -put… (0 comment)