Its pretty straight forward to add Command Line Arguments to Spark (scala) from shell. $ ./spark-2.0.0-bin-hadoop2.6/bin/spark-shell -i ~/scalaparam.scala --conf spark.driver.args="param1value  param2value  param3value" Parameter values are separated by  spaces  (param1value  param2value  param3value) contents of  scalaparam.scala val args = sc.getConf.get("spark.driver.args").split("\\s+") val param1=args(0) val param2=args(1) val param3=args(2) println("param1 passed from shell : " + param1) println("param2 passed from… (0 comment)

bzip using HDFS command
When your Hadoop distribution lacks NFS Gateway support (unix command support), simple tasks becomes complicated. For example bzipping a file is such a simple task if nfs gateway is enabled $ bzip2  /hdfspath/file.csv but when nfs is not present then it becomes little bit challenging $  hdfs dfs -cat /hdfspath/file.csv | bzip2 | hadoop fs -put… (0 comment)

SQOOP  : mapred.FileAlreadyExistsException : Output directory
Sometimes when you import data from RDBMS to Hadoop via Sqoop you will see this error. org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hadoopcluster/user/username/importtable  already exists Solution: $ hdfs dfs -rm -r -skipTrash  hdfs://hadoopcluster/user/username/importtable Reason: When Sqoop is used for Importing data, sqoop creates a temporary file under  home directory and later deletes those files. Sometimes due to some issue,… (0 comment)

Hive Error :  SemanticException TOK_ALLCOLREF is not supported in current context Reason :  Select Distinct * From <tablename> Solution : Hive doesn’t support  distinct *, so  mention the column names Select Distinct col1, col2, col3… From <tablename>  … (0 comment)