Thursday, May 9, 2013

Hive-0.10.0 Setup on Pseudo-distributed Hadoop-1.1.2 on Cygwin/Windows7

As all the Apache products supporting distributed parallel computing under the Hadoop umbrella are expected to run on Unix like environment, they might need little bit of tweaking here and there to run properly on Cygwin/Windows platform. These are some of my experimental observation while setting up Hive on Hadoop on Cygwin/Windows7 platform.
 
  • Untar tarball hive-0.10.0.tar.gz
          $ tar -xzvf hive-0.10.0.tar.gz

  • Update C:/cygwin /home/{User}/.bash_profile, add the following to cygwin PATH:
          export HIVE_HOME=/cygdrive/c/hadoop/hive-0.10.0
          export PATH=$HIVE_HOME/bin:$PATH

  • Update $HIVE_HOME/conf/hive-env.sh, add the following:
          HADOOP_HOME=/cygdrive/c/hadoop/hadoop-1.1.2

  • Update $HADOOP_HOME/conf/hadoop-env.sh, add the following:
          "export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib"


The above setup helps to deal with some of the common exceptions HIVE  might throw during configuration –

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf


Exception in thread "main" java.lang.RuntimeException: Failed to load Hive builtin functions
Caused by: java.lang.ClassNotFoundException: org.apache.hive.builtins.BuiltinUtils

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/session/SessionState
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.session.SessionState


2 comments:

Unknown said...

Hi
Thanks for the hints
I am facing problem in executing the select query in Hive

Kindly do the need full
hive> SELECT a.ISBN FROM BXDATASET a LIMIT 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201310031514_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201310031514_0001
Kill Command = C:\cygwin\home\PB\hadoop-0.20.2\/bin/hadoop job -Dmapred.job.tracker=localhost:9101 -kill job_201310031514_0001
2013-10-03 15:17:45,590 Stage-1 map = 0%, reduce = 0%
2013-10-03 15:18:24,706 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201310031514_0001 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
hive>

Surajit Paul said...

The console output errors are usually misleading, as it doesn't have a view of the individual jobs/tasks to pull the real errors. You may like to check out the jobtracker web dashboard to find which hive mapreduce task has failed and the corresponding log file would have the right error message.