Monday, December 16, 2013

oozie-4.0.0 Configuration on hadoop-2.2.0


Oozie is an extensible, scalable and reliable system to define, manage, schedule, and execute complex Hadoop workloads via web services, Java API and Oozie CLI.
·         XML-based declarative framework to specify a job or a complex workflow of dependent jobs.
·         Support different types of job such as Hadoop Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications.
·         Workflow scheduling based on frequency and/or data availability.
·         Monitoring capability, automatic retry and failure handing of jobs.
·         Extensible and pluggable architecture to allow arbitrary grid programming paradigms.
·         Authentication, authorization, and capacity-aware load throttling to allow multi-tenant software as a service.

Oozie is a server based Workflow Engine specialized in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.Oozie is a Java Web-Application that runs in a Java servlet-container (Tomcat).For the purposes of Oozie, a workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs, Hive jobs) arranged in a control dependency DAG (Direct Acyclic Graph). "control dependency" from one action to another means that the second action can't run until the first action has completed.Oozie workflows definitions are written in hPDL (a XML Process Definition Language similar to JBOSS JBPM jPDL).Oozie workflow actions start jobs in remote systems (i.e. Hadoop, Pig, Hive, Sqoop etc.). Upon action completion, the remote systems callback Oozie to notify the action completion, at this point Oozie proceeds to the next action in the workflow.
Oozie workflows contain control flow nodes and action nodes.Control flow nodes define the beginning and the end of a workflow (start, end and fail nodes) and provide a mechanism to control the workflow execution path (decision, fork and join nodes).Action nodes are the mechanism by which a workflow triggers the execution of a computation/processing task. Oozie provides support for different types of actions: Hadoop map-reduce, Hadoop file system, Pig, SSH, HTTP, eMail and Oozie sub-workflow. Oozie can be extended to support additional type of actions.Oozie workflows can be parameterized (using variables like ${inputDir} within the workflow definition). When submitting a workflow job values for the parameters must be provided. If properly parameterized (i.e. using different output directories) several identical workflow jobs can concurrently.

Prerequisite Software for Oozie-4.0.0

  • ·         Java JDK-1.7
  • ·         Maven-3.1.1
  • ·         Hadoop-2.2.0
  • ·         Pig-0.11.1
  • ·         Oozie-4.0.0
(Download oozie-4.0.0 from http://www.apache.org/dyn/closer.cgi/oozie/4.0.0)

java, javac and mvn must be in the command path.

Building Oozie Distribution

Update properties in pom.xml for the corresponding software

<properties>
    <javaVersion>1.7</javaVersion>
    <targetJavaVersion>1.7</targetJavaVersion>
    <!-- to be able to run a single test case from the main project -->
    <failIfNoTests>false</failIfNoTests>
    <test.timeout>5400</test.timeout>
    <test.exclude>_</test.exclude>
    <test.exclude.pattern>_</test.exclude.pattern>
    <oozie.test.dir>${project.build.directory}/test-data</oozie.test.dir>
    <oozie.test.forkMode>once</oozie.test.forkMode>
    <hadoop.version>2.2.0</hadoop.version>
    <hbase.version>0.94.7</hbase.version>
    <hcatalog.version>0.5.0</hcatalog.version>
    <hadooplib.version>${hadoop.version}.oozie-${project.version}</hadooplib.version>
    <hbaselib.version>${hbase.version}.oozie-${project.version}</hbaselib.version>
    <hcataloglib.version>${hcatalog.version}.oozie-${project.version}</hcataloglib.version>
    <clover.license>/home/jenkins/tools/clover/latest/lib/clover.license</clover.license>
    <!--This is required while we support a a pre 0.23 version of Hadoop which does not have
    the hadoop-auth artifact. After we phase-out pre 0.23 we can get rid of this property.-->
    <hadoop.auth.version>2.2.0</hadoop.auth.version>
    <!-- Sharelib component versions -->
    <hive.version>0.11.0</hive.version>
    <pig.version>0.11.1</pig.version>
    <pig.classifier></pig.classifier>
    <sqoop.version>1.4.3</sqoop.version>
    <sqoop.classifier>hadoop100</sqoop.classifier>
    <streaming.version>${hadoop.version}</streaming.version>
    <distcp.version>${hadooplib.version}</distcp.version>
    <!-- Tomcat version -->
    <tomcat.version>6.0.36</tomcat.version>
    <openjpa.version>2.2.2</openjpa.version>
</properties>

Execute bin/mkdistro.sh from command prompt. If all prerequisites are installed properly, oozie gets built successfully with all dependencies. 
System outputs on successful oozie build:
[INFO] Building tar : /home/surajit.paul/oozie/oozie-4.0.0/distro/target/oozie-4.0.0-distro.tar.gz
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Oozie Main ................................. SUCCESS [4.450s]
[INFO] Apache Oozie Client ............................... SUCCESS [51.036s]
[INFO] Apache Oozie Hadoop 1.1.1.oozie-4.0.0 ............. SUCCESS [2.506s]
[INFO] Apache Oozie Hadoop Distcp 1.1.1.oozie-4.0.0 ...... SUCCESS [0.232s]
[INFO] Apache Oozie Hadoop 1.1.1.oozie-4.0.0 Test ........ SUCCESS [0.447s]
[INFO] Apache Oozie Hadoop 2.2.0.oozie-4.0.0 ............. SUCCESS [7.550s]
[INFO] Apache Oozie Hadoop 2.2.0.oozie-4.0.0 Test ........ SUCCESS [0.562s]
[INFO] Apache Oozie Hadoop Distcp 2.2.0.oozie-4.0.0 ...... SUCCESS [0.289s]
[INFO] Apache Oozie Hadoop 0.23.5.oozie-4.0.0 ............ SUCCESS [6.928s]
[INFO] Apache Oozie Hadoop 0.23.5.oozie-4.0.0 Test ....... SUCCESS [0.818s]
[INFO] Apache Oozie Hadoop Distcp 0.23.5.oozie-4.0.0 ..... SUCCESS [0.322s]
[INFO] Apache Oozie Hadoop Libs .......................... SUCCESS [6.534s]
[INFO] Apache Oozie Hbase 0.94.2.oozie-4.0.0 ............. SUCCESS [0.785s]
[INFO] Apache Oozie Hbase Libs ........................... SUCCESS [1.315s]
[INFO] Apache Oozie HCatalog 0.5.0.oozie-4.0.0 ........... SUCCESS [1.802s]
[INFO] Apache Oozie HCatalog 0.6.0.oozie-4.0.0 ........... SUCCESS [1.709s]
[INFO] Apache Oozie HCatalog Libs ........................ SUCCESS [2.299s]
[INFO] Apache Oozie Share Lib Oozie ...................... SUCCESS [9.163s]
[INFO] Apache Oozie Share Lib HCatalog ................... SUCCESS [13.288s]
[INFO] Apache Oozie Core ................................. SUCCESS [1:50.476s]
[INFO] Apache Oozie Docs ................................. SUCCESS [28.831s]
[INFO] Apache Oozie Share Lib Pig ........................ SUCCESS [18.892s]
[INFO] Apache Oozie Share Lib Hive ....................... SUCCESS [23.419s]
[INFO] Apache Oozie Share Lib Sqoop ...................... SUCCESS [13.089s]
[INFO] Apache Oozie Share Lib Streaming .................. SUCCESS [7.547s]
[INFO] Apache Oozie Share Lib Distcp ..................... SUCCESS [3.632s]
[INFO] Apache Oozie WebApp ............................... SUCCESS [51.899s]
[INFO] Apache Oozie Examples ............................. SUCCESS [11.077s]
[INFO] Apache Oozie Share Lib ............................ SUCCESS [13.236s]
[INFO] Apache Oozie Tools ................................ SUCCESS [17.439s]
[INFO] Apache Oozie MiniOozie ............................ SUCCESS [2.619s]
[INFO] Apache Oozie Distro ............................... SUCCESS [6:28.591s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13:26.335s
[INFO] Finished at: Wed Nov 27 14:57:56 UTC 2013
[INFO] Final Memory: 154M/742M
[INFO] ------------------------------------------------------------------------

Oozie distro created, DATE[2013.11.27-14:44:25GMT] VC-REV[unavailable], available at [/home/surajit.paul/oozie/oozie-4.0.0/distro/target]

Oozie Server Installation


Create a directory libext inside oozie home directory
Copy all dependent jar files including jdbc driver and ext-2.2.zip files in libextdirectory
Prepare oozie.war by executing the command as shown below

./bin/oozie-setup.sh prepare-war -hadoop 2.2.0 /home/hadoop -extjs /home/surajit.paul/oozie/oozie-4.0.0/libext/ext-2.2.zip -jars /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-codec-1.4.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-cli-1.2.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-beanutils-core-1.8.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-beanutils-1.7.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/avro-1.7.4.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-httpclient-3.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-digester-1.8.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-configuration-1.6.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-compress-1.4.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-collections-3.2.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-net-3.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-math-2.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-logging-1.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-lang-2.4.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-io-2.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-client-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-auth-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-annotations-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/guava-11.0.2.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-common-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-common-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-app-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-hdfs-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-api-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-shuffle-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-jobclient-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-core-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/jackson-core-asl-1.8.8.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-server-common-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-common-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-client-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/slf4j-log4j12-1.6.6.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/slf4j-api-1.6.6.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/protobuf-java-2.5.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/paranamer-2.3.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/log4j-1.2.16.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/jsr305-1.3.9.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/jetty-util-6.1.26.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/jackson-mapper-asl-1.8.8.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/zookeeper-3.4.5.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/xz-1.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/xmlenc-0.52.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/snappy-java-1.0.4.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/mysql-connector-java-5.1.27-bin.jar sharelib create -fs hdfs://172.31.0.56:9000/user/surajit.paul/ locallib /home/surajit.paul/oozie/oozie-4.0.0/sharelib/target/oozie-sharelib-4.0.0.tar.gz

On creation of oozie.war, following message should be displayed on the CLI:

[root@ip-172-31-0-56 oozie-4.0.0]# ./bin/oozie-setup.sh prepare-war -d /home/surajit.paul/oozie/oozie-4.0.0/libext
  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"

INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/avro-1.7.4.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-beanutils-1.7.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-beanutils-core-1.8.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-cli-1.2.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-codec-1.4.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-collections-3.2.1.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-compress-1.4.1.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-configuration-1.6.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-digester-1.8.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-httpclient-3.1.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-io-2.1.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-lang-2.4.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-logging-1.1.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-math-2.1.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-net-3.1.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/guava-11.0.2.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-annotations-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-auth-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-client-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-common-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-hdfs-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-app-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-common-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-core-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-jobclient-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-shuffle-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-api-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-client-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-common-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-server-common-2.2.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/jackson-core-asl-1.8.8.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/jackson-mapper-asl-1.8.8.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/jetty-util-6.1.26.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/jsr305-1.3.9.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/log4j-1.2.16.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/mysql-connector-java-5.1.27-bin.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/paranamer-2.3.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/protobuf-java-2.5.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/slf4j-api-1.6.6.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/slf4j-log4j12-1.6.6.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/snappy-java-1.0.4.1.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/xmlenc-0.52.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/xz-1.0.jar
INFO: Adding extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/zookeeper-3.4.5.jar

New Oozie WAR file with added 'ExtJS library, JARs' at /home/surajit.paul/oozie/oozie-4.0.0/distro/target/oozie-4.0.0-distro/oozie-4.0.0/oozie-server/webapps/oozie.war


INFO: Oozie is ready to be started

 

Oozie DB Creation


Oozie-site.xml must be modified to be pointed to the MySQL database on EMR cluster, as shown below –


<property>
    <name>oozie.db.schema.name</name>
    <value>oozie</value>
    <description>
        Oozie DataBase Name
    </description>
</property>
<property>
    <name>oozie.service.JPAService.create.db.schema</name>
    <value>true</value>
    <description>
        Creates Oozie DB.
        If set to true, it creates the DB schema if it does not exist.
        If the DB schema exists is a NOP. If set to false, it does not
        create the DB schema. If the DB schema does not exist it fails start up.
    </description>
</property>
<property>
    <name>oozie.service.JPAService.jdbc.driver</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>
        JDBC driver class.
    </description>
</property>
<property>
    <name>oozie.service.JPAService.jdbc.url</name>
    <value>jdbc:mysql://ip-172-31-0-56:3306/${oozie.db.schema.name}</value>
    <description>
        JDBC URL.
    </description>
</property>
<property>
    <name>oozie.service.JPAService.jdbc.username</name>
    <value>oozie</value>
    <description>
        DB user name.
    </description>
</property>
<property>
    <name>oozie.service.JPAService.jdbc.password</name>
    <value>oozie</value>
    <description>
        DB user password.
        IMPORTANT: if password is emtpy leave a 1 space string,
        the service trims the value,if empty Configuration assumes it is NULL.
    </description>
</property>




Create a database schema – oozie in the MySQL database server.
Create a user – oozie and password oozie with all database privileges
Execute command - ./bin/ooziedb.sh create -run DB Connection

System output on successful oozie table creation

[root@ip-172-31-0-56 oozie-4.0.0]# ./bin/ooziedb.sh create -run DB Connection
  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"

Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE
Set MySQL MEDIUMTEXT flag
DONE

Oozie DB has been created for Oozie version '4.0.0'

The SQL commands have been written to: /tmp/ooziedb-5014017589567251888.sql

Login in to MySQL using oozie user to ensure all tables are created successfully.
Execute mysql&gt; show tables;

(Oozie-4.0.0 has 12 tables while oozie-3.3.2 has 10 tables. All tables must be displayed.)

+------------------------+
| Tables_in_oozie        |
+------------------------+
| BUNDLE_ACTIONS         |
| BUNDLE_JOBS            |
| COORD_ACTIONS          |
| COORD_JOBS             |
| OOZIE_SYS              |
| OPENJPA_SEQUENCE_TABLE |
| SLA_EVENTS             |
| SLA_REGISTRATION       |
| SLA_SUMMARY            |
| VALIDATE_CONN          |
| WF_ACTIONS             |
| WF_JOBS                |
+------------------------+
12 rows in set (0.00 sec)


At this stage Tomcat server containing oozie.war should start successfully, and oozie jobs should run as scheduled for execution.

Wednesday, July 24, 2013

Oozie MySQL Integration Configuration

Follow the following steps to integrate Oozie-3.3.2 with MySQL database:

1. Copy MySQL driver - mysql-connector-java-5.1.22-bin.jar into -

${OOZIE_HOME}/libext
${OOZIE_HOME}/distro/target/oozie-3.3.2-distro/oozie-3.3.2/libtools
${OOZIE_HOME}/distro/target/oozie-3.3.2-distro/oozie-3.3.2/lib

2. Prepare oozie.war file using the command -


${OOZIE_HOME}/distro/target/oozie-3.3.2-distro/oozie-3.3.2/bin/oozie-setup.sh prepare-war -jars -extjs ${OOZIE_HOME}/webapp/src/main/webapp/ext-2.2/ext-2.2.zip


3. Copy the oozie.war file into tomcat server -


$cp distro/target/oozie-3.3.2-distro/oozie-3.3.2/oozie-server/webapps/oozie.war webapp/src/main/webapp/oozie.war


4. Create schema - oozie in MySQL database.


5. Update oozie-site.xml in ${OOZIE_HOME}/distro/target/oozie-3.3.2-distro/oozie-3.3.2/conf/oozie-site.xml with relevant details as shown below -





6. Execute the following command -


${OOZIE_HOME}/distro/target/oozie-3.3.2-distro/oozie-3.3.2/bin/ooziedb.sh create -run DB Connection


Output should be -
------------------------------------------------------------
setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
setting OOZIE_LOG=${OOZIE_HOME}/logs
setting OOZIE_LOG4J_FILE=oozie-log4j.properties
setting OOZIE_LOG4J_RELOAD=10
setting OOZIE_HTTP_PORT=11000


Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE


Oozie DB has been created for Oozie version '3.3.2'
------------------------------------------------------------
Following tables get created in oozie schema -
1.BUNDLE_ACTIONS
2.BUNDLE_JOBS
3.COORD_ACTIONS
4.COORD_JOBS
5.OOZIE_SYS
6.OPENJPA_SEQUENCE_TABLE
7.SLA_EVENTS
8.VALIDATE_CONN
9.WF_ACTIONS
10.WF_JOBS

Friday, June 28, 2013

Sqoop – The Artery of BigData Ecosystem

Apache Sqoop is essentially designed for efficiently transferring data between Hadoop ecosystem and structured data storage like any relational database management system, e.g. MySQL, Oracle, MS SQL, Postgre SQL, and DB2 etc. As an integral component of Hadoop ecosystem, Sqoop launches MapReduce job, which is extremely fault tolerant distributed parallel computing, to execute the tasks. Another essential benefit of Sqoop is its entirely automated process of transferring large volume of structured or semi-structured data.



Sqoop Data Flow

In this paper, we will explore various means of executing the tasks such as using Sqoop Command Line Interface (CLI) and using Java API. In this paper we will see how to import data from a RDBMS i.e. MySQL, manipulate the data in the Hadoop environment and export the manipulated data back to the RDBMS tables. We will use 12 years of Sensex data for indices BSE30 and BSE FMCG for this analysis.

Pre-requisites:

The experiment is executed on a Pseudo-distributed Hadoop ecosystem configured on the Red Hat Enterprise Linux 6 (RHEL6) OS. The detail discussion on the configuration of the relevant frameworks such as Hadoop, Zookeeper, HBase, Hive etc. are beyond the scope of this discussion. The pre-requisites are listed below –
Pseudo-distributed Hadoop-1.1.2 is installed and running.
MySQL- database is installed and running
HBase-0.94.5
Hive-0.10.0
Sqoop-1.4.3
Zookeeper-3.4.5
A shared metastore schema in MySQL 

Data Preparation in MySQL database:

SQL script to create table bse30 and bsefmcg in the database:

CREATE TABLE bse30 (
date DATE PRIMARY KEY NOT NULL,
index_name VARCHAR(20) NOT NULL,
open DOUBLE,
high DOUBLE,
low DOUBLE,
close DOUBLE);


CREATE TABLE bsefmcg (
date DATE PRIMARY KEY NOT NULL,
index_name VARCHAR(20) NOT NULL,
open DOUBLE,
high DOUBLE,
low DOUBLE,
close DOUBLE);

SQL script to load data in the tables:

LOAD DATA LOCAL INFILE '/home/surajit/hadoop/hivestuff/BSE30.csv'
INTO TABLE hadoop.bse30
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(@datVar, `index_name`, `open`, `high`, `low`, `close`)
SET `date` = STR_TO_DATE(@datVar, '%d-%b-%y');

LOAD DATA LOCAL INFILE '/home/surajit/hadoop/hivestuff/BSEFMCG.csv'
INTO TABLE hadoop.bsefmcg
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(@datVar, `index_name`, `open`, `high`, `low`, `close`)
SET `date` = STR_TO_DATE(@datVar, '%d-%b-%y');

A snapshot of bsefmcg table:
+------------+--------------+---------+---------+---------+---------+
| date       | index_name   | open    | high    | low     | close   |
+------------+--------------+---------+---------+---------+---------+
| 2012-12-03 | S&P BSE FMCG | 6030.46 | 6045.94 | 6001.85 | 6017.39 |
| 2012-12-04 | S&P BSE FMCG | 6021.03 | 6040.69 |  5992.6 | 6031.57 |
| 2012-12-05 | S&P BSE FMCG | 6059.01 |  6079.9 | 6030.04 | 6045.15 |
| 2012-12-06 | S&P BSE FMCG | 6071.92 | 6087.24 | 5994.67 |  6075.4 |
| 2012-12-07 | S&P BSE FMCG | 6081.08 |  6102.7 | 6041.48 | 6066.73 |
| 2012-12-10 | S&P BSE FMCG | 6054.81 | 6092.25 | 6046.52 | 6081.14 |
| 2012-12-11 | S&P BSE FMCG | 6090.31 |  6176.5 |  6083.6 | 6142.21 |
| 2012-12-12 | S&P BSE FMCG | 6148.52 | 6171.81 | 6113.93 | 6132.31 |
| 2012-12-13 | S&P BSE FMCG | 6124.06 | 6124.06 | 5960.12 | 5970.55 |
| 2012-12-14 | S&P BSE FMCG | 5950.67 | 5996.69 | 5913.77 |  5975.6 |
| 2012-12-17 | S&P BSE FMCG | 5973.57 |  5979.5 | 5929.84 | 5944.28 |
| 2012-12-18 | S&P BSE FMCG | 5938.92 | 5988.08 | 5897.83 | 5964.05 |
| 2012-12-19 | S&P BSE FMCG |  5962.3 | 5983.85 | 5928.04 | 5941.64 |
| 2012-12-20 | S&P BSE FMCG | 5942.52 | 5970.25 | 5914.15 |  5949.4 |
| 2012-12-21 | S&P BSE FMCG | 5929.25 | 5949.62 |  5883.6 | 5924.29 |
| 2012-12-24 | S&P BSE FMCG |  5956.1 | 5963.48 | 5910.51 | 5920.24 |
| 2012-12-26 | S&P BSE FMCG | 5923.69 | 5963.57 | 5900.83 | 5945.29 |
| 2012-12-27 | S&P BSE FMCG | 5971.29 | 5973.44 |  5903.4 | 5916.24 |
| 2012-12-28 | S&P BSE FMCG | 5907.14 | 5944.86 | 5907.14 | 5932.07 |
| 2012-12-31 | S&P BSE FMCG | 5939.92 | 5939.92 |  5902.9 | 5916.22 |
+------------+--------------+---------+---------+---------+---------+

Importing Data into HDFS:

Sqoop reads data from a table in a database row-by-row and uploads into HDFS file system. The import process is executed in parallel which uploads files in various formats such as binary Avro, Sequence file or delimited text file (e.g. comma or tab separated), as per the configured parameters. In the import process, a Java class gets generated as a by-product which can parse a text file and serialize or de-serialize a sequence or Avro file.  

  1. Using Sqoop CLI 
Sqoop command line script for data import execution for HDFS file system:

$ bin/sqoop import --connect jdbc:mysql://localhost:3306/hadoop --driver com.mysql.jdbc.Driver --username root --password root --table bse30

$ bin/sqoop import --connect jdbc:mysql://localhost:3306/hadoop --driver com.mysql.jdbc.Driver --username root --password root --table bsefmcg


  1. Using Java API
package hadoop.sqoop;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.sqoop.tool.ImportTool;
import com.cloudera.sqoop.SqoopOptions;
import com.cloudera.sqoop.tool.ExportTool;

public class HDFSImportSqooper {
   
      /*
             * Logger instance
             */
            Log log = LogFactory.getLog(HDFSSqooper.class);

            /* CONSTANTS */
            private static final String JOB_NAME = "Sqoop HDFS Job";
            private static final String MAPREDUCE_JOB = "HDFS Map Reduce Job";
            private static final String DBURL = "jdbc:mysql://localhost:3306/hadoop";
            private static final String DRIVER = "com.mysql.jdbc.Driver";
            private static final String USERNAME = "root";
            private static final String PASSWORD = "root";        
            private static final String HADOOP_HOME = "/home/surajit/hadoop/hadoop-1.1.2";
            private static final String HDFS_DIR = "/user/surajit/";
            private static final String JAR_OUTPUT_DIR = "/home/surajit/tmp/sqoop-surajit/compile";
            private static final String TARGET_DIR = "hdfs://localhost:9000/user/surajit/";
            private static final String SUCCESS = "SUCCESS !!!";
            private static final String FAIL = "FAIL !!!";
            
            /**
             * @param args
             */
            public static void main(String[] args) {
                        HDFSImportSqooper js = new HDFSImportSqooper();
                        js.exportFromHadoop("bsefmcgp");              
            }



      /**
             * Imports data from RDBMS MySQL and uploads into Hadoop file system
             * @param RDBMS Table Name
             */
            public void importToHadoop(String table){
                        log.info(“Importing data into HDFS …....”);
                        SqoopOptions options = new SqoopOptions(DBURL,table);
                        options.setDriverClassName(DRIVER);
                        options.setUsername(USERNAME);
                        options.setPassword(PASSWORD);
                        options.setFieldsTerminatedBy('\t');
                        options.setHadoopMapRedHome(HADOOP_HOME);
                        options.setJobName(JOB_NAME);
                        options.setLinesTerminatedBy('\n');
                        options.setMapreduceJobName(MAPREDUCE_JOB);
                        options.setTableName(table);
                        options.setJarOutputDir(JAR_OUTPUT_DIR);
                        options.setTargetDir(TARGET_DIR + table);
                       
                        ImportTool it = new ImportTool();
                        int retVal = it.run(options);
                        if(retVal == 0){
                                    log.info(SUCCESS);
                        }
                        else{
                                    log.info(FAIL);
                        }
            }
}

Snapshot of imported data in HDFS file system -

-rw-r--r-- 1 surajit supergroup 0 2013-05-29 11:29 /user/surajit/bse30/_SUCCESS
-rw-r--r-- 1 surajit supergroup 46846 2013-05-29 11:29 /user/surajit/bse30/part-m-00000
-rw-r--r-- 1 surajit supergroup 47513 2013-05-29 11:29 /user/surajit/bse30/part-m-00001
-rw-r--r-- 1 surajit supergroup 49084 2013-05-29 11:29 /user/surajit/bse30/part-m-00002
-rw-r--r-- 1 surajit supergroup 49950 2013-05-29 11:29 /user/surajit/bse30/part-m-00003


Pig script to manipulate the data in HDFS -

B30 = LOAD 'bsefmcg' using PigStorage('\t') as (histdate:chararray, index_name:chararray, open:double, high:double, low:double, close:double);
B31 = foreach B30 generate histdate, index_name, ROUND(open), ROUND(high), ROUND(low), ROUND(close);
B32 = foreach B31 generate SUBSTRING(histdate, 0, 7), index_name, $2, $3, $4, $5;
B33 = GROUP B32 BY ($0, $1);
B34 = FOREACH B33 GENERATE group, AVG(B32.$2), AVG(B32.$3), AVG(B32.$4), AVG(B32.$5);
B35 = FOREACH B34 GENERATE FLATTEN($0), ROUND($1), ROUND($2), ROUND($3), ROUND($4);
STORE B35 INTO 'bsefmcgp' USING PigStorage(',');

Exporting Data from HDFS:

  1. Using Sqoop CLI 
Sqoop command line script for data export execution for HDFS file system:

$ bin/sqoop export --connect jdbc:mysql://localhost:3306/hadoop --driver com.mysql.jdbc.Driver --username root --password root --table bse30 --export-dir /user/surajit/bse30

$ bin/sqoop export --connect jdbc:mysql://localhost:3306/hadoop --driver com.mysql.jdbc.Driver --username root --password root --table bsefmcg --export-dir /user/surajit/bsefmcg


  1. Using Java API 
package hadoop.sqoop;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.sqoop.tool.ImportTool;
import com.cloudera.sqoop.SqoopOptions;
import com.cloudera.sqoop.tool.ExportTool;

public class HDFSExportSqooper {
   
      /**
             * Logger instance
             */
            Log log = LogFactory.getLog(HDFSSqooper.class);

            /* CONSTANTS */
            private static final String JOB_NAME = "Sqoop HDFS Job";
            private static final String MAPREDUCE_JOB = "HDFS Map Reduce Job";
            private static final String DBURL = "jdbc:mysql://localhost:3306/hadoop";
            private static final String DRIVER = "com.mysql.jdbc.Driver";
            private static final String USERNAME = "root";
            private static final String PASSWORD = "root";        
            private static final String HADOOP_HOME = "/home/surajit/hadoop/hadoop-1.1.2";
            private static final String HDFS_DIR = "/user/surajit/";
            private static final String JAR_OUTPUT_DIR = "/home/surajit/tmp/sqoop-surajit/compile";
            private static final String TARGET_DIR = "hdfs://localhost:9000/user/surajit/";
            private static final String SUCCESS = "SUCCESS !!!";
            private static final String FAIL = "FAIL !!!";

            /**
             * @param args
             */
            public static void main(String[] args) {
                        HDFSExportSqooper js = new HDFSExportSqooper();
                        js.exportFromHadoop("bsefmcgp");              
      }
   
      /**
             * Exports data from the HDFS file system to the RDBMS / MySQL schema
             * @param RDBMS Table Name
             */
            public void exportFromHadoop(String table){
                log.info("Exporting data from HDFS .....");
                String outCols[] = {"month","index_name","open","high", "low", "close"}; // RDBMS table columns in order
                SqoopOptions options = new SqoopOptions(DBURL,table);
                options.setDriverClassName(DRIVER);
                options.setUsername(USERNAME);
                options.setPassword(PASSWORD);
                options.setFieldsTerminatedBy(',');
                options.setHadoopMapRedHome(HADOOP_HOME);
                options.setJobName(JOB_NAME);
                options.setLinesTerminatedBy('\n');
                options.setMapreduceJobName(MAPREDUCE_JOB);
                options.setTableName(table);
                options.setJarOutputDir(JAR_OUTPUT_DIR);
                options.setClearStagingTable(true);
                options.setExportDir(HDFS_DIR + table);
                options.setDbOutputColumns(outCols);
                options.setFieldsTerminatedBy(',');
                options.setUpdateMode(SqoopOptions.UpdateMode.AllowInsert);
                       
                ExportTool it = new ExportTool();
                    int retVal = it.run(options);
                    if(retVal == 0){
                           log.info(SUCCESS);
                    }
                    else{
                           log.info(FAIL);
                   }
            }
        }

Importing Data into HIVE:

  1. Using Sqoop CLI 
Sqoop command line script for data import execution into HIVE file system:

$ bin/sqoop create-hive-table --connect jdbc:mysql://localhost:3306/hadoop --driver com.mysql.jdbc.Driver --username root --P --table bse30

$ bin/sqoop create-hive-table --connect jdbc:mysql://localhost:3306/hadoop --driver com.mysql.jdbc.Driver --username root --P --table bsefmcg


  1. Using Java API 
package hadoop.sqoop;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.sqoop.tool.ImportTool;
import com.cloudera.sqoop.SqoopOptions;
import com.cloudera.sqoop.tool.ExportTool;

public class HiveImportSqooper {
                 
     /*
      * Logger instance
      */
      Log log = LogFactory.getLog(HiveSqooper.class);
   
      /* CONSTANTS */
            private static final String JOB_NAME = "Sqoop Hive Job";
            private static final String MAPREDUCE_JOB = "Hive Map Reduce Job";
            private static final String DBURL = "jdbc:mysql://localhost:3306/hadoop";
            private static final String DRIVER = "com.mysql.jdbc.Driver";
            private static final String USERNAME = "root";
            private static final String PASSWORD = "root";                    
            private static final String HADOOP_HOME = "/home/surajit/hadoop/hadoop-1.1.2";
            private static final String JAR_OUTPUT_DIR = "/home/surajit/tmp/sqoop-surajit/compile";
            private static final String HIVE_HOME = "/home/surajit/hadoop/hive-0.10.0";
            private static final String HIVE_DIR = "/user/hive/warehouse/";
            private static final String WAREHOUSE_DIR = "hdfs://localhost:9000/user/hive/warehouse";
            private static final String SUCCESS = "SUCCESS !!!";
            private static final String FAIL = "FAIL !!!";
            static SqoopOptions options = new SqoopOptions();
   
      /**
       * @param args
       */
       public static void main(String[] args) {    
             HiveImportSqooper js = new HiveImportSqooper();
             js.exportFromHive("bsefmcg");         
       }

      /**
             * Imports data from RDBMS MySQL and uploads into Hive environment
             */       
            public void importToHive(String table){
                       
                        System.out.println("SqoopOptions loading .....");                   
                       
                        /* MySQL connection parameters */
                        options.setConnectString(DBURL);
                        options.setTableName(table);
                        options.setDriverClassName(DRIVER);
                        options.setUsername(USERNAME);
                        options.setPassword(PASSWORD);
                        options.setHadoopMapRedHome(HADOOP_HOME);
                       
                        /* Hive connection parameters */
                        options.setHiveHome(HIVE_HOME);
                        options.setHiveImport(true);
                        options.setHiveTableName("bsefmcgh");
                        options.setOverwriteHiveTable(true);
                        options.setFailIfHiveTableExists(false);
                        options.setFieldsTerminatedBy(',');
                        options.setOverwriteHiveTable(true);
                        options.setDirectMode(true);
                        options.setNumMappers(1);     // No of Mappers to be launched for the job
                       
                        options.setWarehouseDir(WAREHOUSE_DIR);
                        options.setJobName(JOB_NAME);
                        options.setMapreduceJobName(MAPREDUCE_JOB);
                        options.setTableName(table);
                        options.setJarOutputDir(JAR_OUTPUT_DIR);                
                       
                        System.out.println("Import Tool running ....");
                        ImportTool it = new ImportTool();
                        int retVal = it.run(options);
                        if(retVal == 0){
                                    log.info(SUCCESS);
                        }
                        else{
                                    log.info(FAIL);
                        }
                   }
            }

Exporting Data from HIVE:

  1. Using Sqoop CLI 
Sqoop command line script for data export execution from HIVE file system:

$ bin/sqoop export --connect jdbc:mysql://localhost:3306/hadoop --driver com.mysql.jdbc.Driver --username root --password root --table bse30 --export-dir /user/hive/warehouse/bse30

$ bin/sqoop export --connect jdbc:mysql://localhost:3306/hadoop --driver com.mysql.jdbc.Driver --username root --password root --table bsefmcg --export-dir /user/hive/warehouse/bsefmcg

  1. Using Java API 
package hadoop.sqoop;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.sqoop.tool.ImportTool;
import com.cloudera.sqoop.SqoopOptions;
import com.cloudera.sqoop.tool.ExportTool;

public class HiveExportSqooper {
                 
     /*
      * Logger instance
      */
      Log log = LogFactory.getLog(HiveSqooper.class);
   
      /* CONSTANTS */
            private static final String JOB_NAME = "Sqoop Hive Job";
            private static final String MAPREDUCE_JOB = "Hive Map Reduce Job";
            private static final String DBURL = "jdbc:mysql://localhost:3306/hadoop";
            private static final String DRIVER = "com.mysql.jdbc.Driver";
            private static final String USERNAME = "root";
            private static final String PASSWORD = "root";                    
            private static final String HADOOP_HOME = "/home/surajit/hadoop/hadoop-1.1.2";
            private static final String JAR_OUTPUT_DIR = "/home/surajit/tmp/sqoop-surajit/compile";
            private static final String HIVE_HOME = "/home/surajit/hadoop/hive-0.10.0";
            private static final String HIVE_DIR = "/user/hive/warehouse/";
            private static final String WAREHOUSE_DIR = "hdfs://localhost:9000/user/hive/warehouse";
            private static final String SUCCESS = "SUCCESS !!!";
            private static final String FAIL = "FAIL !!!";
      static SqoopOptions options = new SqoopOptions();
   
      /**
       * @param args
       */
     public static void main(String[] args) {    
           HiveExportSqooper js = new HiveExportSqooper();
           js.exportFromHive("bsefmcg");         
       }
   
      /**
             * Exports data from the Hive file system to the RDBMS / MySQL schema
             * @param RDBMS Table Name
             */
            public void exportFromHive(String table){
                       
                log.info("Exporting data from HIVE.....");
                       
                String outCols[] = {"month","index_name","open","high", "low", "close"};  // RDBMS table columns in order
                                               
                /* MySQL connection parameters */
                SqoopOptions options = new SqoopOptions(DBURL,table);
                options.setDriverClassName(DRIVER);
                options.setUsername(USERNAME);
                options.setPassword(PASSWORD);
                       
                /* Hive connection parameters */
                options.setFieldsTerminatedBy(',');
                options.setHadoopMapRedHome(HADOOP_HOME);
                options.setJobName(JOB_NAME);
                options.setLinesTerminatedBy('\n');
                options.setMapreduceJobName(MAPREDUCE_JOB);
                options.setTableName(table);
                options.setJarOutputDir(JAR_OUTPUT_DIR);
                options.setClearStagingTable(true);
                options.setExportDir(HIVE_DIR + table);
                options.setDbOutputColumns(outCols);
                options.setFieldsTerminatedBy(',');
                options.setUpdateMode(SqoopOptions.UpdateMode.AllowInsert);
                       
                ExportTool it = new ExportTool();
                    int retVal = it.run(options);
                    if(retVal == 0){
                               log.info(SUCCESS);
                    }
                    else{
                               log.info(FAIL);
                    }
                }
           }

Importing Data into HBase:

  1. Using Sqoop CLI 
$ bin/sqoop import --connect jdbc:mysql://localhost:3306/hadoop --driver com.mysql.jdbc.Driver --username root --P --table bse30 --hbase-row-key date --hbase-create-table --column-family firstdecade --hbase-table bse30

$ bin/sqoop import --connect jdbc:mysql://localhost:3306/hadoop --driver com.mysql.jdbc.Driver --username root --P --table bsefmcg --hbase-row-key date --hbase-create-table --column-family firstdecade --hbase-table bsefmc
g

  1. Using Java API 
      package hadoop.sqoop;
     import org.apache.log4j.Logger;
     import org.apache.sqoop.tool.ImportTool;
     import com.cloudera.sqoop.SqoopOptions;
     /**
      *
      * @author Surajit Paul
      *
      */
     public class HBaseImportSqooper {
     /*
      * Logger instance
      */
     Logger log = Logger.getLogger(HBaseSqooper.class);


     /* CONSTANTS */
     private static final String JOB_NAME = "Sqoop HBase Job";
     private static final String MAPREDUCE_JOB = "HBase Map Reduce Job";
     private static final String DBURL = "jdbc:mysql://localhost:3306/hadoop";
     private static final String DRIVER = "com.mysql.jdbc.Driver";
     private static final String USERNAME = "root";
     private static final String PASSWORD = "root";                    
     private static final String HADOOP_HOME = "/home/surajit/hadoop/hadoop-1.1.2";
     private static final String JAR_OUTPUT_DIR = "/home/surajit/tmp/sqoop-surajit/compile";
     private static final String SUCCESS = "SUCCESS !!!";
     private static final String FAIL = "FAIL !!!";



     /**
      * @param args
      */
     public static void main(String[] args) {
            HBaseImportSqooper js = new HBaseImportSqooper();
            js.importToHbase("bsefmcg");          
     }

     /**
      *
      * @param table
      */
     public void importToHbase(String table){
           int retVal = 0;
           SqoopOptions options = new SqoopOptions(DBURL,table);
           options.setDriverClassName(DRIVER);
           options.setUsername(USERNAME);
           options.setPassword(PASSWORD);
           options.setHadoopMapRedHome(HADOOP_HOME);
           options.setHBaseTable(table);
           options.setHBaseRowKeyColumn("date");
           options.setHBaseColFamily("firstdecade");                 
           options.setJobName(JOB_NAME);                   
           options.setMapreduceJobName(MAPREDUCE_JOB);
           options.setTableName(table);
           options.setJarOutputDir(JAR_OUTPUT_DIR);
           options.setCreateHBaseTable(true);
           options.setDirectMode(true);
           
           ImportTool it = new ImportTool();

                retVal = it.run(options);
                if(retVal == 0){
                            log.info(SUCCESS);
                }
                else{
                            log.info(FAIL);
                }
            }
      }


Exporting Data from HBase:

When --hbase-table parameter is specified in import execution, Sqoop imports the data into the HBase table instead of HDFS directory. Sqoop serializes all values to HBase by converting each value to its string representation, and then inserts UTF-8 bytes of this string into the target cell in the HBase table. While exporting the data from HBase to HDFS, Sqoop creates Sequence Files. One of the limitation with use of Sequence Files is that there is no generic approach to access data in the Sequence File. Access to the Writable class is required that was used to write the data, as Sqoop code-generates the class file. This introduces a serious problem: when the Sqoop is upgraded to a newer version, the code-generated class file won't be able to access the Sequence File, created by the older version of Sqoop. Due to this limitation, its not recommended to use HBase Sequence Files for export operation using Sqoop. There are certain other means of exporting HBase tables to RDBMS tables, this is beyond the scope of this article.


Sqoop Export - Happy Path

13/06/02 17:02:30 INFO sqoop.HDFSSqooper: Exporting data .....
13/06/02 17:02:30 WARN sqoop.ConnFactory: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
13/06/02 17:02:30 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
13/06/02 17:02:30 INFO manager.SqlManager: Using default fetchSize of 1000
13/06/02 17:02:30 INFO tool.CodeGenTool: Beginning code generation
13/06/02 17:02:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM bsefmcgp AS t WHERE 1=0
13/06/02 17:02:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM bsefmcgp AS t WHERE 1=0
13/06/02 17:02:30 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/surajit/hadoop/hadoop-1.1.2
Note: /home/surajit/tmp/sqoop-surajit/compile/bsefmcgp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/02 17:02:31 INFO orm.CompilationManager: Writing jar file: /home/surajit/tmp/sqoop-surajit/compile/bsefmcgp.jar
13/06/02 17:02:31 INFO mapreduce.ExportJobBase: Beginning export of bsefmcgp
13/06/02 17:02:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM bsefmcgp AS t WHERE 1=0
13/06/02 17:02:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/06/02 17:02:38 INFO input.FileInputFormat: Total input paths to process : 1
13/06/02 17:02:38 INFO input.FileInputFormat: Total input paths to process : 1
13/06/02 17:02:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/06/02 17:02:38 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/02 17:02:38 INFO mapred.JobClient: Running job: job_201305311424_0064
13/06/02 17:02:39 INFO mapred.JobClient: map 0% reduce 0%
13/06/02 17:02:53 INFO mapred.JobClient: map 50% reduce 0%
13/06/02 17:02:56 INFO mapred.JobClient: map 100% reduce 0%
13/06/02 17:02:56 INFO mapred.JobClient: Job complete: job_201305311424_0064
13/06/02 17:02:56 INFO mapred.JobClient: Counters: 18
13/06/02 17:02:56 INFO mapred.JobClient: Job Counters
13/06/02 17:02:56 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=12684
13/06/02 17:02:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/06/02 17:02:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/06/02 17:02:56 INFO mapred.JobClient: Launched map tasks=4
13/06/02 17:02:56 INFO mapred.JobClient: Data-local map tasks=4
13/06/02 17:02:56 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/06/02 17:02:56 INFO mapred.JobClient: File Output Format Counters
13/06/02 17:02:56 INFO mapred.JobClient: Bytes Written=0
13/06/02 17:02:56 INFO mapred.JobClient: FileSystemCounters
13/06/02 17:02:56 INFO mapred.JobClient: HDFS_BYTES_READ=13333
13/06/02 17:02:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=245728
13/06/02 17:02:56 INFO mapred.JobClient: File Input Format Counters
13/06/02 17:02:56 INFO mapred.JobClient: Bytes Read=0
13/06/02 17:02:56 INFO mapred.JobClient: Map-Reduce Framework
13/06/02 17:02:56 INFO mapred.JobClient: Map input records=156
13/06/02 17:02:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=181866496
13/06/02 17:02:56 INFO mapred.JobClient: Spilled Records=0
13/06/02 17:02:56 INFO mapred.JobClient: CPU time spent (ms)=2500
13/06/02 17:02:56 INFO mapred.JobClient: Total committed heap usage (bytes)=26083328
13/06/02 17:02:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3287945216
13/06/02 17:02:56 INFO mapred.JobClient: Map output records=156
13/06/02 17:02:56 INFO mapred.JobClient: SPLIT_RAW_BYTES=464
13/06/02 17:02:56 INFO mapreduce.ExportJobBase: Transferred 13.0205 KB in 24.8137 seconds (537.3249 bytes/sec)
13/06/02 17:02:56 INFO mapreduce.ExportJobBase: Exported 156 records.
13/06/02 17:02:56 INFO sqoop.HDFSSqooper: SUCCESS!!!

Reference: