Oozie is an extensible, scalable and reliable system to
define, manage, schedule, and execute complex Hadoop workloads via web
services, Java API and Oozie CLI.
·
XML-based declarative framework to specify a
job or a complex workflow of dependent jobs.
·
Support different types of job such as Hadoop
Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications.
·
Workflow scheduling based on frequency and/or
data availability.
·
Monitoring capability, automatic retry and
failure handing of jobs.
·
Extensible and pluggable architecture to
allow arbitrary grid programming paradigms.
·
Authentication, authorization, and
capacity-aware load throttling to allow multi-tenant software as a service.
Oozie is a server based Workflow Engine specialized in
running workflow jobs with actions that run Hadoop Map/Reduce and Pig
jobs.Oozie is a Java Web-Application that runs in a Java servlet-container
(Tomcat).For the purposes of Oozie, a workflow is a collection of actions (i.e.
Hadoop Map/Reduce jobs, Pig jobs, Hive jobs) arranged in a control dependency
DAG (Direct Acyclic Graph). "control dependency" from one action to
another means that the second action can't run until the first action has
completed.Oozie workflows definitions are written in hPDL (a XML Process
Definition Language similar to JBOSS JBPM jPDL).Oozie workflow actions start
jobs in remote systems (i.e. Hadoop, Pig, Hive, Sqoop etc.). Upon action
completion, the remote systems callback Oozie to notify the action completion,
at this point Oozie proceeds to the next action in the workflow.
Oozie workflows contain control flow nodes and action
nodes.Control flow nodes define the beginning and the end of a workflow (start,
end and fail nodes) and provide a mechanism to control the workflow execution
path (decision, fork and join nodes).Action nodes are the mechanism by which a
workflow triggers the execution of a computation/processing task. Oozie
provides support for different types of actions: Hadoop map-reduce, Hadoop file
system, Pig, SSH, HTTP, eMail and Oozie sub-workflow. Oozie can be extended to
support additional type of actions.Oozie workflows can be parameterized (using
variables like ${inputDir} within the workflow definition). When submitting a
workflow job values for the parameters must be provided. If properly
parameterized (i.e. using different output directories) several identical
workflow jobs can concurrently.
Prerequisite Software for Oozie-4.0.0
- · Java JDK-1.7
- · Maven-3.1.1
- · Hadoop-2.2.0
- · Pig-0.11.1
- · Oozie-4.0.0
(Download
oozie-4.0.0 from http://www.apache.org/dyn/closer.cgi/oozie/4.0.0)
java,
javac and mvn must be in the command path.
Building Oozie Distribution
Update properties in pom.xml for the corresponding software
<properties>
<javaVersion>1.7</javaVersion>
<targetJavaVersion>1.7</targetJavaVersion>
<!-- to be able to run a single test case from the main project -->
<failIfNoTests>false</failIfNoTests>
<test.timeout>5400</test.timeout>
<test.exclude>_</test.exclude>
<test.exclude.pattern>_</test.exclude.pattern>
<oozie.test.dir>${project.build.directory}/test-data</oozie.test.dir>
<oozie.test.forkMode>once</oozie.test.forkMode>
<hadoop.version>2.2.0</hadoop.version>
<hbase.version>0.94.7</hbase.version>
<hcatalog.version>0.5.0</hcatalog.version>
<hadooplib.version>${hadoop.version}.oozie-${project.version}</hadooplib.version>
<hbaselib.version>${hbase.version}.oozie-${project.version}</hbaselib.version>
<hcataloglib.version>${hcatalog.version}.oozie-${project.version}</hcataloglib.version>
<clover.license>/home/jenkins/tools/clover/latest/lib/clover.license</clover.license>
<!--This is required while we support a a pre 0.23 version of Hadoop which does not have
the hadoop-auth artifact. After we phase-out pre 0.23 we can get rid of this property.-->
<hadoop.auth.version>2.2.0</hadoop.auth.version>
<!-- Sharelib component versions -->
<hive.version>0.11.0</hive.version>
<pig.version>0.11.1</pig.version>
<pig.classifier></pig.classifier>
<sqoop.version>1.4.3</sqoop.version>
<sqoop.classifier>hadoop100</sqoop.classifier>
<streaming.version>${hadoop.version}</streaming.version>
<distcp.version>${hadooplib.version}</distcp.version>
<!-- Tomcat version -->
<tomcat.version>6.0.36</tomcat.version>
<openjpa.version>2.2.2</openjpa.version>
</properties>
Execute
bin/mkdistro.sh
from command prompt. If all prerequisites are installed properly, oozie gets
built successfully with all dependencies.
System outputs on successful oozie build:
[INFO]
Building tar : /home/surajit.paul/oozie/oozie-4.0.0/distro/target/oozie-4.0.0-distro.tar.gz
[INFO]
------------------------------------------------------------------------
[INFO]
Reactor Summary:
[INFO]
[INFO]
Apache Oozie Main ................................. SUCCESS [4.450s]
[INFO]
Apache Oozie Client ............................... SUCCESS [51.036s]
[INFO]
Apache Oozie Hadoop 1.1.1.oozie-4.0.0 ............. SUCCESS [2.506s]
[INFO]
Apache Oozie Hadoop Distcp 1.1.1.oozie-4.0.0 ...... SUCCESS [0.232s]
[INFO]
Apache Oozie Hadoop 1.1.1.oozie-4.0.0 Test ........ SUCCESS [0.447s]
[INFO]
Apache Oozie Hadoop 2.2.0.oozie-4.0.0 ............. SUCCESS [7.550s]
[INFO]
Apache Oozie Hadoop 2.2.0.oozie-4.0.0 Test ........ SUCCESS [0.562s]
[INFO]
Apache Oozie Hadoop Distcp 2.2.0.oozie-4.0.0 ...... SUCCESS [0.289s]
[INFO]
Apache Oozie Hadoop 0.23.5.oozie-4.0.0 ............ SUCCESS [6.928s]
[INFO]
Apache Oozie Hadoop 0.23.5.oozie-4.0.0 Test ....... SUCCESS [0.818s]
[INFO]
Apache Oozie Hadoop Distcp 0.23.5.oozie-4.0.0 ..... SUCCESS [0.322s]
[INFO]
Apache Oozie Hadoop Libs .......................... SUCCESS [6.534s]
[INFO]
Apache Oozie Hbase 0.94.2.oozie-4.0.0 ............. SUCCESS [0.785s]
[INFO]
Apache Oozie Hbase Libs ........................... SUCCESS [1.315s]
[INFO]
Apache Oozie HCatalog 0.5.0.oozie-4.0.0 ........... SUCCESS [1.802s]
[INFO] Apache
Oozie HCatalog 0.6.0.oozie-4.0.0 ........... SUCCESS [1.709s]
[INFO]
Apache Oozie HCatalog Libs ........................ SUCCESS [2.299s]
[INFO]
Apache Oozie Share Lib Oozie ...................... SUCCESS [9.163s]
[INFO]
Apache Oozie Share Lib HCatalog ................... SUCCESS [13.288s]
[INFO]
Apache Oozie Core ................................. SUCCESS [1:50.476s]
[INFO]
Apache Oozie Docs ................................. SUCCESS [28.831s]
[INFO]
Apache Oozie Share Lib Pig ........................ SUCCESS [18.892s]
[INFO]
Apache Oozie Share Lib Hive ....................... SUCCESS [23.419s]
[INFO]
Apache Oozie Share Lib Sqoop ...................... SUCCESS [13.089s]
[INFO]
Apache Oozie Share Lib Streaming .................. SUCCESS [7.547s]
[INFO]
Apache Oozie Share Lib Distcp ..................... SUCCESS [3.632s]
[INFO]
Apache Oozie WebApp ............................... SUCCESS [51.899s]
[INFO]
Apache Oozie Examples ............................. SUCCESS [11.077s]
[INFO]
Apache Oozie Share Lib ............................ SUCCESS [13.236s]
[INFO]
Apache Oozie Tools ................................ SUCCESS [17.439s]
[INFO]
Apache Oozie MiniOozie ............................ SUCCESS [2.619s]
[INFO]
Apache Oozie Distro ............................... SUCCESS [6:28.591s]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD
SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total
time: 13:26.335s
[INFO]
Finished at: Wed Nov 27 14:57:56 UTC 2013
[INFO] Final
Memory: 154M/742M
[INFO]
------------------------------------------------------------------------
Oozie distro
created, DATE[2013.11.27-14:44:25GMT] VC-REV[unavailable], available at
[/home/surajit.paul/oozie/oozie-4.0.0/distro/target]
Oozie Server Installation
Copy all dependent jar files including jdbc driver and ext-2.2.zip files in libextdirectory
Prepare oozie.war by executing the command as shown below
./bin/oozie-setup.sh
prepare-war -hadoop 2.2.0 /home/hadoop -extjs
/home/surajit.paul/oozie/oozie-4.0.0/libext/ext-2.2.zip -jars
/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-codec-1.4.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-cli-1.2.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-beanutils-core-1.8.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-beanutils-1.7.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/avro-1.7.4.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-httpclient-3.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-digester-1.8.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-configuration-1.6.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-compress-1.4.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-collections-3.2.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-net-3.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-math-2.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-logging-1.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-lang-2.4.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-io-2.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-client-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-auth-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-annotations-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/guava-11.0.2.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-common-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-common-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-app-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-hdfs-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-api-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-shuffle-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-jobclient-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-core-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/jackson-core-asl-1.8.8.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-server-common-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-common-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-client-2.2.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/slf4j-log4j12-1.6.6.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/slf4j-api-1.6.6.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/protobuf-java-2.5.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/paranamer-2.3.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/log4j-1.2.16.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/jsr305-1.3.9.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/jetty-util-6.1.26.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/jackson-mapper-asl-1.8.8.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/zookeeper-3.4.5.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/xz-1.0.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/xmlenc-0.52.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/snappy-java-1.0.4.1.jar:/home/surajit.paul/oozie/oozie-4.0.0/libext/mysql-connector-java-5.1.27-bin.jar
sharelib create -fs hdfs://172.31.0.56:9000/user/surajit.paul/ locallib
/home/surajit.paul/oozie/oozie-4.0.0/sharelib/target/oozie-sharelib-4.0.0.tar.gz
On
creation of oozie.war, following message should be displayed on the CLI:
[root@ip-172-31-0-56
oozie-4.0.0]# ./bin/oozie-setup.sh prepare-war -d
/home/surajit.paul/oozie/oozie-4.0.0/libext
setting CATALINA_OPTS="$CATALINA_OPTS
-Xmx1024m"
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/avro-1.7.4.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-beanutils-1.7.0.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-beanutils-core-1.8.0.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-cli-1.2.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-codec-1.4.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-collections-3.2.1.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-compress-1.4.1.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-configuration-1.6.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-digester-1.8.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/commons-httpclient-3.1.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-io-2.1.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-lang-2.4.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-logging-1.1.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-math-2.1.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/commons-net-3.1.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/guava-11.0.2.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-annotations-2.2.0.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-auth-2.2.0.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-client-2.2.0.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-common-2.2.0.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-hdfs-2.2.0.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-app-2.2.0.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-common-2.2.0.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-core-2.2.0.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-jobclient-2.2.0.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-mapreduce-client-shuffle-2.2.0.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-api-2.2.0.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-client-2.2.0.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-common-2.2.0.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/hadoop-yarn-server-common-2.2.0.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/jackson-core-asl-1.8.8.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/jackson-mapper-asl-1.8.8.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/jetty-util-6.1.26.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/jsr305-1.3.9.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/log4j-1.2.16.jar
INFO: Adding
extension:
/home/surajit.paul/oozie/oozie-4.0.0/libext/mysql-connector-java-5.1.27-bin.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/paranamer-2.3.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/protobuf-java-2.5.0.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/slf4j-api-1.6.6.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/slf4j-log4j12-1.6.6.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/snappy-java-1.0.4.1.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/xmlenc-0.52.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/xz-1.0.jar
INFO: Adding
extension: /home/surajit.paul/oozie/oozie-4.0.0/libext/zookeeper-3.4.5.jar
New Oozie
WAR file with added 'ExtJS library, JARs' at
/home/surajit.paul/oozie/oozie-4.0.0/distro/target/oozie-4.0.0-distro/oozie-4.0.0/oozie-server/webapps/oozie.war
INFO: Oozie
is ready to be started
Oozie DB Creation
Oozie-site.xml must be modified to be pointed to
the MySQL database on EMR cluster, as shown below –
<property>
<name>oozie.db.schema.name</name>
<value>oozie</value>
<description>
Oozie DataBase Name
</description>
</property>
<property>
<name>oozie.service.JPAService.create.db.schema</name>
<value>true</value>
<description>
Creates Oozie DB.
If set to true, it creates the DB schema if it does not exist.
If the DB schema exists is a NOP. If set to false, it does not
create the DB schema. If the DB schema does not exist it fails start up.
</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
<description>
JDBC driver class.
</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://ip-172-31-0-56:3306/${oozie.db.schema.name}</value>
<description>
JDBC URL.
</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>oozie</value>
<description>
DB user name.
</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>oozie</value>
<description>
DB user password.
IMPORTANT: if password is emtpy leave a 1 space string,
the service trims the value,if empty Configuration assumes it is NULL.
</description>
</property>
Create a database schema – oozie in the MySQL database server.
Create a user – oozie and password oozie with all database privileges
Execute command - ./bin/ooziedb.sh create -run DB Connection
System
output on successful oozie table creation
[root@ip-172-31-0-56
oozie-4.0.0]# ./bin/ooziedb.sh create -run DB Connection
setting CATALINA_OPTS="$CATALINA_OPTS
-Xmx1024m"
Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not
exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE
Set MySQL MEDIUMTEXT flag
DONE
Oozie DB has been created for Oozie
version '4.0.0'
The SQL commands have been written
to: /tmp/ooziedb-5014017589567251888.sql
Execute mysql> show tables;
(Oozie-4.0.0 has 12 tables
while oozie-3.3.2 has 10 tables. All tables must be displayed.)
+------------------------+
| Tables_in_oozie |
+------------------------+
| BUNDLE_ACTIONS |
| BUNDLE_JOBS |
| COORD_ACTIONS |
| COORD_JOBS |
| OOZIE_SYS |
| OPENJPA_SEQUENCE_TABLE |
| SLA_EVENTS |
| SLA_REGISTRATION |
| SLA_SUMMARY |
| VALIDATE_CONN |
| WF_ACTIONS |
| WF_JOBS |
+------------------------+
12 rows in set (0.00 sec)
At this stage Tomcat server
containing oozie.war should start successfully, and oozie jobs should run as
scheduled for execution.