Installing Cloud Foundry BOSH Command Line Interface (CLI) on Centos 7

By Harikrishna Doredla

Cloud Foundry BOSH is an open source tool. BOSH Command Line Interface (CLI) is used to interact with the Director and to bootstrap new BOSH environments. The CLI is written in Ruby and is provided by the two gems:
bosh_cli contains main operator commands.
bosh_cli_plugin_micro contains bootstrapping commands.

Install the two Gems following the steps below:  (more…)

Read More

Analyze Big data using Apache Spark SQL

By  Rajashekar Yedla

Apache Spark SQL is a powerful data processing engine and in-memory computing framework to perform processing quickly and analyze large vloume of data . We fetch the elements of a RDD into a Spark SQL table, and query on that table. We can write only SELECT queries on the Spark SQL table and no other SQL operations are possible. Select query on Spark SQL returns RDD only. It has Rich API’s supporting in 3 different languages (Java, Scala and Python).

We use Spark SQL extensively to perform ETL on Big Data where we find it convenient to dispense with writing complex code using Spark.

Working with Spark SQL:

As in Spark, to start with Spark SQL first we have to get the data into RDD(Resilient Distributed Data sets). Once the RDD is available, we create a Spark SQL table with desired RDD elements as table records. This we achieve using SparkSqlContext. Now we implement business logic writing appropriate SELECT queries on Spark SQL tables . The output of the query will be another RDD and the elements of output RDD will be  saved as a Text file, or as an Object file as we need.. (more…)

Read More

Executing Oozie workflow of spark jobs in shell action

By Anusha Jallipalli

Oozie is a server-based Workflow Engine and runs in a Java Servlet-Container to schedule and manage Hadoop jobs.

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs including Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp as well as system specific jobs like Java programs and shell scripts).

Here we discuss a simple work flow which takes input from HDFS and performs word count using spark job. Here the job-1 passes its output to  job-2 as programmed in the shell script.. (more…)

Read More