Standalone spark cluster setup in AWS cloud

By Harikrishna Doredla

Here I discuss how the standalone Spark cluster is setup in AWS using EC2.

Let’s assume we are setting up a 3 node standalone cluster. The ip address of each node say : (m4.xlarge – $0.239 per Hour) (m4.large – $0.12 per Hour) (m4.large – $0.12 per Hour)

Each node has 100 GB EBS volume

Servers Info


Read More

Scrapy installation on CentOS and Windows

By Harikrishna Doredla

Scrapy is an application framework supporting development of applications in a given environment. I discuss here the steps of installation of Scrapy both CentOS and Windows environments including installation of the dependencies thereof.

Scrapy Installation on Centos 6.5

Scrapy needs python 2.7 and above to run in CentOS . CentOS 6.5 comes with Python 2.6 .So we need to install python 2.7+ to run Scrapy code. Here are the steps to install Python 2.7.11. Firstly install the Scrapy dependencies, preceding installation of Python 2.7.11. (more…)

Read More