Hive is a Data Ware House system to access and query data stored in the Hadoop File System. Hive uses a language called Hive query Language (HQL) with the same grammar and predicates used in SQL language.
Our experience in using Hive for analysis of large data sets (big data) related to bank-card transactions has given us opportunity to garner the best features of Hive available as on date, for generating in depth analytical reports on card transactions throwing insights on several dimensions of customer usage of cards.
Here is a brief recap on other parts of Hadoop framework mentioned in this write up. Hadoop is basically two parts – 1. Distributed file system (Hadoop File system referred as HDFS , and 2. MapReduce , a computing and processing framework. Hive provides data ware house facility on top of Hadoop.
I will share here some of the best features of Hive that were very much handy to generate analytical reports out of the large data sets in HDFS, processed (cleaned & transformed) using Spark and Spark Sql… (more…)