Overview of Big Data

Big Data analytics project lifecycle – brief

Hadoop Overview

Environment setup

  • Setting up Single node cluster
  • Setting up Multi node cluster

Hadoop Distributed File System – HDFS

  • HDFS Overview
  • HDFS Architecture, Operations
  • Data Storage in Hadoop – Replication and Block storage• Hadoop Daemons and its functionality
  • Java based approach (Access HDFS through Java)

Map Reduce

  • History of Map Reduce
  • Map Reduce architecture – Programming paradigm• How Map Reduce model works?
  • Mapper and Reducer concepts
  • Data types in Map Reduce
  • Input and output formats of Map Reduce

Course – Curriculum

  • Different phases of Map Reduce
  • Partitioner and Combiner
  • Data Localization in Map Reduce
  • Examples of Map Reduce program – Log Analysis• MR Job Configurations

YARN Framework

  • Yarn Architecture
  • Advantages of YARN over MR engine V1
  • MR Job submission and execution procedure


  • Overview of Sqoop
  • Sqoop Installation and configuration
  • Importing data from RDBMS to HDFS
  • Incremental import of data
  • Conditional based import
  • Sqoop – to hive import
  • Sqoop – Export to RDBMS
  • Sqoop – Meta store
  • Sqoop – Jobs configure and execution

Course – Curriculum


  • Introduction to Hive
  • Hive Architecture
  • RDBMS vs Hive
  • Map Reduce Vs Apache Pig Vs Hive
  • Hive shell and HiveQL(HQL)
  • Hive server & Beeline client
  • Tables Management (External Vs Managed tables)
  • Schemas and data types
  • Partitions (Static & Dynamic partitioning)
  • Bucketing
  • Different file formats – rc, orc, parquet, Json, text formats• Hive – Built in functions
  • Hive – User defined functions (UDF)
  • HQL – Query execution plan & performance tuningIntroduction to NoSQL
  • What is NoSQL Databases?
  • Fixed schema(RDBMS) Vs Flexible Schema(NoSQL)
  • Json (JavaScript Object Notation) format
  • Ideas Behind MongoDB / Cassandra

Course – Curriculum


  • HBase Introduction
  • HBase architecture
  • HBase vs RDBMS (fixed Vs flexible schema)
  • Master and Region servers
  • HBase commands

Introduction to Apache Spark

  • Introduction to Spark
  • Apache Hadoop Vs Apache Spark
  • Need for Spark
  • Sample Programs to understand the advantages of Spark