Overview of Big Data
Big Data analytics project lifecycle – brief
Hadoop Overview
Environment setup
- Setting up Single node cluster
- Setting up Multi node cluster
Hadoop Distributed File System – HDFS
- HDFS Overview
- HDFS Architecture, Operations
- Data Storage in Hadoop – Replication and Block storage• Hadoop Daemons and its functionality
- Java based approach (Access HDFS through Java)
Map Reduce
- History of Map Reduce
- Map Reduce architecture – Programming paradigm• How Map Reduce model works?
- Mapper and Reducer concepts
- Data types in Map Reduce
- Input and output formats of Map Reduce
Course – Curriculum
- Different phases of Map Reduce
- Partitioner and Combiner
- Data Localization in Map Reduce
- Examples of Map Reduce program – Log Analysis• MR Job Configurations
YARN Framework
- Yarn Architecture
- Advantages of YARN over MR engine V1
- MR Job submission and execution procedure
Sqoop
- Overview of Sqoop
- Sqoop Installation and configuration
- Importing data from RDBMS to HDFS
- Incremental import of data
- Conditional based import
- Sqoop – to hive import
- Sqoop – Export to RDBMS
- Sqoop – Meta store
- Sqoop – Jobs configure and execution