Duration: 3 days
Course code: SS-BDA-006
Audience
- Application developers
- Architects
- Consultants
- Technical Managers
Prerequisites
Experience with Java is strongly encouraged.
Description
This course is designed to provide a rapid immersion into Big Data with Hadoop using IBM BigInsights, IBM's distribution of Hadoop. The course starts with clear explanation of key concepts of Map-Reduce algorithm beyond Hadoop and immediately becomes hands-on: the participants right away start using the tools. We start with an introduction to the Hadoop cluster and teach the ways to interact with the Hadoop file system and the cluster. Next we examine writing Java programs that perform the processing on the Hadoop cluster. We discuss writing of mappers and reducers, common algorithms, best practices and testing approaches. Next we introduce Hive and Pig – popular higher level interfaces to the Hadoop system.
Objectives
Upon completion, attendees would be able to:
- Understand the Hadoop technology and IBM BigInsights
- Work with the Hadoop file system
- Write Map Reduce programs using Hadoop APIs
- Use Hive and Pig for productive development
- Make the best use of IBM BigInsights tooling for Hadoop
Outline for Mastering Big Data with Hadoop and IBM BigInsights Training Course
Big Data and Hadoop: A quick dive
- Big Data
- Problems with conventional systems
- Map Reduce algorithm
- Traditional Database Applications
- Hadoop
- IBM BigInsights
MapReduce
- What is MapReduce?
- Relevance of MapReduce to Big Data
- Map operation
- Reduce operation
- Survey of real-world Map Reduce problems
- Execution strategies for MapReduce
Hadoop
- What is Hadoop?
- The Hadoop architecture
- Hadoop Distributed File System
Hadoop Distributed File System (HDFS)
- HDFS Architecture
- HDFS API
- Web interface
- Command shell
- Scalability
- Data replication
Working with Hadoop API
- Hadoop API
- Mapper
- Reducer
- Combiner
- JobConf
- JobClient
Designing Hadoop Applications
- Typical Hadoop algorithms
- Best practices for Hadoop
- Testing Hadoop programs
Working with Hive
- What is Hive?
- Hive architecture
- Data warehouse using Hive
- Hive QL
- Plugging custom mappers and reducers
Working with Pig
- What is Pig?
- Pig architecture
- Analyzing data using Pig
- Using Pig Latin to build data analysis programs