Getting Started with Big Data, Hadoop, and NoSQL Training Course

Duration: 1 day
Course code: SS-BDA-004

Audience
  • Technology executives
  • Enterprise architects
  • Software architects
  • Data architects
  • Technology managers
  • Any stakeholder who needs to understand Big Data, Hadoop, and NoSQL technologies
Prerequisites

Experience with software development.

Description

This course is designed to provide a rapid immersion into Big Data, the Hadoop ecosystem, and the NoSQL technology. We start with an introduction to Big Data, and explore the MapReduce, the key algorithm for processing big data on clusters. We continue with the Hadoop, the leading open source framework for Big Data processing and explore the components of the Hadoop ecosystems. We continue with overview of the related NoSQL technologies and we advise on selecting the right data store for a project and organization. Our goal for this course is that you leave with thorough understanding of the concepts and technologies so you can make the sound decisions in evaluation and adoption of these technologies in your organization. We believe that these new technologies need to be integrated with the conventional and existing data systems and we present the ways of how to achieve this.

Objectives
  • Identify Big Data problems and solutions and understand how they are relevant to IT
  • Explain MapReduce algorithm and assess its applicability to data problems
  • Decide on applying different technologies within the Hadoop ecosystem
  • Explain four different architectures of NoSQL stores and select the correct category based on the application area
  • Identify the ways of integrating Big Data with the conventional IT in the organization

Outline for Big Data Hadoop Training Course

Big Data and Hadoop: A quick dive

  • Big Data
  • Problems with conventional systems
  • Map Reduce algorithm
  • Traditional Database Applications
  • Hadoop

MapReduce

  • What is MapReduce?
  • Relevance of MapReduce to Big Data
  • Map operation
  • Reduce operation

Hadoop

  • What is Hadoop?
  • The Hadoop architecture
  • Hadoop Distributed File System

Hadoop Distributed File System (HDFS)

  • HDFS Architecture
  • HDFS API
  • Scalability
  • Data replication

Hadoop Applications

  • Typical Hadoop algorithms
  • Best practices for Hadoop

Working with Hive

  • What is Hive?
  • Hive architecture
  • Data warehouse using Hive

Working with Pig

  • What is Pig?
  • Analyzing data using Pig
  • Using Pig Latin to build data analysis programs

Other Elements of the Hadoop Ecosystem

  • Flume
  • Sqoop
  • Zookeper
  • YARN: The new MapReduce

Stream Processing

  • Stream processing paradigm and Big Data
  • Stream processing systems

NoSQL: Not Only SQL

  • Why NoSQL?
  • Relational database problems
  • Scalability, its Price, and its Limits
  • Key-Value Stores
  • Columnar Stores
  • Document Stores
  • Graph Stores
  • Selecting the right store
  • Polyglot persistence
  • Optimal choices and reasonable compromises

Integrating with Conventional IT

  • Hadoop as the new Data Warehouse
  • ETL jobs and Hadoop
  • Integrating Big Data with conventional systems
  • The human factor of Big Data introduction
  • Best practices of Hadoop/NoSQL introduction

Comments are closed.