Curs Intro to Big Data Hadoop Architecture (3 days course)


3 - 24 EUR


1 zi (8 ore)

Înscrie-te la acest curs sau cere detalii suplimentare



Despre mine

Tip mesaj


Înscrie-te la acest curs sau cere detalii suplimentare



Despre mine

Tip mesaj



Knowledge of SQL/noSQL difference, Distributed systems/networks concepts, Network concepts.

– Big Data Architecture evolution

– Lambda architecture overview

– What is Hadoop, HDFS emergence and MapReduce evolution

– Use cases of Hadoop and Hadoop IRL

– Hadoop framework overview, detailed overview and applicability cases :

– HDFS & MapReduce essentials, YARN

– Exercises on MapReduce

– Apache Hive and Apache Impala intro and hands on exercises

– Understanding role of file formats in Hadoop: Apache Avro, Parquet, ORC (hands on exercises)

– Data storage:

– Architecting data in Hadoop: storage options considerations

– Apache Hbase intro and hands on exercises (using Hive)

– Data computing:

– Apache Spark intro

– MapReduce vs Apache Spark

– Start working with Spark : create an RDD from a HDFS file, transformations &

actions, creating a dataframe, operations on dataframes;

– Data analysis:

– SQL on Hadoop: Hive, Impala

– Search: Solr/Elastic

– Spark SQL intro and hands on exercises

– Data ingestion:

– Alternatives for data ingestion in Hadoop: Streamsets, Sqoop

– Messaging bus: Apache Kafka

– Other: Oozie, Zookeeper, Hue

Commercial distributions of Hadoop : Cloudera , MapR, Hortonworks generic considerations.

– We will work with Cloudera commercial distribution throughout the course ( CDH);


– We will need open Internet connection throughout the course, we will run the exercises on cloud – thus is mandatory for the Internet connection to be open and reliable;

– Each participant need to have it’s own computer in order to run the hands on exercises, also the computer settings has to allow access to Google docs and Github for getting access to presenters slides, documents, data and exercises;

– Google Chrome browser;

– For the local computers we will need a SSH client to connect to the cloud environment.