Ingesting data to Hadoop: Sqoop and Flume


Getting your Hadoop cluster up and running is only the first in your path to a successful BigData implementation. After all of your Hadoop infrastructure is in place, getting data on to you Hadoop cluster for analysis is the next challenge. The quality of the analysis depends on the quality of data available.Loading raw data from various sources is essential so that Hadoop to become your “data lake” -a single place to store all of your raw data and make that data accessible for analysis, across your organization. “Raw data” could be data form relational databases – Oracle, MySQL, SQL Server or flat files – text log files, JSON files, XML files, and various other different distinct types of data. Our Ingesting data to Hadoop: Sqoop & Flume course is the quickest and best way for you to learn about two of the most frequently used Hadoop ecosystem components to ingesting data onto Hadoop:

  • Sqoop – ingesting data from relational databases to Hadoop
  • Flume – realtime ingesting data from flat files and other sources for Hadoop

Course agenda:

  • The Flume architecture – agents, sources, channels, sinks, collectors.
  • Deploying Flume in a highly availability configuration and parallel data processing.
  • Using Flume for ingesting log files, in realtime, to Hadoop.
  • Flume configuration options.
  • Working with various Flume “sources”: from plain text files to reading Tweeter feeds.
  • The Sqoop architecture: Sqoop CLI and Sqoop2 server.
  • Using Sqoop for ingesting data from relational databases to Hadoop with Oracle & MySQL source examples.
  • Using Sqoop for incremental continuous ingestion of data – concepts and execution.

Technical Requirements


There are no reviews yet.

Be the first to review “Ingesting data to Hadoop: Sqoop and Flume”


The instructor provided a wealth of knowledge to provide us a good understanding of the topic

- Roland D

Very committed professional, born teacher. Interested in student success. Great time window he committed to (before and after class hours)

- Dale B

David have tremendous caliber to deliver training with same energy through out the day. He have excellent knowledge on each knowledge area of Hadoop. Have excellent troubleshooting skills and excellent query resolving ability

- Hemant B
Course Instructor
David Yahalom
Big Data & Database Expert

David Yahalom is the CTO of NAYA Technologies. David is both a database professional with extensive hands-on experience as well as a leading instructor. With over 12 years of experience in database and information systems architecture design, leading database teams, and database consulting, David has built an extensive background in Oracle, Hadoop/Cloudera, Amazon EMR, SQL Server, MySQL, and PostgreSQL. David is a certified instructor for Cloudera and Oracle University.



Upcoming Session
00:00 - ()
1x 4 Hours day
NAYA Academy
Hands-On Scope