How to Import Data from MySQL to HDFS Using Sqoop

Apache Sqoop is a tool in Hadoop ecosystem which is used to import/export data between RDBMS and HDFS. This data is in structured format and has a schema. There are multiple cases where you want to analyze some data in your RDBMS, but due to huge size of data your RDBMS is not capable enough to process that big data. Hence, you can use Sqoop in such situations and send this data on HDFS where storing and processing big data is not a problem. In this blog, I will show you how to send data from MySQL to HDFS using Sqoop Import.

MySql to HDFS Using Sqoop

To show this example, let me create a table in mysql which is on my windows machine and put some data in it.

Create a Database named linoxide, then create a Table named employee by executing the below command:

Command: create table employee ( id int, name varchar(20), dept varchar(20) );

Insert values in table employee and check the rows present in table employee by executing the below command:

Command: select * from employee;

Mysql Sqoop Example - MySQL to HDFS Using Sqoop

Now, Open Command Prompt (CMD) on Windows and check the IPv4 Address of your LAN (most cases vmnet1) connection by executing the below command:

Mysql Sqoop ipconfig - MySQL to HDFS Using Sqoop

Find out the IPv4 address of your system using the above step. In my case it is 192.168.1.33

Grant all privileges to root@your_ipv4_address by executing the below command.

Command: grant all privileges on *.* to root@192.168.1.33 IDENTIFIED BY ‘root’ WITH GRANT OPTION; 

Mysql Grant Privileges - MySQL to HDFS Using Sqoop

 

 

 

Download connector of MySQL on your machine where sqoop is installed using below links. This connector are needed to make connection between Sqoop and Mysql db.

MySQL connector : Download

Move mysql connector to the lib folder of sqoop by executing the below command:

Command: sudo mv Downloads/mysql-connector-java-5.1.26-bin.jar /home/hadoop/sqoop/lib

Now we are ready to run sqoop import command.

Required items for the command:

IPv4 Address – Your IPv4 address. In my case it is 192.168.1.33

Database Name – linoxide

Table Name – employee

Username – root

Output Directory – Could be any, I have used sqoop_out

Command: bin/sqoop import --connect jdbc:mysql://192.168.1.33/linoxide --table employee --username root -P --target-dir /sqoop_out  -m 1

Enter the Password of your mysql when asked.

Import the table employee present in MySQL database to hdfs by executing the below command.

hadoop@hadoop-VirtualBox:~/sqoop$ bin/sqoop import --connect jdbc:mysql://192.168.
1.33/linoxide --table employee --username root -P --target-dir /sqoop_out  -m 1

16/12/25 03:03:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5

Enter password:

16/12/25 03:03:09 INFO manager.MySQLManager: Preparing to use a MySQL streaming 
resultset.

16/12/25 03:03:09 INFO tool.CodeGenTool: Beginning code generation

Sun Dec 25 03:03:10 IST 2016 WARN: Establishing SSL connection without server's
 identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 
5.7.6+ requirements SSL connection must be established by default if explicit 
option isn't set. For compliance with existing applications not using SSL the 
verifyServerCertificate property is set to 'false'. You need either to explicitly 
disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for 
server certificate verification.

16/12/25 03:03:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM
 `employee` AS t LIMIT 1

16/12/25 03:03:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM
 `employee` AS t LIMIT 1

16/12/25 03:03:11 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/
hadoop-2.7.3

16/12/25 03:03:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/
compile/33bf897949f7b40e3cc1f5b5afae7cb3/employee.jar

16/12/25 03:03:21 INFO manager.MySQLManager: Setting zero DATETIME behavior to 
convertToNull (mysql)

16/12/25 03:03:21 INFO mapreduce.ImportJobBase: Beginning import of employee

16/12/25 03:03:23 INFO Configuration.deprecation: mapred.jar is deprecated. 
Instead, use mapreduce.job.jar

16/12/25 03:03:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. 
Instead, use mapreduce.job.maps

16/12/25 03:03:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:
8032

Sun Dec 25 03:03:37 IST 2016 WARN: Establishing SSL connection without server's 
identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 
5.7.6+ requirements SSL connection must be established by default if explicit 
option isn't set. For compliance with existing applications not using SSL the 
verifyServerCertificate property is set to 'false'. You need either to explicitly 
disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for 
server certificate verification.

16/12/25 03:03:37 INFO db.DBInputFormat: Using read commited transaction isolation

16/12/25 03:03:38 INFO mapreduce.JobSubmitter: number of splits:1

16/12/25 03:03:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1482614949106_0001

16/12/25 03:03:42 INFO impl.YarnClientImpl: Submitted application 
application_1482614949106_0001

16/12/25 03:03:43 INFO mapreduce.Job: The url to track the job: 
http://hadoop-VirtualBox:8088/proxy/application_1482614949106_0001/

16/12/25 03:03:43 INFO mapreduce.Job: Running job: job_1482614949106_0001

16/12/25 03:04:25 INFO mapreduce.Job: Job job_1482614949106_0001 running in
 uber mode : false

16/12/25 03:04:25 INFO mapreduce.Job:  map 0% reduce 0%

16/12/25 03:04:51 INFO mapreduce.Job:  map 100% reduce 0%

16/12/25 03:04:52 INFO mapreduce.Job: Job job_1482614949106_0001 completed 
successfully

16/12/25 03:04:53 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=127675

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=87

HDFS: Number of bytes written=86

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Other local map tasks=1

Total time spent by all maps in occupied slots (ms)=20382

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=20382

Total vcore-milliseconds taken by all map tasks=20382

Total megabyte-milliseconds taken by all map tasks=20871168

Map-Reduce Framework

Map input records=5

Map output records=5

Input split bytes=87

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=210

CPU time spent (ms)=2780

Physical memory (bytes) snapshot=118214656

Virtual memory (bytes) snapshot=1904357376

Total committed heap usage (bytes)=54853632

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=86

16/12/25 03:04:53 INFO mapreduce.ImportJobBase: Transferred 86 bytes in 85.6488
seconds (1.0041 bytes/sec)

16/12/25 03:04:53 INFO mapreduce.ImportJobBase: Retrieved 5 records.

Once the job completes, check the output stored on HDFS

hadoop@hadoop-VirtualBox:~/sqoop$ hdfs dfs -cat /sqoop_out/part-m-00000

1,John,sales

2,Alex,Development

3,Jessica,Marketing

4,Luke,Operation

5,Lisa,Marketing

hadoop@hadoop-VirtualBox:~/sqoop$

Oracle to HDFS

Similarly, to send the data from Oracle database to HDFS, you will need to put oracle connector in sqoop's lib directory.

Oracle Connector : Download

Below are the required items for the command to import data from oracle to HDFS:

IPv4 Address – Your IPv4 address. In my case it is 192.168.1.33

Database Name – linoxide

Table Name – employee

Username – root

Output Directory – sqoop_out

Command - sudo bin/sqoop import –connect jdbc:oracle:thin:system/system@192.168.1.33:1521:xe --username system -P --table system.emp –columns “ID” –target-dir /sqoop_out-m 1

Conclusion

Congratulations! You have successfully transferred data from MySQL to HDFS using Apache Sqoop. Similarly, you can transfer structured data from any RDBMS to HDFS that you want to process, make sure that you put the connectors in sqoop's lib directory before running import/export command.

Have anything to say?

Your email address will not be published. Required fields are marked *

All comments are subject to moderation.