How to Install Apache Sqoop on Ubuntu 16.04

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. For example MySQL, Oracle, Microsoft SQL Server. You can import and export data between relational databases and hadoop. You can also import/export from / to semi-structured data sources, for example HBase and Cassandra (NoSQL databases). Sqoop ships as one binary package that incorporates two separate parts - client and server.

  • Server- You need to install server on single node in your cluster. This node will then serve as an entry point for all Sqoop clients.
  • Client- Clients can be installed on any number of machines.

Below are the steps to setup Apache Sqoop on Ubuntu 16.04. Download required Sqoop package and this will have sqoop-1.99.7-bin-hadoop200.tar.gz file.

1) Download Sqoop using wget

Download Sqoop using below command on your filesystem.


Check if the file got downloaded correctly.

2) Extract Sqoop tar file

Extract the downloaded file.

tar -xvf sqoop-1.99.7-bin-hadoop200.tar.gz

Check if the file got extracted correctly.

3) Move the Sqoop Directory

Move the sqoop directory to /usr/lib/

sudo mv sqoop-1.99.7-bin-hadoop200 /usr/lib/

The Sqoop server acts as a Hadoop client, therefore Hadoop libraries (Yarn, Mapreduce, and HDFS jar files) and configuration files (core-site.xml, mapreduce-site.xml, ...) must be available on this node.

4) Set Hadoop and Sqoop Environment Variables

You should have Hadoop environment variables set in .bashrc file.

# Set Hadoop-related environment variables
export HADOOP_HOME=$HOME/hadoop-2.7.3
export HADOOP_CONF_DIR=$HOME/hadoop-2.7.3/etc/hadoop
export HADOOP_MAPRED_HOME=$HOME/hadoop-2.7.3 
export HADOOP_COMMON_HOME=$HOME/hadoop-2.7.3 
export HADOOP_HDFS_HOME=$HOME/hadoop-2.7.3
export HADOOP_YARN_HOME=$HOME/hadoop-2.7.3

Also, set sqoop environment variables in .bashrc file.

sudo gedit .bashrc

Put below lines in .bashrc file.

export SQOOP_HOME=/usr/lib/sqoop-1.99.7-bin-hadoop200 export PATH=$PATH:$SQOOP_HOME/bin export SQOOP_CONF_DIR=$SQOOP_HOME/conf export SQOOP_CLASS_PATH=$SQOOP_CONF_DIR

Use below command to put the changes into effect.

source .bashrc

5) Copy Required Jar Files to Sqoop Server lib Directory

Copy hadoop-common, hadoop-mapreduce, hadoop-hdfs, hadoop-yarn jars to /usr/lib/sqoop-1.99.7-bin-hadoop200/server/lib (sqoop server lib directory). Below are the paths from where you need to copy all the jars to sqoop server lib directory.

/home/ubuntu/hadoop-2.7.3/share/hadoop/common /home/ubuntu/hadoop-2.7.3/share/hadoop/common/lib /home/ubuntu/hadoop-2.7.3/share/hadoop/hdfs /home/ubuntu/hadoop-2.7.3/share/hadoop/hdfs/lib /home/ubuntu/hadoop-2.7.3/share/hadoop/mapreduce /home/ubuntu/hadoop-2.7.3/share/hadoop/mapreduce/lib /home/ubuntu/hadoop-2.7.3/share/hadoop/yarn /home/ubuntu/hadoop-2.7.3/share/hadoop/yarn/lib

6) Edit core-site.xml

Sqoop server will need to impersonate users to access HDFS and other resources in or outside of the cluster as the user who started given job rather then user who is running the server. You need to configure Hadoop's core-site.xml and add below 2 properties to it.


7) Initialize Metadeta Repository

The metadata repository needs to be initialized before starting Sqoop 2 server for the first time.

 ./bin/sqoop2-tool upgrade

8) Start Sqoop Server

Start the sqoop server.

 ./bin/sqoop2-server start

Check if the sqoop server service has started.


9) Start Sqoop Client

Just copy Sqoop distribution artifact on target machine and unzip it in the desired location and you can start your client. I am using the same machine as the client as well. Start the Sqoop client


10) Download RDBMS Connectors

Download connectors of MySQL , Oracle and SQL Server using below links. These connectors are needed to make connection between Sqoop and RDBMS.

MySQL connector : Download
Oracle Connector : Download
Microsoft SQL Server Connector : Download

Check whether all the connectors got downloaded.

ls Downloads/

11) Set an Environment Variable to use RDBMS Connectors

Move all the connectors to a directory and set that directory as an environment variable.

sudo mkdir -p /var/lib/sqoop2/
sudo chmod 777 /var/lib/sqoop2/
mv Downloads/*.jar /var/lib/sqoop2/
ls -l /var/lib/sqoop2/
export SQOOP_SERVER_EXTRA_LIB=/var/lib/sqoop2/


Voila! You have successfully set up Apache Sqoop on Ubuntu 16.04. Now you are ready to import/export data using Sqoop. The next step is to use any of the RDBMS connector and import/export data from RDBMS to HDFS or HDFS to RDBMS.

5 Comments... add one

  1. Hello,

    Thank you for the tutorial.

    However, I am stuck at the step 7.

    the output is :
    hadoop_u1@tnguyen-Inspiron-7520:/usr/local/sqoop/bin$ sudo ./sqoop2-tool upgrade
    Setting conf dir: /usr/local/sqoop/bin/../conf
    Sqoop home directory: /usr/local/sqoop
    Can't load the Hadoop related java lib, please check the setting for the following environment variables:

    Here is my .bashrc file:
    export HADOOP_HOME=/usr/local/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export JAVA_HOME=/usr/lib/jvm/java-8-oracle
    export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
    export SQOOP_HOME=/usr/local/sqoop
    export SQOOP_CONF_DIR=$SQOOP_HOME/conf

    from step 1 to 6, I followed what you put.

  2. Update- There seems to be a lot of conflict with the hadoop/share folder and the sqoop/lib folder. Is this normal? I had to delete a lot of jars to get it to the next point. Now it says that that hadoop/conf is either not a proper configuration directory or a permissions issue. I cancelled the permissions by making it available to everyone, and now it still throws the error that hadoop/conf is not a proper configuration directory.


Leave a Comment