How to Install Apache Sqoop on Ubuntu 16.04

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. For example : MySQL, Oracle, Microsoft SQL Server. You can import and export data between relational databases and hadoop. You can also import / export from / to semi-structured data sources, for example HBase and Cassandra (NoSQL databases). Sqoop ships as one binary package that incorporates two separate parts - client and server.

  • Server- You need to install server on single node in your cluster. This node will then serve as an entry point for all Sqoop clients.
  • Client- Clients can be installed on any number of machines.

Below are the steps to setup Apache Sqoop on Ubuntu 16.04. Download required Sqoop package and this will have sqoop-1.99.7-bin-hadoop200.tar.gz file.

1) Download Sqoop using wget

Download Sqoop using below command on your filesystem.


Check if the file got downloaded correctly.

2) Extract Sqoop tar file

Extract the downloaded file.

tar -xvf sqoop-1.99.7-bin-hadoop200.tar.gz

Check if the file got extracted correctly.

3) Move the Sqoop Directory

Move the sqoop directory to /usr/lib/

sudo mv sqoop-1.99.7-bin-hadoop200 /usr/lib/

The Sqoop server acts as a Hadoop client, therefore Hadoop libraries (Yarn, Mapreduce, and HDFS jar files) and configuration files (core-site.xml, mapreduce-site.xml, ...) must be available on this node.

4) Set Hadoop and Sqoop Environment Variables

You should have Hadoop environment variables set in .bashrc file.

# Set Hadoop-related environment variables
export HADOOP_HOME=$HOME/hadoop-2.7.3
export HADOOP_CONF_DIR=$HOME/hadoop-2.7.3/etc/hadoop
export HADOOP_MAPRED_HOME=$HOME/hadoop-2.7.3 
export HADOOP_COMMON_HOME=$HOME/hadoop-2.7.3 
export HADOOP_HDFS_HOME=$HOME/hadoop-2.7.3
export HADOOP_YARN_HOME=$HOME/hadoop-2.7.3

Also, set sqoop environment variables in .bashrc file.

sudo gedit .bashrc

Put below lines in .bashrc file.

export SQOOP_HOME=/usr/lib/sqoop-1.99.7-bin-hadoop200 export PATH=$PATH:$SQOOP_HOME/bin export SQOOP_CONF_DIR=$SQOOP_HOME/conf export SQOOP_CLASS_PATH=$SQOOP_CONF_DIR


Use below command to put the changes into effect.

source .bashrc

5) Copy Required Jar Files to Sqoop Server lib Directory

Copy hadoop-common, hadoop-mapreduce, hadoop-hdfs, hadoop-yarn jars to /usr/lib/sqoop-1.99.7-bin-hadoop200/server/lib (sqoop server lib directory). Below are the paths from where you need to copy all the jars to sqoop server lib directory.

/home/ubuntu/hadoop-2.7.3/share/hadoop/common /home/ubuntu/hadoop-2.7.3/share/hadoop/common/lib /home/ubuntu/hadoop-2.7.3/share/hadoop/hdfs /home/ubuntu/hadoop-2.7.3/share/hadoop/hdfs/lib /home/ubuntu/hadoop-2.7.3/share/hadoop/mapreduce /home/ubuntu/hadoop-2.7.3/share/hadoop/mapreduce/lib /home/ubuntu/hadoop-2.7.3/share/hadoop/yarn /home/ubuntu/hadoop-2.7.3/share/hadoop/yarn/lib


6) Edit core-site.xml

Sqoop server will need to impersonate users to access HDFS and other resources in or outside of the cluster as the user who started given job rather then user who is running the server. You need to configure Hadoop's core-site.xml and add below 2 properties to it.


7) Initialize Metadeta Repository

The metadata repository needs to be initialized before starting Sqoop 2 server for the first time.

 ./bin/sqoop2-tool upgrade

8) Start Sqoop Server

Start the sqoop server.

 ./bin/sqoop2-server start

Check if the sqoop server service has started.


9) Start Sqoop Client

Just copy Sqoop distribution artifact on target machine and unzip it in desired location and you can start your client their. I am using the same machine as client as well. Start the Sqoop client


10) Download RDBMS Connectors

Download connectors of MySQL , Oracle and SQL Server using below links. These connectors are needed to make connection between Sqoop and RDBMS.

MySQL connector : Download
Oracle Connector : Download
Microsoft SQL Server Connector : Download

Check whether all the connectors got downloaded.

ls Downloads/

11) Set an Environment Variable to use RDBMS Connectors

Move all the connectors to a directory and set that directory as an environment variable.

sudo mkdir -p /var/lib/sqoop2/
sudo chmod 777 /var/lib/sqoop2/
mv Downloads/*.jar /var/lib/sqoop2/
ls -l /var/lib/sqoop2/
export SQOOP_SERVER_EXTRA_LIB=/var/lib/sqoop2/


Voila! You have successfully setup Apache Sqoop on Ubuntu 16.04. Now you are ready to import/export data using Sqoop. The next step is to use any of the RDBMS connector and import/export data from RDBMS to HDFS or HDFS to RDBMS.

Linoxide 3:00 am


Your email address will not be published. Required fields are marked *


  1. Hello,

    Thank you for the tutorial.

    However, I am stuck at the step 7.

    the output is :
    hadoop_u1@tnguyen-Inspiron-7520:/usr/local/sqoop/bin$ sudo ./sqoop2-tool upgrade
    Setting conf dir: /usr/local/sqoop/bin/../conf
    Sqoop home directory: /usr/local/sqoop
    Can't load the Hadoop related java lib, please check the setting for the following environment variables:

    Here is my .bashrc file:
    export HADOOP_HOME=/usr/local/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export JAVA_HOME=/usr/lib/jvm/java-8-oracle
    export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
    export SQOOP_HOME=/usr/local/sqoop
    export SQOOP_CONF_DIR=$SQOOP_HOME/conf

    from step 1 to 6, I followed what you put.

    1. check if there is a derby jar in '/usr/lib/sqoop-1.99.7-bin-hadoop200/server/lib ' if there is remove it.. it will work