30 Most Frequently Used Hadoop HDFS Shell Commands

November 11, 2016 | By in UBUNTU HOWTO
| Reply More

In this tutorial, we will walk you through the Hadoop Distributed File System (HDFS) commands you will need to manage files on HDFS. HDFS command is used most of the times when working with Hadoop File System. It includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports. Most of the commands behave like corresponding Unix commands. Error information is sent to stderr and the output is sent to stdout. So, let's get started.

1) Version Check

To check the version of Hadoop.

ubuntu@ubuntu-VirtualBox:~$ hadoop version

Hadoop 2.7.3

Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff

Compiled by root on 2016-08-18T01:41Z

Compiled with protoc 2.5.0

From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4

This command was run using /home/ubuntu/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar

2) list Command

List all the files/directories for the given hdfs destination path.

ubuntu@ubuntu-VirtualBox:~ $ hdfs dfs -ls /

Found 3 items

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:11 /test

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /tmp

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /usr

3) df Command

Displays free space at given hdfs destination

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -df hdfs:/

Filesystem                Size   Used  Available  Use%

hdfs://master:9000  6206062592  32768  316289024    0%

4) count Command

  • Count the number of directories, files and bytes under the paths that match the specified file pattern.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -count hdfs:/

4            0                  0 hdfs:///

5) fsck Command

HDFS Command to check the health of the Hadoop file system.

ubuntu@ubuntu-VirtualBox:~$ hdfs fsck /

Connecting to namenode via http://master:50070/fsck?ugi=ubuntu&path=%2F

FSCK started by ubuntu (auth:SIMPLE) from /192.168.1.36 for path / at Mon Nov 07 01:23:54 GMT+05:30 2016

Status: HEALTHY

Total size:           0 B

Total dirs:           4

Total files:          0

Total symlinks:                 0

Total blocks (validated): 0

Minimally replicated blocks:        0

Over-replicated blocks:  0

Under-replicated blocks:              0

Mis-replicated blocks:                   0

Default replication factor:            2

Average block replication:            0.0

Corrupt blocks:                0

Missing replicas:                             0

Number of data-nodes:                1

Number of racks:                            1

FSCK ended at Mon Nov 07 01:23:54 GMT+05:30 2016 in 33 milliseconds

The filesystem under path '/' is HEALTHY

6) balancer Command

Run a cluster balancing utility.

ubuntu@ubuntu-VirtualBox:~$ hdfs balancer

16/11/07 01:26:29 INFO balancer.Balancer: namenodes  = [hdfs://master:9000]

16/11/07 01:26:29 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]

Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved

16/11/07 01:26:38 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.36:50010

16/11/07 01:26:38 INFO balancer.Balancer: 0 over-utilized: []

16/11/07 01:26:38 INFO balancer.Balancer: 0 underutilized: []

The cluster is balanced. Exiting...

7 Nov, 2016 1:26:38 AM            0                  0 B                 0 B               -1 B

7 Nov, 2016 1:26:39 AM   Balancing took 13.153 seconds

7) mkdir Command

HDFS Command to create the directory in HDFS.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -mkdir /hadoop

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -ls /

Found 5 items

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:29 /hadoop

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:26 /system

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:11 /test

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /tmp

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /usr

8) put Command

File

Copy file from single src, or multiple srcs from local file system to the destination file system.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -put test /hadoop

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -ls /hadoop

Found 1 items

-rw-r--r--   2 ubuntu supergroup         16 2016-11-07 01:35 /hadoop/test

Directory

HDFS Command to copy directory from single source, or multiple sources from local file system to the destination file system.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -put hello /hadoop/

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -ls /hadoop

Found 2 items

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:43 /hadoop/hello

-rw-r--r--   2 ubuntu supergroup         16 2016-11-07 01:35 /hadoop/test

9) du Command

Displays size of files and directories contained in the given directory or the size of a file if its just a file.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -du /

59  /hadoop

0   /system

0   /test

0   /tmp

0   /usr

10) rm Command

HDFS Command to remove the file from HDFS.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -rm /hadoop/test

16/11/07 01:53:29 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /hadoop/test

11) expunge Command

HDFS Command that makes the trash empty.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -expunge

16/11/07 01:55:54 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

12) rm -r Command

HDFS Command to remove the entire directory and all of its content from HDFS.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -rm -r /hadoop/hello

16/11/07 01:58:52 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /hadoop/hello

13) chmod Command

Change the permissions of files.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -chmod 777 /hadoop

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -ls /

Found 5 items

drwxrwxrwx   - ubuntu supergroup          0 2016-11-07 01:58 /hadoop

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:26 /system

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:11 /test

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /tmp

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /usr

14) get Command

HDFS Command to copy files from hdfs to the local file system.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -get /hadoop/test /home/ubuntu/Desktop/

ubuntu@ubuntu-VirtualBox:~$ ls -l /home/ubuntu/Desktop/

total 4

-rw-r--r-- 1 ubuntu ubuntu 16 Nov  8 00:47 test

15) cat Command

HDFS Command that copies source paths to stdout.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -cat /hadoop/test

This is a test.

16) touchz Command

HDFS Command to create a file in HDFS with file size 0 bytes.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -touchz /hadoop/sample

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -ls /hadoop

Found 2 items

-rw-r--r--   2 ubuntu supergroup          0 2016-11-08 00:57 /hadoop/sample

-rw-r--r--   2 ubuntu supergroup         16 2016-11-08 00:45 /hadoop/test

17) text Command

HDFS Command that takes a source file and outputs the file in text format.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -text /hadoop/test

This is a test.

18) copyFromLocal Command

HDFS Command to copy the file from Local file system to HDFS.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -copyFromLocal /home/ubuntu/new /hadoop

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -ls /hadoop

Found 3 items

-rw-r--r--   2 ubuntu supergroup         43 2016-11-08 01:08 /hadoop/new

-rw-r--r--   2 ubuntu supergroup          0 2016-11-08 00:57 /hadoop/sample

-rw-r--r--   2 ubuntu supergroup         16 2016-11-08 00:45 /hadoop/test

19) copyToLocal Command

Similar to get command, except that the destination is restricted to a local file reference.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -copyToLocal /hadoop/sample /home/ubuntu/

ubuntu@ubuntu-VirtualBox:~$ ls -l s*

-rw-r--r-- 1 ubuntu ubuntu         0 Nov  8 01:12 sample

-rw-rw-r-- 1 ubuntu ubuntu 102436055 Jul 20 04:47 sqoop-1.99.7-bin-hadoop200.tar.gz

20) mv Command

HDFS Command to move files from source to destination. This command allows multiple sources as well, in which case the destination needs to be a directory.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -mv /hadoop/sample /tmp

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -ls /tmp

Found 1 items

-rw-r--r--   2 ubuntu supergroup          0 2016-11-08 00:57 /tmp/sample

21) cp Command

HDFS Command to copy files from source to destination. This command allows multiple sources as well, in which case the destination must be a directory.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -cp /tmp/sample /usr

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -ls /usr

Found 1 items

-rw-r--r--   2 ubuntu supergroup          0 2016-11-08 01:22 /usr/sample

22) tail Command

Displays last kilobyte of the file "new" to stdout

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -tail /hadoop/new

This is a new file.

Running HDFS commands.

23) chown Command

HDFS command to change the owner of files.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -chown root:root /tmp

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -ls /

Found 5 items

drwxrwxrwx   - ubuntu supergroup          0 2016-11-08 01:17 /hadoop

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:26 /system

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:11 /test

drwxr-xr-x   - root   root                0 2016-11-08 01:17 /tmp

drwxr-xr-x   - ubuntu supergroup          0 2016-11-08 01:22 /usr

24) setrep Command

Default replication factor to a file is 3. Below HDFS command is used to change replication factor of a file.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -setrep -w 2 /usr/sample

Replication 2 set: /usr/sample

Waiting for /usr/sample ... done

25) distcp Command

Copy a directory from one node in the cluster to another

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -distcp hdfs://namenodeA/apache_hadoop hdfs://namenodeB/hadoop

26) stat Command

Print statistics about the file/directory at <path> in the specified format. Format accepts filesize in blocks (%b), type (%F), group name of owner (%g), name (%n), block size (%o), replication (%r), user name of owner(%u), and modification date (%y, %Y). %y shows UTC date as “yyyy-MM-dd HH:mm:ss” and %Y shows milliseconds since January 1, 1970 UTC. If the format is not specified, %y is used by default.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -stat "%F %u:%g %b %y %n" /hadoop/test

regular file ubuntu:supergroup 16 2016-11-07 19:15:22 test

27) getfacl Command

Displays the Access Control Lists (ACLs) of files and directories. If a directory has a default ACL, then getfacl also displays the default ACL.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -getfacl /hadoop

# file: /hadoop

# owner: ubuntu

# group: supergroup

28) du -s Command

Displays a summary of file lengths.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -du -s /hadoop

59  /hadoop

29) checksum Command

Returns the checksum information of a file.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -checksum /hadoop/new

/hadoop/new     MD5-of-0MD5-of-512CRC32C               000002000000000000000000639a5d8ac275be8d0c2b055d75208265

30) getmerge Command

Takes a source directory and a destination file as input and concatenates files in src into the destination local file.

ubuntu@ubuntu-VirtualBox:~$ cat test

This is a test.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -cat /hadoop/new

This is a new file.

Running HDFS commands.

ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -getmerge /hadoop/new test

ubuntu@ubuntu-VirtualBox:~$ cat test

This is a new file.

Running HDFS commands.

Conclusion

This is the end of the HDFS Command blog, we hope it was informative and you were able to execute all the commands. We learned to create, upload and list the the contents in our HDFS directories. We also acquired the skills to download files from HDFS to our local file system and explored a few advanced features of HDFS file management using the command line.

Filed Under : FILE SYSTEM, UBUNTU HOWTO

Tagged With : ,

Free Linux Ebook to Download

Leave a Reply

Commenting Policy:
Promotion of your products ? Comment gets deleted.
All comments are subject to moderation.