30 Most Frequently Used Hadoop HDFS Shell Commands

In this tutorial, we will walk you through the Hadoop Distributed File System (HDFS) commands you will need to manage files on HDFS. HDFS command is used most of the times when working with Hadoop File System. It includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports. Most of the commands behave like corresponding Unix commands. Error information is sent to stderr and the output is sent to stdout. So, let's get started.

1) Version Check

To check the version of Hadoop.

[email protected]:~$ hadoop version

Hadoop 2.7.3

Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff

Compiled by root on 2016-08-18T01:41Z

Compiled with protoc 2.5.0

From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4

This command was run using /home/ubuntu/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar

2) list Command

List all the files/directories for the given hdfs destination path.

[email protected]:~ $ hdfs dfs -ls /

Found 3 items

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:11 /test

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /tmp

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /usr

3) df Command

Displays free space at given hdfs destination

[email protected]:~$ hdfs dfs -df hdfs:/

Filesystem                Size   Used  Available  Use%

hdfs://master:9000  6206062592  32768  316289024    0%

4) count Command

  • Count the number of directories, files and bytes under the paths that match the specified file pattern.

[email protected]:~$ hdfs dfs -count hdfs:/

4            0                  0 hdfs:///

5) fsck Command

HDFS Command to check the health of the Hadoop file system.

[email protected]:~$ hdfs fsck /

Connecting to namenode via http://master:50070/fsck?ugi=ubuntu&path=%2F

FSCK started by ubuntu (auth:SIMPLE) from /192.168.1.36 for path / at Mon Nov 07 01:23:54 GMT+05:30 2016

Status: HEALTHY

Total size:           0 B

Total dirs:           4

Total files:          0

Total symlinks:                 0

Total blocks (validated): 0

Minimally replicated blocks:        0

Over-replicated blocks:  0

Under-replicated blocks:              0

Mis-replicated blocks:                   0

Default replication factor:            2

Average block replication:            0.0

Corrupt blocks:                0

Missing replicas:                             0

Number of data-nodes:                1

Number of racks:                            1

FSCK ended at Mon Nov 07 01:23:54 GMT+05:30 2016 in 33 milliseconds

The filesystem under path '/' is HEALTHY

6) balancer Command

Run a cluster balancing utility.

[email protected]:~$ hdfs balancer

16/11/07 01:26:29 INFO balancer.Balancer: namenodes  = [hdfs://master:9000]

16/11/07 01:26:29 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]

Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved

16/11/07 01:26:38 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.36:50010

16/11/07 01:26:38 INFO balancer.Balancer: 0 over-utilized: []

16/11/07 01:26:38 INFO balancer.Balancer: 0 underutilized: []

The cluster is balanced. Exiting...

7 Nov, 2016 1:26:38 AM            0                  0 B                 0 B               -1 B

7 Nov, 2016 1:26:39 AM   Balancing took 13.153 seconds

7) mkdir Command

HDFS Command to create the directory in HDFS.

[email protected]:~$ hdfs dfs -mkdir /hadoop

[email protected]:~$ hdfs dfs -ls /

Found 5 items

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:29 /hadoop

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:26 /system

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:11 /test

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /tmp

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /usr

8) put Command

File

Copy file from single src, or multiple srcs from local file system to the destination file system.

[email protected]:~$ hdfs dfs -put test /hadoop

[email protected]:~$ hdfs dfs -ls /hadoop

Found 1 items

-rw-r--r--   2 ubuntu supergroup         16 2016-11-07 01:35 /hadoop/test

Directory

HDFS Command to copy directory from single source, or multiple sources from local file system to the destination file system.

[email protected]:~$ hdfs dfs -put hello /hadoop/

[email protected]:~$ hdfs dfs -ls /hadoop

Found 2 items

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:43 /hadoop/hello

-rw-r--r--   2 ubuntu supergroup         16 2016-11-07 01:35 /hadoop/test

9) du Command

Displays size of files and directories contained in the given directory or the size of a file if its just a file.

[email protected]:~$ hdfs dfs -du /

59  /hadoop

0   /system

0   /test

0   /tmp

0   /usr

10) rm Command

HDFS Command to remove the file from HDFS.

[email protected]:~$ hdfs dfs -rm /hadoop/test

16/11/07 01:53:29 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /hadoop/test

11) expunge Command

HDFS Command that makes the trash empty.

[email protected]:~$ hdfs dfs -expunge

16/11/07 01:55:54 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

12) rm -r Command

HDFS Command to remove the entire directory and all of its content from HDFS.

[email protected]:~$ hdfs dfs -rm -r /hadoop/hello

16/11/07 01:58:52 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /hadoop/hello

13) chmod Command

Change the permissions of files.

[email protected]:~$ hdfs dfs -chmod 777 /hadoop

[email protected]:~$ hdfs dfs -ls /

Found 5 items

drwxrwxrwx   - ubuntu supergroup          0 2016-11-07 01:58 /hadoop

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:26 /system

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:11 /test

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /tmp

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:09 /usr

14) get Command

HDFS Command to copy files from hdfs to the local file system.

[email protected]:~$ hdfs dfs -get /hadoop/test /home/ubuntu/Desktop/

[email protected]:~$ ls -l /home/ubuntu/Desktop/

total 4

-rw-r--r-- 1 ubuntu ubuntu 16 Nov  8 00:47 test

15) cat Command

HDFS Command that copies source paths to stdout.

[email protected]:~$ hdfs dfs -cat /hadoop/test

This is a test.

16) touchz Command

HDFS Command to create a file in HDFS with file size 0 bytes.

[email protected]:~$ hdfs dfs -touchz /hadoop/sample

[email protected]:~$ hdfs dfs -ls /hadoop

Found 2 items

-rw-r--r--   2 ubuntu supergroup          0 2016-11-08 00:57 /hadoop/sample

-rw-r--r--   2 ubuntu supergroup         16 2016-11-08 00:45 /hadoop/test

17) text Command

HDFS Command that takes a source file and outputs the file in text format.

[email protected]:~$ hdfs dfs -text /hadoop/test

This is a test.

18) copyFromLocal Command

HDFS Command to copy the file from Local file system to HDFS.

[email protected]:~$ hdfs dfs -copyFromLocal /home/ubuntu/new /hadoop

[email protected]:~$ hdfs dfs -ls /hadoop

Found 3 items

-rw-r--r--   2 ubuntu supergroup         43 2016-11-08 01:08 /hadoop/new

-rw-r--r--   2 ubuntu supergroup          0 2016-11-08 00:57 /hadoop/sample

-rw-r--r--   2 ubuntu supergroup         16 2016-11-08 00:45 /hadoop/test

19) copyToLocal Command

Similar to get command, except that the destination is restricted to a local file reference.

[email protected]:~$ hdfs dfs -copyToLocal /hadoop/sample /home/ubuntu/

[email protected]:~$ ls -l s*

-rw-r--r-- 1 ubuntu ubuntu         0 Nov  8 01:12 sample

-rw-rw-r-- 1 ubuntu ubuntu 102436055 Jul 20 04:47 sqoop-1.99.7-bin-hadoop200.tar.gz

20) mv Command

HDFS Command to move files from source to destination. This command allows multiple sources as well, in which case the destination needs to be a directory.

[email protected]:~$ hdfs dfs -mv /hadoop/sample /tmp

[email protected]:~$ hdfs dfs -ls /tmp

Found 1 items

-rw-r--r--   2 ubuntu supergroup          0 2016-11-08 00:57 /tmp/sample

21) cp Command

HDFS Command to copy files from source to destination. This command allows multiple sources as well, in which case the destination must be a directory.

[email protected]:~$ hdfs dfs -cp /tmp/sample /usr

[email protected]:~$ hdfs dfs -ls /usr

Found 1 items

-rw-r--r--   2 ubuntu supergroup          0 2016-11-08 01:22 /usr/sample

22) tail Command

Displays last kilobyte of the file "new" to stdout

[email protected]:~$ hdfs dfs -tail /hadoop/new

This is a new file.

Running HDFS commands.

23) chown Command

HDFS command to change the owner of files.

[email protected]:~$ hdfs dfs -chown root:root /tmp

[email protected]:~$ hdfs dfs -ls /

Found 5 items

drwxrwxrwx   - ubuntu supergroup          0 2016-11-08 01:17 /hadoop

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:26 /system

drwxr-xr-x   - ubuntu supergroup          0 2016-11-07 01:11 /test

drwxr-xr-x   - root   root                0 2016-11-08 01:17 /tmp

drwxr-xr-x   - ubuntu supergroup          0 2016-11-08 01:22 /usr

24) setrep Command

Default replication factor to a file is 3. Below HDFS command is used to change replication factor of a file.

[email protected]:~$ hdfs dfs -setrep -w 2 /usr/sample

Replication 2 set: /usr/sample

Waiting for /usr/sample ... done

25) distcp Command

Copy a directory from one node in the cluster to another

[email protected]:~$ hdfs dfs -distcp hdfs://namenodeA/apache_hadoop hdfs://namenodeB/hadoop

26) stat Command

Print statistics about the file/directory at <path> in the specified format. Format accepts filesize in blocks (%b), type (%F), group name of owner (%g), name (%n), block size (%o), replication (%r), user name of owner(%u), and modification date (%y, %Y). %y shows UTC date as “yyyy-MM-dd HH:mm:ss” and %Y shows milliseconds since January 1, 1970 UTC. If the format is not specified, %y is used by default.

[email protected]:~$ hdfs dfs -stat "%F %u:%g %b %y %n" /hadoop/test

regular file ubuntu:supergroup 16 2016-11-07 19:15:22 test

27) getfacl Command

Displays the Access Control Lists (ACLs) of files and directories. If a directory has a default ACL, then getfacl also displays the default ACL.

[email protected]:~$ hdfs dfs -getfacl /hadoop

# file: /hadoop

# owner: ubuntu

# group: supergroup

28) du -s Command

Displays a summary of file lengths.

[email protected]:~$ hdfs dfs -du -s /hadoop

59  /hadoop

29) checksum Command

Returns the checksum information of a file.

[email protected]:~$ hdfs dfs -checksum /hadoop/new

/hadoop/new     MD5-of-0MD5-of-512CRC32C               000002000000000000000000639a5d8ac275be8d0c2b055d75208265

30) getmerge Command

Takes a source directory and a destination file as input and concatenates files in src into the destination local file.

[email protected]:~$ cat test

This is a test.

[email protected]:~$ hdfs dfs -cat /hadoop/new

This is a new file.

Running HDFS commands.

[email protected]:~$ hdfs dfs -getmerge /hadoop/new test

[email protected]:~$ cat test

This is a new file.

Running HDFS commands.

Conclusion

This is the end of the HDFS Command blog, we hope it was informative and you were able to execute all the commands. We learned to create, upload and list the the contents in our HDFS directories. We also acquired the skills to download files from HDFS to our local file system and explored a few advanced features of HDFS file management using the command line.

Linoxide 3:00 am

About Linoxide

These articles are published with the combined efforts of linoxide team members

Author Archive Page

Have anything to say?

Your email address will not be published. Required fields are marked *

All comments are subject to moderation.