HDFS is now an Apache Hadoop subproject. An HDFS instance contains of vast amount of servers and each store a part of file system. A typical file size in HDFS would be in gigabytes or terabytes in size hence applications will have large data sets. A file once created need not be changed ie it works with write once read many access model.
An HDFS cluster consists of a master server (namenode) that manages the file system namespace and controls the access for the files. And other nodes in the cluster servers as datanodes which handles the storage attached to the nodes and also responsible for block creation / deletion / replication as instructed from namenodes. HDFS is coded in Java so any nodes that supports Java can run nameNode or dataNode applications.
This post gives you a Hadoop HDFS command cheatsheet. This will come very handy when you are working with these commands on Hadoop Distributed File System). Earlier, hadoop fs was used in the commands, now its deprecated, so we use hdfs dfs. All Hadoop commands are invoked by the bin/hadoop script. This cheatsheet contains multiple commands, I would say almost all the commands which are often used by a Hadoop developer as well as administrator. It is pretty comprehensive, I have also shown all the options which can be used for the same command. In any case, while running a command you get an error, do not panic and just check the syntax of your command, there might be a command syntax issue or may be issue with the source or destination you mentioned.
We have grouped commands in below categories:
1) List Files
2) Read/Write Files
3) Upload/Download Files
4) File Management
5) Ownership and Validation
Keep this A4 size cheatsheet on your desk printed, I am sure you will learn them quick and will be a Hadoop expert very soon. Please keep us posted if you need us to add more commands. The commands are categorized into 7 sections according to its usage.