Frequently used HDFS shell commands

Apache Hadoop offers highly reliable, scalable, distributed processing of large data sets using simple programming models.

All HDFS commands are invoked by the bin/hdfs script

Open a terminal window to the current working directory.


1. Print the Hadoop version

2. List the contents of the root directory in HDFS

3. Report the amount of space used and available on currently mounted filesystem

4. Count the number of directories,files and bytes under the paths that match the specified file pattern

5. Run a DFS filesystem checking utility

6. Run a cluster balancing utility

7. Create a new directory named “hadoop” below the /user/training directory in HDFS. Since you’re currently logged in with the “training” user ID, /user/training is your home directory in HDFS.

8. Add a sample text file from the local directory named “data” to the new directory you created in HDFS during the previous step

9. List the contents of this new directory in HDFS

10. Add the entire local directory called “retail” to the /user/training directory in HDFS.

11. Since /user/training is your home directory in HDFS, any command that does not have an absolute path is interpreted as relative to that directory.  The next command will therefore list your home directory, and should show the items you’ve just added there.

12. See how much space this directory occupies in HDFS.

13. Delete a file ‘customers’ from the “retail” directory.

14. Ensure this file is no longer in HDFS.

15. Delete all files from the “retail” directory using a wildcard.

16. To empty the trash

17. Finally, remove the entire retail directory and all of its contents in HDFS.

18. List the hadoop directory again

19. Add the purchases.txt file from the local directory named “/home/training/” to the hadoop directory you created in HDFS

20. To view the contents of your text file purchases.txt which is present in your hadoop directory.

21. Add the purchases.txt file from “hadoop” directory which is present in HDFS directory to the directory “data” which is present in your local directory

22. cp is used to copy files between directories present in HDFS

23. ‘-get’ command can be used alternaively to ‘-copyToLocal’ command

24. Display last kilobyte of the file “purchases.txt” to stdout.

25. Default file permissions are 666 in HDFS Use ‘-chmod’ command to change permissions of a file hadoop fs -ls hadoop/purchases.txt

26. Default names of owner and group are training,training Use ‘-chown’ to change owner name and group name simultaneously hadoop fs -ls hadoop/purchases.txt

27. Default name of group is training Use ‘-chgrp’ command to change group name hadoop fs -ls hadoop/purchases.txt

28. Move a directory from one location to other

29. Default replication factor to a file is 3. Use ‘-setrep’ command to change replication factor of a file

30. Copy a directory from one node in the cluster to another Use

‘-distcp’ command to copy, -overwrite option to overwrite in an existing files

-update command to synchronize both directories

31. Command to make the name node leave safe mode hadoop fs -expunge

32. List all the hadoop file system shell commands

33. Last but not least, always ask for help!


You might also like More from author