Hi all, today we're gonna learn how to find and remove duplicate files on you Linux PC or Server. So, here's tools that you may use anyone of them according to your needs and comfort.
Whether you’re using Linux on your desktop or a server, there are good tools that will scan your system for duplicate files and help you remove them to free up space. Solid graphical and command-line interfaces are both available. Duplicate files are an unnecessary waste of disk space. After all, if you really need the same file in two different locations you could always set up a symbolic link or hard link, storing the data in only one location on disk.
1) FSlint
FSlint is available in various Linux distributions binary repository, including Ubuntu, Debian, Fedora, and Red Hat. Just fire up your package manager and install the “fslint” package. This utility provides a convenient graphical interface by default and it also includes command-line versions of its various functions.
Don’t let that scare you away from using FSlint’s convenient graphical interface, though. By default, it opens with the Duplicates pane selected and your home directory as the default search path.
Installation
To install fslint, as I am running ubuntu, here is the default command:
# apt-get install fslint
But here are installation commands for other linux distributions:
Debian:
# svn checkout http://fslint.googlecode.com/svn/trunk/ fslint-2.45 # cd fslint-2.45 # dpkg-buildpackage -I.svn -rfakeroot -tc # dpkg -i ../fslint_2.45-1_all.deb
Fedora:
# yum install fslint
For OpenSuse:
# [ -f /etc/mandrake-release ] && pkg=rpm # [ -f /etc/SuSE-release ] && pkg=packages # wget http://www.pixelbeat.org/fslint/fslint-2.42.tar.gz # rpmbuild -ta fslint-2.42.tar.gz # rpm -Uvh /usr/src/$pkg/RPMS/noarch/fslint-2.42-1.*.noarch.rpm
For Other Distro:
# wget http://www.pixelbeat.org/fslint/fslint-2.44.tar.gz # tar -xzf fslint-2.44.tar.gz # cd fslint-2.44 # (cd po && make) # ./fslint-gui
Run fslint
To run fslint in GUI version run fslint-gui in Ubuntu, run command (Alt+F2) or terminal:
$ fslint-gui
By default, it opens with the Duplicates pane selected and your home directory as the default search path. All you have to do is click the Find button and FSlint will find a list of duplicate files in directories under your home folder.
Use the buttons to delete any files you want to remove, and double-click them to preview them.
Finally, you are done. Hurray, we have successfully removed duplicate files from your system.
Note that the command-line utilities aren’t in your path by default, so you can’t run them like typical commands. On Ubuntu, you’ll find them under /usr/share/fslint/fslint. So, if you wanted to run the entire fslint scan on a single directory, here are the commands you’d run on Ubuntu:
cd /usr/share/fslint/fslint
./fslint /path/to/directory
This command won’t actually delete anything. It will just print a list of duplicate files — you’re on your own for the rest.
$ /usr/share/fslint/fslint/findup --help find dUPlicate files. Usage: findup [[[-t [-m|-d]] | [--summary]] [-r] [-f] paths(s) ...] If no path(s) specified then the current directory is assumed. When -m is specified any found duplicates will be merged (using hardlinks). When -d is specified any found duplicates will be deleted (leaving just 1). When -t is specified, only report what -m or -d would do. When --summary is specified change output format to include file sizes. You can also pipe this summary format to /usr/share/fslint/fslint/fstool/dupwaste to get a total of the wastage due to duplicates.
2) Fdupes
FDUPES is a program for identifying or deleting duplicate files residing within specified directories written by Adrian Lopez. You can look the GitHub project.
Install fdupes
To install fdupes, do as below:
On Centos 7:
# yum install fdupes Loaded plugins: fastestmirror base | 3.6 kB 00:00:00 epel/x86_64/metalink | 12 kB 00:00:00 epel | 4.3 kB 00:00:00 extras | 3.4 kB 00:00:00 updates | 3.4 kB 00:00:00 (1/2): epel/x86_64/updateinfo | 817 kB 00:00:00 (2/2): epel/x86_64/primary_db | 4.8 MB 00:00:00 Loading mirror speeds from cached hostfile * base: mirrors.linode.com * epel: fedora-epel.mirrors.tds.net * extras: mirrors.linode.com * updates: mirrors.linode.com Resolving Dependencies --> Running transaction check ---> Package fdupes.x86_64 1:1.6.1-1.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ====================================================================================================================================================== Package Arch Version Repository Size ====================================================================================================================================================== Installing: fdupes x86_64 1:1.6.1-1.el7 epel 28 k
On Ubuntu 16.04:
# apt install fdupes Reading package lists... Done Building dependency tree Reading state information... Done The following packages were automatically installed and are no longer required: libdvdnav4 libdvdread4 libenca0 libguess1 librubberband2v5 libsdl2-2.0-0 libsndio6.1 libva-wayland1 libva-x11-1 mpv rtmpdump Use 'sudo apt autoremove' to remove them.
Search for duplicates files
fdupes command searches duplicates in a folder indicated. The syntax is as below
fdupes [ options ] DIRECTORY
Let us create some duplicates files. We will create a folder and 10 files with the same content
# mkdir labor && for i in {1..10}; do echo "Hello, let us try fdupes command" > labor/drago${i} ; done
# mkdir labor/package && for i in {1..10}; do echo "Hello, let us try fdupes recursively" > labor/package/pack${i} ; done
Let's check the result
# ls -lR labor/ labor/: total 44 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago10 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago1 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago2 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago3 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago4 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago5 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago6 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago7 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago8 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago9 drwxr-xr-x 2 root root 4096 Sep 9 23:51 package labor/package: total 40 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack10 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack1 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack2 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack3 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack4 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack5 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack6 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack7 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack8 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack9
We see that all our file exist. Now we can search for duplicate files as below
# fdupes labor/ labor/drago8 labor/drago2 labor/drago7 labor/drago5 labor/drago1 labor/drago3 labor/drago4 labor/drago10 labor/drago6 labor/drago9
You can see that we have all the 10 duplicates files listed above.
Search duplicate file recursively and display the size
You have seen that the result above doesn' show the duplicated files created earlier in labor/package directory. To search duplicated files into a directory and its sub-directories, we use the option -r
and you can see the size of each duplicates files with -S
parameter as below
# fdupes -rS labor 33 bytes each: labor/drago8 labor/drago2 labor/drago7 labor/drago5 labor/drago1 labor/drago3 labor/drago4 labor/drago10 labor/drago6 labor/drago9 37 bytes each: labor/package/pack10 labor/package/pack6 labor/package/pack4 labor/package/pack7 labor/package/pack1 labor/package/pack3 labor/package/pack5 labor/package/pack2 labor/package/pack8 labor/package/pack9
With the result you can understand that the duplicates files have the same size so the same content.
It is possible to omit the first file when researching for duplicated files
Delete the duplicated files
To delete duplicated files, we use the -d
parameter. fdupes will ask which files to preserve
# fdupes -rd labor/ [1] labor/drago8 [2] labor/drago2 [3] labor/drago7 [4] labor/drago5 [5] labor/drago1 [6] labor/drago3 [7] labor/drago4 [8] labor/drago10 [9] labor/drago6 [10] labor/drago9 Set 1 of 2, preserve files [1 - 10, all]: 1 [+] labor/drago8 [-] labor/drago2 [-] labor/drago7 [-] labor/drago5 [-] labor/drago1 [-] labor/drago3 [-] labor/drago4 [-] labor/drago10 [-] labor/drago6 [-] labor/drago9 [1] labor/package/pack10 [2] labor/package/pack6 [3] labor/package/pack4 [4] labor/package/pack7 [5] labor/package/pack1 [6] labor/package/pack3 [7] labor/package/pack5 [8] labor/package/pack2 [9] labor/package/pack8 [10] labor/package/pack9 Set 2 of 2, preserve files [1 - 10, all]: 8 [-] labor/package/pack10 [-] labor/package/pack6 [-] labor/package/pack4 [-] labor/package/pack7 [-] labor/package/pack1 [-] labor/package/pack3 [-] labor/package/pack5 [+] labor/package/pack2 [-] labor/package/pack8 [-] labor/package/pack9
We can check the result as below.
# ls -lR labor/ labor/: total 8 -rw-r--r-- 1 root root 33 Sep 9 23:51 drago8 drwxr-xr-x 2 root root 4096 Sep 10 00:07 package labor/package: total 4 -rw-r--r-- 1 root root 37 Sep 9 23:51 pack2
You can see that we have preserved drago8 and pack2 files
Conclusion
We have seen how to delete duplicated files on Linux both graphically and command line. You can use one the tools depending on your needs. It is important to check the duplicated file in order to save space on your server.
I recommend "Duplicate Files Deleter", it finds duplicate files and delete them. Its amazingly fast and user friendly
Does the programs actually compare the contents of the files that have matching attributes (usually name, size and modification time.)? In any case, is catching corruption before deletion a possible setting?
Which App you looking for, as each using a different algorithm to action.