Find and Remove Duplicate Files on Linux

Identifying and Removing Duplicate Files in Linux
For both desktop and server environments utilizing the Linux operating system, several effective utilities exist to identify and eliminate duplicate files.
These tools assist in reclaiming valuable disk space by locating and removing redundant data.
The Problem with Duplicate Files
Duplicate files represent an inefficient use of storage capacity. Maintaining multiple copies of the same data unnecessarily consumes disk space.
Instead of storing identical files in multiple locations, consider employing symbolic links or hard links.
These linking mechanisms allow you to access the same data from various points without duplicating the underlying file content, thus conserving disk space.
Available Tools
Solutions for detecting duplicate files are available with both graphical user interfaces (GUIs) and command-line interfaces (CLIs).
This provides flexibility for users with varying levels of technical expertise and preferences.
Both types of interfaces offer effective methods for scanning systems and managing duplicate files.
FSlint
Related: 4 Ways to Free Up Disk Space on Linux
FSlint is readily available within the software repositories of numerous Linux distributions, such as Ubuntu, Debian, Fedora, and Red Hat. Installation is easily accomplished through your system's package manager by installing the "fslint" package. The utility features a user-friendly graphical interface as its default mode of operation.
However, command-line versions of its functionalities are also included. Similar to many applications within the Linux ecosystem, the FSlint graphical interface serves as a front-end, utilizing the underlying FSlint commands.
Despite this, the convenience of FSlint’s graphical interface should not be overlooked. Upon launching, it defaults to the Duplicates pane with your home directory pre-selected as the search location. Initiating a scan is as simple as clicking the Find button.
FSlint will then generate a list of duplicate files found within the directories under your home folder. Files identified for removal can be deleted using the provided buttons, and a preview of each file is accessible with a double-click.

It’s important to note that the command-line tools are not automatically included in your system's PATH environment variable. Therefore, they cannot be executed as standard commands.
On Ubuntu systems, these utilities are located within the /usr/share/fslint/fslint directory. To perform a complete FSlint scan on a specific directory using the command line on Ubuntu, the following commands would be used:
cd /usr/share/fslint/fslint
./fslint /path/to/directory
This execution will only display a list of identified duplicate files; the deletion process requires separate action. The utility does not automatically remove files.

fdupes
Typically, the fdupes command is not pre-installed on most systems. However, it can be readily obtained from the software repositories of numerous Linux distributions. This utility represents a straightforward command-line application.
For users working solely within a Linux command-line interface, without access to a graphical user interface, fdupes is likely the most efficient and expedient tool for identifying duplicate files.
Its operation is remarkably simple. Execute the fdupes command, appending the directory path you wish to examine. For instance, fdupes /home/chris will display all duplicate files located within the /home/chris directory, but it won't scan subdirectories.
To conduct a recursive search, including all subdirectories within a specified path, utilize the fdupes -r /home/chris command. This will identify and list duplicate files throughout the entire directory structure under /home/chris.
The tool functions as a reporting mechanism; it does not automatically delete any files. Instead, it presents a list of identified duplicates. Manual deletion of these files is then possible, if desired.
Alternatively, the command can be invoked with the -d option to facilitate file deletion. When used, the system will prompt you to select which files to retain, guiding the removal process.

dupeGuru, dupeGuru Music Edition, and dupeGuru Pictures Edition
Related: Installing Software Outside Ubuntu's Repositories
We are revisiting dupeGuru with another recommendation. This open-source, cross-platform tool proves remarkably useful, having already been suggested for identifying duplicate files on Windows systems and for cleaning up redundant files on macOS.
dupeGuru presents a slight inconvenience as it isn't typically included in the software repositories of many Linux distributions. However, it is available within Arch Linux's repositories. The dupeGuru website provides a Personal Package Archive (PPA) facilitating straightforward installation of their software packages on Ubuntu and related Linux distributions.
Users operating different Linux distributions have the option to compile the software directly from its source code.
Available Editions
Similar to its availability on Windows and Mac, dupeGuru is offered in three distinct editions. These include a standard edition for general duplicate file scanning, a specialized edition designed to locate duplicate songs potentially ripped or encoded with varying settings, and an edition focused on identifying similar photos that may have undergone rotation, resizing, or other modifications.
All three editions are accessible for download from the dupeGuru website and are also provided through the Ubuntu PPA.
The application functions consistently across platforms. Initiate the program, designate the folders you wish to scan, and then commence the scan process.
A list of identified duplicate files will be displayed, allowing you to select and remove them, or alternatively, relocate them to different directories. Files can also be quickly opened for inspection with a simple double-click.
Following installation, the Ubuntu package requires launching from the command line. For instance, the standard edition is initiated using the dupeguru_se command.
Notably, a desktop shortcut isn't installed by default. This limited system integration is the primary reason preventing a higher recommendation, as the utility performs effectively once installed and launched.

It's important to acknowledge that this isn't an exhaustive compilation. Numerous other duplicate file finding utilities exist within your Linux distribution's package manager, predominantly command-line tools lacking a graphical user interface.
For most users and their requirements, the tools detailed above represent our preferred choices and those we confidently recommend.
- dupeGuru: A versatile solution for general duplicate file removal.
- dupeGuru Music Edition: Specifically tailored for music file duplicates.
- dupeGuru Pictures Edition: Ideal for identifying visually similar images.