rdfind 找出重複檔案的小工具

在 linux 系統下要找出重複的檔案,可以使用「rdfind」這個小工具。小巧輕便使用簡單,下載到安裝不到 1M 大小。可以用 md5、或是 sha1 編碼找出重複的檔案,或是給定參數決定重複的檔案要如何處理,處理模式包含「軟連結取代、硬連結取代、刪除」


sudo apt-get install rdfind


Search for duplicate files in home directory and a backup directory:

rdfind ~ /mnt/backup

Delete duplicate in a backup directory:

rdfind -deletefiles true /mnt/backup

Search for duplicate files in directories called foo:

find . -type d -name foo -print0 |xargs -0 rdfind

要得到更詳細的使用說明可以用 「man rdfind」查詢。

rdfind(1)                                           rdfind                                          rdfind(1)

       rdfind - finds duplicate files

       rdfind [ options ] directory1 | file1 [ directory2 |file2 ] ...

       rdfind  finds duplicate files across and/or within several directories. It calculates checksum only if
       necessary.  rdfind runs in O(Nlog(N)) time with N being the number of files.

       If two (or more) equal files are found, the program decides which of them is the original and the rest
       are  considered duplicates. This is done by ranking the files to each other and deciding which has the
       highest rank. See section RANKING for details.

       If you need better control over the ranking than given, you can use some preprocessor which sorts  the
       file  names  in  desired order and then run the program using xargs. See examples below for how to use
       find and xargs in conjunction with rdfind.

       To include files or directories that have names starting with -, use rdfind ./- to  not  confuse  them
       with options.

       Given  two  or  more equal files, the one with the highest rank is selected to be the original and the
       rest are duplicates. The rules of ranking are given below, where the rules  are  executed  from  start
       until  an original has been found. Given two files A and B which have equal content, the ranking is as

       If A was found while scanning an input argument earlier than than B, A is higher ranked.

       If A was found at a depth lower than B, A is higher ranked (A closer to the root)

       If A was found earlier than B, A is higher ranked.

       The last rule is needed when two files are found in the same directory (obviously not given  in  sepa‐
       rate  arguments,  otherwise  the first rule applies) and gives the same order between the files as the
       operating system delivers the files while listing the directory. This is operating system specific be‐

       Searching options etc:

       -ignoreempty true|false
              Ignore empty files. (default)

       -followsymlinks true|false
              Follow symlinks. Default is false.

       -removeidentinode true|false
              removes items found which have identical inode and device ID. Default is true

       -checksum md5|sha1
              what type of checksum to be used: md5 or sha1. Default is md5.

       Action options:

       -makesymlinks true|false
              Replace duplicate files with symbolic links

       -makehardlinks true|false
              Replace duplicate files with symbolic links

       -makeresultsfile true|false
              Make a results file results.txt (default) in the current directory.

       -outputname name
              Make the results file name to be "name" instead of the default results.txt.

       -deleteduplicates true|false
              Delete (unlink) files.

       General options:

       -sleep Xms
              sleeps  X milliseconds between reading each file, to reduce load. Default is 0 (no sleep). Note
              that only a few values are supported at present: 0,1-5,10,25,50,100 milliseconds.

       -n -dryrun
              displays what should have been done, dont actually delete or link anything.

       -h, -help, --help
              displays brief help message.

       -v, -version, --version
              displays version number.

       Search for duplicate files in home directory and a backup directory:
              rdfind ~ /mnt/backup

       Delete duplicate in a backup directory:
              rdfind -deletefiles true /mnt/backup

       Search for duplicate files in directories called foo:
              find . -type d -name foo -print0 |xargs -0 rdfind

       results.txt (the default name is results.txt and can be changed with option outputname, see above) The
       results  file  results.txt  will  contain  one  row  per duplicate file found, along with a header row
       explaining the columns.  A text describes why the file is considered a duplicate:

       DUPTYPE_UNKNOWN some internal error

       DUPTYPE_FIRST_OCCURENCE the file that is considered to be the original.

       DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing the directory in the same input
       argument as the original)

       DUPTYPE_OUTSIDE_TREE the file is found during processing another input argument than the original.

       0 on success, nonzero otherwise.

       When specifying the same directory twice, it keeps the first encountered as the most important (origi‐
       nal), and the rest as duplicates. This might not be what  you  want.   The  symlink  creates  absolute
       links.  There are lots of enhancements left to do. Please contribute!

       Avoid  manipulating  the  directories  while rdfind is reading.  rdfind is quite brittle in that case.
       Especially, when deleting or making links, rdfind can be subject to a symlink attack.  Use with care!

       Paul   Sundvall   2006,   reachable   at   rdfind@paulsundvall.net   Rdfind   can    be    found    at

       Do you find rdfind useful? Drop me a line! It is always fun to hear from people who actually use it.

       Several  persons have helped with suggestions and improvements: Niels Möller, Carl Payne and Salvatore
       Ansani. Thanks also to you who tested the program and sent me feedback.

       1.2.4 (release date 20090121) svn id: $Id: rdfind.1 567 2009-01-21 17:27:30Z pauls $

       This program is distributed under GPLv2.

       md5sum(1), find(1), symlinks(2)

January 2009                                        1.2.4                                           rdfind(1)



WordPress.com Logo

您的留言將使用 WordPress.com 帳號。 登出 / 變更 )

Twitter picture

您的留言將使用 Twitter 帳號。 登出 / 變更 )


您的留言將使用 Facebook 帳號。 登出 / 變更 )

Google+ photo

您的留言將使用 Google+ 帳號。 登出 / 變更 )

連結到 %s