167

I run

ln /a/A /b/B

I would like to see at the folder a where the file A points to by ls.

Oliver Salzburg
  • 89,072
  • 65
  • 269
  • 311

12 Answers12

262

You can find inode number for your file with

ls -i

and

ls -l

shows references count (number of hardlinks to a particular inode)

after you found inode number, you can search for all files with same inode:

find . -inum NUM

will show filenames for inode NUM in current dir (.)

dotsbyname
  • 2,743
84

There isn't really a well-defined answer to your question. Unlike symlinks, hardlinks are indistinguishable from the "original file".

Directory entries consist of a filename and a pointer to an inode. The inode in turn contains the file metadata and (pointers to) the actual file contents). Creating a hard link creates another directory entry that references the same inode. These references are unidirectional (in typical filesystems, at least) -- the inode only keeps a reference count. There is no intrinsic way to find out which is the "original" filename.

By the way, this is why the system call to "delete" a file is called unlink. It just removes a hardlink. The inode and attached data are deleted only if the inode's reference count drops to 0.

The only way to find the other references to a given inode is to exhaustively search over the file system checking which files refer to the inode in question. You can use 'test A -ef B' from the shell to perform this check.

38

UNIX has hard links and symbolic links (made with "ln" and "ln -s" respectively). Symbolic links are simply a file that contains the real path to another file and can cross filesystems.

Hard links have been around since the earliest days of UNIX (that I can remember anyway, and that's going back quite a while). They are two directory entries that reference the exact same underlying data. The data in a file is specified by its inode. Each file on a file system points to an inode but there's no requirement that each file point to a unique inode - that's where hard links come from.

Since inodes are unique only for a given filesystem, there's a limitation that hard links must be on the same filesystem (unlike symbolic links). Note that, unlike symbolic links, there is no privileged file - they are all equal. The data area will only be released when all the files using that inode are deleted (and all processes close it as well, but that's a different issue).

You can use the "ls -i" command to get the inode of a particular file. You can then use the "find <filesystemroot> -inum <inode>" command to find all files on the filesystem with that given inode.

Here's a script which does exactly that. You invoke it with:

findhardlinks ~/jquery.js

and it will find all files on that filesystem which are hard links for that file:

pax@daemonspawn:~# ./findhardlinks /home/pax/jquery.js
Processing '/home/pax/jquery.js'
   '/home/pax/jquery.js' has inode 5211995 on mount point '/'
       /home/common/jquery-1.2.6.min.js
       /home/pax/jquery.js

Here's the script.

#!/bin/bash
if [[ $# -lt 1 ]] ; then
    echo "Usage: findhardlinks <fileOrDirToFindFor> ..."
    exit 1
fi

while [[ $# -ge 1 ]] ; do
    echo "Processing '$1'"
    if [[ ! -r "$1" ]] ; then
        echo "   '$1' is not accessible"
    else
        numlinks=$(ls -ld "$1" | awk '{print $2}')
        inode=$(ls -id "$1" | awk '{print $1}' | head -1l)
        device=$(df "$1" | tail -1l | awk '{print $6}')
        echo "   '$1' has inode ${inode} on mount point '${device}'"
        find ${device} -inum ${inode} 2>/dev/null | sed 's/^/        /'
    fi
    shift
done
paxdiablo
  • 7,138
34
ls -l

The first column will represent permissions. The second column will be the number of sub-items (for directories) or the number of paths to the same data (hard links, including the original file) to the file. Eg:

-rw-r--r--@    2    [username]    [group]    [timestamp]     HardLink
-rw-r--r--@    2    [username]    [group]    [timestamp]     Original
               ^ Number of hard links to the data
18

How about the following simpler one? (Latter might replace the long scripts above!)

If you have a specific file <THEFILENAME>and want to know all its hardlinks spread over the directory <TARGETDIR>, (which can even be the entire filesystem denoted by /)

find <TARGETDIR> -type f -samefile  <THEFILENAME>

Extending the logic, if you want to know all the files in the <SOURCEDIR> having multiple hard-links spread over <TARGETDIR>:

find <SOURCEDIR> -type f -links +1   \
  -printf "\n\n %n HardLinks of file : %H/%f  \n"   \
  -exec find <TARGETDIR> -type f -samefile {} \; 
17

There are a lot of answers with scripts to find all hardlinks in a filesystem. Most of them do silly things like running find to scan the whole filesystem for -samefile for EACH multiply-linked file. This is crazy; all you need is to sort on inode number and print duplicates.

With only one pass over the filesystem to find and group all sets of hardlinked files

find dirs   -xdev \! -type d -links +1 -printf '%20D %20i %p\n' |
    sort -n | uniq -w 42 --all-repeated=separate

This is much faster than the other answers for finding multiple sets of hardlinked files.
find /foo -samefile /bar is excellent for just one file.

  • -xdev : limit to one filesystem. Not strictly needed since we also print the FS-id to uniq on
  • ! -type d reject directories: the . and .. entries mean they're always linked.
  • -links +1 : link count strictly > 1
  • -printf ... print FS-id, inode number, and path. (With padding to fixed column widths that we can tell uniq about.)
  • sort -n | uniq ... numeric sort and uniquify on the first 42 columns, separating groups with a blank line

Using ! -type d -links +1 means that sort's input is only as big as the final output of uniq so we aren't doing a huge amount of string sorting. Unless you run it on a subdirectory that only contains one of a set of hardlinks. Anyway, this will use a LOT less CPU time re-traversing the filesystem than any other posted solution.

sample output:

...
            2429             76732484 /home/peter/weird-filenames/test/.hiddendir/foo bar
            2429             76732484 /home/peter/weird-filenames/test.orig/.hiddendir/foo bar
        2430             17961006 /usr/bin/pkg-config.real
        2430             17961006 /usr/bin/x86_64-pc-linux-gnu-pkg-config

        2430             36646920 /usr/lib/i386-linux-gnu/dri/i915_dri.so
        2430             36646920 /usr/lib/i386-linux-gnu/dri/i965_dri.so
        2430             36646920 /usr/lib/i386-linux-gnu/dri/nouveau_vieux_dri.so
        2430             36646920 /usr/lib/i386-linux-gnu/dri/r200_dri.so
        2430             36646920 /usr/lib/i386-linux-gnu/dri/radeon_dri.so

...

TODO?: un-pad the output with awk or cut. uniq has very limited field-selection support, so I pad the find output and use fixed-width. 20chars is wide enough for the maximum possible inode or device number (2^64-1 = 18446744073709551615). XFS chooses inode numbers based on where on disk they're allocated, not contiguously from 0, so large XFS filesystems can have >32bit inode numbers even if they don't have billions of files. Other filesystems might have 20-digit inode numbers even if they aren't gigantic.

TODO: sort groups of duplicates by path. Having them sorted by mount point then inode number mixes things together, if you have a couple different subdirs that have lots of hardlinks. (i.e. groups of dup-groups go together, but the output mixes them up).

A final sort -k 3 would sort lines separately, not groups of lines as a single record. Preprocessing with something to transform a pair of newlines into a NUL byte, and using GNU sort --zero-terminated -k 3 might do the trick. tr only operates on single characters, not 2->1 or 1->2 patterns, though. perl would do it (or just parse and sort within perl or awk). sed might also work.


What are hardlinks, really?

Symlinks are when one name points to another name. Hard links are when multiple names exist for the same file (inode); they all just refer to the inode, not each other.

A directory entry is a string (name) and an inode number. So a filename effectively "points to" an inode. We call it "hard linking" when you have multiple names pointing to the same inode.

When you make a hard link with ln, you specify two file names, but the newly created hard link just references the inode directly, not the filename or path. After a link(2) system call, there's no sense in which one is the original and one is the link.

This is why, as the answers point out, the only way to find all the links is find / -samefile /foo/bar to search the FS for one with the same inode number. One directory entry for an inode doesn't "know about" other directory entries for the same inode, and the inode itself doesn't have references back to dirents that point at it.

The inode only has a reference count. This lets the kernel delete it and free the disk space (for it and the file contents it points to) when the last name for it is unlink(2)ed (and the last open file descriptor is closed, and mmap is munmapped).

This is the "link count" in ls output.

Peter Cordes
  • 6,345
4

This is somewhat of a comment to Torocoro-Macho's own answer and script, but it obviously won't fit in the comment box.


Rewrote your script with more straightforward ways to find the info, and thus a lot less process invocations.

#!/bin/sh
xPATH=$(readlink -f -- "${1}")
for xFILE in "${xPATH}"/*; do
    [ -d "${xFILE}" ] && continue
    [ ! -r "${xFILE}" ] && printf '"%s" is not readable.\n' "${xFILE}" 1>&2 && continue
    nLINKS=$(stat -c%h "${xFILE}")
    if [ ${nLINKS} -gt 1 ]; then
        iNODE=$(stat -c%i "${xFILE}")
        xDEVICE=$(stat -c%m "${xFILE}")
        printf '\nItem: %s[%d] = %s\n' "${xDEVICE}" "${iNODE}" "${xFILE}";
        find "${xDEVICE}" -inum ${iNODE} -not -path "${xFILE}" -printf '     -> %p\n' 2>/dev/null
    fi
done

I tried to keep it as similar to yours as possible for easy comparison.

Comments on this script and yours

  • One should always avoid the $IFS magic if a glob suffices, since it is unnecessarily convoluted, and file names actually can contain newlines (but in practice mostly the first reason).

  • You should avoid manually parsing ls and such output as much as possible, since it will sooner or later bite you. For example: in your first awk line, you fail on all file names containing spaces.

  • printf will often save troubles in the end since it is so robust with the %s syntax. It also gives you full control over the output, and is consistent across all systems, unlike echo.

  • stat can save you a lot of logic in this case.

  • GNU find is powerful.

  • Your head and tail invocations could have been handled directly in awk with e.g. the exit command and/or selecting on the NR variable. This would save process invocations, which almost always betters performance severely in hard-working scripts.

  • Your egreps could just as well be just grep.

3

You can configure ls to highlight hardlinks using an 'alias', but as stated before there is no way to show the 'source' of the hardlink which is why I append .hardlink to help with that.

highlight hardlinks

Add the following somewhere in your .bashrc

alias ll='LC_COLLATE=C LS_COLORS="$LS_COLORS:mh=1;37" ls -lA --si --group-directories-first'
2

Based on the findhardlinks script (renamed it to hard-links), this is what I have refactored and made it work.

Output:

# ./hard-links /root

Item: /[10145] = /root/.profile
    -> /proc/907/sched
    -> /<some-where>/.profile

Item: /[10144] = /root/.tested
    -> /proc/907/limits
    -> /<some-where else>/.bashrc
    -> /root/.testlnk

Item: /[10144] = /root/.testlnk
    -> /proc/907/limits
    -> /<another-place else>/.bashrc
    -> /root/.tested

 

# cat ./hard-links
#!/bin/bash
oIFS="${IFS}"; IFS=$'\n';
xPATH="${1}";
xFILES="`ls -al ${xPATH}|egrep "^-"|awk '{print $9}'`";
for xFILE in ${xFILES[@]}; do
  xITEM="${xPATH}/${xFILE}";
  if [[ ! -r "${xITEM}" ]] ; then
    echo "Path: '${xITEM}' is not accessible! ";
  else
    nLINKS=$(ls -ld "${xITEM}" | awk '{print $2}')
    if [ ${nLINKS} -gt 1 ]; then
      iNODE=$(ls -id "${xITEM}" | awk '{print $1}' | head -1l)
      xDEVICE=$(df "${xITEM}" | tail -1l | awk '{print $6}')
      echo -e "\nItem: ${xDEVICE}[$iNODE] = ${xITEM}";
      find ${xDEVICE} -inum ${iNODE} 2>/dev/null|egrep -v "${xITEM}"|sed 's/^/   -> /';
    fi
  fi
done
IFS="${oIFS}"; echo "";
1

This thread had been thoroughly chewed but I think I can still contribute.

You can find ALL of the Hardlinks of a file by using this command:

sudo find / - inum `ls -i | grep <file-name> | cut -d' ' -f1` 
  • replace <file-name> with your the name of your file.

Basically what it does is looking for the file name(grep) in the current directory(ls), separating the inum(cut). Then it passes the result of the inum(using ``) into the find command to look for all the files in the filesystem that correspond to the exact inum.

Note: This command uses sudo because it looks for ALL the files in the root(/). this can be changed to the any directory path.

The advantages of this to other answers are the use of simple and clear commands, would also reduce @zzr's answer to a single line of code.

umlal
  • 11
  • 1
1

A GUI solution gets really close to your question:

You cannot list the actual hardlinked files from "ls" because, as previous commentators have pointed out, the file "names" are mere aliases to the same data. However, there actually is a GUI tool that gets really close to what you want which is to display a path listing of file names that point to the same data (as hardlinks) under linux, it is called FSLint. The option you want is under "Name clashes" -> deselect "checkbox $PATH" in Search (XX) -> and select "Aliases" from drop-down box after "for..." towards the top-middle.

FSLint is very poorly documented but I found that making sure the limited directory tree under "Search path" with the checkbox selected for "Recurse?" and the aforementioned options, a listing of hardlinked data with paths and names that "point" to the same data are produced after the program searches.

Charles
  • 11
-2

i just needed to use something like this ...

alias ...='__ () { ls -ai "${@}" | while read _i_ _u_ ; do find ../.. -inum "${_i_}" -printf "${_u_}\t%p" ; echo ; done ; } ; __'

... gives a tabbed list of all names connected with the inodes that ls discovers with what ever arguments are given, from the relative location '../..' in this case. Not totally bullet proof, just a sketch that fitted my need to find the hard links in some directories i was untangling ...

most of the time there will just be 2 columns in the output, a name of the entry from 'ls', and then the relative path from 'find' to the same inode ... if 'find' finds other entries that are for the same inode, they will appear in 3rd or 4th or nth columns in the output

sol
  • 101