Check if two paths are the same, even if their strings don't match exactly

Question

I have a script that depends on some environment variables, some of which are paths

The script checks if it's being run in the correct directory by checking $(pwd)/expected-subdir against $VARIABLE_PATH/expected_subdir

Of course if $VARIABLE_PATH == $(pwd)/ (I naturally appended a slash to the end of the variable when typing it), the string match fails even though the directories are technically sane.

Is there a way to uniquely identify a directory, given a path to it, and see if another path will route you to the same spot? It's possible the path given is not sane, so this check should still be done, but it seems unnecessary to force the path to exactly match, when a stray slash or even a relative directory (I know these can be bad sometimes) would route you to the same spot.

score 2 · Answer 1 · answered Jan 21 '19 at 14:54

You can compare their inode numbers, which can be obtained from ls -i

ls -di path/to/dir

(-d prevents ls from showing the inode ids for the contents of the directory);

or from stat

stat path/to/dir | grep -o 'Inode:\ [0-9]*' | cut -d' ' -f2

You can also use readlink -f to get the full path to the directory.

Kamil Maciorowski · Accepted Answer · 2019-01-22T09:45:17.753

This other answer mentions stat. Let's improve the approach:

a="$(stat --format=%d:%i -- "$(pwd)/expected-subdir/")"         || printf "Error A\n" >&2
b="$(stat --format=%d:%i -- "$VARIABLE_PATH/expected-subdir/")" || printf "Error B\n" >&2
[ "$a" = "$b" ] || printf "Not the same path.\n" >&2

Notes:

We're using stat --format=, so no further parsing is required.
%d is device number, %i is inode number. The latter alone is not enough because two objects on different devices (filesystems) may have the same inode number.
The shell is smart enough to separately handle quotes inside and outside $( ).
printf-s are just stubs for error handling. In the real script you'd probably want something like
```
[ "$a" = "$b" ] || { printf "Not the same path.\n" >&2; exit 2; }
```
Nonexistent path will make stat throw an error.
Trailing slashes are important. If foo is a symlink to a directory, stat foo will examine the symlink while stat foo/ will examine the directory. Alternatively research stat -L and use it maybe.
-- is in case $(pwd) or $VARIABLE_PATH begins with - (see guideline 10 here).

There are readlink -f (also mentioned in the other answer) and realpath utilities. One of them may be a better approach than stat. It depends on which aspects of paths are important to you. Main differences:

If /foo/bar/ is a bind mount to /moo/baz/ then stat --format=%d:%i will tell you they are the same, but readlink or realpath will consider them different.
The situation gets complicated with symlinks and ... See man 1 realpath, especially -L and -P options.

With symlinks involved, if your script changes the directory upwards (towards /, e.g. cd ..), you may get different results depending on where you start, even if stat or readlink -f says the two starting points are the same (compare Why does ls .. show real parent content when I'm inside a symbolic link directory?). Consider this approach:

# early in the script
set -P
cd .   # seems like no-op but updates the PWD variable to physical path

With bind mounts involved, if your script changes the directory, you may get different results depending on where you start, even if stat says the two starting points are the same. Consider the example with /foo/bar/ and /moo/baz/ (already introduced above). It's obvious /foo/ may be completely different than /moo/. But also /foo/bar/abc may be different than /moo/baz/abc because any abc may be an independent bind mount to something else. So not only cd .. may place you in a different spot, but also cd abc.

Well, abc may be a file. What if you bind another file to /moo/baz/abc but not to /foo/bar/abc? The perceived files will be different even if stat says you're in the same place!

Because of these problems you may indeed prefer readlink or realpath over stat.

stat, readlink and realpath are not required by POSIX. In your case a portable solution may look like this:

a="$(cd -P -- "expected-subdir" && pwd -P)"                || exit 1
b="$(cd -P -- "$VARIABLE_PATH/expected-subdir" && pwd -P)" || exit 1
[ "$a" = "$b" ] || { printf "Not the same path.\n" >&2; exit 2; }

It works by cd-ing to either path and retrieving it with pwd. These operations are explicitly forced to act "physically" (-P).

set -P is not POSIX either. If you want POSIX shell to "emulate" set -P, replace cd and pwd:

cd()  { command cd  -P "$@"; }
pwd() { command pwd -P "$@"; }

POSIX requires the last -P or -L option to take effect, so pwd -L still retrieves the logical path even though the function "injects" -P; the same for cd -L.

Check if two paths are the same, even if their strings don't match exactly

2 Answers2