4

I am trying to find a solution similar to the one used below to find the Top N oldest files (modification time) on my AIX system starting from a given directory and digging through all sub-directories as well under it. Unfortunately printf is not supported on AIX ( my version being 7.1) find command. Is there an alternative way to accomplish the same task on AIX?

$ find /home/sk/ostechnix/ -type f -printf '%T+ %p\n' | sort | head -n 5

Source: https://ostechnix.com/find-oldest-file-directory-tree-linux/

AIX man page for find command

phuclv
  • 30,396
  • 15
  • 136
  • 260
pchegoor
  • 151

6 Answers6

3

This is a POSIX solution.

stat could help but it is not required by POSIX. In general POSIX tools cannot fully replace stat. Parsing ls -l to get mtime is a non-trivial task.

The only(?) approach that is relatively straightforward is find -newer:

# parameters (adjust them)
set -- /home/sk/ostechnix/ /another/starting/point "/and another/"
N=5

fixed code (nothing to adjust)

find "$@" -type f -exec sh -c ' f="$1" shift c="$(find "$@" -type f ! -newer "$f" | wc -l)" printf "%s\t%s\n" "$c" "$f" ' find-sh {} "$@" ; | sort -k 1n,1 | head -n "$N" | cut -f 2-

For each file the code finds and counts files that are not newer (in terms of mtime i.e. modification time). The rest is sort … | head … | cut ….

Notes:

  • This will fail for pathnames containing newline characters.

  • Results cannot be trusted if files are added, removed or modified when the code runs.

  • The solution does not scale well with the number of files. I think it's O(n2). Start testing on a directory containing at most few hundred files. I can think of one or two approaches that could scale better, but they are quite complicated and you'd better just compile GNU find that supports -printf.

  • It may seem we can interrupt the inner find when it finds big enough number of files older than the current file, because at some number we can be sure the file cannot be among N oldest files, right? But ! -newer means "older or equally old" and there may be arbitrary many files that are equally old. I tested this. An "optimization":

    c="$(find … | head … | wc -l)"
    

    can significantly speed things up, but in case when there are equally old files the results can be wrong. I won't elaborate. I think if find provided something like -older (strictly older, not the same as ! -newer) than we could optimize this way.

  • find-sh is explained here: What is the second sh in sh -c 'some shell code' sh?.

  • The code supports multiple starting paths. Keep in mind if the same file appears under two (or more) starting paths then it will be considered by find as two (or more) files (e.g. cd /dev && find ./ /dev /dev/null /dev//null | grep /null prints four lines referring to the same file). In such case it may happen that two or more of N oldest files will be the same file. The command you found is similar in this matter. Specifying paths that don't overlap is the right thing.

  • If you want to specify just one starting path then you can speed things up a little by using this modified code instead:

    # parameters (adjust them)
    start=/home/sk/ostechnix/
    N=5
    

    fixed code (nothing to adjust)

    find "$start" -type f -exec sh -c ' start="$1" shift for f do c="$(find "$start" -type f ! -newer "$f" | wc -l)" printf "%s\t%s\n" "$c" "$f" done ' find-sh "$start" {} + | sort -k 1n,1 | head -n "$N" | cut -f 2-

    Here we spawn fewer shells by passing multiple results from the outer find to a shell. We couldn't easily use this trick in the previous code because we needed to pass an arbitrary number of starting paths.

  • Run the code in a subshell, so it doesn't affect anything ($N, $f, $c, positional parameters or $start) in your main shell.

  • In general find runs many, many times; it tests the same set of files (not with exactly the same tests though). If there are some problems like permission denied then they will appear many times. Consider redirecting stderr to /dev/null, at least for the inner finds (c="$(find … 2>/dev/null | …)").

2

When faced with such limitations, I usually turn to perl, which is typically installed on AIX systems. The File::Find module is helpful for doing the heavy lifting. The script below uses that module to discover files, just like find would, and captures the modification timestamps as it goes along, using perl's stat() function. Once it's collected the files, it trims the results down to the "N" given, sorted by the oldest modification date. I've split out a line of code that strips the timestamp out; if you want to mimic the find ... printf '%T+ %p\n' behavior of printing the timestamp next to the filename.

Using perl has advantages over shell-code workarounds because:

  • it doesn't choke on filenames with whitespace (particularly newlines) or other escape characters
  • it doesn't require the support of a GNU date program
  • it doesn't rely on parsing the output of IBM's istat program

There is potential confusion when outputting the filenames; here, I've delimited them with newlines, but be aware that a filename containing a newline will be visually mistaken for an additional file.

I'm by no means a perl expert, so the code is a little bit "brute-force" and simple, but I appreciate obvious simplicity in the face of future maintenance concerns, unless & until performance or memory restrictions are a concern. Note that the current script requires enough memory to store all the filenames and timestamps as an array of strings, plus as many sorted results as was requested.

#!/usr/bin/perl -w
# prints N oldest files

use strict; use File::Find ();

expect at least 2 arguments: N and 1 or more starting directories;

$#ARGV is "number of arguments minus one", counting from [0]

if ($#ARGV < 1) { die "Usage: $0 N dir1 ..." }

my $n = shift; unless ($n =~ /^\d+$/ && $n > 0) { die "$0: N must be a positive integer" }

my @results = ();

sub wanted { return unless -f $; push (@results, (stat($))[9] . " " . $_); }

using "no_chdir=1" so that stat() and $_ have the full filepath

File::Find::find({wanted => &amp;wanted, no_chdir => 1}, @ARGV);

array is zero-based, so subtract one...

--$n;

keep N within the range of the results

$n = $#results if $n > $#results;

a plain numeric sort works with the age-in-seconds as the leading data;

files with the same timestamp will then sort on filename

my @oldest = (sort @results)[0..$n];

strip the timestamp out

@oldest = map { $_ =~ s/^\d+ //; $_; } @oldest;

print join("\n", @oldest) . "\n";

1

This formulation might work on AIX:

find /home/sk/ostechnix/ -type f| while read line; do echo "$(date +%s -r "$line") $line"; done|sort -n -k1|cut -d' ' -f2-

source

harrymc
  • 498,455
1

After searching the web, I came up with the below one-line solution based mostly on the answer provided here: Unix/Linux find and sort by date modified and Perl one-liner, printing filename as part of output

The below solution first uses find command to find files (starting from current directory) and then pipes the output to perl command to do the sort and then pipes the resultant list of files into another perl command to get the timestamp of each of the file in a desired format. The result will show the top 5 oldest files.

I am not a Perl expert but I am guessing the below can further be simplified. Please do let me know if this is the case. The below solution seems to be working fine as of now. It works well on my AIX system.

find . -type f -print  2>/dev/null |
perl -l -ne '
$_{$_} = -M;  
END {
    $,="\n";
    @sorted = sort {$_{$b} <=> $_{$a}} keys %_;  
    print @sorted[0..5];
}'  | xargs -I {} perl -MPOSIX   -e 'print "\n $ARGV[0] -------> $1 " . strftime("%A %Y-%m-%d  %H:%M:%S", localtime((stat "$ARGV[0]")[9]))  '  {}

This gives an output as follows:

./file1.txt ---> Sunday 2018-03-04 15:20:32
./sample/file2.sh ---> Sunday 2019-01-27 08:30:45
./test/file3.txt ---> Tuesday 2019-05-21 18:45:32
./sample/temp/file4.sh ---> Friday 2019-12-27 12:30:45
./file5.txt ---> Tuesday 2020-06-13 15:20:32
phuclv
  • 30,396
  • 15
  • 136
  • 260
pchegoor
  • 151
1

The simplest solution without any 3rd party tools is to enable globstar in ksh and let ls sorts by itself

set -o globstar
ls -dlrt /home/sk/ostechnix/** | head -5

-dlrt is POSIX-compliant so this works in all POSIX environments, not only AIX, as long as your shell supports globstar

** in this snippet will expand to all files and folders in the specified path. -d will prevent folder contents from being expanded; and -rt will sort all the entries in the argument, oldest first. But this includes directories in the list so if you want to get oldest files only use this

ls -dlrt /home/sk/ostechnix/** | egrep -v '^d' | head -5

Also it won't work for hidden files. It's much easier in zsh where you can control the globbing at call site, but in ksh you'll need to do expand them explicitly

dirpath=/home/sk/ostechnix
# Get all files and folders
ls -dlrt "$dirpath"/** "$dirpath"/**/.* | egrep -v '/\.?\.$' | head -5
# Get only files
ls -dlrt "$dirpath"/** "$dirpath"/**/.* | egrep -v '^d' | head -5

The former command has a tiny downside: the . and .. entry of $dirpath will be shown if you run without a path before ** like ls -dlrt ** **/.* | egrep -v '/\.?\.$'. It can be filtered out with grep

ls -dlrt ** **/.* | egrep -v '/\.?\.$| \.?\.$'
# Or
ls -dlrt ** **/.* | egrep -v -e '/\.?\.$' -e ' \.?\.$'
# Or
ls -dlrt ** **/.* | egrep -v -e '/\.$' -e '/\.\.$' -e ' \.\.$' -e ' \.$'

But as always, parsing ls -l output is not reliable so there'll be some files with special names which will be matched unexpectedly. For example your find command and the above alternatives won't work correctly for newlines in filenames


You can remove -l if you don't want the long format but to filter out folders and get files only you'll need to use a different way

ls -dprt /home/sk/ostechnix/** | egrep -v '/$' | head -5

Or ls -dprt ** **/.* | egrep -v '/$' | head -5 to get hidden files

AIX also includes csh which seems to also support globstar. If you have bash you can enable the same option with shopt -s globstar and in zsh you can use GLOB_STAR_SHORT with setopt globstarshort or set per-glob globbing options. Similarly in tcsh use set globstar

For more details about globstar in each shell please read The result of ls *, ls ** and ls ***

phuclv
  • 30,396
  • 15
  • 136
  • 260
1

AIX has the syscall(1) command so technically you can do pretty much anything you want on the command line without resorting to ls or any 3rd party solutions including Perl

syscall [ -n ] Name [ Argument1 ... ArgumentN ] [ ; Name [ Argument1 ... ArgumentN ] ] ... 

You can use it to call the stat/statx/stat64/stat64x... syscall to get the st_mtime field in fullstat.h, or st_mtime/st_mtime_n in stat.h. Something like this

find $StartPath -type f | \
    xargs -I{} -n 1 syscall stat {} \&0 | \
    parse-stat-output-and-sort

Since I don't have AIX I can't verify this and write a parser for the output. You'll need to check the offset of st_mtime yourself and read that in the output

There seems to be a port of this syscall(1) to Linux and possibly other POSIX systems though: oliwer/syscall

syscall [-<n>] name [args...] [, name [args...]]...

It uses , instead of ;. But someone not on AIX might be able to work out a workable solution from this


Off-topic, but Plan9 also has a similar command with somewhat different syntax:

syscall [ -o ] entry [ arg ...  ]

and there's also a Linux port of this: mauri870/syscall

phuclv
  • 30,396
  • 15
  • 136
  • 260