1

I have a huge archive of files on which I've screwed up downloading from sftp and the folder structure is incorrect.

The result is many cases where folders are one level deeper than they should be (think screwing up an rsync command).

For example:

/foo/bar/bar
/foo/bar/quux/quux
/foo/bar/baz/quux/quux

That is, the extra folder isn't necessarily always at the same number of folders down from the root. It's most likely always a leaf and its immediate parent with the same name.

Is there a nice scriptable way (bash, Powershell or even cmd) to process a folder recursively, something like this pseudocode:

  let leafFolders = findLeafFoldersSomehow();  // an array of fully qualified paths

for (let folder of leafFolders) {

if ( getParentFolderName(folder) == getName(folder) ) {
   // move all files and folders in folder into parentFolder
   // delete folder if empty
}

}

I'm currently processing these with a combination of windows batch files and robocopy, but this really only works for duplicates at the same level and I have to run it manually each time.

I'd really prefer a safe automatic way to "collapse" the duplicate folder names down. I'm pretty confident there are no legitimate cases where the files should have a folder and subfolder of the same name. I'm also pretty sure this only affects leaf folders and their parents and there's nothing like e.g. /foo/bar/bar/baz/quux

Please note I cannot just redownload the archive with the correct rsync/lftp arguments again; this is over 500GB of files and I no longer have access to the server.

Is there any relatively simple way to accomplish this with scripting, or something like rsync? I could go so far as writing something in node.js or C# but I'd rather avoid that in favor of using bash, Powershell, or even cmd.

I'm on Windows 10, but while I'm not a Linux maven, I can use WSL with bash if need be.

Dan Novak
  • 206

1 Answers1

0

you can create bash script and pass workdir as argument

sudo -H bash /path/to/myscript.sh /foo/bar

script for GNU find -print0 with NUL separated dirnames

#!/bin/bash

default workdir

[ -z "$1" ] && set -- "$PWD"

find -L "$@" -depth -mindepth 1 -type d -links 2 -print0 |
while IFS= read -r -d $'\0' dir do dir=$(realpath "$dir") par="${dir%/}" [ -d "$dir" ] || continue if [ "${par##/}" = "${dir##/}" ] then cp -aflPx --backup=t "$dir" "${par%/}" find "$dir" -delete fi done

each leaf is passed two times. first pass is hard-linking files from leaf to parent

  • cp -l is used as replacement for mv
  • cp --backup=t will rename existing files

second pass will clean up leaf. because the files are hard-linked, only the old file paths are deleted, while inodes and file contents (including all metadata) are preserved

  • find -delete is used as replacement for rm -rf

Note the -depth flag will start from leaf. first and second pass is processed for each leaf.

[ -d "$dir" ] || continue will skip non-existing directories. this is just double-check in case find does not recognize changes


Edit: only leafs will moved with this trick (except for Btrfs) thx @ Gohu

alecxs
  • 396
  • 1
  • 4
  • 15