11

I have 16 subdirectories which all contain somewhere between 1m-1.5m files each (roughly 18m files in total), but I need all the files to be in a single directory. Each file is tiny (35-100 bytes each). The total combined size of the files is relatively small - around 600mb - but it appears to be the sheer amount of them that's causing the issues.

So far I've tried:

Windows move: Didn't even get started. It said it would take 'about a day' to calculate the move. Gave up after 2 hours of calculating.

DOS move: This works great for the first 500-600k files (moving around 10k files per second), but starts to slow down noticeably as it drags towards the million mark, doing about 100 files every 2 seconds.

7Zip: I've read suggestions that zipping up the entire folder and then extracting it in the destination would be WAY quicker; however using the GUI it just crashed explorer after a few minutes; using the command line was incredibly slow (100 files every few seconds)

DOS robocopy: Having already moved ~1m files yesterday, I ran robocopy src_folder dest_folder *.log just to shift the last of what was in the first directory. It took 27 minutes to move ~12k files.

No matter what method I choose, it seems that the number of files in the destination folder is what causes the issue. If there are more than a million files in the destination, the move/copy slows to an absolute crawl regardless of the method.

Any ideas on how to achieve this that won't take days/weeks? For reference it's on a single SSD on a single machine: 64-bit, 16gb RAM, 8 threads.

indextwo
  • 227

2 Answers2

2

This PowerShell script, which has been tested with many positive responses, invokes Robocopy and is much faster; simply change a few parameters [destination, etc.] and you're good to go:

$max_jobs = 10
$tstart = get-date
$log = "C:\Robo\Logs"

$src = Read-Host -Prompt 'Source path' if(! ($src.EndsWith("") )){$src=$src + ""}

$dest = Read-Host -Prompt 'Destination path' if(! ($dest.EndsWith("") )){$dest=$dest + ""}

if((Test-Path -Path $src )) { if(!(Test-Path -Path $log )){New-Item -ItemType directory -Path $log} if((Test-Path -Path $dest)){ robocopy $src $dest $files = ls $src

$files | %{
  $ScriptBlock = {
    param($name, $src, $dest, $log)
    $log += "\$name-$(get-date -f yyyy-MM-dd-mm-ss).log"
    robocopy $src$name $dest$name /E /nfl /np /mt:16 /ndl > $log
    Write-Host $src$name " completed"
  }

  $j = Get-Job -State "Running"
  while ($j.count -ge $max_jobs) 
  {
   Start-Sleep -Milliseconds 500
   $j = Get-Job -State "Running"
  }
  Get-job -State "Completed" | Receive-job
  Remove-job -State "Completed"
  tart-Job $ScriptBlock -ArgumentList $_,$src,$dest,$log
}

While (Get-Job -State "Running") { Start-Sleep 2 }
Remove-Job -State "Completed" 
Get-Job | Write-host

$tend = get-date

Cls
Echo 'Completed copy'
Echo 'From: $src'
Echo 'To: $Dest'
new-timespan -start $tstart -end $tend

} else {echo 'invalid Destination'} } else {echo 'invalid Source'}

JW0914
  • 9,096
0

Use DSynchronise, as it's free! The reason why it's a good option for you, is that it doesn't do what Windows Explorer does by counting the amount and size every file in the queue before copying. It just copies straight away.

However you can tick the checkbox so that it will count the disk space first. You can also choose to store a backup of every file that is deleted or overwritten in advance. And you can use preview mode so you can test how the synchronise will occur before you actually do it for real.

Also keep in mind that it doesn't always copy files in alphanumerical order, so if the copying or synchronising suddenly stops halfway to your detriment, then you might have to start again from the beginning.

I find that the old version 2.30.1 is easier to use and faster than the newer version (that was 2.41.1 at the time).

dsyncrhonize 2.30.1

dsynchronize 2.41.1

desbest
  • 1,058