1

We currently have a application that generates a pdf document and automatically names it based on {UniqueID-DocCode-StartDate-StartTime}, all this data is coming from a db via our application. We are getting one major problem.

  • pdf contents and filename are getting mixed up. e.g

Filename: 123456-Doc001-28042017-1415.pdf

Contents: 987654-Doc002-28042017-1312

My problem is identifying the pdfs that have failed (contents != filename) and re-triggering them.

The filename would match the contents in terms of being present, but the contents is structured as a letter, so a direct compare wouldn't work, also they vary in length dramatically depending on how complicated the contents is.

So, my wish list would be:

  1. Ideally check for each parameter from filename. However just being able to check UniqueID would be sufficient.
  2. A way of either moving failed files, renaming them or a report back of failed files in a list.
  3. Run as a scheduled job or constantly from a directory.

Let me know if there is any particular info you need and I should be able to get it to you.

Taz
  • 111

1 Answers1

0

Using the powershell script below it converted the pdf to text which is stored in temp.txt file, which is then used to compare against the filename. The filename is split using a delimiter, and then told which of the splits to use to compare. This runs for every file in the directory where the file ends with .pdf. It would provide a list in error.log of files that did not match.

We had to use a third party .exe to convert pdf to text.

$path = "C:\brokenPDFs\"

$output = $path + "\output.log"
$errorpath = $path + "\error.log"

"Start:" | Out-File $output
"Start:" | Out-File $errorpath

Clear-Content $output
Clear-Content $errorpath

$exe = $path + "pdftotext.exe" 

$errorcount = 0

$files = Get-ChildItem $path *.pdf

 Foreach ($currentfile In $files)
        {
        $filename=$currentfile.Name
        $splitname = $filename.split("^")
        $currentUR = $splitname[0]

        #write-host $currentfile.Name

        &$exe $currentfile.FullName $path\temp.txt

        $result = select-string -Path $path\temp.txt -Pattern $currentUR -Quiet      

            If ($result -eq $true)
                {
                $match = $currentfile.FullName
                "Match on string :  $currentUR  in file :  $match" | Out-File $output -Append
                }
            If ($result -eq $false)
                {
                $match = $currentfile.FullName
                "String not found:  $currentUR  missing from file :  $match" | Out-File $errorpath -Append
                write-host "ERROR: $currentfile missing $currentUR"
                $errorcount++
                }
            $result = $null
        }

        write-host "Total Errors: $errorcount"
Taz
  • 111