4

I teach in a college, and a wee while ago I had some fun spotting students who copied other students work by simply taking their word file and "paraphrasing" the sentences. So student A would innocently lend their file to student B, who would maliciously copy their work. The plagiarism was easy to spot, but I discovered that when they did this the "author" of the Word file submitted by student B was listed as student A (student B was only an editor).

My students have caught on. This time, two of them submitted pdfs.

Again, the plagiarism is easy to spot. However, it would be nice to have the same hard evidence as before.

Is it possible to find out the author of a file from which a pdf was created?

I tried using ExifTool, which gives metadata for a pdf, but this doesn't go far enough back. So I am expecting the answer to my question to be "no". But it would be nice if this was confirmed for me :-)

StarGeek
  • 1,644
user1729
  • 141

2 Answers2

1

There is no definitive way for you to know if a person is actually the author of the document he/she submitted to you because removing metadata from Word document is a trivial task.

Personally identifiable information can be easily removed using the Inspect Document feature of Microsoft Word (2007 and later)

However, if your students haven't edited/removed it before converting the document to PDF, you could find out the author simply by opening the document in Microsoft Reader, Adobe Reader, Foxit Reader, etc. and looking at its metadata (File → Properties in Adobe Reader)

To check for plagiarism however, you could try converting the document to HTML or plain text format (simply copy and paste the content to Notepad and save as .TXT) and upload it to a web server you control (public files on Dropbox work as well) and provide the URL to the document in Copyscape.

Vinayak
  • 10,885
0

Check out PDFParser by Didier Stephens or another tool mentioned on his site and you might have better luck.

From my understanding the "author" would either be contained within the meta data, or it would not. There is no slack space or anything like that within a PDF for you to carve, but I guess you could try searching the strings contained within the file and looking for mentions of a name that is not in the document itself.

jredd
  • 906