Do Microsoft Office documents - MS Word, Excel, Powerpoint, Access, etc. - contain information about the license of the computer used for creating | modifying them?
Thanks in advance.
Do Microsoft Office documents - MS Word, Excel, Powerpoint, Access, etc. - contain information about the license of the computer used for creating | modifying them?
Thanks in advance.
There are too many versions and subversions of the Microsoft Office products to give you a unique an exhaustive answer.
In earlier version it was included then it was removed, nowadays if present it should be traceable in the metadata section (see below), but it should be still encoded in a somehow hidden way somewhere else.
There is an additional issue: privacy laws in different countries. If not clearly stated in the contract (and successive modification) exactly which kind of "private" information are shared, even Microsoft can incur in sanctions. So you may check, carefully, in the licence agreement of your OS/program version[e.g. w10,msa]. It's a long task, I know, but if you go on aka.ms/privacy and click "Learn More" below "Personal Data We Collect" I noted some time ago[answer] that you already agreed to share it
It includes data about the operating systems and other software installed on your device, including product key
I will write down some hints and direction to search for, in the following TL;DR section.
In the office pages [1,2] about "Remove hidden data and personal information..." and "Inspect documents for hidden data and personal information" is reported that a many kind of personal data can be included inside an Office document (and it is reported a way to remove them):
Comments, revision marks from tracked changes, versions, and ink annotations If you collaborated with other people to create your document, your document might contain items such as revision marks from tracked changes, comments, ink annotations, or versions. This information can enable other people to see the names of people who worked on your document, comments from reviewers, and changes that were made to your document.
Document properties and personal information Document properties, also known as metadata, include details about your document such as author, subject, and title. Document properties also include information that is automatically maintained by Office programs, such as the name of the person who most recently saved a document and the date when a document was created. If you used specific features, your document might also contain additional kinds of personally identifiable information (PII), such as e-mail headers, send-for-review information, routing slips, and template names.
Headers, footers, and watermarks Word documents can contain information in headers and footers. Additionally, you might have added a watermark to your Word document.
Hidden text Word documents can contain text that is formatted as hidden text. If you do not know whether your document contains hidden text, you can use the Document Inspector to search for it.
Document server properties If your document was saved to a location on a document management server, such as a Document Workspace site or a library based on Microsoft Windows SharePoint Services, the document might contain additional document properties or information related to this server location.
Custom XML data Documents can contain custom XML data that is not visible in the document itself. The Document Inspector can find and remove this XML data.
There is no direct mention to the license of the computer or of Office.
(Neither it is said that there is not).
If you are using Document IDs in document or record management[6] it may be back-traced the original licence.
In the paper "Disclosing Private Information from Metadata, hidden info and lost data" [3], for example, is stated
in an environment where social networks make the sharing of resources such an important issue, it is necessary to store information about documents authors, the computers used to edit the documents, software versions, printers where they were printed, and so on.
Later in the same document you can read (before the output of the proof)
These metadata are used by Microsoft Office in order to perform its own tasks. And they may contain compromising information such as software versions, authors, revision history, the last person who edited it and when he did it, the last time the document was printed, which printer was used, total editing time for the document, information about e-mail messages including e-mail addresses, and even, in some earlier versions of Office, a Global Unique ID that identifies the computer on which the document was edited.
Note: even if this conference[3b] article is a little dated, you may found other reference online on similar sites.
So it is stated that in old version it was stored and in newer not any more, or at least not in a clean way and it is not excluded that it may be introduced later again.
If the document is redacted on a corporative environment and/or published/redacted online there are other issues related to the server added information and/or to the "different sources metadata" matching. Moreover if you have been sharing your document or using Track Changes [6] you may need to check on the "Privacy Options" too.
You may check changing format, maybe in rtf or pdf; note that even if not present in the "exported" format, the license version reference may still be present in the original file. On the other side, if you can find in the "exported" format it's high the probability that you can find it in the original document too.
The pdf format contains metadata too, but the format is not owned by Microsoft, and is higher the probability that some tools can detect additional hidden information. It could be enough exiftool [5].
The rtf is somehow human readable and you can inspect by yourself.
Modern format (docx...) used to save files from Microsoft Office Suite indeed are zip files that you can unzip as normal archives. Then you can search for xml separated data, again somehow human readable. This will not ensure, BTW, that the licence is included (hidden) in some other place.
A differences (diff) test. You can create exactly the same document (e.g. docx) on two differently licensed Word/Computer (but with the same version of Office) taking care of deleting in advance the metadata from each document, before saving them. They should be the same file (indeed they will be not, for information such as the creation time...).
Now you can save (eventually rename) and unzip the two documents, so to obtain two structures (trees) with directories and files.
Comparing each of this "pair of files" you will see if they are the same or where they differ. The information about the license, if present, has to be in the parts that differs. If there are no different parts it cannot be present.