I came across a question by user flytnx about removed email attachments on Forensic Focus. The question was about how one would go about proving that an MSG file received during the course of eDiscovery productions had been tampered with. Specifically, can the modification be detected if a user opened an MSG file with attachments, removed the attachments by using the “Remove Attachment” option in Outlook, saved the message and produced it. This is about a scenario where the rest of the message does not have any obvious indications—such as references to the attachments—about the presence of attachments.
I wanted to look into the possibility of recovering the attachments as well. After all, Outlook Item (.msg) File Format is based on the Compound File Binary File Format, which is a fairly complex, file-system-like structure within a file. This can be a treasure trove of forensic artifacts.
I started by preparing a test message with two PDF attachments. The resulting MSG file was 888,320 bytes in size, mostly because of the two PDF attachments, which were 810,654 bytes in total. The message looked as follows:
Figure 1 — Original Email Message with Attachments
I proceeded to delete both attachments. In Outlook 2016, this can be achieved by clicking on the small triangular menu icon next to each attachment and clicking “Remove Attachment”.
Figure 2 — Removing Attachments
I then saved the message by using the save button, which can be seen on the top left corner of Figure 1. The resulting message looked as follows:
Figure 3 — Email with Removed Email Attachments
Detecting Removed Email Attachments
The first thing I checked was if and how the internal MAPI properties of the MSG file were affected. I looked at the PR_CREATION_TIME and PR_LAST_MODIFICATION_TIME properties, and found their values to be identical to the original message. Even if they had changed, they would not have helped make a conclusive determination as other operations, such as exporting MSG files out of a mailbox, also update the internal timestamps.
When attachments were removed using Outlook, the PR_MESSAGE_FLAGS property was updated and the HasAttach flag was removed. This was consistent with the appearance of the modified message, making it more challenging to detect the modification.
The original message had its X-MS-Has-Attach field in the transport header populated as “yes”. As expected, this remained the same after the modification. This was a good indicator that something was amiss, but it only indicates the presence of attachments, not their count. So, in a scenario where the original message contains multiple email attachments, but not all of them are removed, this field would not be very useful.
I then looked at the file size and noted that the size of the MSG file, after removal of the attachments, increased from 888,320 bytes to 922,112 bytes. This is clearly a red flag, as this message without any attachments should typically be in the 50 KB – 60 KB size range. So, we are looking at an unusually large message without any attachments and with very little content in the message body.
Since the modified MSG file was close to the original in terms of size, I did a binary comparison to see what had changed:
Figure 4 — Differences Between Files
Looking at the two files side by side in a Hex editor, I noticed that only a few bytes at the header of the file were changed. The rest of the file was left the same, and a large block of content was added to the end of the modified file. The added content looked as follows:
Figure 5 — New Root Entry in Modified File
As you can see in Figure 5 above, this is a new root entry—a special directory entry found in every compound file that serves as the parent of all other directory entries. The 8-byte section I highlighted in Figure 5 is a timestamp associated with the root entry in FILETIME format. The decoded value is Wed, 30 August 2017 20:47:50 UTC, which reflects the time when I saved the message after removing its attachments.
Recovering Removed Email Attachments
I looked around inside the MSG file with the removed email attachments, and was able to locate the PDF XMP metadata streams for both “removed” PDF attachments. So, based on the file size, and parts of the PDFs still being clearly visible, it looked like the attachments might still be in there.
Figure 6 — PDF XMP Metadata Inside Email with Removed Attachments
I then tried extracting the PDF attachments inside the modified MSG file using 7-Zip and olefile, but neither of them worked. They were able to extract the attachments of the original MSG file, but not the one I had modified. This was not surprising as these tools parse the Compound File Binary File Format by following the structure of the file rather than by carving for orphaned content.
In an effort to carve the PDF attachments manually, I searched for the “%PDF” (25 50 44 46) and “%%EOF” (25 25 45 4F 46) byte patterns to find the beginning and end of each PDF. I then extracted those ranges using the “Select Range” and “Save Selection…” commands in 010 Editor—there are many ways to go about doing this.
I compared the hashes of the two PDF files I extracted from the MSG file with removed email attachments, and found that they matched those of the original PDF attachments. So, the attachments were in there completely, and stored contiguously.
Carving attachments from the Compound File Binary File Format manually may not be feasible in all email forensics scenarios. I wanted to see if we could automate this, at least to get a general sense of if any orphaned attachments are found in a modified MSG file.
I loaded the modified MSG file into X-Ways as a raw forensic image—essentially as raw data without any file system or partition information. I then performed file carving using the file header signature search in X-Ways. As expected, X-Ways was able to recover the two PDF files (see Figure 7).
Figure 7 — X-Ways Screenshot of Carved PDF Attachments
I concluded after performing a few experiments that the size of the MSG file increases after attachments are removed from it using Outlook. So, a message with an unusually large size compared to its contents warrants deeper analysis.
I have found that the removed attachments were still inside the modified file, and could be carved manually or by using traditional file carving methods. This could even be automated by combining a large number of files without attachments and carving for files across the board.
The original question was about MSG files, and that’s what I focused on in this post. However, Compound File Binary File Format is used quite often in the Microsoft universe. It may be possible to perform similar analysis and recovery on other file types that utilize this format.
- Compound File Binary File Format — https://msdn.microsoft.com/en-us/library/dd942138.aspx
- Outlook Item (.msg) File Format — https://msdn.microsoft.com/en-us/library/cc463912(v=EXCHG.80).aspx
- Exploring the Compound File Binary Format — https://blogs.msdn.microsoft.com/openspecification/2009/07/24/exploring-the-compound-file-binary-format/