Skip to main content

There is no doubt that email evidence plays a big role in digital forensics and eDiscovery. Considering many subsequent steps such as forensic email authentication, search & culling, processing, review, and productions follow collection, forensically collecting emails is something you have to get right from the start.

Here are 5 things you should know before forensically collecting emails:

1. Identify Email Sources

Your plan might be to collect emails from a custodian’s Gmail account. That’s great, but do not forget that forensic email preservation is often a multi-pronged process.

In addition to the emails hosted by Gmail, potentially relevant email evidence may exist in multiple locations such as local computers (e.g., locally cached data by Mozilla Thunderbird, Mac Mail, etc.), mobile devices such as phones and tablets, backup media and cloud storage systems.

These ESI sources usually complement each other. For instance, the email server may contain messages that were deleted from the local workstation, or an archive saved on backup media may contain emails that are no longer on the server.

It is important to make a thorough inventory and identify all sources of potentially relevant email evidence to start your forensic email collection project off on the right foot.

2. Consider Security Issues and Two-Factor-Authentication

When tasked with forensically collecting emails from an online service, your instinct might be to ask for the custodian’s username and password. This is often not a good idea, and may not even be enough to get you authenticated in some cases.

Many hosted email service providers are moving away from authenticating third-party applications and services with a password. Instead, application-specific passwords or OAuth are often used. Considering how someone’s Gmail password is often also their password for all things Google, handing somebody your password to collect your emails is far from ideal.

Additionally, if the custodian has two-factor-authentication enabled, or if the email service detects that you are logging in from a different device and IP, additional information such as a security code sent to a mobile device may be required.

Instead of asking the custodian for their password, you can ask them to authorize the forensic email collection software to access their account for a limited time. This way, the custodian does not have to share their password and can see clearly the data points to which the software is requesting access. For instance, the forensic email acquisition software can read his emails, but not his files on Google Drive. Once the forensic email preservation is complete, the custodian can visit his account security settings and revoke access to the forensic email collection software.

3. Choose the Right Format

In my experience, Personal Storage Table (PST) format seems to be the most popular among eDiscovery practitioners. While PST might be a good choice for some eDiscovery processing software, it is not necessarily the best format in which to forensically preserve emails.

For example, if you download emails over the internet via the Internet Message Access Protocol (IMAP), the raw messages would be sent by the server in MIME format*. Even if your workflow requires a conversion to a different format such as MSG or PST, you should consider maintaining a copy of each message in its native format for forensic preservation purposes.

4. Use the Right Tools

Using the right tools plays an important role in making sure you collect the emails in a forensically sound manner. One important consideration is ensuring that no changes are being made to the source mailbox due to your collection efforts.

For example, if you use a general-purpose email client such as Outlook to download emails via IMAP, IMAP folders would typically be selected using the “SELECT” IMAP command, and messages would be downloaded using the “FETCH” IMAP command. This would cause most servers to update message flags such as \Seen and \Recent. On the other hand, a forensic email collection tool can select folders using the “EXAMINE” IMAP command, and download messages using “PEEK” (i.e., BODY.PEEK[]) so that message flags are not disturbed.

Similarly, conventional email clients can create mail folders on the target mailbox such as “Junk”, which can easily be avoided if the mailbox is accessed in a read-only manner using a forensic email collection tool.

It is also important to use the right connection protocol to get the best results. For instance, if you forensically collect a Gmail mailbox using IMAP, you would typically download the same message multiple times if it has multiple Gmail labels assigned to it. On the other hand, the same mailbox could be collected much more efficiently using a forensic email collection tool that utilizes the Gmail API.

5. Hash Collected Emails and Document

A crucial part of forensic work is documentation. When forensically collecting emails, there is a long list of things you can document such as target email address, custodian and case information, software used, target server address, port, the protocol used, message counts by folder, logs of communications between the server and acquisition software, any issues encountered, etc.

Additionally, you should calculate and record cryptographic hashes of downloaded messages (I recommend the SHA-256 algorithm). This would help establish that the acquired data did not change during subsequent eDiscovery or digital forensics steps.

Forensically Collecting Emails — Wrap Up

Although email evidence is frequently used in litigation, I have found that many digital forensics and eDiscovery professionals do not have a very good workflow for forensically collecting emails. Conventional email clients are often used instead of forensic tools, the process is rarely documented, and some key sources of email evidence are often overlooked.

I strongly believe that a few thoughtful changes to your email preservation workflow can make a big difference in the efficiency and accuracy of your efforts.


* Message in Internet Message Format (RFC 5322), encoding of non-ASCII data and multimedia content as defined by RFC 2045 through RFC 2049 (i.e., MIME).