When mailboxes are forensically preserved for eDiscovery or digital forensic investigations, their contents are almost always searched and filtered. Filtering emails helps overcome time, scope and cost constraints and alleviates privacy concerns.
There are two main ways of filtering emails—before and after the forensic acquisition. Each method has its pros and cons, which we will discuss here.
Filtering Emails after Forensic Collection
This method involves forensically collecting mailboxes entirely. Once the collection is complete, each mailbox would be ingested into eDiscovery or digital forensic investigation tools and searched before subsequent steps such as processing, analysis and review.
- Flexibility — Case requirements, keywords, date ranges are all subject to change. It is not uncommon for a legal team to discover more search terms after they have started document review. When you have access to the entire mailbox, you can go back and re-run your searches without having to perform another acquisition.
- More powerful search options — Digital forensic investigation and eDiscovery tools are able to extract attachments and embedded objects recursively, perform optical character recognition (OCR) on pages that do not have extractable text and create a searchable index of the entire mailbox. A well-designed tool also provides you with detailed reports on which documents cannot be searched (e.g., encrypted, corrupt or unrecognized files).
- Familiar workflow — You are likely very familiar with the capabilities and search syntax of your eDiscovery and digital forensic investigation tools. It is an advantage to be able to use the workflow you are already comfortable with.
- Collecting entire mailboxes can take a long time — Depending on the size of the mailbox and the capabilities of the server, you might have to allocate several hours to collect a mailbox in its entirety.
- Privacy and scope concerns — The owner of the mailbox may not be fine with your collecting their entire mailbox. The mailbox may contain confidential information that is outside the scope of the engagement. This issue is often exacerbated when collecting an opposing party’s mailbox.
- Increased processing time & cost — Ingesting and searching mailboxes take time and often have associated costs—usually proportional to the data size. Collecting the entire mailbox increases processing time and cost as you would be starting off with a larger amount of data.
Filtering Emails before Forensic Collection
I like to refer to this method as pre-acquisition searching and filtering. In this scenario, searches are run directly on the email server and the forensic email collection is limited to only the responsive emails.
Most email providers such as Gmail and Office 365, as well as on-premises Exchange and IMAP servers, support searching. Forensic email collection tools can perform nearly instantaneous searches on email servers and display the results.
- Time savings — Having to collect a large mailbox (e.g., 200k+ messages) from a slow email provider such as Yahoo can take a long time. What if only a very small percentage of those messages were responsive? You could run a search on the server side within a couple of minutes and forensically collect only the few hundred responsive items rapidly.
- Helps with privacy and scope concerns — I ran into many cases where I was simply not allowed to collect an entire mailbox due to privacy and scope concerns. I was instructed to limit the collection to only messages between certain individuals and date ranges. Performing server-side email searching is the only way to accommodate such requests.
- Reduced processing time & cost down the line — When the data universe is limited from the very beginning, a smaller amount of data will be run through subsequent steps such as ingestion, processing, analysis, and review. This often results in significant time and cost savings.
- Changes in scope & instructions — If there is a change in the scope of the project, you might have to go back and perform a supplemental forensic collection using the revised search parameters.
- Limited search capabilities — Server-side searching is much more limited in terms of functionality, especially when it comes to blanket keyword searches. Depending on the case, you could use server-side searches for filtering emails by recipients, subjects, dates, etc. The ability to search attachments depends on the capabilities of the server. File types that are not recognized by the email server would not be indexed for searching.
- Search syntax learning curve — Search syntax that you would use to search emails on the server side changes from service to service—or server to server for on-premises scenarios. For example, the search syntax in Gmail application programming interface (API) is quite different than the Exchange Advanced Query Syntax (AQS) used in Exchange Web Services. The IMAP SEARCH command is a completely different ball game. The good news is that you would not need to deal with the APIs directly, and a well-designed forensic email collection tool would provide you with an intuitive user interface and guidance on the search syntax.
A Hybrid Approach
Some email archival tools have the option to filter emails during acquisition. They do not execute the search on the server side. Instead, they download each message, evaluate it against the search criteria, and then save it or discard it based on responsiveness.
This method does not have the advantages of pre-acquisition searching on the server side, because you still have to download all the emails. Because attachments are not extracted, OCRed (when necessary) and indexed on-the-fly, you do not get the benefits of using a powerful eDiscovery or digital forensic investigation tool for performing the search, either.
Filtering emails before and after forensic collection both have their pros and cons. In some cases, you may find that only one of these methods is a viable option. For instance, if you are restricted from forensically preserving the entire mailbox of an opposing party due to privacy concerns, you might have to perform your search on the server side. On the other hand, if you have a long list of complex queries with proximity and Boolean operators, which need to be executed on all documents—including attachments and documents without extractable text—ingesting the emails into your eDiscovery or digital forensic investigation tools and performing the searching and filtering there might be the only option.
In some cases, you might have the flexibility to choose which method to use. It is important to be familiar with both options and understand the trade-offs so that you can make informed decisions.