New technique identifies email authors

Posted by Kate Taylor

Researchers from Concordia University say they've developed a way of identifying the author of anonymous emails - and that the technique is admissible in a court of law.

While police can often use an IP address to discover where an email originated, they often find several people at that address. They need a reliable, effective way to determine who wrote the emails under investigation.

"In the past few years, we've seen an alarming increase in the number of cybercrimes involving anonymous emails," says Benjamin Fung, a professor of information systems engineering. "These emails can transmit threats or child pornography, facilitate communications between criminals or carry viruses."

Fung and his colleagues used techniques used in speech recognition and data mining to identify frequent patterns – unique combinations of features that recur in a suspect's emails.

They first identify the patterns found in emails written by the subject, then filter out any which are also found in the emails of other suspects. The remaining frequent patterns are unique to the author of the emails being analyzed.

"Let's say the anonymous email contains typos or grammatical mistakes, or is written entirely in lowercase letters," says Fung. "We use those special characteristics to create a write-print. Using this method, we can even determine with a high degree of accuracy who wrote a given email, and infer the gender, nationality and education level of the author."

To test the accuracy of their technique, the team examined the Enron Email Dataset, a collection which contains over 200,000 real-life emails from 158 employees of the Enron Corporation.

Using a sample of 10 emails written by each of 10 subjects, they were able to identify authorship with an accuracy of 80 to 90 percent.

"Our technique was designed to provide credible evidence that can be presented in a court of law," says Fung. "For evidence to be admissible, investigators need to explain how they have reached their conclusions. Our method allows them to do this."