Learning from the Ones that Got Away: Detecting New Forms of Phishing Attacks.

Christopher N. Gutierrez     Taegyu Kim     Raffaele Della Corte     Jeffrey Avery     Dan Goldwasser     Marcello Cinque     Saurabh Bagchi    
IEEE Transactions on Dependable and Secure Computing (TDSC), 2018
[pdf]

Abstract

Phishing attacks continue to pose a major threat for computer system defenders, often forming the first step in a multi-stage attack. There have been great strides made in phishing detection; however, some phishing emails appear to pass through filters by making simple structural and semantic changes to the messages. We tackle this problem through the use of a machine learning classifier operating on a large corpus of phishing and legitimate emails. We design SAFE-PC (Semi-Automated Feature generation for Phish Classification), a system to extract features, elevating some to higher level features, that are meant to defeat common phishing email detection strategies. To evaluate SAFE-PC, we collect a large corpus of phishing emails from the central IT organization at a tier-1 university. The execution of SAFE-PC on the dataset exposes hitherto unknown insights on phishing campaigns directed at university users. SAFE-PC detects more than 70 percent of the emails that had eluded our production deployment of Sophos, a state-of-the-art email filtering tool. It also outperforms SpamAssassin, a commonly used email filtering tool. We also developed an online version of SAFE-PC, that can be incrementally retrained with new samples. Its detection performance improves with time as new samples are collected, while the time to retrain the classifier stays constant.


Bib Entry

  @article{GKCAGCB_tdsc_2018,
    author = "Christopher N. Gutierrez and Taegyu Kim and Raffaele Della Corte and Jeffrey Avery and Dan Goldwasser and Marcello Cinque and Saurabh Bagchi",
    title = "Learning from the Ones that Got Away: Detecting New Forms of Phishing Attacks.",
    booktitle = "IEEE Transactions on Dependable and Secure Computing (TDSC)",
    year = "2018"
  }