What Machine Learning Can and Can't Do for Security
2021-08-07, 11:30–12:00, Main Track

What can machine learning do for security? A number of things. One major challenge is determining what’s normal and what’s malicious. Machine learning can help with this. For example, ML techniques are used in spam filtering scan email. Machine learning is also being applied to other areas like network traffic monitoring and malware analysis and has potential to detect zero days exploits.
However, machine learning isn't magic. We discuss some of the limitations of machine learning, and how problems like false positives can be mitigated.


Most of us have heard vendors promoting products that use "machine learning." But what does that mean? This is a general introduction to machine learning concepts and a discussion of applications to security. We begin by talking about commonly used terminology – what are artificial intelligence, neural networks, machine learning, and deep learning? How do they work?

What can machine learning do for security? A number of things. One major challenge is determining what’s normal and what’s malicious. Machine learning can help with this. For example, ML techniques are used in spam filtering scan email. Large email providers, e.g., Google and Yahoo, have intelligent systems that can create new spam filtering rules based on automated learning.

Machine learning is also being applied to other areas like network traffic monitoring and malware analysis. Traditional network intrusion detection (NIDS) and malware identification involve rules and signatures, where behavior associated with known threats is identified. But what about new threats, such as zero-day exploits? Anomaly-based detection compares traffic to normal behavior, and has the potential to detect previously unknown attacks with no established signature. We present some examples of freely available machine learning software and walk through some simple use cases.

However, machine learning isn't magic, and it has its limitations. The quality of the training data significantly affects the quality of the results, and training data needs to be updated to reflect changes in relationships and new data points. False positives can consume a lot of analysts' time and lead to alert fatigue. We discuss some techniques, e.g. cross-domain correlation, to reduce the number of false positives.

What is "machine learning?" * Definition * How does it work? * What is a neural network? * Common machine learning terminology explained
* Supervised vs unsupervised learning * Different kinds of machine learning * Examples of machine learning and security
Classification problem * What’s normal? What’s malicious?
* Example: spam filtering
* Example: network traffic analysis * Traditional NIDS involves rules/signatures * Anomaly detection NIDS (ADNIDS) compares traffic to normal patterns
* Example: Behavior-based Malware Analysis * Common AV malware detection involves signatures (patterns related to known behavior) * What about zero-day exploits or malware that can morph?
Attack behaviors are different from normal behaviors * Unusual system calls * Writing stolen data to files, registry manipulation, etc * Unusual network traffic (e.g. command and control) * Destinations (lots of unexplained traffic to a particular destination) * Payloads (C&C traffic likely has similar structure) * Software currently using machine learning for security * Examples: spam filters, Splunk
Limitations of machine learning * Training data * False positives / alert fatigue * Mitigating false positives
Future directions in machine learning and security

Wendy is a software developer interested in the intersection of cybersecurity and data science. She’s involved in the NASA Datanauts program and participated in the SANS Women’s Academy, earning GIAC GSEC, GCIH, and GCIA certifications. She has masters degrees in computer science and library and information science from the University of Illinois.