Natural Computing Lecture, March 15th, 2022
Artificial Immune Systems
Immunology's "negative selection" as anomaly detector
Inge Wortel (& Johannes Textor)
inge.wortel@ru.nl
Data Science, RU
Two basic questions at the heart of this course:
Previous two lectures: mostly [2]. Now: immuno-inspired computing — back to [1] (with a bit of [2]).
The immune system has a number of attractive features worth replicating:
Stephanie Forrest, Steven A. Hofmeyr, and Anil Somayaji (1997). Communcations of the ACM, 40(10):88-96.
An artificial immune system is any algorithm/engineered system inspired by the real immune system.
The real immune system is incredibly complex and diverse — and so are AISs! But examples include:
We will focus on one particular AIS: the negative selection algorithm.
Let's first examine how the immune system accomplishes these features:
Question | |
---|---|
What do you think is the most important organ of our immune system? |
Multiple layers of protection:
![]() |
![]() |
![]() |
![]() |
The adaptive immune system has two major subsystems: B cells and T cells
Question | |
---|---|
B cells make antibodies to attack pathogens in blood, whereas T cells attack pathogens inside our own cells. What do you think is harder — and why? |
Human cells constantly make proteins and degrade them into fragments.
These fragments, peptides, are essentially strings of the amino acid "alphabet".
These fragments, peptides, are essentially strings of the amino acid "alphabet".
All cells (healthy or infected) "present" these peptides on their outer membrane, like little flags, using so-called MHC molecules.
T cells constantly scan these MHC complexes for anomalous peptides:
Their task becomes:
T-cell immunity:
Question | |
---|---|
Can you see a reason why this might be difficult? |
The problem:
The problem:
The solution: randomize!
Randomly generated T cells are diverse:
Question | |
---|---|
Can you see both a benefit and a problem of this solution? |
The (new) problem:
Solution — negative selection:
The immune system has a one-class classification problem
Negative selection is an algorithm that solves this problem.
Negative selection — IS |
---|
![]() |
Negative selection — algorithm | ||||
---|---|---|---|---|
Let X be the learning domain containing both
for each m $\in$ M do (Forrest, 1997) |
Discussion | |
---|---|
Just like with other population-based algorithms (evolutionary algorithms, swarm intelligence, ...) we have a population of individuals (detectors) working together to solve a given task. But how is our AIS different in terms of:
|
To make use of this generic scheme, we still need to define the detectors and the matching rule determining what they do and do not detect.
Example 1: Real-valued negative selection | |
---|---|
Learning domain: $\mathbb{R}^d$ (d-dim vectors) Detectors: hyperspheres, hypercubes, ... |
Example 2: String-based negative selection | |
---|---|
Learning domain: $\Sigma^\ell$ (strings of length $\ell$) Detectors: patterns |
Learning domain X: $\mathbb{R}^2$
Detectors d: circles with fixed radius r
Generalization occurs, with a strength depending on r.
Negative selection | ||||
---|---|---|---|---|
|
Matching rule |
---|
Detector d matches any point s contained within its circle of radius r. |
Learning domain X: $\{0,1\}^\ell$
Detectors d: patterns of length $\ell$ with parameter r
Generalization occurs, with a strength depending on r.
Negative selection | ||||
---|---|---|---|---|
|
Matching rule: e.g. r-contiguous |
---|
Detector d of length $\ell$ matches string s of length $\ell$ if d and s contain the same symbol at r subsequent positions. |
The (new) problem:
Solution — negative selection:
Did we actually solve the problem?
Idea: trace unix system calls of a daemon process like sendmail, coding each system call as an alphabet symbol. Generate a database of system call sequences in normal circumstances.
Apply negative selection to these sequences and generate detectors, which may be capable to detect anomalous operations.
Example | |
---|---|
iKEKEEFEEgKEgKEEFEggFKvafvaegggggj 6LgggggngngnggLLnnPPggnnggLggiKEKE |
normal exploit |
So our strings don't have to be binary! In your assignment, you will apply negative selection to such data.
S. Forrest, S. A. Hofmeyr, A. Somayaji, T. A. Longstaff (1996). A sense of self for unix processes. IEEE Symposium on Security and Privacy.
Back to our desirable features:
Our negative selection algorithm fulfils at least the last three. So why are they not more broadly used?
The immune system's T cells are negatively selected to match anything but self.
A negative selection algorithm is any one-class classifier based on this principle, and can be applied to many data domains (but mostly strings).
Negative selection algorithms can learn from examples; i.e. they can generalize.
Negative selection requires a population ("repertoire") of detectors. But training does not yield a "fittest individual", but a fittest combination of detectors. This is important because every detector specializes in only a part of our learning domain X.
Implementing such an algorithm efficiently requires substantial work, but that is out of the scope of this lecture.
After the break, Gijs Schröder will talk about learning classifier systems, which have much in common with our AIS.