Fast, Memory-efficient Regular Expression Matching with NFA-OBDDs

Liu Yang, Rezwana Karim, Vinod Ganapathy, Randy Smith.

Computer Networks (COMNET); Volume 55, Number 15, pages 3376--3393; October 2011.

Modern network intrusion detection systems (NIDS) employ regular expressions as attack signatures. Internally, NIDS represent and operate these regular expressions as finite automata. However, finite automata present a well-known time/space tradeoff. Deterministic automata (DFAs) provide fast matching, but DFAs for large signature sets often consume gigabytes of memory because DFA combination results in a multiplicative increase in the number of states. Non-deterministic automata (NFAs) address this problem and are space-efficient, but are several orders of magnitude slower than DFAs. This time/space tradeoff has motivated much recent research, primarily with a focus on improving the space-efficiency of DFAs, often at the cost of reducing their performance.

We consider an alternative approach that focuses instead on improving the time-efficiency of NFA-based signature matching. NFAs are inefficient because they maintain a frontier of multiple states at any instant during their operation, each of which must be processed for every input symbol. We introduce NFA-OBDDs, which use ordered binary decision diagrams (OBDDs) to efficiently process sets of NFA frontier states. Experiments using HTTP and FTP signature sets from Snort show that NFA-OBDDs can outperform traditional NFAs by up to three orders of magnitude, thereby making them competitive with a variant of DFAs, while still retaining the space-efficiency of NFAs.

Paper: [ PDF ] (© Elsevier)
DOI: [ 10.1016/j.comnet.2011.07.002 ]
Code and Data: [ Here ]

Conference version: This article is a revised and expanded version of Improving NFA-based Signature Matching using Ordered Binary Decision Diagrams, which appeared in Proceedings of RAID 2010, the 13th International Symposium on Recent Advances in Intrusion Detection, September 2010.

Papers page