Recently, I helped several clients evaluate security tools—and one recurring topic was WAFs (Web Application Firewalls).
WAFs are essential for blocking attacks like SQL injection, RCE, and XSS. But how do you know if a WAF actually works in real-world scenarios?
To answer that, I ran a comparative test of several open-source or free WAFs, using a consistent methodology and transparent test data.
How We Measured WAF Performance
The effectiveness of a WAF should be measured scientifically. We used four key metrics:
- Detection Rate – Measures how many attacks are caught (True Positive Rate).
- False Positive Rate – How often normal traffic is incorrectly blocked.
- Accuracy Rate – A combined score of true positives and true negatives.
- Detection Latency – The average time a WAF takes to process and respond to a request.
These were calculated using classic classification metrics:
Term | Meaning |
---|---|
TP | Malicious requests correctly blocked |
TN | Legitimate requests correctly allowed |
FP | Legitimate requests incorrectly blocked |
FN | Malicious requests incorrectly allowed |
Formulas used:
Detection Rate = TP / (TP + FN)
False Positive Rate = FP / (TP + FP)
Accuracy Rate = (TP + TN) / (TP + TN + FP + FN)
To measure performance precisely, we used 90% and 99% latency percentiles.
Sample Data
All tests were run using open tools and publicly available data:
- Normal traffic (white samples): 60,707 HTTP requests from real forum browsing (~2.7GB)
-
Attack traffic (black samples): 600 curated payloads gathered over 5 hours, using:
- DVWA + common attack scenarios
- Payloads from PortSwigger
- VulHub targets + classic PoCs
- DVWA with increasing protection levels (med/high)
Ratio of normal to attack traffic was 100:1, reflecting real-world internet traffic.
Test Setup
- Web Server: Nginx, returns 200 OK to any request.
location / {
return 200 'hello WAF!';
default_type text/plain;
}
- WAF Config: All WAFs tested using default configurations, with no custom tuning.
-
Testing Tool: Custom script that:
- Parses Burp Suite exports
- Deletes cookies and sets Host headers
- Mixes normal and malicious traffic
- Determines WAF response by checking for HTTP 200
- Outputs all metric calculations automatically
Test Results
SafeLine WAF
TP: 426 TN: 33056 FP: 38 FN: 149
Detection Rate: 74.09%
False Positive Rate: 8.19%
Accuracy Rate: 99.44%
90% Latency: 0.73 ms
99% Latency: 0.89 ms
Coraza
TP: 404 TN: 27912 FP: 5182 FN: 171
Detection Rate: 70.26%
False Positive Rate: 92.77%
Accuracy Rate: 84.10%
90% Latency: 3.09 ms
99% Latency: 5.10 ms
ModSecurity
TP: 400 TN: 25713 FP: 7381 FN: 175
Detection Rate: 69.57%
False Positive Rate: 94.86%
Accuracy Rate: 77.56%
90% Latency: 1.36 ms
99% Latency: 1.71 ms
Baota WAF
TP: 224 TN: 32998 FP: 96 FN: 351
Detection Rate: 38.96%
False Positive Rate: 30.00%
Accuracy Rate: 98.67%
90% Latency: 0.53 ms
99% Latency: 0.66 ms
nginx-lua-waf
TP: 213 TN: 32619 FP: 475 FN: 362
Detection Rate: 37.04%
False Positive Rate: 69.04%
Accuracy Rate: 97.51%
90% Latency: 0.41 ms
99% Latency: 0.49 ms
SuperWAF
TP: 138 TN: 33048 FP: 46 FN: 437
Detection Rate: 24.00%
False Positive Rate: 25.00%
Accuracy Rate: 98.57%
90% Latency: 0.34 ms
99% Latency: 0.41 ms
Summary Table
WAF | False Negatives | False Positives | Accuracy Rate | Avg Latency (90%) |
---|---|---|---|---|
SafeLine | 149 | 38 | 99.44% | 0.73 ms |
Coraza | 171 | 5182 | 84.10% | 3.09 ms |
ModSecurity | 175 | 7381 | 77.56% | 1.36 ms |
Baota | 351 | 96 | 98.67% | 0.53 ms |
nginx-lua-waf | 362 | 475 | 97.51% | 0.41 ms |
SuperWAF | 437 | 46 | 98.57% | 0.34 ms |
Final Thoughts
- SafeLine WAF delivered the best balance of high detection accuracy and low false positives.
- ModSecurity and Coraza had decent detection, but excessive false positives make them hard to use in production.
- Simpler WAFs like Baota, nginx-lua-waf, and SuperWAF were fast and light, but missed a large portion of attacks.
Reminder: These tests reflect only one set of samples, tools, and environments. Real-world performance can vary significantly. Always test in your own environment before deploying.
Want to test these yourself? I’ll be open-sourcing the full dataset and testing scripts soon. Follow me to stay updated.
Top comments (0)