Detection Engine
The detection engine is a four-stage pipeline that analyzes request content for attack patterns. It combines regex matching, content normalization, semantic analysis, and performance monitoring.
Architecture
Section titled “Architecture”Content → ContentPreprocessor → PatternCompiler → SemanticAnalyzer → PerformanceMonitor → DetectionResultPatternCompiler
Section titled “PatternCompiler”Compiles and caches regex patterns with ReDoS protection.
Primary engine: re2-wasm — a WebAssembly port of Google’s RE2 that guarantees linear-time matching. Works in Node.js and edge runtimes (Cloudflare Workers, Deno).
Fallback chain:
re2-wasm(linear time, ReDoS-safe)worker_threadstimeout wrapper (native RegExp in a worker with configurable timeout)- Native
RegExp(last resort, no timeout protection)
import { PatternCompiler } from '@guardcore/core';
const compiler = new PatternCompiler(2000, 1000);
const result = await compiler.safeMatch( String.raw`<script[^>]*>[^<]*<\/script\s*>`, userInput,);
if (result) { console.log('XSS pattern matched:', result[0]);}Pattern safety validation:
const [isSafe, message] = compiler.validatePatternSafety( '(a+)+$',);// isSafe: false// message: "Pattern contains dangerous construct: (.*)+""The validator checks for catastrophic backtracking patterns like (.*)+, (.+)+, nested quantifiers, and measures actual execution time against test strings.
Cache: LRU cache of compiled patterns, configurable max size (default 1000, max 5000).
ContentPreprocessor
Section titled “ContentPreprocessor”Normalizes input before pattern matching to defeat encoding-based bypasses.
Processing pipeline:
- Unicode normalization (NFKC) + lookalike character replacement (e.g.,
\u2044->/,\uff1c-><) - Encoding decoding — up to 3 iterations of URL decoding + HTML entity decoding
- Null byte removal + control character stripping
- Whitespace normalization
- Safe truncation — preserves attack-containing regions when truncating long content
import { ContentPreprocessor } from '@guardcore/core';
const preprocessor = new ContentPreprocessor(10000, true);
const normalized = await preprocessor.preprocess( '%3Cscript%3Ealert%281%29%3C%2Fscript%3E',);// "<script>alert(1)</script>"Attack-preserving truncation: When content exceeds maxContentLength, the preprocessor extracts regions containing attack indicators (e.g., <script, ../, eval() and prioritizes keeping those in the truncated output.
SemanticAnalyzer
Section titled “SemanticAnalyzer”Analyzes content semantically to catch attacks that bypass regex patterns.
Analysis produces:
| Metric | Description |
|---|---|
attackProbabilities | Per-category scores (xss, sql, command, path, template) from 0 to 1 |
entropy | Shannon entropy of the content (high entropy suggests obfuscation) |
encodingLayers | Count of detected encoding types (URL, base64, hex, unicode, HTML entities) |
isObfuscated | Boolean based on entropy > 4.5, encoding layers > 2, special char ratio > 0.4, or 100+ consecutive non-space characters |
codeInjectionRisk | Score from 0 to 1 based on code structure patterns and AST parsing (via acorn) |
suspiciousPatterns | Array of detected structural patterns (tags, function calls, command chains, path traversal, URLs) |
tokenCount | Number of extracted tokens |
import { SemanticAnalyzer } from '@guardcore/core';
const analyzer = new SemanticAnalyzer();const analysis = analyzer.analyze(userInput);const threatScore = analyzer.getThreatScore(analysis);
if (threatScore >= 0.7) { console.log('Semantic threat detected:', analysis.attackProbabilities);}Threat score formula:
- Max attack probability * 0.3
- Obfuscation detected: +0.2
- Encoding layers: +0.1 per layer (max 0.2)
- Code injection risk * 0.2
- Suspicious patterns: +0.05 per pattern (max 0.1)
- Total capped at 1.0
PerformanceMonitor
Section titled “PerformanceMonitor”Tracks regex execution times and detects anomalies using z-score analysis.
What it tracks per pattern:
- Total executions, matches, and timeouts
- Average, min, and max execution times
- Recent execution time history (sliding window)
Anomaly detection:
- Timeout: pattern exceeded the configured timeout
- Slow execution: execution time >
slowPatternThreshold - Statistical anomaly: execution time z-score >
anomalyThreshold(default 3.0)
import { PerformanceMonitor } from '@guardcore/core';
const monitor = new PerformanceMonitor(3.0, 0.1, 1000, 1000);
monitor.registerAnomalyCallback((anomaly) => { console.log('Pattern anomaly:', anomaly.type, anomaly.pattern);});75 Pattern Categories
Section titled “75 Pattern Categories”The engine ships with 75 built-in patterns across 15 attack categories:
| Category | Patterns | Contexts |
|---|---|---|
| XSS | 8 | query_param, header, request_body |
| SQL Injection | 9 | query_param, request_body |
| Directory Traversal | 5 | url_path, query_param, request_body |
| Command Injection | 5 | query_param, request_body |
| File Inclusion | 2 | url_path, query_param, request_body |
| LDAP Injection | 3 | query_param, request_body |
| XML/XXE | 3 | header, request_body |
| SSRF | 2 | query_param, request_body |
| NoSQL Injection | 2 | query_param, request_body |
| File Upload | 1 | header, request_body |
| Path Traversal (encoded) | 1 | url_path, query_param, request_body |
| Template Injection | 2 | query_param, request_body |
| HTTP Splitting | 1 | header, query_param, request_body |
| Sensitive Files | 5 | url_path, request_body |
| CMS Probing | 4 | url_path, request_body |
| Reconnaissance | 22 | url_path |
Each pattern is only checked against relevant contexts — a SQL injection pattern is not tested against URL paths, and reconnaissance patterns are not tested against request bodies.
Tuning Knobs
Section titled “Tuning Knobs”| Config Field | Default | Effect |
|---|---|---|
detectionCompilerTimeout | 2.0 | Seconds before regex compilation times out |
detectionMaxContentLength | 10000 | Max characters to scan per request |
detectionPreserveAttackPatterns | true | Keep attack regions when truncating |
detectionSemanticThreshold | 0.7 | Minimum semantic score to flag as threat |
detectionAnomalyThreshold | 3.0 | Z-score standard deviations for anomaly |
detectionSlowPatternThreshold | 0.1 | Seconds before a pattern is “slow” |
detectionMonitorHistorySize | 1000 | Metrics history buffer size |
detectionMaxTrackedPatterns | 1000 | Max patterns tracked by monitor |
Lowering detectionSemanticThreshold catches more attacks but increases false positives.
Raising detectionMaxContentLength scans more of large request bodies but increases CPU time.
Lowering detectionCompilerTimeout fails faster on complex patterns but may miss legitimate matches.