Insights Transaction Monitoring

Velocity Rules vs. ML Scoring: When to Use Each

July 31, 2025 By Daniel Park, Head of Engineering 11 min read

Comparison visualization of rule-based versus machine learning scoring for transaction monitoring

The tension between velocity rules and ML scoring is one of the more practically contentious questions in digital bank compliance right now. Vendors position ML as the answer to the false positive problem. Regulators and examiners position explainability as a non-negotiable requirement. Both positions are correct, and resolving the tension requires being clear about what each approach is actually good for — and where each creates new risks if deployed without the other.

What Velocity Rules Do Well

Velocity rules are deterministic. A rule that fires when an account sends more than 12 transactions in a 24-hour rolling window will fire every time that condition is met, for every account, without variation. That determinism is an asset in BSA/AML compliance for several reasons.

First, it's auditable. When an examiner asks why a particular account was flagged, the answer is unambiguous: the account sent 14 transactions between 2pm and 2am, which exceeded the 12-transaction velocity threshold. The rule ID, the threshold value, the triggering transaction data — all of this is a clean reconstruction. There's no probabilistic ambiguity to explain.

Second, rules are calibration-controllable. If your false positive rate on a given rule is running at 94%, you can adjust the threshold from 12 transactions to 18 transactions, document the rationale, and observe the effect on alert volume and SAR conversion rate. The feedback loop is direct and interpretable.

Third, specific typologies map cleanly to rules. Structuring detection — looking for multiple sub-$10,000 cash deposits within a defined window — is a rule-amenable problem. The behavior has a well-defined pattern with a known threshold ($10,000 CTR trigger) and a known timeframe (same business day, or rolling multi-day aggregation). You can write a rule that captures this pattern with high precision.

Where Rules Fall Short

Rules are threshold-based, which means they are blind to context. A rule that fires at 12 transactions per 24 hours will fire on a gig-economy freelancer receiving payment for four separate jobs in the same day — and on a money mule moving funds for a fraud network in eight rapid-fire transfers. From the rule's perspective, these look identical.

This is the structural driver of elevated false positive rates in digital bank environments. The rule can't distinguish between expected and unexpected behavior for a given account. All it sees is whether the transaction count crossed a line. Adding more rules to compensate — layering in dollar-amount rules, geographic rules, counterparty count rules — increases detection coverage but multiplies alert volume non-linearly. Three overlapping rules, each with a 90% false positive rate individually, can generate an alert queue that's almost entirely noise.

Rules also don't adapt. As your customer base grows and behavioral patterns evolve, static thresholds drift out of calibration. A threshold set when your average customer was a 28-year-old urban professional may be systematically off when your customer base has expanded to include small business owners, older customers, and customers in rural markets with different transaction patterns.

What ML Scoring Actually Contributes

ML scoring's contribution in a compliant AML program is not replacement of rules — it's anomaly contextualization. A behavioral baseline model trained on a customer's transaction history over a rolling 60-to-90-day window creates a probabilistic profile of what normal looks like for that account. When a rule fires, the ML component asks: is this transaction anomalous relative to this customer's baseline, or is it consistent with their established pattern?

That additional signal changes how the alert queue gets prioritized, not whether the alert gets generated. The rule still fires. The case is still opened. What changes is the order in which analysts review cases. High ML anomaly score + rule trigger gets reviewed first. Low ML anomaly score + rule trigger gets reviewed after.

In practice, teams that deploy this hybrid approach see meaningful reductions in effective false positive rates — cases that reach a human reviewer tend to be more substantive, because the low-anomaly noise has been deprioritized. SAR conversion rates (the percentage of reviewed alerts that result in SAR filings) often improve because analyst attention is better concentrated on the cases that warrant it.

The Explainability Requirement: Why ML Alone Is Not Defensible

The FFIEC BSA/AML Examination Manual requires institutions to document the rationale for SAR decisions — why a particular activity was deemed suspicious, and what information supported that determination. A pure ML score does not produce this documentation on its own. "The model scored this account at 0.87" is not an adequate SAR narrative or case rationale.

We're not saying ML scores can't be part of a documented rationale. They can. An analyst writing a case narrative can reference that the transaction pattern scored significantly above the account's behavioral baseline, combined with the specific rule violation that generated the alert. That is a defensible, explainable case record.

What is not defensible is an automated workflow that routes alerts to SAR filing based on ML score alone, without analyst review and documented reasoning. Regulators have been explicit about this — ML-assisted decisions are acceptable; fully automated SAR filing without human review and case documentation is not.

When to Use Each Approach

Scenario	Best Approach	Reason
Structuring / CTR aggregation	Rules	Known threshold ($10K), deterministic, examiner-expected
High-velocity P2P transfer bursts	Hybrid	Rules flag threshold breach; ML scores anomaly vs. baseline
New account rapid transaction ramp	Rules (with time-window)	No behavioral history for ML; rules capture account-age patterns
Round-tripping / mule network detection	Network analysis + ML	Rules can't model multi-hop fund flows; ML graph features do
Geographic anomaly on established account	Hybrid	Rules flag geo-deviation; ML scores vs. travel history baseline

The practical conclusion is that rules and ML scoring are not competing architectures — they're complementary layers with different strengths. The program design question is how to combine them in a way that produces an auditable, explainable case record while reducing the noise that consumes analyst capacity.

Velocity Rules vs. ML Scoring: When to Use Each

What Velocity Rules Do Well

Where Rules Fall Short

What ML Scoring Actually Contributes

The Explainability Requirement: Why ML Alone Is Not Defensible

When to Use Each Approach

AML's False Positive Problem: Why 95% Is the New Baseline

CTR Thresholds and Structuring Detection in High-Volume Digital Banks

The SAR 30-Day Window: What Digital Banks Get Wrong

What Velocity Rules Do Well

Where Rules Fall Short

What ML Scoring Actually Contributes

The Explainability Requirement: Why ML Alone Is Not Defensible

When to Use Each Approach

Continue reading

AML's False Positive Problem: Why 95% Is the New Baseline

CTR Thresholds and Structuring Detection in High-Volume Digital Banks

The SAR 30-Day Window: What Digital Banks Get Wrong