AML risk scoring quantifies the financial-crime risk of each customer (and where appropriate, each transaction). It is the operational expression of the risk-based approach (RBA) at the heart of FATF Recommendation 1. A well-built risk scoring model focuses the team's effort on high-risk relationships, applies lighter monitoring to low-risk volume, and produces a defensible answer to a supervisor's "why was this level of due diligence applied?" question. This HowTo walks through the seven steps of building one from scratch.
Step 1: Identify Risk Factors
The first job is naming the factors that drive customer risk. Industry practice covers three dimensions:
Customer factors:
- Customer type (individual, SME, corporate, financial institution, NPO)
- Industry (high risk: cash-intensive businesses, crypto, gambling, arms trade, precious metals, real estate)
- PEP status (foreign PEP, domestic PEP, ex-PEP, RCA)
- Ownership transparency (UBO identifiable, layered ownership present)
- Customer age and relationship history (new customer vs long-established)
Geographic factors:
- Customer nationality / country of residence
- Trading countries
- FATF grey/black list status (as of 2025 grey list includes: Burkina Faso, Cameroon, Democratic Republic of Congo, Haiti, Mali, Mozambique, Myanmar, Nigeria, Philippines, South Africa, South Sudan, Syria, Vietnam, Yemen; black list: DPRK, Iran)
- High-risk jurisdiction exposure
Product/transaction factors:
- Product or service used (cash-intensive, SWIFT cross-border, crypto, mobile wallet, correspondent banking, prepaid card)
- Expected transaction volume
- Behavioural history (anomaly count, prior STR-eligible activity, previously closed accounts)
Jurisdiction-Specific Factors
EU AMLD5 Annex III lists high-risk factors that any EU institution's model must capture: certain customer types, business relationships in unusual circumstances, customers in high-risk jurisdictions, etc. UK MLR 2017 mirrors this. JMLSG Part I, Chapter 4 provides further detail. Your model should map these regulatory anchors to specific factors.
Step 2: Design the Factor Scoring Scale
Define a scale for each factor. The common approach is 1-5:
| Score | Meaning | Example (Geography) |
|---|---|---|
| 1 | Very low risk | UK, Germany, France, Sweden |
| 2 | Low risk | Norway, Switzerland, Australia |
| 3 | Medium risk | Latin America, parts of Eastern Europe |
| 4 | High risk | FATF grey-list countries |
| 5 | Very high risk | FATF black-list countries, sanctioned regimes |
Some models use 1-10 or 1-100; the choice is stylistic. What matters is consistent use and documentation.
Practical recommendation: tabulate the scoring matrix and submit to compliance review. Every factor value must have a documented rationale for the score it receives.
Step 3: Set Factor Weights
Factors are not equal. PEP status carries more weight than relationship age. Geography carries more than customer type.
Typical distribution (totalling 100%):
- Customer type: 15%
- Industry: 15%
- PEP/sanctions status: 20%
- UBO transparency: 10%
- Geographic risk: 20%
- Product/service risk: 15%
- Account history/behaviour: 5%
Two approaches to setting these:
Expert-driven. The compliance team agrees weights through structured discussion. Fast, comprehensible, but not empirically validated.
Data-driven. Statistical analysis on historical SAR/STR cases — which factors actually predicted real risk? Logistic regression or gradient boosting. Stronger but requires data (at least several hundred SAR cases).
A new institution starts expert-driven; 12-18 months in, data-driven recalibration becomes feasible.
Step 4: Combine Scores into a Customer Risk Level
Each customer's total weighted score:
Total Risk Score = Σ (Factor_i × Weight_i)
Map to risk levels:
| Total Score | Risk Level | Applied Treatment |
|---|---|---|
| 1.0 - 2.0 | Low | Simplified Due Diligence (where eligible); standard monitoring |
| 2.1 - 3.5 | Medium | Standard Customer Due Diligence; normal monitoring |
| 3.6 - 4.5 | High | Enhanced Due Diligence (EDD); tighter monitoring |
| 4.6 - 5.0 | Very high | EDD + senior management approval; close monitoring; more frequent review |
The threshold ranges (2.0, 3.5, 4.5) emerge from the model. The distribution they produce — what proportion of customers ends up where — must be known. Typical target: 70-80% low, 15-20% medium, 3-8% high, 0.5-2% very high.
If the distribution falls outside that envelope (e.g. 30% of customers flagged as high risk), either the thresholds or the weights are mis-calibrated.
Step 5: Apply Segmentation
A single model is not optimal across customer types. Risk factors for an individual customer differ from those for a corporate. Typical segments:
- Individual — retail: PEP, geography, income/account inconsistency weighted higher
- Individual — affluent/HNW: + source of wealth, multiple accounts, multi-country footprint
- SME: Industry, ownership transparency, expected volume
- Corporate: UBO structure, financial statement traceability, multi-jurisdiction operations
- Financial institution (correspondent): Regulator quality, supervisory rating, AML programme quality
- NPO: Donation sources, operating countries, beneficiary populations
Running a fully separate scoring matrix per segment is operationally heavy; the practical compromise is the same factor set with segment-specific weights.
Step 6: Validate the Model
Before going live, the model must be tested on retrospective data:
- Historical backtesting. Run the model over 12-24 months of customer portfolio. Confirm that those flagged high risk are in fact the population that produced SARs.
- False positive analysis. How many customers flagged high risk did not produce a real case? 100% false positive is impossible; 70-90% is normal (lower than sanctions screening FP because risk scoring is broader).
- False negative analysis. Of the SAR cases you actually filed, how many had been flagged low risk? These are the model's misses. Should be close to zero.
- Sensitivity analysis. How much does the result move when a single factor changes? Over-sensitive models (a 1-point factor change flips the level) are unstable; under-sensitive models (50 points and the level stays the same) lack discrimination.
- Compliance review. Model documentation (factor definitions, weights, thresholds, validation results) approved by the MLRO and, depending on the institution, internal audit.
EU AMLD5 Article 8 requires risk assessment methodology to be documented and updated. FCA's SYSC chapters expect the same. The documentation is not optional — it is what a supervisor reviews.
Step 7: Operational Integration and Continuous Monitoring
Once live:
Onboarding integration. Score calculated at application time; risk level branches the KYC flow (standard vs EDD). High risk triggers additional documentation, senior management approval.
Periodic re-rating. Customer risk is not static. Annually (six-monthly for high risk), score is recalculated. Factor values may have changed (customer changed country, gained PEP status, increased transaction profile).
Event-driven re-rating. Specific events trigger immediate recalculation — SAR filed on the customer, large anomaly, adverse media hit, new sanctions match.
Monitoring threshold binding. Risk level drives transaction monitoring thresholds. A high-risk customer's £100K single transfer triggers an alert; for a low-risk customer the threshold may be £500K. This is one of the strongest false positive reduction techniques.
Case management. Higher risk levels mean closer transaction monitoring; an assigned analyst maintains a portfolio of high-risk customers; periodic review calendars auto-generate.
Model performance tracking. Monthly dashboard: risk-level distribution, SAR-to-score correlation, false positive trend, model drift. Reported quarterly to the AML governance committee.
Worked Example: Scoring a Customer
To make the method concrete, work through a typical customer:
Profile. James K., 47, UK national, resident in London, owner of a construction-materials import SME, expected monthly transaction volume £150K-400K, five-year relationship with the bank, no SAR history, business partners in the UAE and Pakistan.
Factor scores:
| Factor | Value | Score | Weight | Contribution |
|---|---|---|---|---|
| Customer type | SME-owning individual | 2 | 15% | 0.30 |
| Industry | Construction materials import | 3 | 15% | 0.45 |
| PEP status | None | 1 | 20% | 0.20 |
| UBO transparency | Clear (self-owned) | 1 | 10% | 0.10 |
| Geographic risk | UK resident + UAE/Pakistan ties | 3 | 20% | 0.60 |
| Product/service | Standard current account + SWIFT | 2 | 15% | 0.30 |
| Account history | 5 years clean | 1 | 5% | 0.05 |
Total score: 2.00 → just below medium → low risk segment.
Treatment: standard CDD. Annual review. High-value transactions (£500K+) trigger additional scrutiny because of the geographic partnership profile.
Now vary the scenario: the UAE partner is receiving payment via a Pakistan-incorporated intermediary, and Pakistan is on the FATF grey list. Geographic risk score 3 → 4. New total: 2.20 → still medium. Treatment shifts: standard CDD + six-monthly review.
Second scenario: James K. is appointed to a local council seat (domestic PEP under FATF). PEP score 1 → 3 (domestic PEP, risk-based per FATF Recommendation 12 guidance). New total: 2.60 → bordering on medium-high. Annual review compresses to six-monthly; transaction monitoring sensitivity raised.
Third scenario: James K. signs an export contract with a Syrian counterparty. Geographic risk jumps to 5 (sanctions-affected jurisdiction). New total: 3.40 → high risk threshold approached. EDD trigger; senior management approval for continued relationship.
This worked example shows how the model responds to real customer behaviour — it is dynamic, not a static snapshot.
Model Governance: Who Owns Which Decision?
The risk scoring model is not a technical tool; it is a regulator-exposed decision structure. Governance lines:
MLRO. Owner of the model documentation; every material change (new factor, weight change, threshold shift) requires MLRO sign-off. In a supervisory review the MLRO is accountable for explaining the model.
Compliance. Tracks model application day-to-day; monthly reporting; pattern detection (model drift, anomalous analyst output, segment performance).
Risk function. Engages with the model from the institution-wide risk perspective, particularly the proportion of customers landing in high-risk segments. Risk committee (typically monthly) reviews model performance.
Internal audit. Annual model validation; independent test results reported to the board.
IT / data engineering. Operates the technical implementation; data-quality monitoring, model integration, dashboard production. Not a decision-maker on model parameters — operational maintenance only.
Board / governance committee. Annual report on model changes; approval threshold defined. Board adjusts risk appetite which feeds back into the model.
Decision-rights matrix (RACI-style):
| Decision | MLRO | Compliance | Risk | IT | Board |
|---|---|---|---|---|---|
| New factor added | Approve | Recommend | Consult | Implement | Inform |
| Factor weight change | Approve | Recommend | Consult | Implement | Inform |
| Threshold change | Approve | Recommend | Consult | Implement | (if material) Inform |
| Annual model validation | Approve | Participate | Participate | Data supply | Outcome inform |
| Model retirement / redesign | Approve | Participate | Participate | Implement | Approve |
This structure produces a clean answer to "who owns the model and who took the decisions" at supervisory review.
How Risk Scoring Connects to Other AML Processes
Risk scoring is not isolated — it wires into:
- Onboarding flow: the risk score branches the KYC path (standard vs EDD)
- Screening intensity: higher risk → lower match threshold; lower risk → higher
- Transaction monitoring thresholds: risk level sets the trigger amounts for monitoring rules
- Review cycles: high risk annual → six-monthly, very high → quarterly
- SAR assessment: a transaction looks suspicious; a high-risk customer triggers a SAR faster
- EDD depth: source of funds documentation depth, monitoring cadence calibrated to risk level
For the combined approach see the segment-based threshold table in our how to reduce AML false positives article — it maps risk level + list type combinations to threshold matrices.
Common Mistakes
Geography + PEP only. Industry and account behaviour are critical; models without them underdiscriminate.
Too few segments. A single model across all customers is either over-sensitive (alert flood) or under-sensitive (missed risk).
Frozen weights. Without annual recalibration the model goes stale — it does not adapt to changing risk patterns.
"Nothing" for low risk. Low risk means standard monitoring, not absence of monitoring. In a supervisory review, "why was no monitoring applied?" has no defensible answer.
Documentation gap. If how the model works is not written down, decisions look unreasoned at inspection.
Frequently Asked Questions
Is risk scoring automated or manual?
Hybrid. Calculation is automated. Some factor assessments (e.g. UBO transparency, behaviour anomaly) are algorithm + analyst judgement. High risk levels require senior management approval — that step is manual. Hybrid is what scales with human accountability.
Is the risk score updated daily?
No, expensive and unnecessary. Standard practice: annual regular review (six-monthly for high risk), plus event-driven recalculation (new PEP status, sanctions match, anomaly, SAR). Some institutions run nightly batch to refresh changing factors (geography, expected volume) but this is optional.
Are AI/ML models required?
No. Rule-based weighted-sum models are entirely accepted and most institutions use them. ML models (gradient boosting, neural networks) can offer better predictive power but must be explainable — at supervisory review you must articulate why this customer scored this way. Black-box ML models are problematic for compliance use.
What if a low-risk customer triggers a SAR requirement?
File the SAR. Risk level does not change SAR obligation. Risk level determines proactive monitoring intensity; once a specific transaction is suspicious it is reported regardless of risk level. SAR obligation under MLR 2017, AMLD5 and equivalents is unconditional.
Should we share the risk score with the customer?
No. The risk score is an internal assessment; not shared. When additional documentation is requested for a high-risk customer, "as part of our internal AML assessment" is a sufficient general rationale. Sharing the specific score or formula is both operationally risky and beyond regulator expectation.
How Legichain Helps
Legichain's AML platform includes a configurable risk scoring engine. Factor templates for customer type, geography (with auto-updating FATF grey/black list status), PEP/sanctions integration, product and behaviour are built in; weights and thresholds are configurable from the admin panel. Segment templates align with EU AMLD5 Annex III and UK JMLSG Chapter 4 high-risk indicators.
Periodic recalculation runs on cron; event-driven triggers (sanctions match, anomaly score) launch re-rating. Model documentation auto-generates as PDF — the inspection-ready model report is a single click.
