Even before the rise of AI, fraud model validation was an incredibly difficult task. But with the explosion of generative AI (gen AI)—and the more recent ascension of agentic AI—testing, evaluating, and verifying fraud models has become a monumental challenge.
Cybercriminals are exploiting technological advancements to execute AI-driven scams, like account takeovers, phishing, and security identity fraud, that look and sound authentic. Meanwhile, modelers are now using gen AI and agentic AI to develop fraud prevention systems. Those AI-based models, however, must be thoroughly vetted to ensure accuracy, reliability, and compliance with regulations.
Chandrakant Maheshwari, the lead model validator at Flagstar Bank and one of the key authors of ACAMS’ CAM7 course work on anti-money laundering (AML) compliance, has decades of experience in financial crime prevention and fraud modeling. He is the author of a forthcoming book on financial crime risk modeling, “Validating Financial Crime Risk Models: A Guide to Managing Data Quality, Transaction Monitoring, and Compliance,” scheduled to be published by Springer later this year.
Recently, ProSight spoke with Maheshwari about fraud trends, validation best practices, the limitations of traditional back-testing for financial crime, the pros and cons of buying versus building models, and the complexities that AI brings to fraud model validation.
ProSight: What types of fraud are the most difficult for banks to manage today?
Maheshwari: The hardest fraud to manage is the kind that looks like legitimate behavior until it is too late. Account takeover, authorized push payment fraud, and first-party fraud all exploit the same data signals that banks use to approve genuine transactions. Synthetic identity fraud compounds this because the fraudulent customer never existed in any prior data set. The model has no baseline to compare against. What makes these categories especially difficult is not their technical complexity. It is the speed. Fraud risk manifests in minutes to days. By the time a pattern is confirmed, the loss has already occurred.
ProSight: What are the structural differences between anti-fraud and AML model architectures?
Maheshwari: The most important structural difference is rarely discussed: money launderers are internal customers. They hold accounts, maintain relationships with the institution, and use its products to move illicit funds. The financial system is their laundering mechanism. Fraudsters, on the other hand, are typically external actors targeting the bank or its customers. The financial system is their victim. That distinction drives everything about how models are built and validated.
AML monitoring must be holistic, tracking behavioral patterns across a customer’s full history over months. Fraud monitoring, in contrast, is transactional and time-critical, designed to stop a loss within minutes. The validation challenge is also structurally different. Fraud models can be back-tested against confirmed loss events. AML models cannot.
A suspicious activity report is not a confirmed case of money laundering. The AML validator must assess whether the model is well-designed to detect illicit behavior, not whether it actually caught it. That epistemological gap is what makes AML model validation a discipline in its own right.
ProSight: How are financial institutions using AI, including gen AI and agentic AI models, to combat fraud?
Maheshwari: Machine learning has been part of fraud detection for over a decade. What is new is the application of generative AI to the investigative layer. Banks are beginning to deploy large language model-based systems to draft suspicious activity report (SAR) narratives, summarize alert context, and support analyst decision-making. Agentic AI takes this further by orchestrating multiple tasks autonomously—retrieving transaction history, cross-referencing typology libraries, and generating a recommended disposition. The productivity gains are real—but so are the governance risks. These systems must be validated like any other model under SR 11-7. The output is a compliance decision, and the accountability remains with the institution.
ProSight: What specific steps do banks need to take to validate their fraud models? Can you recommend any best practices?
Maheshwari: Validation must begin with the fraud risk assessment. Every model assumption should be traceable back to the institution’s documented risk profile. From there, validators should assess conceptual soundness, data quality, and ongoing performance. For fraud models specifically, the labeled data set is critical. Confirmed fraud cases are sparse and may not represent the full range of attack vectors. Validators therefore must challenge whether the training data reflects current typologies or whether it is already outdated.
Threshold validation, segment-level performance analysis, and fairness audits across customer demographics are not optional. They are the baseline. Documentation must support a full audit trail of every tuning decision.
ProSight: How do the standards for validation shift when firms incorporate agentic AI into their fraud models?
Maheshwari: The entire pipeline becomes the model. Under SR 11-7, a model is defined by its inputs, processing logic, and outputs. An agentic system that retrieves documents, applies reasoning, and generates a recommendation meets that definition at every stage.
Validators must assess each component—the retrieval architecture, the prompt design, the inference layer, and the output formatting. The challenge is that agentic systems are not deterministic—i.e., the same input can produce different outputs depending on retrieval results and model version. Validation must therefore establish reproducible testing protocols, define acceptable variance bands, and confirm that human oversight is embedded before any consequential decision is finalized.
ProSight: Are AI models being increasingly adopted because of the limitations of traditional back-testing for financial crime?
Maheshwari: Partly, yes. But the more precise answer is that AI adoption is driven by the limitations of rule-based systems at scale, not by back-testing failures specifically. Rules degrade. Criminals adapt. A threshold set 12 months ago may be systematically exploited today. Machine-learning models can detect patterns that no rule anticipated.
The back-testing problem is a separate issue and a serious one. In AML, there is no ground truth. You cannot back-test against confirmed money laundering the way you back-test a credit model against actual defaults. AI does not solve that problem, but it does change the shape of it.
ProSight: How should banks determine whether to buy or build their fraud models, and what are the pros and cons of employing external models?
Maheshwari: The most defensible answer is that banks should neither buy nor build exclusively. Rather, they should use a hybrid approach. Vendor models bring breadth. They are trained on data from thousands of institutions across geographies and fraud typologies. Emerging attack patterns, synthetic identity networks, and cross-institutional mule activity are things a single bank simply cannot see from its own transaction history alone.
Relying entirely on internally built models means the institution is always one step behind threats it has not yet encountered. At the same time, vendor outputs are a starting point, not a final answer. The vendor model does not know your customer base, your product mix, or your institution-specific risk profile. Banks that layer their own models on top of vendor outputs—refining scores, adjusting thresholds, and adding institution-specific features—get the best of both worlds. The vendor provides coverage, while the internal model provides precision.
The validation obligation applies to both layers equally, and the interaction between internal and external models must itself be validated. A well-tuned internal refinement layer on top of a poorly understood vendor model only compounds the opacity, rather than resolving it.
ProSight: Are there any fraud model validation trends that you will be following closely throughout the remainder of 2026?
Maheshwari: Two areas stand out. First, fairness and bias auditing is becoming a front-line validation requirement, not an afterthought. Segment-level performance disparities are a regulatory and reputational risk. Second, the integration of the fraud risk assessment into model validation and governance is gaining attention.
*The opinions expressed by Chandrakant Maheshwari in this article are his own and do not reflect those of his employer.
By: Robert Sales