Blog Posts Process Analysis

Unsupervised Learning Part 2: The AML Connection

Blog: Enterprise Decision Management Blog

Random pattern

Putting innovation into production is a big theme at FICO, as we commercialize analytic breakthroughs from FICO Labs into the products our customers rely on, worldwide. Recently, this has included applying advanced unsupervised learning to money laundering, one of the many domains in which FICO technology fights financial crime.

In my first blog about unsupervised learning, I took a deep dive into this machine learning technique, which draws inferences in the absence of outcomes. In a nutshell: “Good unsupervised learning requires more care, judgement and experience than supervised, because there is no clear, mathematically representable goal for the computer to blindly optimize to without understanding the underlying domain.”

Springboarding from that blog, today’s post covers three categories of unsupervised learning that FICO has investigated, refined and put into our anti-money laundering (AML) solutions.

The State of the Art in Unsupervised Analytics

Category 1: Finding distance-based outliers relative to training points

This category of unsupervised learning quantifies “outlierness” under the principle that if a query point is close to many training points, it is considered ordinary, but if it is further from them, then it should score higher (e.g., to denote greater outlierness). These are the most well-known and intuitive approaches, and what a data scientist would typically respond with if given a pop quiz to find outliers in multi-dimensional data. Here are two classic techniques:

These techniques hinge on the choice of metric, and excess feature cross-correlation can be a big problem.  A Mahalanobis distance may help somewhat with correlation, but performs poorly with the frequently encountered non-Gaussian distributions and categorical features. Focusing on the difficulties involved in defining a proper metric becomes a matter of art and science trying to deal with cross-correlation and improper variable scaling, which emphasizes some outliers while being less sensitive to others.

Category 2: Machine learning of underlying data

Unsupervised learning methods that adapt to underlying data present a more sophisticated approach with explicit machine learning. They are less commonly used and understood than the Category 1 methods, but have some support in published scholarly and scientific literature. Based on ML concepts, these methods are more adaptive to the underlying data even when they have complex distributions and correlations.

  • An explicit dot-product/metric is necessary in the kernel function and has major effects on results
  • A significant number of training data must be stored for scoring
  • The raw score has no quantitative interpretation and often has a very ugly empirical distribution

Furthermore, the training procedure for the OCSVM, like most support vector models, does not scale sufficiently to the sizes of data sets now commonly encountered.

Category 3: Probabilistic and topological detection

Probablistic and topological methods showcase FICO’s latest machine learning innovations; our data science team is unaware of any other technique that has all the associated advantages. FICO has developed and implemented these in our labs from scratch, mindful of our experience with the previously discussed varieties of unsupervised learning.

Chart showing feature differences

Advanced Analytics for AML

FICO’s advanced analytics for anti-money laundering incorporates our most sophisticated machine learning-based outlier detection technologies. One key feature of our product is a quantification, from transaction and other data, of the degree of unusual and risky behavior that a few customers exhibit relative to the bulk of low-risk customers. Our fundamental innovations in unsupervised modeling and outlier scoring have improved the sophistication, palatability and success to tackle one of the world’s most elusive and disturbing channels: money laundering and associated crimes against humanity.

Follow me on Twitter @ScottZoldi.

The post Unsupervised Learning Part 2: The AML Connection appeared first on FICO.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples