Blog Posts Process Analysis

Today’s Deep Dive: Innovative Unsupervised Learning in AI

Blog: Enterprise Decision Management Blog

Random pattern

Categorically, artificial intelligence (AI) can appear be an odd juxtaposition of order and disorder — we direct the AI with algorithms, yet the system produces new insights seemingly magically. This two-part blog unpacks the mysteries of two very different AI techniques: supervised and unsupervised learning.

Supervised Learning: The Workhorse of AI

Most of the well-known applications of machine learning and computational AI involve supervised learning. The modeler amasses a vast set of existing data (e.g., financial transactions, internet photographs, or the texts of tweets) and a base-level “ground truth” outcome that is already known, perhaps in retrospect or by expensive human investigation.

Equipped with any number of computational algorithms, the scientist becomes the “supervisor” whose code trains the model to reproduce, in the lab, the known outcomes with a low probability of error. The models are then deployed to live a happy life scoring credit risk and fraud likelihood, finding pictures of Chihuahuas and muffins, or flagging insulting tweets. Technically, each model computes a probabilistically weighted predicted outcome that we believe to be like those outcomes from the training examples. The state of the art for supervised learning is now well established; you can choose from dozens of comprehensive predictive analytics and neural network packages.

Unsupervised Learning: Inferences in the Absence of Outcomes

But what if there is no set of “true outcomes” known, or the ones at hand are restricted in quality or quantity? What can machine learning do for us then? This is the domain of the far trickier unsupervised learning, which draws inferences in the absence of outcomes.

Good unsupervised learning requires more care, judgement and experience than supervised, because there is no clear, mathematically representable goal for the computer to blindly optimize to without understanding the underlying domain.

The Challenge of Outlier Detection

A central task within unsupervised modeling is outlier detection: Which examples are most unlike most of their peers? Outlier detection and transaction fraud scoring provide an easy illustration:

The solution pattern for these tasks is a problem- and domain-specific transformation of the raw data into a quantitative vector space of features — up to now, exactly in line with supervised predictive modeling. This is followed by a more generic mathematical construction to yield a numerical score of the “degree of outlier-ness,” in the absence of ground-truth training outcomes.

Because there are far fewer principles, and less didactic instruction and widely available software compared to classic supervised modeling, there are even more analytic “gotchas” requiring deep analytic scientist experience and judgement. Difficulties and considerations in outlier detection include:

  1. The need to define a metric or distance. Many techniques require defining a “metric” or “distance” function between pairs of observations. One problem is that the individual components of this feature vector have qualitatively different meanings – how can one balance adding or subtracting apples and oranges, and kumquats and kangaroos?Often this is done ad-hoc or, unfortunately, without intention as the underlying algorithm method assumes a metric. What should be done in the real-life scenario of a combination of quantitative and categorical features? Supervised modeling can often be blissfully ignorant of this problem, since the quantitative optimization with known targets tends to scale and transform each feature automatically, to the degree that it contributes predictive value.

    In an unsupervised context, an explicit metric will have major influence on the scoring of outliers; this is imposed by the analytic scientist. Additionally, in a high-dimensional space, our intuitions about the properties of neighborhoods and neighborliness derived from our three-dimensional physical experience are very misleading: A randomly selected point in the training dataset is often not much further away than a point’s nearest neighbors. At FICO, we believe outlier statistics derived under these intuitions ought to be approached with caution.

  1. Computational burdens on scoring. How expensive is it, in terms of computation and in memory, to score new observations with the outlier model? Do any complex data structures need to be created for scoring? Do we need to retain a significant fraction (or all) of the training data set to score a new observation in production?
  2. Calibration and interpretation of score. If we have a number representing “degree of outlierness,” what does it mean? Does it have a well-behaved, approximately continuous score distribution under the natural data set, or is the distribution irregular, with significant delta functions or gaps? What happens when the dimension of the training set changes, i.e. are there major systematic trends?
  3. Feature cross-correlation. This is a subtle yet critical problem that gets little attention in the field. Frequently, the underlying features are designed to address a particular of the problem domain, but often there are a significant number of related, and therefore correlated, features covering some conceptual axes of the problem, but other aspects of behavior are represented by only a few features each. The effect on outlier scores may be severe. Can one balance this automatically, in a principled manner?

Requirements for Commercial-Grade Outlier Detection

Beyond clear technical issues, there are some higher-level properties that FICO scientists believe a state-of-the-art, commercial solution must address.

Check back for Part 2 of this blog, where I’ll tell you more about FICO’s latest innovations in outlier detection, and how we are applying these unsupervised learning breakthroughs in our fraud and compliance solutions.

Follow me on Twitter @ScottZoldi.

The post Today’s Deep Dive: Innovative Unsupervised Learning in AI appeared first on FICO.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples