This article provides an overview of how to evaluate your trained version before deploying your model. Use the following sections as a guide to view model status and the potential strength of predictions for your model before you deploy.
In this article:
From the Model Dashboard, click View Model Details to view the details and status of the model or view the status for all models from the Model Dashboard. The goal prior to deploying is to have a model with a status of Trained.
The following table describes possible model statuses:
|Requires Publish||Model must be saved and published.|
|Waiting to be Trained||Model waiting to be trained. Save and publish to start training.|
|In Training||Model currently in training but not yet trained.|
|Training Failed||Model must be revised and retrained.|
|Data Preparation Failed||Data modification required to make model viable.|
|Trained||Model trained and ready to deploy.|
|Deployed||Model currently deployed and can be retrained or Undeployed.|
Tealium Predict offers two types of strength ratings: static ratings for each trained version, and dynamic ratings for each deployed model. The strength rating assigned to each trained version of each model provides an easy way to understand rating of the quality (strength) of the training and the model which resulted from it. For more information about strength scores, see Model Scores and Ratings.
Models not yet deployed are not assigned a strength score. Training is a one-time event and each retraining results in a new version number. For example, when you retrain Version 1, the resulting version is Version 2. Each retraining is a new and separate event, which makes this type of strength rating static and specific to each version. The rating for a version does not change over time.
The quality of any model is a relative judgment, not an absolute fact. Different teams have different needs and goals for their models, as well as different levels of sophistication in their modeling and testing abilities and varying quality in their input datasets. For these reasons, model strength ratings are not regarded as absolute. The intention is to use the ratings as a general guideline for quality.
A Confusion Matrix is a key tool used to evaluate a trained model. During the training and testing process that runs automatically when you create or retrain a model, Tealium Predict attempts to "classify" the visitors during the Training Date Range into two groups: true and false. These two groups reflect whether a user actually did the behavior signalled by the Target Attribute of your model, such as made a purchase or signed up for your email list.
Access the Confusion Matrix for your trained model by navigating to the Model Dashboard > View Model Details > Training > View Training Details. The Confusion Matrix allows you to easily view the accuracy of these predictions by comparing the true or false prediction value with the true or false actual value. There are four possible scenarios, as described quadrants description below.
This comparison is made possible by the fact that the model trains on historical data (the Training Date Range). Once your model is deployed, the scenario changes. If your deployed model makes a prediction for a particular visitor today and the prediction timeframe is "in the next 10 days", results are not available for up to 10 days to determine whether the value returns as true or false.
The following list describes the four quadrants of the Confusion Matrix:
You can use the values of the quadrants to calculate the two constituent parts of F1 Score (Recall and Precision).
The following list describes how the values are calculated:
The Confusion Matrix uses a threshold value of 0.5 to differentiate between predicted positive and predicted negative values.
In Tealium Predict, the ROC/AUC (under the curve) is a performance measurement reported for a trained model in the Model Explorer. In industry terms, the ROC is a true positive rate calculated as the number of true positives divided by the sum of the number of true positives and the number of false negatives. The ROC describes how well a model predicts the positive class when the actual outcome is positive. The true positive rate is also referred to as Sensitivity. Access the ROC/AUC Curve for your trained model by navigating to the Model Dashboard > View Model Details > Training > View Training Details.
The Receiver Operating Characteristics (ROC) curve and the area under this curve (referred to as AUC, for area under curve) are common tools in the machine learning community for evaluating the performance of a classification model.
The ROC curve shows the trade-offs between different thresholds and consists of a plot of True Positive Rate (y-axis) against the False Positive Rate (x-axis), as follows:
Ideally, the results allow you to distinguish between True and False classes. The model always predicts the correct answer.
The following example depicts a "perfect model" for Probability Distribution and the ROC curve:
For an extreme contrast, the following example depicts the Probability Distribution and ROC curve in scenarios where your model always predicts the wrong answer. Always labels True as False, and vice versa.
The following example depicts a poor scenario, which is defined as a model that is incapable of distinguishing between True and False classes. In this scenario, the Probability Distribution displays two large curves directly on top of each other.
In a realistic ROC curve, 0.5 < AUC < 1.0 displays smaller values on the x-axis of the plot to indicate lower false positives and higher true negatives. Larger values display on the y-axis of the plot to indicate higher true positives and lower false negatives.
When predicting a binary outcome, it is either a correct prediction (true positive) or not (false positive).
The Training Details screen for any version of any trained model shows a probability distribution of the predictions made by the model during training.
The two colored curves of this chart represent the distributions of true and false predictions that the model made during training. Since the model training process uses historical data and you know whether each visitor actually performed the target behavior, it is possible to test the model by comparing the predictions for historical visitors versus the actual outcomes. The purpose of this comparison is to set aside a portion of the training dataset as the test subset.
The probability distribution compares predictions against actual values for the visitors in the test subset. Visitors who were part of the True class (did perform the behavior) are displayed as part of the teal-colored curve and visitors who were part of the False class are part of the orange-colored curve.
The following list describes characteristics of an ideal probability distribution:
Next step is to retrain or deploy your model. To retrain your model in an attempt to increase the strength score prior to deploying, see Retraining a Model. To skip retraining and deploy, see Deploying a Model.