This article provides an overview of how to evaluate your trained version before deploying your model.

In this article:

Table of Contents Placeholder

Evaluate Model Status and Strength

Use the following sections as a guide to view model status and the potential strength of predictions for your model before you deploy.

View your Model Status

Click a model to view the details and status of the model or view the status for all models from the summary page. The goal prior to deploying is to have a model with a status of Trained.

The following table describes possible model statuses:

Model Status Description
Requires Publish Model must be saved and published.
Waiting to be Trained Model waiting to be trained. Save and publish to start training.
In Training Model currently in training but not yet trained.
Training Failed Model must be revised and retrained.
Data Preparation Failed Data modification required to make model viable.
Trained Model trained and ready to deploy.
Currently Deployed Model currently deployed and can be retrained or Undeployed.

Evaluating the Strength Details of Trained Versions

Tealium Predict offers two types of strength ratings: static ratings for each trained version, and dynamic ratings for each deployed model. The strength rating assigned to each trained version of each model provides an easy way to understand rating of the quality (strength) of the training and the model which resulted from it.

Models not yet deployed are not assigned a strength score. Training is a one-time event and each retraining results in a new version number. For example, when you retrain "Version 1", the resulting version is "Version 2". Each retraining is a new and separate event, which makes this type of strength rating static and specific to each version. The rating for a version does not change over time.

The quality of any model is a relative judgment, not an absolute fact. Different teams have different needs and goals for their models, as well as different levels of sophistication in their modeling and testing abilities and varying quality in their input datasets. For these reasons, model strength ratings are not regarded as absolute. The intention is to use the ratings as a general guideline for quality.

Model Strength Scoring

Predict Strength Scores Magnified.jpgModel strength ratings for Predict provide an easy-to-understand rating of the quality (strength) of each version of each model. The rating system is comprised of four categorical labels Poor, Fair, Good, or Excellent. The labels are based on an F1 Score, which is the typical metric used to evaluate the quality of a propensity model.

The strength rating displays next to each version on the Model Explorer page, in the Training Details panel, and in the Tiles view on the Overview page (for the latest trained version).

In scoring, the obvious goal is to achieve a higher strength score prior to deploying to ensure the most accurate predictions. F1 Score values are categorized using the following scale and trained models are assigned one of the following strengths:

  • Excellent - F1 score is greater than 0.80
  • Good - F1 score is greater than 0.60 but less than 0.80
  • Fair - F1 score is greater than 0.50 but less than 0.60
  • Poor - F1 score is less than or equal to 0.50

The strength score contains a total score, the F1 Score, Recall, Precision, and Accuracy. For a detailed explanation of these scoring elements, see Model Scores and Ratings.

Next Steps

Next step is to retrain or deploy your model. To retrain your model in an attempt to increase the strength score prior to deploying, see Retraining a Model. To skip retraining and deploy, see Deploying a Model.