This article defines general statistical modeling terminology, terms specific to Tealium products, and terms used in the Tealium Predict ML interface.
In this article:
An audience is defined as a group of visitor profiles that share a set of attribute conditions and used to trigger vendor actions (connectors) in real-time. In Tealium Predict, the output attributes from your models are used to create one or more audiences for which to target your marketing efforts.
In Tealum Predict, the confusion matrix, also known as an error matrix, is a performance measurement reported for a trained model in the Model Explorer that compares actual and predicted values. In industry terms, a confusion matrix uses a set of test data for which true values are known and then displays actual and predicted values in table format to allow you to visualize the performance of a given algorithm.
A Data Scientist is an analytical expert that utilizes skills in technology and social science to look for trends and manage data using industry knowledge, contextual understanding, and skepticism of existing assumptions to reveal solutions to business challenges.
A deployed model refers to a model that has been "trained" and then deployed to populate prediction values into your customer profiles in Tealium AudienceStream.
Machine learning refers to a subfield of artificial intelligence that focuses on enabling computers to learn without human guidance by recognizing patterns. Machine learning uses a set of predetermined rules to remember the patterns, analyze output, and create a model to explain the patterns and guide the future behavior. In cases where you know what data you want, machine learning accelerates the path to acquiring the desired data. In cases where you do not know exactly what you want or a pattern to identify, machine learning can find a pattern and reveal results that you can use to move forward with acquiring the data you need.
In Tealum Predict, a model represents the behavior you are predicting within a specific timeframe, such as a purchase, conversion, or any customer behavior tracked in AudienceStream. Models are created using an algorithm and the results are used to explain patterns and predict future outcomes.
The Model Explorer refers to an interactive section of the product interface where you can view performance measurements for each model in each stage and fine-tune the model with actionable items from the interface, such as retrain or deploy.
The model strength refers to a combination of metrics and acceptable thresholds used to assign a score to a model. These scores are used to determine model quality and performance and the ability of the model to perform in the real world. The model score displays in the Model Explorer as "Excellent", "Good", "Fair", or "Poor".
In Tealium Predict, the output attribute is the attribute created as a result of model training.
The timeframe, in days weeks, or months for the which you want to predict when the action for your target attribute occurs. For example, a user's "likelihood to return" in the next "x" days, weeks, or months.
The probability distribution refers to a performance graph reported for a trained model. This graph shows how well the model separates the cases where a visitor did return and perform the action of interest as compared to cases in which the visitor did not return and perform the action. In industry terms, the probability distribution is a mathematical function in which the outcome provides probabilities of the occurrence of different outcomes of an experiment, and thus the probability of a predetermined event to occur.
The ROC/AUC (under the curve) refers to a performance measurement reported for a trained model. In industry terms, the ROC is known as a true positive rate calculated as the number of true positives divided by the sum of the number of true positives and the number of false negatives. The ROC describes how well a model predicts the positive class when the actual outcome is positive. The true positive rate is also referred to as sensitivity.
The prediction accuracy for a model degrades over time. When you retrain a Tealium Predict model with new data, the prediction accuracy increases and remains more accurate over a longer period of time.
The model strength score provides ratings that grade the quality (strength) of each version of each model. The rating system is comprised of four categorical labels Poor, Fair, Good, or Excellent. The strength score is the total score for a model, including the F1 Score, Recall, Precision, and Accuracy. For a detailed explanation of these scoring elements, see Model Scores and Ratings.
The target attribute is AudienceStream attribute selected to define your model and represent the user action being predicted. These attributes are visit or visitor-level booleans selected to signal that an action has been performed. For example, a boolean visit attribute named "Has Purchased" signals that a purchase event has occurred during a visit.
In Tealium Predict, Training refers to the stage in which a model consumes and analyzes data for a predetermined period of time to be used for predictions. The size and quality of the data used during this stage is an important factor in the accuracy of results when you deploy the model.
In Tealium Predict, the trained version of a model refers to a singular instance of training a model. Every machine learning model has a version and each version is trained with data used to accurately make predictions.