Understanding how your models are performing is crucial to having faith in an AI system. Continual has many built-in tools to expose model insights and gain trust in the system.
Whenever a model version is created, Continual runs a series of experiments across different model algorithms and optimizes performance across a specified performance metric. Users can always review the results of these experiments for any models run.
This shows the performance of each experiment in the set across the major performance metrics. Experiments are always viewable with every model version in the Web UI, and you may additionally see the performance across training, validation, and test data sets.
Continual select the best performing experiment with regard to the performance metric on the validation dataset as the winning experiment.
For classification problems, the "Model Insights" tab of the model version will display a confusion matrix of the model results across training, validation, and test datasets.
Confusion matrices are great to identify problems such as when models are not predicting a certain class with enough frequency, often due to class imbalance issues. This can often be remedied via optimizing on a different metric (like F1 or ROCAUC), or by rebalancing your training data.
Actual vs. predictions¶
Regression problems will contain a plot of Actual vs. Predicted and Residuals to help users better ascertain the performance of the model. The Actual vs. Predicted plot would be the identity (y=x) for a perfect model.
With this plot it's easy to start determining if there is a discernible pattern in the predictions. In the case above, for example, we notice that for large actual values, our model is under-predicting. The Residual chart shows the same information, just plotting the error between real and actual instead of the actual values (I.E. the line y=0 would be a perfect model).
Regression problems will additionally have a plot for Cooke's Distance. Cooke's Distance is an outlier detection method that can be used to help determine if a problem is well-suited for a linear model. The plot show's individual data point's influence on a linear model. If there are a large number of points if it likely that a linear model is not optimal and you may with to whitelist tree-based models.
Feature importance is one of the most crucial tools to understanding the impact of features on the predictions of a model. Continual performs permutation-based feature importance on the winning experiment. The results are viewable with every model version. The values represent the average amount the performance metric changes as the feature value is permuted. Large value indicate a more important feature. Note that negative values actually indicate that the features are detrimental to the model and should be removed.
Shapley values (global)¶
In addition to the permutation-based feature importance, users can enabled Shapley values for the model. These are disabled by default, but can be enabled via the YAML's training configuration. Shapley values help explain how a feature affects a prediction with regard to its expected value. Shapley values can be calculated for every individual prediction (local), or can be summed to provide general insight into the model (global). If enabled, the Web UI will display a chart for the Shapley values in your "Model Insights" Tab.
Partial dependency plots¶
Continual also produces partial dependence plots to help understand the relationship between predictions and models. Partial Dependence plots plot the value of a feature agains the predicted values. These plots can help you understand the relationship between different features and the predictions themselves. For example, we find in the chart below that having high monthly insurance premium leads to a larger prediction with regard to the lifetime value of a customer. This is hopefully not surprising!