Glossary

Accuracy

Accuracy is a metric that generally describes how the model performs across all classes. It is useful when all classes are of equal importance. It is calculated as the ratio between the number of correct predictions to the total number of predictions.

$$ Accuracy = (Number of correct predictions \over Total number of predictions) $$

For binary classification, accuracy can also be calculated in terms of positives and negatives as follows:

Artifact

Output files in any format. For example, you can record images (for example, PNGs), models (for example, a pickled scikit-learn model), and data files (for example, a Parquet file) as artifacts.

Autolog

Automatic logging allows you to log metrics, parameters, and models without the need for explicit log statements. There are two ways to use autologging:

  1. Call newron.autolog() before your training code. This will enable autologging for each supported library you have installed as soon as you import it.
  2. Use library-specific autolog calls for each library you use in your code. See below for examples.

The following libraries support autologging:

  • Scikit-learn
  • TensorFlow and Keras
  • Gluon
  • XGBoost
  • LightGBM
  • Statsmodels
  • Spark
  • Fastai
  • Pytorch

For flavors that automatically save models as an artifact, additional files for dependency management are logged.

Concept Drift

Concept Drift in machine learning is a situation where the statistical properties of the target variable (what the model is trying to predict) change over time. In other words, the meaning of the input data that the model was trained on has significantly changed over time, but that the model in production doesn’t know about the change and therefore can no longer make accurate predictions.

Confusion Matrix

A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix.

The confusion matrix shows the ways in which your classification model is confused when it makes predictions. It gives you insight not only into the errors being made by your classifier but more importantly the types of errors that are being made.

Data Drift

Data drift is one of the top reasons model accuracy degrades over time. For machine learning models, data drift is the change in model input data that leads to model performance degradation. Monitoring data drift helps detect these model performance issues.

Causes of data drift include:

  • Upstream process changes, such as a sensor being replaced that changes the units of measurement from inches to centimeters.
  • Data quality issues, such as a broken sensor always reading 0.
  • Natural drift in the data, such as mean temperature changing with the seasons.
  • Change in relation between features, or co-variate shift.

Data Versioning

Experiment

Experiment Tracking

The Newron Experiment Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results.

Hyperparameters

Hyperparameters are parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning. The prefix ‘hyper_’ suggests that they are ‘top-level’ parameters that control the learning process and the model parameters that result from it.

Metric

Metrics is a measure of the performance of a model during training and testing,

Model

A Newron Model is created from an experiment or run that is logged with one of the model flavor’s newron.<model_flavor>.log_model() methods. Once logged, this model can then be registered with the Model Registry.

Model Registry

The Newron Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of a model. It provides model lineage (which Newron experiment and run produced the model), model versioning, stage transitions (for example from staging to production), and annotations.

Model Version

Each registered model can have one or many versions. When a new model is added to the Model Registry, it is added as version 1. Each new model registered to the same model name increments the version number.

Parameter

A model parameter is a configuration variable that is internal to the model and whose value can be estimated from the given data.

  • They are required by the model when making predictions.
  • Their values define the skill of the model on your problem.
  • They are estimated or learned from data.
  • They are often not set manually by the practitioner.
  • They are often saved as part of the learned model.

Precision

Precision attempts to answer the following question: What proportion of positive identifications was actually correct? Precision is defined as follows:

$$ Precision = (TP \over TP+FP) $$

Project

Recall

Recall attempts to answer the following question: What proportion of actual positives was identified correctly? Mathematically, recall is defined as follows:

$$ Recall = (TP \over TP+FN) $$

Run