Seldonian | Tutorial F

Implementing a binary logistic model with the toolkit

Logistic regression is used for classification, a sub-regime of supervised learning. Binary logistic regression refers to the fact that there are two possible output classes, typically 0 and 1. Now let's implement this model using the toolkit.

First, make sure you have the latest version of the engine installed.

$ pip install --upgrade seldonian-engine

Models are implemented as Python classes in the toolkit. There are three basic requirements for creating a new Seldonian model class:

The class must inherit from the appropriate model base class: seldonian.models.RegressionModel for regression-based models and seldonian.models.ClassificationModel for classification-based models. The class must call the init method of the parent class in its own init method.
The class must have a predict() method in which it takes as input a weight vector, theta, and a feature matrix, X, and outputs the predicted continuous-valued label (for regression) or the probabilities of the predicted classes (for classification) for each sample row in X. For the special case of binary classification, the model should output the probability of predicting the positive class for each input sample. This method is often referred to as the "forward pass" for a neural network.
The predict() method must be differentiable by autograd. Effectively, this means that predict() must be implemented in pure Python or autograd's wrapped version of NumPy. There is a way to bypass this requirement to enable support for other Python libraries, which we briefly describe below.

The third requirement may seem overly restrictive. Autograd is the automatic differentiation engine we use in the toolkit, and it is what allows us to support custom-defined behavioral constraints. However, it has limited out-of-the-box support for non-native Python libraries. As stated above, it can be bypassed, but this must be done for each external library independently. We have added support for PyTorch models (see Tutorial G: Creating your first Seldonian PyTorch model), and we are in the process of adding support for scikit-learn and Tensorflow models. If you would like to request support for other external model libraries, please do so on the Engine GitHub Issues page.

Our implementation will be done using NumPy, so requirement three will be met without any additional work. Given these three requirements, the bulk of the work in creating a new Seldonian model class is typically in defining the predict() method. For logistic regression, there is a straightforward equation for predicting the probability of the positive class: $$\hat{Y}(\theta,X) = \sigma\left(\theta^{T}X\right) + b,$$ where $\hat{Y}$ are the predicted probabilities of the positive class, $\sigma(x) = \frac{1}{1+e^{-x}}$ is the sigmoid function, $\theta$ are the model weights, $X$ are the features, and $b$ is the intercept term (also called bias term). We now have everything we need to code up our new model class, which we will name MyBinaryLogisticRegressionModel.

from seldonian.models.models import ClassificationModel

class MyBinaryLogisticRegressionModel(ClassificationModel):
    def __init__(self):
        """ Implements binary logistic model """
        super().__init__()
        self.has_intercept = True

    def predict(self,theta,X):
        """ Predict the probability of 
        having positive class label for each data point
        in X. 
            i = number of datapoints
            j = number of features (including bias term, if provided)

        :param theta: The parameter weights
        :type theta: array of length j 
        :param X: The features 
        :type X: array of shape (i,j)
        :return: predictions for each class each observation
        :rtype: array of length i
        """
        Z = theta[0] + (X @ theta[1:]) 
        Y_pred = 1/(1+np.exp(-Z))
        return Y_pred

First, notice that we meet requirement 1 by calling the parent class' __init__() method from within our __init__() method. We set self.has_intercept = True, which tells the toolkit that there will be a bias term. This flag is only used when finding an initial solution to use in the optimization process if none is provided by the user. You could optionally define a method of this class that returns an initial solution to use. Requirement 2 is met with the implementation of the predict() method. Notice that the bias term theta[0] is the first element of the parameter weight array. X @ theta[1:] is one way to express the matrix multiplication $\theta^{T}X$ in Python.

At this point, this model is ready to use in the toolkit. To use this model when running the Seldonian Engine, one would do:

import MyBinaryLogisticRegressionModel
from seldonian.spec import SupervisedSpec

model = MyBinaryLogisticRegressionModel()
spec = SupervisedSpec(model=model,...)
SA.run(spec)

In the example code above, the model object is input to the spec object which is then used to run the Seldonian algorithm. In the example, ... represents the other input parameters that the spec object requires. See the Fair loans tutorial for an example of how a full spec object would be specified. This model is pretty minimal. It is designed to show you the minimum required aspects of a Seldonian model. An optional method you could implement is the gradient of the predict() method. By providing this in the spec object via the custom_primary_gradient_fn parameter, you may be able to speed up candidate selection. The engine will automatically find the gradient if you do not provide one, but it can be slow depending on the implementation of your predict() method.

Tutorial F: Creating a new Seldonian supervised learning model

Contents

Introduction

Implementing a binary logistic model with the toolkit

Summary