## Understand Logistic Regression and sharpen your PyTorch skills

To understand better what we’re going to do next, you can read my previous article about logistic regression:

So, what’s our plan for implementing Logistic Regression with PyTorch?

Let’s first think of the underlying math that we want to use.

There are many ways to define a loss function and then find the optimal parameters for it, among them, here we will implement in our `LogisticRegression`

class the following 3 ways for learning the parameters:

- We will rewrite the logistic regression equation so that we turn it into a least-squares linear regression problem with different labels and then, we use the closed-form formula to find the weights:

- Like above, we turn logistic into least-squares linear regression, but instead of the closed-form formula, we use stochastic gradient descent (SGD) to minimize the following loss function:

which was obtained by substituting the y in the sum of squared errors loss

with the right-hand side of

- We use the maximum likelihood estimation (MLE) method, write the likelihood function, play around with it, restate it as a minimization problem, and apply SGD with the following loss function:

In the above equations, X is the input matrix that contains observations on the row axis and features on the column axis; y is a column vector that contains the classification labels (0 or 1); f is the sum of squared errors loss function; h is the loss function for the MLE method.

If you want to find out more about how we obtained the above equations, please check out the above-linked article.

So now, this is our goal: translate the above equations into code. And we’ll use PyTorch for that.

We plan to use an object-oriented approach for implementation. We’ll create a `LogisticRegression`

class with 3 public methods: `fit()`

, `predict()`

, and `accuracy()`

.

Among fit’s parameters, one will determine how our model learns. This parameter is named method (not to be confused with a method as a function of a class) and it can take the following strings as values: ‘ols_solve’ (OLS stands for Ordinary Least Squares), ‘ols_sgd’, and ‘mle_sgd’.

To not make the `fit()`

method too long, we would like to split the code into 3 different private methods, each one responsible for one way of finding the parameters.

We will have the `__ols_solve()`

private method for applying the closed-form formula.

In this method and in the other methods that use the OLS approach, we will use the constant EPS to make sure the labels are not exactly 0 or 1, but something in between. That’s to avoid getting plus or minus infinity for the logarithm in the equations above.

In `__ols_solve()`

we first check if X has full column rank so that we can apply this method (you can read more about this technique and what happens if X doesn’t have full column rank in this article). Then we force y to be between EPS and 1-EPS. The `ols_y`

variable holds the labels of the ordinary least-squares linear regression problem that’s equivalent to our logistic regression problem. Basically, we transform the labels that we have for logistic regression so that they are compliant with the linear regression equations. After that, we apply the closed-form formula using PyTorch functions.

For the 2 SGD-based algorithms, it would be redundant to have them as 2 separate methods since they will have almost all the code the same except for the part where we compute the loss value, as we have 2 different loss functions for them.

What we’ll do is to create a generic `__sgd()`

method that does not rely on a particular loss function. Instead, it will expect as a parameter a function responsible for computing the loss value which the `__sgd()`

method will use.

In this method, we first initialize the weights to a random column vector with values drawn from a normal distribution with mean 0 and a standard deviation of 1/(# of features). The intuition for this std dev is that if we have more features, then we need smaller weights to be able to converge (and not blow up our gradients).

We will create a `DataLoader`

object with `shuffle=True`

so that it will take care of shuffling and simply return us a batch of data `(xb, yb)`

when we iterate over it with `for step, (xb, yb) in enumerate(loader):`

.

Then we go through all the dataset for `iterations`

times and for each batch of data, we compute the loss value using the `loss_fn`

function taken as a parameter, then we call the `backward()`

method on the loss value.

By calling `backward()`

, the gradient is computed and stored in the `grad`

attribute of `self.weights`

. So next we can use `self.weights -= learning_rate * self.weights.grad`

to update the weights, and we do so inside the `with torch.no_grad():`

block because we don’t want this operation of updating the weights to be considered the next time we call `backward()`

.

By default, PyTorch keeps track of every operation that involves a tensor which has `requires_grad == True`

(in our case, `self.weights`

), and when `backward()`

is called, the gradient of the whole function composition chain is computed. The operations that we don’t want in the gradient computation we can put inside a `with torch.no_grad():`

block.

An important thing to note about automatic differentiation in PyTorch is that when `backward()`

is called, **the gradient is NOT set anew** but instead, **the new gradient is added to the existing value in the ****grad**** attribute**.

That’s why we have to set `self.weights.grad`

to zero after each update, and we do so by using `self.weights.grad.zero_()`

.

Then, after the training is done, we use `self.weights = self.weights.detach()`

to detach `self.weights`

from the computational graph, which means that it will not require grad anymore.

For ‘ols_sgd’ and ‘mle_sgd’ we’ll create 2 private methods: `__sse_loss()`

and `__mle_loss()`

that compute and return the loss value for these 2 different techniques. For these 2 methods, we simply apply the formulas for **f** and **h** using PyTorch’s math functions.

So, when `fit()`

is called with `method=‘ols_solve’`

we call `__ols_solve()`

, when `method=‘ols_sgd’`

we call `__sgd()`

with `loss_fn=self.__sse_loss`

, and when `method=’mle_sgd’`

we call `__sgd()`

with `loss_fn=self.__mle_loss`

.

In `predict()`

we first check if `fit()`

was called previously by looking for the `weights`

attribute (the fit method is the only method that creates it). Then we check if the shapes of the input matrix x and weights vector allow multiplication. Otherwise, return error messages. If everything is OK, we do the multiplication and pass the result through the logistic function.

In `accuracy()`

we make predictions using the above method. Then check if the shape of the predictions matches that of the true labels, otherwise, we show an error message. After that we make sure that both predictions and the true labels have values of either 0 or 1 by a simple rule: if the value is >= 0.5 consider it a 1, otherwise a 0. We apply this rule by using the `torch.where()`

function.

To compute the accuracy, we check for equality between y and y_hat. This will return a vector of Boolean values. Then cast these Booleans to float (False becomes 0.0, and True becomes 1.0). Then, the accuracy is simply the mean of these values.

Here is the full code of the `LogisticRegression`

class:

Now, we would like to test our LogisticRegression class with some real-world data. For that, we will use this heart disease dataset from Kaggle. You can read more about this dataset on Kaggle, but the main idea is to predict the “target” column (which is 0 if healthy or 1 if has heart disease) based on the others.

Below is the code which shows our LogisticRegression class in action (cells 1 & 2 are not shown below to avoid repetition; it was shown in the snippet above).

As you can see, we were able to obtain a decent **80%+** accuracy both in training and testing with our from-scratch implementation.

**If you want to learn more about Machine Learning and the Mathematics behind it, then here are two great books that can help you:**

You can see the full notebook on Kaggle and Github.

*I hope you found this information useful and thanks for reading!*

*Let’s keep in touch! Feel free to follow me on social media: Medium, **LinkedIn**, **Twitter**, **Facebook** to get my latest posts.*

**This article is also posted on Medium here. You can have a look!**

[…] How to implement Logistic Regression with NumPyHow to Implement Logistic Regression with TensorFlowHow to Implement Logistic Regression with PyTorch […]