Learn PyTorch basics by implementing linear regression
Probably, implementing linear regression with PyTorch is an overkill. This library was made for more complicated stuff like neural networks, complex deep learning architectures, etc. Nevertheless, I think that using it for implementing a simpler machine learning method, like linear regression, is a good exercise for those who want to start learning PyTorch.
At its core, PyTorch is just a math library similar to NumPy, but with 2 important improvements:
- It can use GPU to make its operations a lot faster. If you have a compatible GPU properly configured, you can make the code run on GPU with just a few changes.
- It is capable of automatic differentiation; this means that for gradient-based methods you don’t need to manually compute the gradient, PyTorch will do it for you.
You can think of PyTorch as NumPy on steroids.
While these 2 features may not seem like big improvements for what we want to do here (linear regression), since this is not very computationally-expensive and the gradient is quite simple to compute manually, they make a big difference in deep learning where we need a lot of computing power and the gradient is quite nasty to calculate by hand.
Before working on the implementation, let’s first briefly recall what linear regression is:
Linear regression is estimating an unknown variable in a linear fashion by some other known variables. Visually, we fit a line (or a hyperplane in higher dimensions) through our data points.
If you’re not comfortable with this concept or want to understand better the math behind it, you can read my previous article about linear regression:
Now, let’s jump to the coding part.
Firstly, we need to, obviously, import some libraries. We import
torch as it is the main thing we use for the implementation,
matplotlib for visualizing our results,
make_regression function, from
sklearn, which we will be using to generate a regression dataset for using as an example, and the python’s built-in
import torch import matplotlib.pyplot as plt from sklearn.datasets import make_regression import math
Then we will create a
LinearRegression class with the following methods:
.fit()— this method will do the actual learning of our linear regression model; here we will find the optimal weights
.predict()— this one will be used for prediction; it will return the output of our linear model
.rmse()— computes the root mean squared error of our model with the given data; this metric is kind of “the average distance from our model’s estimate to the true y value”
The first thing we do inside
.fit() is to concatenate an extra column of 1’s to our input matrix X. This is to simplify our math and treat the bias as the weight of an extra variable that’s always 1.
.fit() method will be able to learn the parameters by using either closed-form formula or stochastic gradient descent. And to choose which to use, we will have a parameter called method that will expect a string of either ‘solve’ or ‘sgd’.
method is set to ‘solve’ we will get the weights of our model by the following formula:
which requires the matrix X to have full column rank; so, we will check for this and otherwise we show an error message.
The first part of our
.fit() method is:
Note that the other parameters after
method are optional and are used only in the case we use SGD.
The second part of this method handles the case of
method = ‘sgd’, which doesn’t require that X has full column rank.
The SGD algorithm for our least squares linear regression is sketched below:
We will start this algorithm by initializing the weights class attribute to a tensor which is a column vector with values drawn from a normal distribution with mean 0 and standard deviation 1/(number of columns). We divide the standard deviation by the number of columns to make sure we don’t get too big values as output in the initial stages of the algorithm. This is to help us converge faster.
At the beginning of each iteration, we randomly shuffle our rows of data. Then, for each batch, we compute the gradient and subtract it (multiplied by the learning rate) from the current weights vector to obtain the new weights.
In the SGD algorithm sketched above, we had shown the manually computed gradient; it’s that expression multiplied by alpha (the learning rate). But in the code below we won’t compute that expression explicitly; instead, we compute the loss value:
then we let PyTorch compute the gradient for us.
Below is the second half of our
To compute the gradient of the loss with respect to the weights, we need to call the
.requires_grad_(True) method on the
self.weights tensor, then we compute the loss according to the formula given above. After the loss is computed, we call
.backward() method on the loss tensor which will compute the gradient and store it in the
.grad attribute of
self.weights. After we do the update, we call
.detach() to get a new tensor without any operations recorded on it, so that the next time we compute the gradient we will do so based only on operations in that single iteration.
self from this method to be able to concatenate the calls of the constructor and
.fit() like this:
lr = LinearRegression().fit(X, y, ‘solve’).
.predict() method is quite straight-forward. We first check if
.fit() was called before, then concatenate a column of 1’s to X and verify that the shape of X allows multiplication with the weights vector. If everything is OK, we simply return the result of the multiplication between X and the weights vector as the predictions.
.rmse() we first get the outputs of the model using
.predict(), then if there were no errors during predict, we compute and return the root mean squared error which can be thought of as “the average distance from our model’s estimate to the true y value”.
Below is the full code of the
LinearRegression class in an example
To show our implementation of linear regression in action, we will generate a regression dataset with the
make_regression() function from
X, y = make_regression(n_features=1, n_informative=1, bias=1, noise=35)
Let’s plot this dataset to see how it looks like:
The y returned by
make_regression() is a flat vector. We will reshape it to a column vector to use with our
y = y.reshape((-1, 1))
Firstly, we will use
method = ‘solve’ to fit the regression line:
lr_solve = LinearRegression().fit(X, y, method='solve') plt.scatter(X, y) plt.plot(X, lr_solve.predict(X), color='orange')
The root mean squared error of the above regression model is:
lr_solve.rmse(X, y) # tensor(31.8709, dtype=torch.float64)
Then, we also use
method = ‘sgd’ and we will let the other parameters have their default values:
lr_sgd = LinearRegression().fit(X, y, method='sgd') plt.scatter(X, y) plt.plot(X, lr_sgd.predict(X), color='orange')
As you can see, the regression lines in the 2 images above for methods ‘solve’ and ‘sgd’ are almost identical.
The root mean squared error we got when using ‘sgd’ is:
lr_sgd.rmse(X, y) # tensor(31.9000, dtype=torch.float64)
Here is the Jupyter Notebook with all the code:
I hope you found this information useful and thanks for reading!
This article is also posted on Medium here. Feel free to have a look!