Learn TensorFlow basics by implementing linear regression
Let’s first briefly recall what linear regression is:
Linear regression is estimating an unknown variable in a linear fashion by some other known variables. Visually, we fit a line (or a hyperplane in higher dimensions) through our data points.
If you’re not comfortable with this concept or want to understand better the math behind it, you can read my previous article about linear regression:
Probably, implementing linear regression with TensorFlow is an overkill. This library was made for more complicated stuff like neural networks, complex deep learning architectures, etc. Nevertheless, I think that using it for implementing a simpler machine learning method, like linear regression, is a good exercise for those who want to know how to build custom things with TensorFlow.
TensorFlow has many APIs; and most introductory courses/tutorials only explain a higher-level API, like Keras. But that may not be sufficient, for example, if you want to use custom loss and/or activation functions that are not yet implemented in Keras.
At its core, TensorFlow is just a math library similar to NumPy, but with 2 important improvements:
- It uses GPU to make its operations a lot faster. If you have a compatible GPU properly configured, TF 2 will automatically use it; no code changes are required.
- It is capable of automatic differentiation; this means that for gradient-based methods you don’t need to manually compute the gradient, TensorFlow will do it for you.
You can think of TensorFlow as NumPy on steroids.
While these 2 features may not seem like big improvements for what we want to do here (linear regression), since this is not very computationally-expensive and the gradient is quite simple to compute manually, they make a big difference in deep learning where we need a lot of computing power and the gradient is quite nasty to calculate by hand.
Now, let’s jump to the implementation.
Firstly, we need to, obviously, import some libraries. We import
tensorflow as it is the main thing we use for the implementation,
matplotlib for visualizing our results,
make_regression function, from
sklearn, which we will be using to generate a regression dataset for using as an example, and the python’s built-in
import tensorflow as tf import matplotlib.pyplot as plt from sklearn.datasets import make_regression import math
Then we will create a
LinearRegression class with the following methods:
.fit()— this method will do the actual learning of our linear regression model; here we will find the optimal weights
.predict()— this one will be used for prediction; it will return the output of our linear model
.rmse()— computes the root mean squared error of our model with the given data; this metric is kind of “the average distance from our model’s estimate to the true y value”
The first thing we do inside
.fit() is to concatenate an extra column of 1’s to our input matrix X. This is to simplify our math and treat the bias as the weight of an extra variable that’s always 1.
.fit() method will be able to learn the parameters by using either closed-form formula or stochastic gradient descent. And to choose which to use, we will have a parameter called method that will expect a string of either ‘solve’ or ‘sgd’.
method is set to ‘solve’ we will get the weights of our model by the following formula:
which requires the matrix X to have full column rank; so, we will check for this and otherwise we show an error message.
The first part of our
.fit() method is:
Note that the other parameters after
method are optional and are used only in the case we use SGD.
The second part of this method handles the case of
method = ‘sgd’, which doesn’t require that X has full column rank.
The SGD algorithm for our least squares linear regression is sketched below:
We will start this algorithm by initializing the weights class attribute to a TensorFlow Variable which is a column vector with values drawn from a normal distribution with mean 0 and standard deviation 1/(number of columns). We divide the standard deviation by the number of columns to make sure we don’t get too big values as output in the initial stages of the algorithm. This is to help us converge faster.
At the beginning of each iteration, we randomly shuffle our rows of data. Then, for each batch, we compute the gradient and subtract it (multiplied by the learning rate) from the current weights vector to obtain the new weights.
In the SGD algorithm sketched above, we had shown the manually computed gradient; it’s that expression multiplied by alpha (the learning rate). But in the code below we won’t compute that expression explicitly; instead, we compute the loss value:
then we let TensorFlow compute the gradient for us.
Below is the second half of our
We need to compute the loss value inside the
with tf.GradientTape() as tape block, then call
tape.gradient(loss_value, self.weights) to get the gradient. For this to work, it is important that the quantity with respect to which the gradient is taken (
self.weights) to be a
tf.Variable object. Also, we should use the
.assign_sub() method instead of
-= when changing the weights.
self from this method to be able to concatenate the calls of the constructor and
.fit() like this:
lr = LinearRegression().fit(X, y, ‘solve’).
.predict() method is quite straight-forward. We first check if
.fit() was called before, then concatenate a column of 1’s to X and verify that the shape of X allows multiplication with the weights vector. If everything is OK, we simply return the result of the multiplication between X and the weights vector as the predictions.
.rmse() we first get the outputs of the model using
.predict(), then if there were no errors during predict, we compute and return the root mean squared error which can be thought of as “the average distance from our model’s estimate to the true y value”.
Below is the full code of the
Using our LinearRegression class in an example
To show our implementation of linear regression in action, we will generate a regression dataset with the
make_regression() function from
X, y = make_regression(n_features=1, n_informative=1, bias=1, noise=35)
Let’s plot this dataset to see how it looks like:
The y returned by
make_regression() is a flat vector. We will reshape it to a column vector to use with our
y = y.reshape((-1, 1))
Firstly, we will use
method = ‘solve’ to fit the regression line:
lr_solve = LinearRegression().fit(X, y, method='solve') plt.scatter(X, y) plt.plot(X, lr_solve.predict(X), color='orange')
The root mean squared error of the above regression model is:
lr_solve.rmse(X, y) # <tf.Tensor: shape=(), dtype=float32, numpy=37.436085>
Then, we also use
method = ‘sgd’ and we will let the other parameters have their default values:
lr_sgd = LinearRegression().fit(X, y, method='sgd') plt.scatter(X, y) plt.plot(X, lr_sgd.predict(X), color='orange')
As you can see, the regression lines in the 2 images above for methods ‘solve’ and ‘sgd’ are almost identical.
The root mean squared error we got when using ‘sgd’ is:
lr_sgd.rmse(X, y) # <tf.Tensor: shape=(), dtype=float32, numpy=37.86531>
Here is the Jupyter Notebook with all the code:
I hope you found this information useful and thanks for reading!
This article is also posted on Medium here. Feel free to have a look!