Machine Learning for Beginners, Part 11: Ridge Regression


Today I’m going to go back to the conversation we were having about a month ago about the basic principles of machine learning, including the ridge regression algorithm.  Part 10 of the machine learning for beginners blog talked about using random forest algorithms to determine if an English Premier League football team would win the match. Today I want to expand on the ideas presented in my Random  “Data Science in 90 Seconds” YouTube video and continue the discussion in plain language without math or code.

In regression data science problems, we’re usually trying to estimate the parameters of a model or which variables influence the model’s outcome. Let’s look at a simple linear regression problem. Let’s say we’re trying to predict whether a rainfall amount will be greater than 5 mm in our location next Monday. The rainfall amount would be the target variable, y. We could have three predictor variables, such as time of day, season and cloud cover percentage, represented as x1, x2 and x3. In addition to these predictor variables, the regression model uses parameters to predict the target variable.

Ridge regression estimates how likely a variable is to influence the target variable and is the most commonly used linear regression technique in data science. Ridge regression is used when there are more predictor variables than observations. Perhaps the data set we can access only has fifty observations of time of day but there are hundreds of other predictor variables that could affect rainfall amounts.

In regression, the line that best fits these data points is used to predict the target variable, rainfall amount over 5 mm. A line that fits the data is drawn through the space closest to our data points shown in blue in the image below. On this plot, the residual is the distance between the line we’ve drawn and each data point shown as the gray lines. The residual is also called the “error”. Ridge regression minimizes the sum of the residuals squared. The algorithm is not biased, meaning estimates are based on the actual data set.


(Image from HackerEarth)

In my next blog, I plan to talk about LASSO method which is closely related to ridge regression.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.