What is Kernel-Support Vector Machine(SVM)?

How Kernel-Support Vector Machine(SVM) works? , Why is kernelized SVM much slower than linear SVM?, RBF-kernel SVM from Python’s sklearn library?

6 min readSep 29, 2020

Before dive deep into Kernel SVM, we first understand what is Support Vector Machine(SVM)?

So kernel basically in SVM help to create a hyperplane for doing the separation so that we can easily do the categorization, whether it is linearly separable or non-linearly separable.

In linearly separable data, kernel function(linear kernel function) helps to make a hyperplane that can do the separation easily.

But in the case of the dataset where we cannot separate the dataset on 2D plain.

We cannot separate this data linearly

So what strategy we use to separate the dataset

We need to do some analysis on this part. So for that, we need to understand Mapping to a higher dimension. According to this topic when it is difficult to separate the data linearly we need to map it to a higher dimension in order to do the separation.

We try to understand this problem with an example. So let's take an example

Now we need to separate green and red points as we cannot create a line or linear hyperplane to separate it.

Step 1. Let's displace points to backward that is 5 points back. So for that, we need to subtract 5 from x variable that is x-5(f=x-5).

Step 2. We know that making the function square of its value creates a U-shape function now the same thing we apply on this function that is (x-5)²

Step 3. On projecting the value on the U-shape function.

Step 4. Now if we draw a linear line it helps to separate the dataset.

This concept is known as Mapping to higher dimensions wherein the above example we map 1D to 2D.

Now, our question is how to separate this dataset?

So, similarly, we have to do the mapping to a higher dimension.

For this, we have used the Gaussian RBF Kernel function.

Equation of Gaussian RBF Kernel function

In the above equation:

‘K’ stands for Kernel Function

‘x’ stands for data points

‘l’ stands for landmark

‘σ' stands for the circumference of separation.

The above picture gives you a description of how the Kernel looks like for the separation?

So, now let's see where is a landmark?

This landmark is put in the middle of a separation.

Now if a point is far away from this landmark, it is located on the below plain, and the value of kernel function is zero(0) because you can see the power of ‘e’(exponent), is bigger if the difference from the landmark is bigger and the more negative the power of ‘e’(exponent) which means it values is near to zero.

If a point is near to the landmark then the value of the power of ‘e’(exponent) is near to zero then the value of kernel function value is one(1).

Now, with this concept, we can solve the problem of 2D dataset.

Now, put the landmark and do hyperplane separation.

If the value of ‘σ’ is big then the circumference is bigger.

If the value of ‘σ’ is small then the circumference is smaller.

Similarly, Now we can solve any problem such as,

Put the landmark,

and similarly, you can do the separation with complex problems as well.

Now, we will do the implementation part. We first import our data set of people who want to buy a specific product.

We should follow the steps to build an SVM Algorithm.

Step 1. Import the Libraries

Step 2. Importing the Dataset

Step 3. Split the data into a matrix of features(X)(So we are taking ‘Age’ and ‘Salary’ into consideration to do Prediction) and the dependent variable(y).

Step 4. Splitting the matrix of features(X) and dependent variable(y) into training and test set.

Steps 5. Now we do Feature Scaling for ‘Age’ and ‘Salary’ column.

Code to apply feature scaling on the dataset

Step 6. Fitting a linear model to test and training dataset. Here kernel we take ‘linear’ which helps to do linear separation and creates a linear hyperplane for doing the separation.