As we understand the algorithm , if we see our dataset we may find some similarity which may affect our predictions, i.e for some similar gpa's there are different sat scores or vice-versa.
for e.g.
GPA SAT SCORE
4.0 1220
2.8 1220
etc.
Like that there are some few cases, so what happens now and how does algorithms deals with it ? that is the question .
To answer that lets see the actual regression algorithm step by step.
step 1:To observe our goal
Clearly you can see that we have our cost function with some parameters to minimize the function with minimum error using gradient descent algorithm because it is necessary to produce much efficient results with less predictive errors.
step 2: How do we minimize and why ?
The first step we choose towards the minimization problems is the learning rate alpha.The bigger the learning rate the faster the algorithm learns but with less accuracy and efficient but if the learning rate is much small the algorithm learns slowly but more accurately which causes less predictive errors.
Gradient descent :it is the actual math behind the algorithm .
What it is actually doing is that it is updating the cost fuction with its hypothetical perameter that is theta_1 and Theta_2.The both parameters plays a vital role in predicting the results because these parameters are the main components in the hypothesis above.so, it is very important to train and aquire the hypothetical parameters accurately.
Once the parameters have been processed it will be fixed and there will be no more further changes in the parameter.
To make you understand better let me show you results from my implemetation which you can check out from here
As you can see in the image the how the results improves over the iteration and the final results of Theta_1 and Theta_2
Now that our hypothesis is ready we can predict based on the parameters that we pass (in this case Theta_1 and Theta_2).
if we want to predict something we just need to pass the arguments to predicts the result.
It is important to understand the hypothetical parameters are fixed over the entire process , it does not matters how may times you enters the same elements to predict, it will predict the same because we are using the same Theta_1 and Theta_2 over and over again.
if you observe our dataset there is only one feature and one labels , to overcome our challange it is recommended that we use more numbers of feature to our algorithms which will give the better results because the learning is based on the different parameter that we use as a features to understand the different different circumstances.