We want to find the line $y=mx+b$ that's closest to all our points (the regression line).Īs an example, say we have the points $(1,2),(2,4.5),(3,6),(4,7)$. Say you have a bunch of data points $(x,y)$. This question can also be answered from a linear algebra perspective. Traditionally, when we conduct a regression analysis, we find estimates of the slope and intercept so as to minimize the sum of squared errors. Regression line attempts to define the predicted value of y (dependent variable) for a given value of x (independent variable). A loss function gives us a way to say how 'bad' something is, and thus, when we minimize that, we make our line as 'good' as possible, or find the 'best' line. Specifically, we must stipulate a loss function. However, while this seems straightforward, we need to figure out what we mean by 'best', and that means we must define what it would be for a line to be good, or for one line to be better than another, etc. What you are trying to do in regression is find what might be called the 'line of best fit'. Given this framework, you see a cloud of points, which may be vaguely circular, or may be elongated into an ellipse.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |