Least Squares

• Basic Idea
• Collect data points
(x1,y1),
(x2,y2),
...
(xn,yn)
for known xi and measured yi
• Suppose the variance of yi is σi2
• Specify a model y = f(x)
• Compute deviations
d1 = y1 - f(x1)
d2 = y2 - f(x2)
...
dn = yn - f(xn)
• Try to minimize Χ2 =
d1212 +
d2222 +
...
dn2n2
• The "R-value" is a measure of how well the function fits
R = |yi - f(xi)|)/|yi|)
• Linear Least Squares
• Assume a linear model
f(x) = a*x + b
• Take derivatives of Χ2 with respect to a and b and set to zero to find the minimum
• 0 = -2*Σ((xiyi)/σi2) + 2*a*Σ((xi2)/σi2) + 2*b*Σ((xi)/σi2)
• 0 = 2*Σ((yi)/σi2) - 2*a*Σ((xi)/σi2) - 2*b*Σ(1/σi2)
• Which we solve for a and b
• e.g. for unit weights
0 = -2*Σ(xiyi) + 2*a*Σ(xi2) + 2*b*Σ(xi)
0 = 2*Σ(yi) - 2*a*Σ(xi) - 2*b*n
a = (Σ(xiyi) - Σ(xi)Σ(yi)/n)/ (Σ(xi2) - (Σ(xi))2/n)
b = - (Σ(xiyi)*Σ(xi) - Σ(xi2)Σ(yi))/ (n*Σ(xi2) - (Σ(xi))2)
• Note that the line goes though the means in x and y, so it is common practice to shift the data to deviations from the mean.
• See http://facstaff.pepperdine.edu/lrogers/cs105/cs105s4pgm8.html
• See here for definitions of correlation coeeficient and regression coefficients.