Supervised Learning Linear Regression
in Notes / Artificialintelligence / Introductiontoai
DAY 2
Linear Regression
- ์ ํ์ ํ๊ท๋ถ์
Linear Model
- graph analysis
- blue line is the best model since it has the smallest error considering real data (points)
- yellow line goes to negative in for the first point -> impossible
- \(H\): Hypothesis
- \(W\): weight vectors
- \(b\): bias value
Cost Function
- aka Loss Function
- \(m\): number of training data (m๋นต)
- the bigger the \(m\), the error rate (cost) will also become higher (๋ฐ์ดํฐ ๋ง์์๋ก ํ๋ฆด ๊ฐ๋ฅ์ฑ high)
- \(H(x^i)\): expectation (์์ธก์น)
- \(y^i\): actual (์ค์ ๊ฐ)
- due to this \(y\) value, (actual data under linear model) the total value might become negative
- โก square the whole expression (if set to \|absolute\|, then you canโt take the derivative of it)
Cost Function Minimization (GDM)
- \(W\)์ \(b\) needs to be found (\(x\) and \(y\) are constants)
\(b\) ignored for convenience
Grandient Descent Method
- \[W \leftarrow W - \alpha \frac {\partial} {\partial W} Cost(W)\]
- find the slope of random point from Derivative of Cost
- jump to point where slope is not as steep (jumping distance depends on steepness of a slope)
- continue until \(W\) converges to 0
Learning Rates ฮฑ
- \(\alpha\) : learning rate
- too large: overshooting
too small: takes very long time to converge
described in picture
- determining learning rates:
- try several learning rates and observe the cost function
Multi-Variable Linear Regression
Model \[H(x_1,x_2, ..., x_n) = w_1x_1 + w_2x_2 + \cdots + w_nx_n + b\]
Cost \[Cost(W, b) = \frac {1}{m} \sum_{i=1}^m (H(x_i^i, x_2i, ..., x_n^i)-y^i)^2\]
- multiple \(x\) each require \(w\)
- more \(x\) leads to better accuracy (more information is provided)
Programming
- \[H(x_1,x_2, ..., x_n) = w_1x_1 + w_2x_2 + \cdots + w_nx_n + b\]
- โก \(H(X) = XW + b\)
- \[\begin{pmatrix} x_1 & x_2 & ... & x_n \end{pmatrix} \times \begin{pmatrix} w_1\\ w_2\\ ...\\ w_n \end{pmatrix} = w_1x_1 + w_2x_2 + \cdots + w_nx_n\]
- MATRIX MULTIPLICATION used for faster calculation (transpose \(W\))
- \[H(X) = XW_T + b\]
Linear Regression Implementation
Python Libraries
- in released order:
- Tensorflow
- by Google
- PyTorch
- my Meta
- easier and more intuitive
- Keras
- by Google
- only for deep nerual network architecture
- easiest
PyTorch implementation of Linear Regression Model ๐ฅ
import torch
# Training Data
x_train = torch.FloatTensor([[1,1], [2,2], [3,3]])
y_train = torch.FloatTensor([[10], [20], [30]])
#dimensions of training data
W = torch.randn([2,1], requires_grad=True) #gradient descent Function
b = torch.randn([1], requires_grad = True)
#mimmize (optimize parameters W, b)
optimizer = torch.optim.SGD([W, b], lr = 0.01) #Stochastic Gradient Descent ์ด์ฉ๊ณ ...
#learning rate (alpha) = 0.01
def model_LinearRegression(x):
return torch.matmul(x, W) +b #H(x) = Wx + b
for step in range(2000): #LOOP! (jump 2000 times)
prediction = model_LinearRegression(x_train)
cost = torch.mean((prediction - y_train) ** 2) #square๊ฐ ํด์ฃผ๊ณ m๋นต
#์์์ optimizing ํด์ค
optimizer.zero_grad() # 0๊น์ง optimize
cost.backward()
optimizer.step()
x_test = torch.FloatTensor([[4,4]]) #4,4 ๊ฐ ์ถ๊ฐ ์ ์์ ๋ต: 40์ด ๋์์ผ ํจ
model_test = model_LinearRegression(x_test)
print(model_test.detach().item())
- larger loops (more tries) leads to better accuracy (closer to 40)