07 Learning Problem

06 Apr 2023 in Notes / Dataanalytics / Datascience

Introduction
- Problem Setup
A simple Learning Model
- Perceptron
PLA
- Example

Introduction

The big picture of Machine Learning (Learning from Data)

Problem Setup

Formalization:
- \(X = ℝ^d\) : input space
  - \(ℝ^d\): d-dimensional Euclidean space
  - input vector \(x ∈ X : x = (x_1, x_2, ..., x_d)\)
- let \(y={+1, -1}\) : output space (\(\rightarrow\) binary decision)

Example: credit approval (P/F) of applicants of bank loan

component	symbol	credit approval metaphor
input	\(x\)	customer application
output	\(y\)	approve / deny
target function	\(f : X \rightarrow Y\)	ideal credit approval formula
data	\((x_1, y_1), ..., (x_N, y_N)\)	historical records
hypothesis	\(g: X \rightarrow Y\)	formula to be used

\(f\): unkown target function
\(X\): input space (set of all possible inputs \(x\))
\(Y\): output space (set of all possible outputs)
\(N\): the number of input-output examples ( training examples )
\(D≜ {(x_1, y_1), ..., (x_N, y_N)}\): data set where \(y_n = f(x_n)\)
find \(g(x) \approx y=f(x)\)
Example:
- \(x = \begin{bmatrix} x_1\\x_2\end{bmatrix}\) where \(x_1\): age and \(x_2\): annual salary (in USD)
- \(N=\) 11, \(d=\) 2, \(X= ℝ^2\), and \(Y=\) { approve, deny}
- data set \(D\):
- \(\Rightarrow\) linear model (easy, but might not be accurate) \(\rightarrow\) find \(g\) close to \(f\)

Learning Algorithms \(A\):
- use \(D\) to pick a formula \(g : X \rightarrow Y\) that approximates \(f\)
- choose \(g\) from set of candidate formula under consideration, which we call hyothesis set \(H\)
  - ex) \(H\) = set of all linear formulas \(\rightarrow\) choose best linear fit to \(D\) from this
- Scenario Example: new customer applies for credit \(\rightarrow\) bank makes decision
  - based on learned \(g\), not on \(f\) (unknown)
  - if \(g\) faithfully replicates \(f\) \(\rightarrow\) decision is reasonable (good)
  Visualized

A simple Learning Model

Solution Components
- give a specific learning problem
  - target function \(f\) (unknown)
  - training examples
  - learning algorithm, hypothesis set
- Learning Model = \(Learning\) \(Algorithm\) + \(Hypothesis\) \(Set\)
Hypothesis Set \(H\):
- \(H\) specified through \(h(x)\) (\(h ∈ H\))
- functional form \(h(x)\):
  - gives different weights to different coordinates of \(x\)
  - reflects their relative importance in the credit decision
- in this example: \(h(x)\) =linear model
  - \(H\): set of lines

Perceptron

Making a Decision:
- weighted coordinates combined to form a ‘credit score’
- resulting score compared to threshold
The ‘perceptron’ (using linear model)
- \(h(x)\) = \(sign((\sum_{i=1}^dw_ix_i)-threshold)\)
- = \(sign((\sum_{i=1}^dw_ix_i)+b)\) // b = bias
  - \(sign(s)= \begin{cases} +1, s>0\\-1, ㄴ<0\end{cases}\) (s=0 ignored for now)
  - must find decision boundary
- \(h(x)\) = \(sign(w^Tx)\) simplified version of linear model
  - different values for parameters \(w_1, w_2, b\)
  - correspond to different lines \(w_1x_1 + w_2x_2+b=0\)
  - for simplification: treat bias \(b\) as weight \(w_0 \equiv b\)
  - introduce artificial coordinate \(x_0=1\)
  - \(w_1x_1 + w_2x_2+w_0x_0 \Rightarrow\) \(w^Tx \Rightarrow \sum_{i=0}^d w_ix_i\)
Roles of Learning Algorithm
- search \(H\) by looking for weights and bias that perform well on dataset
- produce final hypothesis \(g \in H\)

PLA

Perceptron Learning Algorithm

iterative, guaranteed to converge for linearly seperable
GOAL: determine optimal w based on the data to produce
ASSUME: data set is linearly seperable
Steps (continued from Perceptron)
1. perceptron implements \(h(x) \equiv sign(w^Tx)\), given training set (x1, y1), ..., (xN, yN)
2. PLA picks misclassified point
  - \(sign(w^Tx_n)\) \(\neq y_n\)
  - \(\rightarrow\) updates weight vector \(w \leftarrow w^Tx_n\)
  - \(\Rightarrow\) rule moves boundary in the direction of classifying \(x_n\) correctly
3. iterate until no misclassified data
  - iteration t = 1, 2, ... pick a misclassified point from (x1, y1), ..., (xN, yN) and run PLA iteration on it
  - \(w(t+1)\) \(\leftarrow w(t) + y_nx_n\)

Example

  You want to predict if movies will be profitable based on their screenplays.
  You hire two critics A and B to read a script you have
  and rate it on a scale 1 to 4.
  The critics are not perfect. Here is the data:
  ---------------------------------------
   #  Movie Name    A     B     Profit?
  ---------------------------------------
   1  Pellet Power  1     1       -
   2  Ghosts!       3     2       +
   3  Pac is Bac    2     4       +
   4  Not a Pizza   3     4       +
   5  Endless Maze  2     3       -
  ---------------------------------------

First, you want to check the linear separability of data. Plot the data on the 2D plane above, label movies with \(+\) and \(-\) and determine.
Answer
| + + | - | + | - ㅡㅡㅡㅡㅡㅡ
- \(\Rightarrow\) Linearly Seperable
Now decide to use a perceptron to classify you data. Suppose you directly use scores given above as features, together with a bias feature. That is \(f_0 = 1, f_1 =\) score given by A, \(f_2 =\) score given by B

Run one pass through data with perceptron algorithm, filling out the table below. Go through the data points in order (using data point #1 at step 1)

step	Weights	Score	Correct?
1	[ -1, 0, 0 ]	\((-1 \cdot 1) + (0 \cdot 1) + (0 \cdot 1) = -1\)	yes
2
3
4
5

Answer

- first x ($$x_0$$) = always 1 (artifical coordinate introduced)

| step |   Weights    |                       Score                       | Correct? |
| :--: | :----------: | :-----------------------------------------------: | :------: |
|  1   | [ -1, 0, 0 ] | $$(-1 \cdot 1) + (0 \cdot 1) + (0 \cdot 1) = -1$$ |   yes    |
|  2   | [ -1, 0, 0 ] | $$(-1 \cdot 1) + (0 \cdot 3) + (0 \cdot 2) = -1$$ |  ***no***  |
|  3   | <cb>[  0, 3, 2 ]</cb> | $$( 0 \cdot 1) + (3 \cdot 2) + (2 \cdot 4) = 14$$ |   yes    |
|  4   | [  0, 3, 2 ] | $$( 0 \cdot 1) + (0 \cdot 3) + (0 \cdot 4) = 17$$ |   yes    |
|  5   | [  0, 3, 2 ] | $$( 0 \cdot 1) + (0 \cdot 2) + (0 \cdot 3) = 12$$ |  ***no***  |

change weight when incorrect
- after step 2: weight ([ -1, 0, 0 ] ) @ (dot product) +([ 1, 3, 2 ]) = [ 0, 3, 2 ]
- after step 5: weight ([ 1, 3, 2 ] ) @ (dot product) -([ 1, 2, 3 ]) = [ 0, 1, -1 ]
\(\Rightarrow\) final weights = [-1, 1, -1]

Have weights been learned that separate the data?

Answer

- $$No$$ 
- with weights `[-1, 1, -1]`, data `point 3` (−1 · 1 + 1 · 2 + −1 · 4 = −3 < 0 ) and `point 4` (−1 · 1 + 1 · 3 + −1 · 4 = −2 < 0 ) will be incorrect
  - step 2's 0 will be counted as positive

Choose scenarios for which a perceptron using the features above can indeed perfectly handle a range of scenarios.
- (a) Your reviewers are awesome: if the total of their scores is more than 8, then the movie will definitely be profitable, and otherwise it won’t be.
- (b) Your reviewers are art critics. Your movie will be profitable if and only if each reviewer gives either a score of 2 or a score of 3
- (c) Your reviewers are art critics. Your movie will be profitable if and only if each reviewer gives either a score of 2 or a score of 3

Answer

- (a) : can classify (considering weights `[-8, 1, 1]`)
- (b), (c): can't classify

07 Learning Problem

Introduction

Problem Setup

A simple Learning Model

Perceptron

PLA

Example

HYPNOTES

Error

Introduction

Problem Setup

A simple Learning Model

Perceptron

PLA

Example

Templates (for web app):

Error