ANN Theory

DAY 5 - 6

ANN: Aritificial Neural Network

Introduction

  • Human Brain (Neuron) to Deep Learning Model via mathematical modeling (정보전달과정)


NeuronToBC

  • inputs can be modified by weights
    • amplification, decrease, or eliminated by *0
  • result activated if passes threashold (BINARY CLASSIFICATION)

Multilayer Perception (MLP)

Proposed and Mathematically proven by Prof. Marvin Minsky at MIT (1969), “Father of AI” (who first made the term AI)

  • NEURAL NETWORK ARCHITECTURE
  • NN : Do Linear Classification a lot of times
    NeuronToBC

  • 3-size inputs [x, y, z] => 1-size output
    • 1 size output: linear regression or binary classification
    • 2 or more: softmax classification
  • Hidden Layers do additional Linear Classifications
  • 3 Linear Classifications (indicating the model is nonlinear) => complex calculations are possible

Application to Logic Gate Design

  • AND, OR Gates
    • Binary Classification is possible)
    • NeuronToBC
    • points can be divided into 2 parts depending on y [0, 1]. (red, blue)
  • XOR Gate

    XOR Gate = Same ? 0 : 1

    • why hidden layer was first created
    • NeuronToBC
    • requires 2 linear classification

Solving XOR with MLP

  • \(\bar y\) = XOR
\(x_1\)\(x_2\)\(y_1\)\(y_2\)\(\bar y\)XOR
000100
010011
100011
111000
  • proves that hidden layers allow unsolvable problems solvable

NeuronToBC

(x1 x2) = (0 0)
  • \(y_1 = (0\quad 0) \begin{pmatrix} 5 \\ 5 \end{pmatrix} + (-8) = -8\) ➩ \(Sigmoid(-8) \approx 0\);
  • \(y_2 = (0\quad 0) \begin{pmatrix} -7 \\ -7 \end{pmatrix} + (3) = 3\) ➩ \(Sigmoid(3) \approx 1\);
  • \((y_1\quad y_2) \begin{pmatrix} -11 \\ -11 \end{pmatrix} + (6) = -11 + 6 = -5\) ➩ \(Sigmoid(-5) \approx 0\)
(x1 x2) = (0 1)
  • \(y_1 = (0\quad 1) \begin{pmatrix} 5 \\ 5 \end{pmatrix} + (-8) = -3\) ➩ \(Sigmoid(-3) \approx 0\); \(y_1 = 0\)
  • \(y_2 = (0\quad 1) \begin{pmatrix} -7 \\ -7 \end{pmatrix} + (3) = -4\) ➩ \(Sigmoid(-4) \approx 0\); \(y_2 = 0\)
  • \((y_1\quad y_2) \begin{pmatrix} -11 \\ -11 \end{pmatrix} + (6) = 0 + 6 = 6\) ➩ \(Sigmoid(6) \approx 1\)
(x1 x2) = (1 0)
  • \(y_1 = (0\quad 0) \begin{pmatrix} 5 \\ 5 \end{pmatrix} + (-8) = -3\) ➩ \(Sigmoid(-3) \approx 0\); \(y_1 = 0\)
  • \((y_2 = 0\quad 0) \begin{pmatrix} -7 \\ -7 \end{pmatrix} + (3) = -4\) ➩ \(Sigmoid(-4) \approx 1\); \(y_2 = 0\)
  • \((y_1\quad y_2) \begin{pmatrix} -11 \\ -11 \end{pmatrix} + (6) = 0 + 6 = -5\) ➩ \(Sigmoid(6) \approx 1\)
(x1 x2) = (1 1)
  • \(y_1 = (0\quad 0) \begin{pmatrix} 5 \\ 5 \end{pmatrix} + (-8) = 2\) ➩ \(Sigmoid(2) \approx 1\);
  • \(y_1 = (0\quad 0) \begin{pmatrix} -7 \\ -7 \end{pmatrix} + (3) = -11\) ➩ \(Sigmoid(-11) \approx 0\);
  • \((y_1\quad y_2) \begin{pmatrix} -11 \\ -11 \end{pmatrix} + (6) = -11 + 6 = -5\) ➩ \(Sigmoid(-5) \approx 0\)

Forward Propogation

  • can we add another hidden layer?
    NeuronToBC

  • Then new W \(\begin{pmatrix} ? \\ ? \end{pmatrix}\) and \(b\) required for new \(S\) and \(W=\begin{pmatrix} -11 \\ -11 \end{pmatrix}\) (red box) ➩ \(W = \begin{pmatrix} -11 \\ -11 \\ ? \end{pmatrix}\)
    NeuronToBC

Toy Model

ToyModel

  • although hidden layer 1 and hidden layer 2 looks alike, their weight vectors have the different size (different size inputs)

PyTorch implementation for ANN (XOR)🔥

import torch
import numpy as np

# Training Data
x_train = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]) #XOR DATA
y_train = torch.FloatTensor([[0], [1], [1], [0]])

nHL = 3

W_h = torch.randn([2, nHL], requires_grad=True) # hidden layer weight
b_h = torch.randn([nHL], requires_grad = True)  
W_o = torch.randn([nHL, 1], requires_grad=True) # ouput layer weight
b_o = torch.randn([1], requires_grad=True)

optimizer = torch.optim.SGD([W_h, W_o, b_h, b_o], lr = 0.01) 

def model_ANN(x):
  HL1 = torch.sigmoid(torch.matmul(x, W_h) + b_h) #hidden layer with 3 units (먼저 생성)
  Out = torch.sigmoid(torch.matmul(HL1, W_o) + b_o)
  return Out

for step in range(200000):
  prediction = model_ANN(x_train)
  cost = torch.mean( (-1) * ((y_train*torch.log(prediction) + (1-y_train)*torch.log(1-prediction))))
  optimizer.zero_grad() # 0까지 optimize
  cost.backward()       
  optimizer.step()

model_test = model_ANN(x_train)
print(model_test.detach().numpy())

Code Explanation

nHL = 3

W_h = torch.randn([2, nHL], requires_grad=True) # hidden layer weight
b_h = torch.randn([nHL], requires_grad = True)  

CodeExplanation

def model_ANN(x):
  HL1 = torch.sigmoid(torch.matmul(x, W_h) + b_h) #hidden layer with 3 units (먼저 생성)
  Out = torch.sigmoid(torch.matmul(HL1, W_o) + b_o)
  return Out
  • input = [ [0, 0], [0, 1], [1, 0], [1, 1] ]

  • Hidden Layer 추가 안할 시 output:
    • [ 0.5, 0.5, 0.5, 0.5] => all are not > 0.5
      • => [ 0, 0, 0, 0 ]
      • ACTUAL: [ 0, 1, 1, 0 ] (50% accuracy) => HIDDEN LAYER REQUIRED
  • Hidden Layer output :
    • [ 0.001413, 0.9953.., 0.993166..., 0.0079 ] => [ 0, 1, 1, 0 ] => CORRECT

Further ANNs

Wide ANN for XOR

(참고: tensorflow)
CodeExplanation

Deep ANN for XOR

(참고: tensorflow)
CodeExplanation

  • More Layer the BETTER

Gradient Vanishing Problem

CodeExplanation

  • No matter high the number of layers, accuracy can be low
  • ex) 100000 as input -> mapped to 0 ~ 1 -> REPEAT -> … -> Value disappears (converges to 0)

CodeExplanation

RELU (Rectified Linear Unit)

  • solves the Gradient Vanishing Problem

CodeExplanation

  • when activated, the actual value is returned. ex) input 3, returns 3

Deep Learning Revolution

50 Years AgoNow
labeled datasets too smallBig - Data
Computers too slowGPU
only consider 1-D vector inputConbolutional Layers for n-D inputs (Hidden Layers)
wrong type of non-linearity (activation function)ReLU for Gradient Vanishing Problem

Deep Learning Review

Deep Learning Computation Procedure

  1. Deep Learning Model Setup
    • MLP, CNN, RNN, GAN, Costomized 중 뭐 쓸 것인지..
    • Number of Hidden Layers, Units, Input/Outputs…
    • Cost Function / Optimizer Selection
  2. Training (with Large-Scale Dataset)
    • Input Data, Output: Labels
    • Learning -> Weights Updates (\(W\) and \(b\)) for Cost Function Minimization
  3. Inference / Testing (Real-World Execution)
    • Use \(W\) and \(b\) (optimized at step #2) to calculate input
    • Input : Real-World Input Data
    • Output: Inference Results based on Updated Weights in Deep NN