Dev/Machine Learning

[Coursera] Machine Learning Week 1

NillK 2016. 7. 18. 22:05

Introduction

Welcome

수많은 인공 지능 연구자들의 목표는 인간의 뇌를 흉내내서 만들어 인간의 뇌와 비슷하게 학습하게 만드는 것
알고리즘을 사용할 것임 -> 어떻게 유도되는지는 알 필요 없음

Introduction

Welcome

Machine Learning

Grew out of work in AI
New capability for computers

Machine Learning Examples

Database mining(데이터베이스 수집)
- Large datasets from growth of automation/web.
- E.g. Web click data, medical records, biology, engineering
Applications can't program by hand
- E.g. Autonomous helicopter, handwriting recognition, most of Natural Language Processing(NLP), Computer Vision
Self-customizing programs
- E.g., Amazon, Netflix product recommendations
Understanding human learning (brain, real AI)

What is Machine Learning?

Machine Learning에 대한 두 개의 정의가 있음

The field of study that gives computers the ability to learn without being explicitly programmed - Arthur Samuel

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E - Tom Mitchell

앞으로 기계 학습 문제의 종류들과 알고리즘에 대해 이야기해볼것임. 일반 적으로 Machine Learning은 크게 두 개로 나뉜다.

Supervised Learning
Unsupervised Learning

뭐 그 외 Reinforcement Learning이나 Recommender systems 등이 있는데 나중에 얘기할래.
주요 기계학습기술들을 선택하고, 여러 알고리즘 중 하나를 어떻게 선택해야 하는지, 언제 그 알고리즘을 적절하게 사용할 것인지에 대해서 배울 것이다.

Supervised Learning

right answers given

데이터에 직선을 그릴지 이차 곡선을 그릴지, 어떻게 선택하고 결정하느냐
Supervised Learning이라는 건 결국 우리가 알고리즘에게 정확한 답을 알고 있는 데이터(data set)를 준다는 데서 유래

Regression Problem

Regression: Predict continuous valued output(price)
연속적인 값 예측

Classification

Discrete valued output (0 or 1)
이산적인 결과값 예측. 가능한 값이 2개 이상일 수 있다.

Machine Learning에서는 하나 이상의 feature가 사용될 수 있다. 무한히 많은 feature도 가능해 Age, Tumor Size, Clump thickness, Uniformity of Cell Size, Uniformity of Cell Shape..

우리는 어떻게 수많은 feature들을 다룰 수 있을까?

Unsupervised Learning

Supervised Learning은 label이 있었고 right answers 가 주어졌었지만, Unsupervised Learning은 그와 다르게 Label이 없음
라벨이 없기 때문에 답도 없고, 뭣도 없고 그냥 알고리즘한테 여기 데이터셋이 있어. 여기서 어떤 구조를 찾아낼 수 있겠니? 라고 한다.
Clustering

Linear Regression with One Variable

Model and Cost Function

Model Representation

Housing Prices

Supervised Learning : Given the right answer for each example in the data
- data set == training set
Regression Problem : Predict real-valued output
Notation:
- m = Number of training examples
- $x's$ = "input" variables / features
- $y's$ = "output" variable / "target" variable
- $(x, y)$ = single training example (single row in data set)
- $(x^{(i)}$, $y^{(i)})$ = $i^{th}$ training example ($i$ is not exponentiation, just index)

Supervised Learning Algorithm Work

Training Set -> Learning Algorithm -> h (hypothesis) : h maps from Size of house(input) to Estimated price(output)

How do we represent h?

$h_\theta(x) = \theta_0 + \theta_1x$ (x에 대한 1차 함수 형태)
Linear regression with one variable
Univariate linear regression (단일 변량 linear regression)

Cost Function

Cost function을 사용하면 주어진 데이터에 가장 가까운 일차함수 그래프를 알아낼 수 있음

Hypothesis: $h_\theta(x) = \theta_0 + \theta_1x$
$\theta_i$'s : Parameters

How to choose $\theta_i$'s?
Idea: Choose $\theta_0, \theta_1$ so that $h_\theta(x)$ is close to $y$ for our training examples $(x, y)$

I want the difference between h(x) and y to be small

$$\sum_{i = 1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2$$

$$\frac{1}{2m}\sum_{i = 1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2$$

$$J(\theta_0, \theta_1) = \frac{1}{2m}\sum_{i = 1}^{m}(h(x^{(i)}) - y^{(i)})^2$$ = cost function = squared error function = mean squared error

Our goal : minimize $J(\theta_0, \theta_1)$

Cost Function - Intuition 1

Simplify h
$h(x)$ : 1차 함수
$J(\theta)$ : 2차 함수

	Origin	Simplified
Hypothesis	$h_\theta(x) = \theta_0 + \theta_1x$	$h_\theta(x) = \theta_1x$
Parameters	$\theta_0, \theta_1$	$\theta_1$
Cost Function	$J(\theta_0, \theta_1) = \frac{1}{2m}\sum_{i = 1}^{m}(h(x^{(i)}) - y^{(i)})^2$	$J(\theta_1) = \frac{1}{2m}\sum_{i = 1}^{m}(h(x^{(i)}) - y^{(i)})^2$
Goal	minimize $J(\theta_0, \theta_1)$	minimize $J(\theta_1)$

$\theta_0$를 없앴기 때문에 여기서 그리는 $h_\theta(x)$는 무조건 원점을 지나는 직선을 그리게 되는데, 여기서 최소값을 구하면 데이터와 형태는 일치한다. => 그럼 데이터에 맞춰 $\theta_1$ 값을 정하면 되겠지?

Cost Function - Intuition 2

use parameter $\theta_1$ and $\theta_2$
두 파라미터를 모두 사용하면 등고선 함수와 같은 형태가 나옴. 같은 등고선 지점에 있는 점은 같은 J값을 얻게 되겠지? 그리고 J를 최소화할 수 있는 지점은 등고선의 가장 낮은 지점일거야!

Parameter Learning

Gradient Descent

Gradient Descent = algorithm to minimize cost

그리고 얜 굉장히 일반적인 애라서 linear regression을 제외한 많은 ML 알고리즘에도 쓰임

Have some function $J(\theta_0, \theta_1)$
Want min $J(\theta_0, \theta_1)$

$\theta_0, \theta_1$를 계속 바꿔가면서 J를 최소화시켜보자!

우리가 $\theta_0, \theta_1, J(\theta_0, \theta_1)$ 으로 이루어진 삼차원 면 위에 있다고 했을 때, Gradient Descent는 한 지점에서 주위를 차분히 둘러본다. 우리가 어느 방향으로 가야 아래로 내려갈 수 있을지를 생각하면서. 그리고 대략적인 방향을 찾으면 또 다시 방향을 찾아서 한발자욱 간다.
처음 지점에 따라 마지막에 도착하는 지점이 다를 수 있다. (local optimum)

Gradient descent algorithm

Definition of Gradient descent algorithm

\begin{eqnarray} \text{repeat until convergence \{ } \\ \theta_j & := & \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1) \quad \text{(for j = 0 and j = 1)}\\ \text{ \}} \end{eqnarray}

$\alpha$(learning rate) : 기본적으로 Gradient descent에서 아래로 내려가는 보폭을 얼만큼의 크기로 할 것인지를 조절

$\theta_0$ 랑 $\theta_1$ 는 동시에 계산해서 다음 단계로 넘어간다. 다시 말해 미리 $\theta_0$를 계산해서 그 값을 $\theta_1$ 계산식에 대입하는 게 아니라, 이전 $\theta_0, \theta_1$을 이용해 일단 다음 값을 계산한 다음에 대입할 것

Gradient Descent Intuition

이전에 있던 $\frac{\partial}{\partial\theta_j}$ derivative term 은 $\theta_j$ 지점에서의 미분값. 따라서 $\theta_0$를 0으로 해서 단순화 했던 그래프 $J(\theta_1)$(2차 함수 그래프)에서 이 식을 계산해보면 $\theta$는 점점 가운데 꼭지점을 향해 감

If $\alpha$(learning rate) is too small, gradient descent can be slow.
If $\alpha$ is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge (minimum에서 갈수록 멀어질 수도 있음 핑퐁처럼)
If your parameters are already at a local minimum one step with gradient descent does absolutely nothing it doesn't change your parameter
Gradient descent can converge to a local minimum, even with the learning rate $\alpha$ fixed. As we approach a local minimum, gradient descent will automatically take smaller steps. (기울기가 감소하니까!) So, no need to decrease $\alpha$ over time.

Gradient Descent For Linear Regression

$$\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1) = \frac{\partial}{\partial\theta_j}\frac{1}{2m}\sum_{i = 1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2 = \frac{\partial}{\partial\theta_j}\frac{1}{2m}\sum_{i = 1}^{m}(\theta_0 + \theta_1 x^{(i)} - y^{(i)})^2$$ \begin{eqnarray} j &=& 0 : \frac{\partial}{\partial\theta_0}J(\theta_0, \theta_1) = \frac{1}{m}\sum_{i = 1}^{m}(h_\theta(x^{(i)}) - y^{(i)}) \\ j &=& 1 : \frac{\partial}{\partial\theta_1}J(\theta_0, \theta_1) = \frac{1}{m}\sum_{i = 1}^{m}(h_\theta(x^{(i)}) - y^{(i)})x^{(i)} \end{eqnarray} apply gradient descent algorithm to minimize our squared error cost function

cost function for linear regression is always going to be a bow shaped function (convex function)
"Batch" Gradient Descent
- "Batch" : Each step of gradient descent uses all the training examples.
- 전체 셋을 보는 대신 subset을 보는 형태의 gradient descent도 있음

Linear Algebra Review

Matrices and Vectors

Matrix

Rectangular array of numbers

Dimension of matrix: number of rows X number of columns (ex R(4 X 2), R(2 X 3))
Matrix Elements (entires of matrix)
- $A_{ij}$ = "$i, j$ entry" in the $i^{th}$row, $j^{th}$ column.
- The matrix gets you a way of letting you quickly organize, index and access lots of data.

Vector

An n X 1 matrix

Dimension of matrix : number of rows X 1 (ex R4, R7)
Vector Elements
- $y^i$ = $i^{th}$ element (1-indexed vs 0-indexed)

대문자는 matrix, 소문자는 일반적인 숫자, 자료값, 스칼라값, 혹은 vector

Addition and Scala Multiplication

Matrix Addition

각 행렬 내의 같은 자리에 있는 원소들을 더함 => 같은 차원인 행렬만 더할 수 있음

Scala Multiplication

Scala means real number
숫자를 행렬의 각 자리와 곱해줌 => 원래의 행렬과 같은 차원의 행렬이 답으로 나옴
나누기도 가능 (곱하기 1/n과 같으니까)

Combination of Operands

여러가지 연산자를 조합해서 사용도 가능

Matrix Vector Multiplication

Example

3 X 2 * 2 X 1 = 3 X 1 matrix

Details

m X n matrix(m rows, n columns) * n X 1 matrix(n-dimensional vector) = m-dimensional vector To get $y_i$, multiply A's $i^{th}$ row with elements of vector x, and add them up.

House sizes:

Sizes: 2104, 1416, 1534, 852
h(x) = -40 + 0.25x

$$ \begin{bmatrix} 1 & 2104 \\ 1 & 1416 \\ 1 & 1534 \\ 1 & 852 \\ \end{bmatrix} \times \begin{bmatrix} -40 \\ 0.25 \\ \end{bmatrix} \text{= }h(x)\text{의 결과} $$

software로 작성했을 때, 우리는 4개의 집의 예상 가격을 계산하는 식을 한줄로 작성할 수 있어!
prediction = Data Matrix X parameters
이건 표현식이 간결할 뿐 아니라 계산적으로도 효율적임

Matrix Matrix Multiplication

Example

2 X 3 * 3 X 2 = 2 X 2

Details

A X B = C m X n matrix(m rows, n columns) * n X o matrix(n rows, o columns) = m X o matrix B를 n X 1 vector로 간주하고 각각을 계산하여 합침 The $i^{th}$ column of the matrix C is obtained by multiplying A with the $i^{th}$ column of B. (for i = 1, 2, ... , o)

House sizes:

Sizes: 2104, 1416, 1534, 852
h(x) = -40 + 0.25x
h(x) = 200 + 0.1x
h(x) = -150 + 0.4x

$$ \begin{bmatrix} 1 & 2104 \\ 1 & 1416 \\ 1 & 1534 \\ 1 & 852 \\ \end{bmatrix} \times \begin{bmatrix} -40 & 200 & -150 \\ 0.25 & 0.1 & 0.4 \\ \end{bmatrix} \text{= 각각의 }h(x)\text{ 결과} $$

Matrix Multiplication Properties

Commutative
- Scala(real number) a x b = b x a (Commutative)
- But, Let A and B be matrices. The in general, A x B != B x A (not Commutative)
Associative
- Scala(real number) a x (b x c) = (a x b) x c (Associative)
- A x B x C Let D = B x C. Compute A x D Let E = A x B. Compute E x C (Associative)

Identity Matrix (단위 행렬)

Denoted I (or I n x n) For any matrix A, A x I = I x A = A

Inverse and Transpose

Matrix Inverse

1 = "identity" in real number
each number have an inverse
- ex) $3 \times 3^{-1} = 1$ but $0 \times 0^{-1} = ? \; \Rightarrow \; 0^{-1} = \text{undefined}$ Not all numbers have an inverse.
Matrix inverse: If A is an m x m matrix(square matrix), and if it has an inverse($A^{-1}$), $A \times A^{-1} = A^{-1} \times A = I$

Matrix Transpose

$A \rightarrow A^T$

Let A be an m x n matrix, and let $B = A^T$ The B is an n x m matrix, and $B_{ij} = A_{ji}$

저작자표시 비영리 동일조건 (새창열림)

현재글[Coursera] Machine Learning Week 1

Nillk dev blog

개발자

Tutorial, mac, clean code, Gatsby, Coursera, Scalatest, Programming for Everybody, Git, Environment Variables, translation, homebrew, System-rules, Eclipse, spring, Python, tomcat, machine learning, Java, Yosemite, linear regression,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

[Coursera] Machine Learning Week 1

Introduction

Welcome

Introduction

Welcome

Machine Learning

Machine Learning Examples

What is Machine Learning?

Supervised Learning

Regression Problem

Classification

Unsupervised Learning

Linear Regression with One Variable

Model and Cost Function

Model Representation

Housing Prices

Supervised Learning Algorithm Work

Cost Function

Cost Function - Intuition 1

Cost Function - Intuition 2

Parameter Learning

Gradient Descent

Gradient descent algorithm

Gradient Descent Intuition

Gradient Descent For Linear Regression

Linear Algebra Review

Linear Algebra Review

Matrices and Vectors

Matrix

Vector

Addition and Scala Multiplication

Matrix Addition

Scala Multiplication

Combination of Operands

Matrix Vector Multiplication

Example

Details

House sizes:

Matrix Matrix Multiplication

Example

Details

House sizes:

Matrix Multiplication Properties

Identity Matrix (단위 행렬)

Inverse and Transpose

Matrix Inverse

Matrix Transpose

'Dev/Machine Learning'의 다른글

티스토리툴바