Deep learning: ReLU 함수

Notice

Recent Posts

Recent Comments

Link

z2soo's Git Blog

« 2024/11 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

z2soo's Blog

Deep learning: ReLU 함수 본문

Big data & AI

Deep learning: ReLU 함수

z2soo 2020. 1. 13. 17:32

Sigmoid 함수

Sigmoid 함수 그래프

Sigmoid 함수 특징

함수 값이 ( 0, 1 )로 제한된다.
중간 값은 1/2 이다.
가장 큰 값을 가지는 함수값은 거의 1이며, 매우 작은 값을 가지는 함수값은 거의 0이다.

Sigmoid 활용 MNIST 예제

이전 포스팅에서 tensorflow에서 제공하는 MNIST 데이터를 가지고 deep learning을 수행해보았다.

deep learning이란 조금 더 학습이 잘 되기 위해 layer를 추가하고, 각 layer에 많은 perceptron을 추가해서 구현하는 학습을 말한다. deep learning을 하였지만, 지금까지 사용해오던 sigmoid를 변함없이 사용한 과정이 다음과 같다.

# module 삽입
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import warnings
warnings.filterwarnings(action='ignore')

# data loading
mnist = input_data.read_data_sets('./data/mnist', one_hot=True)

# placeholder
X = tf.placeholder(shape=[None, 784], dtype=tf.float32)
Y = tf.placeholder(shape=[None, 10], dtype=tf.float32)

# W,b
W1 = tf.Variable(tf.random_normal(shape=[784,256]), name='weight1')
b1 = tf.Variable(tf.random_normal(shape=[256]), name='bias1')
layer1 = tf.sigmoid(tf.matmul(X,W1) + b1)

W2 = tf.Variable(tf.random_normal(shape=[256, 256]), name='weight2')
b2 = tf.Variable(tf.random_normal(shape=[256]), name='bias2')
layer2 = tf.sigmoid(tf.matmul(layer1,W2) + b2)

W3 = tf.Variable(tf.random_normal(shape=[256, 10]), name='weight3')
b3 = tf.Variable(tf.random_normal(shape=[10]), name='bias3')

# Hypothesis
logit = tf.matmul(layer2, W3) + b3
H = tf.nn.softmax(logit)

# cost
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logit, labels=Y))

# train
train = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

# session, 초기화
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# 학습
num_of_epoch = 30
batch_size = 100

for step in range(num_of_epoch):
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    cost_val = 0
    
    for i in range(num_of_iter):
        num_of_iter = int(mnist.train.num_examples / batch_size)
        _, cost_val = sess.run([train, cost], feed_dict={X: batch_x,
                                                    Y: batch_y})
    if step % 3 == 0:
        print(f'cost: {cost_val}')
        
# acuracy
predict = tf.argmax(H,1)
correct = tf.equal(predict, tf.argmax(Y,1))
accuracy = tf.reduce_mean(tf.cast(correct, dtype=tf.float32))
print(f'정확도:{sess.run(accuracy, feed_dict={X:mnist.test.images, Y: mnist.test.labels})}')

multinomial logistic regression 학습 방법 중 1개의 layer 만을 사용하였을 때와 비교해보아도 위의 과정, multi-layer를 이용한 deep learning의 정확도가 많이 향상되지는 않는다.

Sigmoid 함수 한계

0과 1로 표현되는 값들을 linear로는 표현하는데 한계를 느끼면서 사용한 sigmoid 함수는 0과 1 사이의 값으로 표현된다. 즉, 값이 확률로 표현된다. multinomial logistic regression으로 넘어오면서 여러 확률 값들이 나왔고, 그 확률 값들의 합이 1이 되게끔 softmax 함수를 이용하였다. softmax를 통해 여러 logistic 각각에 대한 확률 값들을 구할 수 있게 된 것이다. 하지만, layer가 많아지면서 값들이 sigmoid 함수를 통과하는 횟수가 증가하였고(sigmoid 함수의 중첩 사용), 그에 따라 값이 0에 가까워지는 양상을 보이게 되었다. ( Vanishing Gradient : 값이 희미해짐 ) 이러한 한계를 해결하기 위해 ReLU라는 함수를 만들어 내었고, multi-layer deep learning 과정에서는 sigmoid 함수 대신 ReLU 함수를 사용하기 시작하였다.

ReLU 함수

ReLU 함수 그래프

ReLU 함수 특징

x > 0 이면 기울기가 1인 직선이고, x < 0 이면 함수값이 0이 된다.
sigmoid 함수를 이용한 학습보다 빠르다.
연산 비용이 크지 않고, 구현이 매우 간단하다.
x < 0 인 값들에 대해 기울기가 0 이기 때문에, neuron이 죽을 수 있는 단점이 존재한다.

ReLU 활용 MNIST 예제

위에서 Sigmoid를 사용한 예제를 ReLU 함수를이용해서 구현해보자.

# module 삽입
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import warnings
warnings.filterwarnings(action='ignore')

# data loading
mnist = input_data.read_data_sets('./data/mnist', one_hot=True)

# placeholder
X = tf.placeholder(shape=[None, 784], dtype=tf.float32)
Y = tf.placeholder(shape=[None, 10], dtype=tf.float32)

ReLU 함수 사용시, 변화는 W, b, H 과정에서 발생! 나머지 과정은 동일하게 진행된다.

tf.nn.relu( tf.matmul( X, W ) + b ) : 기존 layer 수식, H 수식에 쓰던 sigmoid 대신 relu 함수 사용
tf.nn.relu( logit ) : 기존 softmax 대신 relu 함수 사용

# W,b 
W1 = tf.Variable(tf.random_normal(shape=[784,256]), name='weight1') 
b1 = tf.Variable(tf.random_normal(shape=[256]), name='bias1') 

# relu!!!!!! 사용!!!!! 
# 수식은 너무 복잡해서 주어진 함수 사용 
layer1 = tf.nn.relu(tf.matmul(X,W1) + b1) 

W2 = tf.Variable(tf.random_normal(shape=[256, 256]), name='weight2') 
b2 = tf.Variable(tf.random_normal(shape=[256]), name='bias2') 
layer2 = tf.nn.relu(tf.matmul(layer1,W2) + b2) 

W3 = tf.Variable(tf.random_normal(shape=[256, 10]), name='weight3') 
b3 = tf.Variable(tf.random_normal(shape=[10]), name='bias3') 


# Hypothesis 
logit = tf.matmul(layer2, W3) + b3 
H = tf.nn.relu(logit) 
# H = tf.nn.softmax(logit) 이전에는 이렇게 썼음
# 모든 함수를 relu로 바꾸어 준다.

나머지 과정은 동일하게 진행된다.

# cost 
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logit, labels=Y)) 

# train 
train = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost) 

# session, 초기화 
sess = tf.Session() 
sess.run(tf.global_variables_initializer()) 

# 학습 
num_of_epoch = 30 
batch_size = 100 

for step in range(num_of_epoch): 
    batch_x, batch_y = mnist.train.next_batch(batch_size) 
    cost_val = 0 
     
    for i in range(num_of_iter): 
        num_of_iter = int(mnist.train.num_examples / batch_size) 
        _, cost_val = sess.run([train, cost], feed_dict={X: batch_x, 
                                                    Y: batch_y}) 
    if step % 3 == 0: 
        print(f'cost: {cost_val}') 
        
# acuracy 
predict = tf.argmax(H,1) 
correct = tf.equal(predict, tf.argmax(Y,1)) 
accuracy = tf.reduce_mean(tf.cast(correct, dtype=tf.float32)) 

print(f'정확도:{sess.run(accuracy, feed_dict={X:mnist.test.images, Y: mnist.test.labels})}')

ReLu 함수를 사용하였지만, 여전히 정확도는 크게 높아지지 않는다. 이전의 설명을 잘 기억한다면, Hinton이 주장한 두 가지의 한계점!

1. 단일 layer 학습의 한계

2. 초기값 임의 설정의 한계

초기값을 랜덤으로 주었을 때, 운이 좋으면 W 값에 따라 학습이 잘 이루어지고, 그렇지 않으면 학습이 잘 이루어지지 못한다는 한계를 가진다. 2010년도 Xavier 초기화라는 방식이 논문으로 발표되었고, 2015년도에는 He's 초기화라는 방식이 논문으로 발표되었다. 현재까지도 계속해서 연구가 진행되고 있다. 초기값 설정에 대해서는 다음 포스팅에서 다루어보도록 한다.

'Big data & AI' 카테고리의 다른 글

Deep learning 예제: CNN 활용 과정 1 (0)	2020.01.14
Deep learning: CNN 개념 (0)	2020.01.14
Deep learning: 과적합 해결 dropout (0)	2020.01.13
Deep learning: Xavier 초기값 및 초기화 (0)	2020.01.13
Deep learning 예제: MNIST (0)	2020.01.13

'Big data & AI' Related Articles

Comments

z2soo's Blog

z2soo's Blog

Deep learning: ReLU 함수 본문

Deep learning: ReLU 함수

목차

Sigmoid 함수

Sigmoid 함수 그래프

Sigmoid 함수 특징

Sigmoid 활용 MNIST 예제

Sigmoid 함수 한계

ReLU 함수

ReLU 함수 그래프

ReLU 함수 특징

ReLU 활용 MNIST 예제

'Big data & AI' 카테고리의 다른 글

티스토리툴바