Comparative Analysis of Optimizers in Deep Neural Networks

Chitra Desai1

1

Publication Date: 2020/11/05

Abstract: The role of optimizer in deep neural networks model impacts the accuracy of the model. Deep learning comes under the umbrella of parametric approaches; however, it tries to relax as many as assumptions as possible. The process of obtaining parameters from the data is gradient descent. Gradient descent is the chosen optimizer in neural network and many of the machine learning algorithms. The classical stochastic gradient descent (SGD) and SGD with momentum which were used in deep neural networks had several challenges which were attempted to resolve using adaptive learning optimizers. Adaptive learning algorithms likeRMSprop, Adagrad, Adam wherein learning rate for each parameter is computed were further developments for better optimizer. Adam optimizer in Deep Neural Networks is often a default choice observed recently. Adam optimizer is a combination of RMSprop and momentum. Though, Adam since its introduction has gained popularity, there are claims that report convergence problem with Adam optimizer. Also, it is advocated that SGD with momentum gives better performance compared to Adam. This paper presents comparative analysis of SGD, SGD with momentum, RMSprop, Adagrad and Adam optimizer on Seattle weather dataset.The Seattle weather dataset, was processed assuming Adam optimizer will prove to be the better optimizer choice as preferred a default choice by many, however, SGD with momentum proved to be a unsurpassed optimizer for this particular dataset.

Keywords: Gradient Descent, SGD with momentum RMSprop, Adagrad and Adam.

DOI: No DOI Available

PDF: https://ijirst.demo4.arinfotech.co/assets/upload/files/IJISRT20OCT608.pdf

REFERENCES

No References Available