WASET
	%0 Journal Article
	%A Zhifeng Kong
	%D 2020
	%J International Journal of Computer and Information Engineering
	%B World Academy of Science, Engineering and Technology
	%I Open Science Index 162, 2020
	%T Convergence Analysis of Training Two-Hidden-Layer Partially Over-Parameterized ReLU Networks via Gradient Descent
	%U https://publications.waset.org/pdf/10011232
	%V 162
	%X Over-parameterized neural networks have attracted a
great deal of attention in recent deep learning theory research,
as they challenge the classic perspective of over-fitting when
the model has excessive parameters and have gained empirical
success in various settings. While a number of theoretical works
have been presented to demystify properties of such models, the
convergence properties of such models are still far from being
thoroughly understood. In this work, we study the convergence
properties of training two-hidden-layer partially over-parameterized
fully connected networks with the Rectified Linear Unit activation via
gradient descent. To our knowledge, this is the first theoretical work
to understand convergence properties of deep over-parameterized
networks without the equally-wide-hidden-layer assumption and
other unrealistic assumptions. We provide a probabilistic lower bound
of the widths of hidden layers and proved linear convergence rate of
gradient descent. We also conducted experiments on synthetic and
real-world datasets to validate our theory.
	%P 166 - 177