WASET

	%0 Journal Article
	%A Alexander Goncharenko and  Andrey Denisov and  Sergey Alyamkin and  Evgeny Terentev
	%D 2019
	%J International Journal of Computer and Information Engineering
	%B World Academy of Science, Engineering and Technology
	%I Open Science Index 153, 2019
	%T Fast Adjustable Threshold for Uniform Neural Network Quantization
	%U https://publications.waset.org/pdf/10010747
	%V 153
	%X The neural network quantization is highly desired
procedure to perform before running neural networks on mobile
devices. Quantization without fine-tuning leads to accuracy drop of
the model, whereas commonly used training with quantization is done
on the full set of the labeled data and therefore is both time- and
resource-consuming. Real life applications require simplification and
acceleration of quantization procedure that will maintain accuracy of
full-precision neural network, especially for modern mobile neural
network architectures like Mobilenet-v1, MobileNet-v2 and MNAS. Here we present a method to significantly optimize training with
quantization procedure by introducing the trained scale factors for
discretization thresholds that are separate for each filter. Using the
proposed technique, we quantize the modern mobile architectures of
neural networks with the set of train data of only &sim; 10% of the
total ImageNet 2012 sample. Such reduction of train dataset size and
small number of trainable parameters allow to fine-tune the network
for several hours while maintaining the high accuracy of quantized
model (accuracy drop was less than 0.5%). Ready-for-use models and
code are available in the GitHub repository.
	%P 491 - 495