Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32579
Personal Information Classification Based on Deep Learning in Automatic Form Filling System

Authors: Shunzuo Wu, Xudong Luo, Yuanxiu Liao


Recently, the rapid development of deep learning makes artificial intelligence (AI) penetrate into many fields, replacing manual work there. In particular, AI systems also become a research focus in the field of automatic office. To meet real needs in automatic officiating, in this paper we develop an automatic form filling system. Specifically, it uses two classical neural network models and several word embedding models to classify various relevant information elicited from the Internet. When training the neural network models, we use less noisy and balanced data for training. We conduct a series of experiments to test my systems and the results show that our system can achieve better classification results.

Keywords: Personal information, deep learning, auto fill, NLP, document analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 690


[1] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Efficient estimation of word representations in vector space. In Proceedings of the 2013 International Conference on Learning Representations, pages 3111–3119, 2013.
[2] Jeffrey Pennington, Richard Socher, and Christopher D Manning. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1532–1543, 2014.
[3] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In Proceedings of the 2016 Confrence on North American Chapter of the Association for Computational Linguistics, pages 2227–2237, 2018.
[4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Confrence on Association for Computational Linguistics, 2019.
[5] Xiaoya Li, Yuxian Meng, Xiaofei Sun, Qinghong Han, Arianna Yuan, and Jiwei Li. Is word segmentation necessary for deep learning of chinese representations? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3242–3252, 2019.
[6] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, 1989.
[7] Mohamed MG Farag, Sunshin Lee, and Edward A Fox. Focused crawler for events. International Journal on Digital Libraries, 19(1):3–19, 2018.
[8] Sawroop Kaur Bal and G Geetha. Smart distributed web crawler. In Proceeding of the 2016 International Conference on Information Communication and Embedded Systems, pages 1–5, 2016.
[9] Dani Gunawan, Amalia Amalia, and Atras Najwan. Improving data collection on article clustering by using distributed focused crawler. 2017.
[10] Deng Kaiying, Chen Senpeng, and Deng Jingwei. On optimisation of web crawler system on scrapy framework. Proceeding of the 2020 International Journal of Wireless and Mobile Computing, 18(4):332–338, 2020.
[11] Priyanga Chandrasekar, Kai Qian, Hossain Shahriar, and Prabir Bhattacharya. Improving the prediction accuracy of decision tree mining with data preprocessing. In Proceeding of the 41st Annual Computer Software and Applications Conference, volume 2, pages 481–484, 2017.
[12] Hongyu Yang and Fengyan Wang. Wireless network intrusion detection based on improved convolutional neural network. Special Section On Security And Privacy In Emerging Decentralized Communication Environments, 7:64366–64374, 2019.
[13] Shuai Jiang and Xiaolong Xu. Application and performance analysis of data preprocessing for intrusion detection system. In Proceeding of the 2019 International Conference on Science of Cyber Security, pages 163–177, 2019.
[14] Armand Joulin, ´ Edouard Grave, Piotr Bojanowski, and Tom´aˇs Mikolov. Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 427–431, 2017.
[15] Yoon Kim. Convolutional neural networks for sentence classification. In Processing of the 19th Conference on Empirical Methods in Natural Language Processing, page 17461751, 2014.
[16] Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial intelligence, pages 2267–2273, 2015.
[17] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1480–1489, 2016.
[18] Alexis Conneau, Holger Schwenk, Lo¨ıc Barrault, and Yann Lecun. Very deep convolutional networks for text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 1107–1116, 2017.
[19] Abdalraouf Hassan and Ausif Mahmood. Efficient deep learning model for text classification based on recurrent and convolutional layers. In Proceeding of the 16th IEEE international Conference on Machine Learning and Applications (ICMLA), pages 1108–1113, 2017.
[20] Long Guo, Dongxiang Zhang, Lei Wang, Han Wang, and Bin Cui. Cran: a hybrid cnn-rnn attention-based model for text classification. In Proceeding of the 2018 International Conference on Conceptual Modeling, pages 571–585, 2018.
[21] Tengfei Liu, Shuangyuan Yu, Baomin Xu, and Hongfeng Yin. Recurrent networks with attention and convolutional networks for sentence representation and classification. Applied Intelligence, 48(10):3797–3806, 2018.
[22] Jin Zheng and Limin Zheng. A hybrid bidirectional recurrent convolutional neural network attention-based model for text classification. IEEE Access, 7:106673–106685, 2019.
[23] Shiyao Wang and Zhidong Deng. Tightly-coupled convolutional neural network with spatial-temporal memory for text classification. In Proceeding of the 2017 International Joint Conference on Neural Networks, pages 2370–2376, 2017.
[24] Juliet Chebet Moso, Jonah Kenei, Elisha T Opiyo Omullo, Robert Oboko, et al. Deep cnn with residual connections and range normalization for clinical text classification. Computer Science and information Technology, 7(4):111–127, 2019.
[25] Renato M Silva, Roney LS Santos, Tiago A Almeida, and Thiago AS Pardo. Towards automatically filtering fake news in portuguese. Expert Systems with Applications, 146:113199, 2020.
[26] Botao Zhong, Xing Pan, Peter ED Love, Lieyun Ding, and Weili Fang. Deep learning and network analysis: Classifying and visualizing accident narratives in construction. Automation in Construction, 113:103089, 2020.
[27] Bharath Sriram, Dave Fuhry, Engin Demir, Hakan Ferhatosmanoglu, and Murat Demirbas. Short text classification in twitter to improve information filtering. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 841–842, 2010.
[28] Saurabh Kumar Srivastava, Sandeep Kumar Singh, and Jasjit S Suri. A healthcare text classification system and its performance evaluation: a source of better intelligence by characterizing healthcare text. In Cognitive informatics, Computer Modelling, and Cognitive Science, pages 319–369. 2020.
[29] Che-Wen Chen, Shih-Pang Tseng, Ta-Wen Kuan, and Jhing-Fa Wang. Outpatient text classification using attention-based bidirectional lstm for robot-assisted servicing in hospital. Information, 11(2):106, 2020.
[30] Xi Yang and Ying Liu. Automatic extraction of theft judgment information in natural language. Proceeding of the 18th International Conference on Electronic Business, 2018.
[31] Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmasi, Mihaela Vela, Liviu P Dinu, and Josef van Genabith. Exploring the use of text classification in the legal domain. Analysis of information in Legal Texts, 2017.