Improved Hybrid Model for Classification of Text Documents
Article Main Content
All universities in and around the globe have senate members whose responsibility is to deliberate on matters that affect the smooth running of the university in senate meetings, such matters include, personnel, management, and student matters. Reports are generated at the end of each senate meeting on these matters and are printed on paper or stored in the system without proper grouping of the matters as a result of lack of efficient classification model. This paper proposes hybrid machine learning and deep learning models for the development of efficient classification model for textual documents and tested with reports from senate deliberations from university of Port Harcourt. The dataset for over ten years was collected and pre-processed, noise and other non-alphanumeric values removed by tokenization. Principal component analysis algorithm which is a machine learning approach was used extensively for feature selection and LSTM a deep learning architecture was used to build the model which has the capacity of retaining the content in its memory for a long time which solves the challenges of memory retention in other models. The model built depicts classification accuracy of 99% and the classification application was able to classify decisions made by the senate into different categories which will assist to eliminate conflicting decisions on the floor of any university senate.
References
-
Thomas, A.M. and Resmipriya, M.G. an efficient text classification scheme using clustering. International Conference on Emerging Trends in Engineering, Science and Technology. 2016; 24(1): 1220–1225.
Google Scholar
1
-
Isa, D., Lee, L.H., Kallimani, V.P. and Rajikuma, R. Text Document preprocessing using the bayes formula for classification based on the vector space model. Computer and Information Science Journal. 2008; 1(4): 79–90.
Google Scholar
2
-
Hotho, A., Staab, S. and Stumme, G. Wordnet improves text document clustering. International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003.
Google Scholar
3
-
Deokar, S.T. Text documents clustering using k means algorithm. International Journal of Technology and Engineering Science. 2013; 1(4): 282–286.
Google Scholar
4
-
Zhang, S., & Sun, Y. Text classification using K-nearest neighbor classifier with cosine similarity measure. In Proceedings of the 2015 International Conference on Intelligent Systems and Knowledge Engineering, 545–550. 2015.
Google Scholar
5
-
Abikoye, O. C., Omokanye, S. O. and Aro, T. O. Binary text classification using an ensemble of naive bayes and support vector machines. GESJ: Computer Science and Telecommunication. 2017; 2(52): 37–45.
Google Scholar
6
-
Al-Anazi, S., Al Mahmoud, H. and Al_Turaiki, I. Finding similar documents using different clustering techniques: Procedia Computer Science, 2016; 82: 28–34.
Google Scholar
7
-
Tripathy, A., Agrawal, A., and Rath, S. K. Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 2016; 57, 117–126.
Google Scholar
8
-
Azam, M., Ahmed, I., Sabah, F., and Hussain, M. I. Feature extraction-based text classification using K-nearest neighbor algorithm. International Journal of Computer Science and Network Security. 2018; 18(12), 95–101.
Google Scholar
9
-
Kowsari, K., Brown, D., Heidarysafa, M., Meimandi, K.J., Gerber, M.S., and Barnes, L.E. HDLTex: hierarchical deep learning for text classification. 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 2017; 364–371.
Google Scholar
10
-
Nedungadi, P., harikumar, H., and Ramesh, M. A high-performance hybrid algorithm or text classification. In proceedings of the Fifth International Conference on Applications of Digital Information and Web Technologies. 2017; 2(3): 118–123.
Google Scholar
11
-
Patel, F. N. and Soni, N. R. Increasing accuracy of k-nearest neighbour classifier for text classification. International Journal of Computer Science and Informatics. 2013; 3(2), 80–85.
Google Scholar
12
-
Adawuofor, C., & Anene, A. N. An improved classification model for igbo text using n-gram and k-nearest neighbor approaches. International Journal of Advanced Computer Science and Applications. 2019 10(4) 254–260.
Google Scholar
13
-
Wang, Z. W., Wang, S. K., Wan, B. T. and Song, W. W. A novel multi-label classification algorithm based on k-nearest neighbor and random walk. International Journal of Distributed Sensor Networks. 2020; 16(3): 1–17.
Google Scholar
14
-
Mohammed, A. and Kora, R. An effective ensemble deep learning framework for text classification. Journal of King Saud University-Computer and Information Sciences, 2022; 34(10), 8825–8837.
Google Scholar
15
-
Ugwu, C. and Obasi, K. Legal case document classification application based on an improved hybrid approach. International Journal of Engineering Research and Technology (IJERT). 2015; 4(4), 517–525.
Google Scholar
16