Improved Hybrid Model for  Classification of Text Documents

May T. Stow; Chidiebere Ugwu; Laeticia N. Onyejegbu

doi:10.24018/ejai.2023.2.2.22

Research Article

May T. Stow

University of Port Harcourt, Nigeria

* Corresponding author

Chidiebere Ugwu

University of Port Harcourt, Nigeria

Laeticia N. Onyejegbu

University of Port Harcourt, Nigeria

10.24018/ejai.2023.2.2.22

Read Counter
246

Downloads
271

Citations

Share

Submitted 2023-02-28
Published 2023-04-11

Read counter = 246 times

Abstract

All universities in and around the globe have senate members whose responsibility is to deliberate on matters that affect the smooth running of the university in senate meetings, such matters include, personnel, management, and student matters. Reports are generated at the end of each senate meeting on these matters and are printed on paper or stored in the system without proper grouping of the matters as a result of lack of efficient classification model. This paper proposes hybrid machine learning and deep learning models for the development of efficient classification model for textual documents and tested with reports from senate deliberations from university of Port Harcourt. The dataset for over ten years was collected and pre-processed, noise and other non-alphanumeric values removed by tokenization. Principal component analysis algorithm which is a machine learning approach was used extensively for feature selection and LSTM a deep learning architecture was used to build the model which has the capacity of retaining the content in its memory for a long time which solves the challenges of memory retention in other models. The model built depicts classification accuracy of 99% and the classification application was able to classify decisions made by the senate into different categories which will assist to eliminate conflicting decisions on the floor of any university senate.

Keywords: senate matters long short-term memory principal component analysis Python Flask

References

Thomas, A.M. and Resmipriya, M.G. an efficient text classification scheme using clustering. International Conference on Emerging Trends in Engineering, Science and Technology. 2016; 24(1): 1220–1225.
Google Scholar

Isa, D., Lee, L.H., Kallimani, V.P. and Rajikuma, R. Text Document preprocessing using the bayes formula for classification based on the vector space model. Computer and Information Science Journal. 2008; 1(4): 79–90.
Google Scholar

Hotho, A., Staab, S. and Stumme, G. Wordnet improves text document clustering. International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003.
Google Scholar

Deokar, S.T. Text documents clustering using k means algorithm. International Journal of Technology and Engineering Science. 2013; 1(4): 282–286.
Google Scholar

Zhang, S., & Sun, Y. Text classification using K-nearest neighbor classifier with cosine similarity measure. In Proceedings of the 2015 International Conference on Intelligent Systems and Knowledge Engineering, 545–550. 2015.
Google Scholar

Abikoye, O. C., Omokanye, S. O. and Aro, T. O. Binary text classification using an ensemble of naive bayes and support vector machines. GESJ: Computer Science and Telecommunication. 2017; 2(52): 37–45.
Google Scholar

Al-Anazi, S., Al Mahmoud, H. and Al_Turaiki, I. Finding similar documents using different clustering techniques: Procedia Computer Science, 2016; 82: 28–34.
Google Scholar

Tripathy, A., Agrawal, A., and Rath, S. K. Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 2016; 57, 117–126.
Google Scholar

Azam, M., Ahmed, I., Sabah, F., and Hussain, M. I. Feature extraction-based text classification using K-nearest neighbor algorithm. International Journal of Computer Science and Network Security. 2018; 18(12), 95–101.
Google Scholar

Kowsari, K., Brown, D., Heidarysafa, M., Meimandi, K.J., Gerber, M.S., and Barnes, L.E. HDLTex: hierarchical deep learning for text classification. 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 2017; 364–371.
Google Scholar

Nedungadi, P., harikumar, H., and Ramesh, M. A high-performance hybrid algorithm or text classification. In proceedings of the Fifth International Conference on Applications of Digital Information and Web Technologies. 2017; 2(3): 118–123.
Google Scholar

Patel, F. N. and Soni, N. R. Increasing accuracy of k-nearest neighbour classifier for text classification. International Journal of Computer Science and Informatics. 2013; 3(2), 80–85.
Google Scholar

Adawuofor, C., & Anene, A. N. An improved classification model for igbo text using n-gram and k-nearest neighbor approaches. International Journal of Advanced Computer Science and Applications. 2019 10(4) 254–260.
Google Scholar

Wang, Z. W., Wang, S. K., Wan, B. T. and Song, W. W. A novel multi-label classification algorithm based on k-nearest neighbor and random walk. International Journal of Distributed Sensor Networks. 2020; 16(3): 1–17.
Google Scholar

Mohammed, A. and Kora, R. An effective ensemble deep learning framework for text classification. Journal of King Saud University-Computer and Information Sciences, 2022; 34(10), 8825–8837.
Google Scholar

Ugwu, C. and Obasi, K. Legal case document classification application based on an improved hybrid approach. International Journal of Engineering Research and Technology (IJERT). 2015; 4(4), 517–525.
Google Scholar

Downloads

PDF

How to Cite

Improved Hybrid Model for Classification of Text Documents. (2023). European Journal of Artificial Intelligence and Machine Learning, 2(2), 17-23. https://doi.org/10.24018/ejai.2023.2.2.22

Issue

Vol. 2 No. 2 (2023)

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] Thomas, A.M. and Resmipriya, M.G. an efficient text classification scheme using clustering. International Conference on Emerging Trends in Engineering, Science and Technology. 2016; 24(1): 1220–1225.
Google Scholar

[2] Isa, D., Lee, L.H., Kallimani, V.P. and Rajikuma, R. Text Document preprocessing using the bayes formula for classification based on the vector space model. Computer and Information Science Journal. 2008; 1(4): 79–90.
Google Scholar

[3] Hotho, A., Staab, S. and Stumme, G. Wordnet improves text document clustering. International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003.
Google Scholar

[4] Deokar, S.T. Text documents clustering using k means algorithm. International Journal of Technology and Engineering Science. 2013; 1(4): 282–286.
Google Scholar

[5] Zhang, S., & Sun, Y. Text classification using K-nearest neighbor classifier with cosine similarity measure. In Proceedings of the 2015 International Conference on Intelligent Systems and Knowledge Engineering, 545–550. 2015.
Google Scholar

[6] Abikoye, O. C., Omokanye, S. O. and Aro, T. O. Binary text classification using an ensemble of naive bayes and support vector machines. GESJ: Computer Science and Telecommunication. 2017; 2(52): 37–45.
Google Scholar

[7] Al-Anazi, S., Al Mahmoud, H. and Al_Turaiki, I. Finding similar documents using different clustering techniques: Procedia Computer Science, 2016; 82: 28–34.
Google Scholar

[8] Tripathy, A., Agrawal, A., and Rath, S. K. Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 2016; 57, 117–126.
Google Scholar

[9] Azam, M., Ahmed, I., Sabah, F., and Hussain, M. I. Feature extraction-based text classification using K-nearest neighbor algorithm. International Journal of Computer Science and Network Security. 2018; 18(12), 95–101.
Google Scholar

[10] Kowsari, K., Brown, D., Heidarysafa, M., Meimandi, K.J., Gerber, M.S., and Barnes, L.E. HDLTex: hierarchical deep learning for text classification. 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 2017; 364–371.
Google Scholar

[11] Nedungadi, P., harikumar, H., and Ramesh, M. A high-performance hybrid algorithm or text classification. In proceedings of the Fifth International Conference on Applications of Digital Information and Web Technologies. 2017; 2(3): 118–123.
Google Scholar

[12] Patel, F. N. and Soni, N. R. Increasing accuracy of k-nearest neighbour classifier for text classification. International Journal of Computer Science and Informatics. 2013; 3(2), 80–85.
Google Scholar

[13] Adawuofor, C., & Anene, A. N. An improved classification model for igbo text using n-gram and k-nearest neighbor approaches. International Journal of Advanced Computer Science and Applications. 2019 10(4) 254–260.
Google Scholar

[14] Wang, Z. W., Wang, S. K., Wan, B. T. and Song, W. W. A novel multi-label classification algorithm based on k-nearest neighbor and random walk. International Journal of Distributed Sensor Networks. 2020; 16(3): 1–17.
Google Scholar

[15] Mohammed, A. and Kora, R. An effective ensemble deep learning framework for text classification. Journal of King Saud University-Computer and Information Sciences, 2022; 34(10), 8825–8837.
Google Scholar

[16] Ugwu, C. and Obasi, K. Legal case document classification application based on an improved hybrid approach. International Journal of Engineering Research and Technology (IJERT). 2015; 4(4), 517–525.
Google Scholar

Improved Hybrid Model for Classification of Text Documents

Article Sidebar

Article Main Content

References