Enhancing Arabic Handwritten Recognition System-Based CNN-BLSTM Using Generative Adversarial Networks
##plugins.themes.bootstrap3.article.main##
Arabic Handwritten Recognition (AHR) presents unique challenges due to the complexity of Arabic script and the limited availability of training data. This paper proposes an approach that integrates generative adversarial networks (GANs) for data augmentation within a robust CNN-BLSTM architecture, aiming to significantly improve AHR performance. We employ a CNN-BLSTM network coupled with connectionist temporal classification (CTC) for accurate sequence modeling and recognition. To address data limitations, we incorporate a GANs based data augmentation module trained on the IFN-ENIT Arabic handwriting dataset to generate realistic and diverse synthetic samples, effectively augmenting the original training corpus. Extensive evaluations on the IFN-ENIT benchmark demonstrate the efficacy of adopted approach. We achieve a recognition rate of 95.23%, surpassing the baseline model by 3.54%. This research presents a promising approach to data augmentation in AHR and demonstrates a significant improvement in word recognition accuracy, paving the way for more robust and accurate AHR systems.
Introduction
Pattern recognition is a broad field dedicated to extracting and interpreting meaningful information from intricate data patterns. Among its various subdomains, text recognition is pivotal in numerous applications, such as data storage, mobile and desktop applications, historical manuscript processing and preservation, among other fields [1]–[5]. Arabic Handwritten Recognition (AHR) is particularly challenging due to the inherent complexities of Arabic script, including diverse writing styles, ligatures, and diacritics. The intricate character connections and vowel marks in Arabic script make this a difficult task. Despite significant advancements, achieving high accuracy in AHR remains an ongoing pursuit [6]–[9].
A primary obstacle in AHR research is the limited availability of large-scale, diverse datasets. Training deep learning models, especially those involving Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) [10]–[13], requires ample data to learn robust representations and generalize effectively. The scarcity of labeled Arabic handwritten datasets hinders model performance and limits practical applicability.
Generative Adversarial Networks (GANs) have revolutionized pattern recognition and computer vision with their ability to generate realistic synthetic data. They have found applications in diverse areas such as neuroimaging and clinical neuroscience, remote sensing and computer vision for audio-visual speech recognition [14]–[17]. These models consist of two competing neural networks: a generator that creates synthetic data resembling the real distribution and a discriminator that attempts to distinguish real from generated data. This adversarial training process incentivizes the generator to produce increasingly realistic samples, potentially alleviating data scarcity issues.
This paper proposes an approach to AHR by leveraging GANs for data augmentation. We employ a CNN-BLSTM model, a well-established architecture for sequence modeling in AHR. By using GANs as a data augmentation technique, we can generate a large amount of synthetic Arabic handwriting data. This not only helps in improving the robustness of the handwriting recognition model but also helps in tackling the problem of data scarcity, which is a common issue in handwriting recognition tasks. Thus, GANs hold great potential in advancing the field of Arabic handwriting recognition. Our key contributions include:
- A GAN-based data augmentation technique for AHR: GAN architecture that captures the intricacies of Arabic handwriting while preserving the essential characteristics of real data.
- Improving recognition accuracy using the augmented dataset: We demonstrate that training the CNN-BLSTM model on augmented data effectively enhances its performance, surpassing the baseline model trained on real data alone.
- Paving the way for more robust and accurate AHR systems: This research presents a promising avenue for addressing data limitations in AHR and contributes to the development of more robust and accurate recognition systems.
The structure of this paper is organized as follows: Section 2 introduces the Arabic handwriting recognition approach and the application of Generative Adversarial Networks as a data augmentation technique. Section 3 delineates the GAN architecture implemented in this study and a description of our baseline CNN-BLSTM model. Section 4 details the experimental setup and presents the obtained results. The paper concludes with Section 5, which summarizes the key findings, discusses potential avenues for future research, and underscores the broader implications of this study for the AHR domain.
Literature Review
The field of Arabic Handwritten Text Recognition has seen significant progress in recent years. Before the deep learning revolution, offline handwritten text recognition relied heavily on Hidden Markov Models (HMMs) [18]–[21]. These statistical models examined sequences of features but struggled to consider the broader context of surrounding characters. To overcome this limitation, researchers experimented with hybrid approaches, combining HMMs with neural networks models [22]–[24]. The paradigm shift emerged with the rise of deep learning, specifically Convolutional Neural Networks and Recurrent Neural Networks. CNNs automatically extract robust features directly from images, while RNNs excel at capturing the sequential nature of characters and words. Combining their strengths, Convolutional Recurrent Neural Networks (CRNNs) became the reigning champions of offline handwriting recognition. CRNNs first use a CNN to generate a sequence of features from the image, then feed those features into an RNN to decode the final text.
Despite considerable advancements in AHR, it continues to face several challenges. One of the primary obstacles is the limited availability of large-scale, diverse datasets. This scarcity of data impedes the training of robust deep learning models, especially those with complex architectures that necessitate substantial data for effective generalization [25]–[27]. Data scarcity has long been a major obstacle in deep learning; this challenge necessitates the use of data augmentation techniques, which artificially expand the training data by modifying existing samples or generating new ones [28]–[31].
In handwriting recognition, acquiring large and diverse datasets can be extremely difficult. Traditional data augmentation techniques involve geometric transformations and image manipulations such as rotation, scaling, skewing, and shifting of images, introducing variability in orientation, size and position, thereby mimicking the natural variations observed in handwritten text [32]–[38]. Elastic deformations, achieved through random warping or jittering of the image, simulate distortions and imperfections typically found in real-world documents. Furthermore, changes in intensity and the injection of noise, which involve modifying the brightness and contrast and adding noise, respectively, can simulate varying lighting conditions and imperfections encountered during the scanning or capturing of handwritten text. Despite their straightforward implementation and computational efficiency, these methods have limited effectiveness in AHR. They often fall short in capturing the specific complexities of the Arabic script, such as complex ligatures, diacritics, and diverse writing styles.
The limitations of traditional data augmentation techniques in capturing the specific complexities of Arabic handwriting necessitate exploring more deeped approaches. Generative adversarial networks have demonstrated significant potential in enhancing performances by offering the ability to generate realistic and diverse synthetic data, making them widely applied in various fields. In medicine, the data augmentation GAN has been used to enhance the performance of basic Convolutional Neural Network classifiers on datasets EMNIST, VGG-Face, and Omniglot [39]. Deep learning techniques have been used to analyze neuroimaging data for diagnosing neurodegenerative diseases. Due to the limited availability of neuroimaging data, Deep Convolutional Adversarial Networks (DCGANs) have been used to generate synthetic images, thereby increasing the size and variety of the dataset [40]. Similarly, in a study on functional near-infrared spectroscopy (fNIRS), Wasserstein GANs were used to generate artificial fNIRS data, which improved the accuracy of classifying different task types [41].
Recent studies on handwritten recognition have focused on addressing the challenges of data scarcity and class imbalance through the use of GANs. Alwaqfi et al. [42] proposed a model that uses a GAN and a CNN to recognize Arabic handwritten characters. The GAN consists of a generator and a discriminator, both of which are CNNs with 10 and 9 layers, respectively. The model was trained on the Arabic Handwritten Characters Dataset (AHCD) and achieved an accuracy of 99.78% when the GAN was used and 96.28% when it was not. Eltay et al. [43] developed a GAN model based on the ScrabbleGAN model to address the imbalance in the frequency of generated handwritten Arabic characters. The model achieved better results when evaluated using generated images in addition to the originals. Mustapha et al. [44] proposed a deep convolutional generative adversarial network (CDCGAN) model to generate isolated handwritten Arabic characters; the model was trained and tested using a CNN model (LeNet5), showing a gap of 10% between the accuracy obtained while using the generated set and the real set. Jemni et al. [45] presented a GAN-based model to enhance the process of the binarization of degraded handwritten documents; the model was tested on the KHATT and IAM datasets. The results showed that the model achieved the best results among all the baseline models for Arabic and English script.
The works in AHR, as presented in this section, demonstrate the significant strides that have been made in this field. The use of Generative Adversarial Networks as a data augmentation technique has notably enhanced the performance and robustness of various systems. However, despite these advancements, there remain challenges and opportunities for further research and development.
Materials and Methods
In this study, we propose a framework aimed at enhancing the accuracy of Arabic handwritten recognition. By integrating the power of deep learning and GANs data augmentation, our approach is grounded on the robust architecture of Convolutional Neural Networks (CNNs), Bidirectional Long Short-Term Memory (BLSTMs), and Connectionist Temporal Classification (CTC) for the purpose of feature extraction and sequence decoding. To tackle the challenges of data scarcity and diversity, we employ GANs to generate synthetic data that closely resembles real Arabic handwriting. An overview of the developed model is provided in Fig. 1.
GANs
Generative Adversarial Networks is an innovative type of neural network architecture first proposed by Goodfellow et al. [46]. The basic GAN framework consists of two competing neural networks-a generator model and a discriminator model. The generator’s role is to synthesize new data instances that mimic real data, while the discriminator aims to differentiate between real data examples and those created by the generator. The two networks are pitted against each other in an adversarial game where the generator tries to better fool the discriminator over time. Through this competitive process, both models improve-the generator gets better at creating realistic synthetic data, and the discriminator gets better at detecting generated data.
The generator output is connected directly to the discriminator input. Through backpropagation, the discriminator’s classification provides a signal that the generator uses to update its weights. The generator model is trained using the loss signals received from the discriminator’s classifications to iteratively enhance its outputs (Fig. 2).
GANs have rapidly gained popularity in recent years due to their ability to produce high-quality synthetic data across various problem domains. They have proven especially adept at image-to-image translation tasks, such as turning semantic layouts into photo-realistic scenes or sketches into rendered faces [46]. In [47] incorporated recurrent neural network recognizer into a GAN to control the generated output. Jha et al. [48] introduced a GAN structure that is grounded on the Convolutional Neural Network, the sigmoid function was employed at the generator’s output layer, and binary cross-entropy was selected as the loss function. Fogel et al. [49] presented ScrabbleGAN which generates each character independently using CNNs. This accounts for influence of neighboring letters. ScrabbleGAN enhancements produce more realistic images compared to previous GAN models. Our GANs layer follows paradigm [49], an approach to synthesize handwritten text images that are versatile both in style and lexicon. ScrabbleGAN relies on a novel generative model which can generate images of words with an arbitrary length.
Baseline Model
Our baseline model is a fusion of three critical elements. Firstly, it employs a Convolutional Neural Network to serve as a feature extractor, pulling out significant features from a text image. Secondly, it uses recurrent layers that leverage Bidimensional Long Short-Term Memory to predict a pre-frame from an input sequence. Lastly, it incorporates a transcription layer that integrates Connectionist Temporal Classification (CTC). This combination of components enables the model to interpret and analyze the input data effectively (Fig. 3) [50]–[52].
CNN
The Convolutional Neural Network is typically employed for image classification tasks. It comprises several layers that are adept at learning high-level features from labeled training data. The fundamental concept of CNN is the utilization of convolutional and pooling layers to progressively extract more abstract patterns that can be executed locally. The convolution layer applies a set of convolution filters to the input images, with each filter extracting a specific feature from the image. The pooling layer simplifies the output through nonlinear down-sampling, thereby reducing the number of parameters that the network needs to learn. This operation enables the identification of different characteristics at each layer.
As depicted in Fig. 4, the architecture embodies a Convolutional Neural Network designed for handwritten recognition, drawing inspiration from LeNet-5 [53]. It comprises several layers. Initially, the input is processed by a convolutional layer that convolves it with a set of learnable filters or weights, each generating one feature map. Following this, a pooling layer (sub-sampling) is employed to gradually reduce the spatial size of the feature map by averaging the features in the neighborhood or pooling for a maximum value, thereby reducing the number of parameters and computation in the network. Each convolution layer is followed by a sub-sampling layer. The alternating sequence of convolutional and pooling layers forms the feature extractor, which extracts distinguishing features from the raw images. Fully connected layers are utilized at the end of the network for high-level reasoning after the feature extraction and consolidation have been carried out by the convolutional and pooling layers. Our CNN architecture, derived from LeNet-5 with modifications, is further detailed in [52]–[54].
BLSTM
The Bi-directional Long Short-Term Memory architecture, introduced by Graves et al. [55], was designed to address the limitations of LSTM, particularly its reliance on processing information based solely on previous context. This is a significant drawback of both RNN and LSTM.
BLSTM has found favor in numerous applications, including speech recognition, due to its ability to be trained using all available input information from both the past and future within a specific time frame. The structure of BLSTM includes two separate hidden layers. One layer processes the input sequence in a forward direction and the second in a backward direction. These two hidden layers are connected to the same output layer. This connection provides the output layer with access to the past and future context for every point in the sequence. As a result, BLSTM outperforms unidirectional LSTMs and standard RNNs in terms of speed and accuracy. The architecture of a BLSTM network is illustrated in Fig. 5.
CTC-WBS
Connectionist Temporal Classification (CTC) plays a pivotal role in enabling recognition without the need for prior segmentation. CTC was initially conceived for speech recognition and later extended to handwriting recognition. The CTC layer transcribes data and predicts transcriptions for test images during the recognition phase, which are then compared with the ground truth of the line image using the Levenshtein edit distance. CTC interprets the network output as a distributed probability over all potential label sequences for a given input sequence, facilitating a direct alignment between input variables and the target label. The final layer of the network, with N outputs where N represents the total number of labels, indicates the probability of each label being observed at a given time. This approach allows for the generation of a set of probabilities that potentially match outputs to a given input.
The Word Beam Search (WBS) decoding algorithm is utilized to decode the output matrix of the Connectionist Temporal Classification layer, positioned immediately after the CTC layers for output decoding. The WBS decoder offers several advantages over the token passing decoder, including faster processing speed, the ability to allow an arbitrary number of non-word characters between words such as numbers and punctuation marks, and the constraint of words by a dictionary.
Results and Discussion
The following subsections explore the influence of Generative Adversarial Networks (GANs) as a data augmentation method for Arabic handwriting recognition, specifically on the IFN-ENIT dataset. We conduct a thorough comparison of our system’s performance under diverse conditions, thereby shedding light on the nuanced impact of GAN-generated data on accuracy and robustness.
Dataset
In the field of handwritten Arabic text recognition research, the IFN/ENIT dataset, published by [56], is the most commonly utilized and recognized dataset. It comprises over 32,492 handwritten Arabic names of Tunisian cities, with the lexicon encompassing 937 names. For each word in the dataset, a Ground Truth (GT) file has been assembled, providing information about the word, including its baseline position and the specific characters employed.
The training process commences with the generator receiving a set of random seeds and transcriptions from a randomly selected word list. We selected random words from the dataset along with their transcriptions to generate a set of synthetic images. By altering the noise vector input into the network, we were able to generate various handwriting styles. Fig. 6 presents examples of synthesized words from the IFN/ENIT dataset showcasing the diversity of generated handwriting styles. For each style corresponding to the words in the lexicon, we produced images and consequently generated a total of 65,590 samples to train our baseline deep learning recognition network.
Results
To assess the outcomes, our model underwent three primary phases on IFN-ENIT database: initial training on the original data, application of traditional data augmentation methods in the second phase, and utilization of expanded dataset via GANs in the final phase. This process allowed us to compare the performance and draw conclusions to confirm the effectiveness of our approach. Table I presents the results.
Model | Data augmentation | RR (%) |
---|---|---|
None [52] | 91.69 | |
CNN-BLSTM/CTC | Traditional [52] | 94.58 |
GANs | 95.23 |
The comparative results reveal the impact of data augmentation techniques on the performance of the Arabic handwriting recognition system. The system’s performance without data augmentation serves as a baseline for comparison. The results are measured in terms of recognition rate (RR), which is the percentage of words that were correctly recognized. The GANS model achieved the highest RR of 95.23%. This is an improvement of 0.65% over the traditional model and 3.54% over the baseline model. The GANs model was able to generate more realistic training data, which led to better generalization on the test data.
To ascertain the impact of data augmentation using the previously described method, we executed a series of classification evaluation experiments. The effectiveness of GAN’s data augmentation and its fundamental principle were evaluated by integrating the generated data with the original training data. The classification process encompassed five variants, each one explored a different combination of training and testing data. Table II summarize the comparative results.
Variant | Training set | Test set | RR (%) |
---|---|---|---|
– | Original | Original | 91.96 |
1 | Original | Generated | 89.12 |
2 | Generated | Generated | 92.54 |
3 | Generated | Original | 87.17 |
4 | Original + Generated | Original | 95.23 |
5 | Original + Generated × 2 | Original | 94.55 |
The synthesis of the results offers a holistic understanding of the advantages and constraints of employing Generative Adversarial Networks for data augmentation in the context of the Arabic Handwriting Recognition System. It also underscores potential areas for enhancement in the system’s approach and modeling.
The significant improvement in the model’s accuracy on real IFN-ENIT images when combining original and GAN-generated data (scenario 4) validates the efficacy of GANs in mitigating data scarcity and overfitting. This implies an enhancement in recognition capabilities for real-world scenarios. The experiments-based training on real data and testing on GAN-generated images (scenario 1), demonstrate that GANs can generate images that are accurately classified by the model. However, this similarity also introduces the possibility of mistaking GAN-generated images for real ones in certain contexts. Similarly, Performance Dip when Testing GAN-trained Models on Real Data (scenario 3), the decrease in performance when testing GAN-trained models on real data exposed the inevitable differences between generated and real distributions. This implies that the prototype images produced by the GANs may not be adequate for training a robust classifier, which is expected to handle the variability present in the real images of the test dataset. Training and Testing Solely on GAN-generated Images (scenario 2), this experiment showcased the model’s capacity to learn the underlying patterns captured by the GAN, and how GAN images can be used for training a classifier. After adding GAN-generated data to the original training set, the classifier’s performance improved. However, when we added more GAN data (scenario 5), the accuracy decreased slightly. This shows a divergence in the model’s effectiveness after a certain saturation point. The optimal distribution between GAN-generated and real data can be ascertained through cross-validation.
This comprehensive analysis provides valuable insights into the potential of GANs data augmentation in the field of Arabic handwriting recognition and suggests areas for future research and development. GANs offer a powerful tool for Arabic handwritten recognition, expanding training data and boosting model performance. Future research should focus on approach to unlock the full potential of GAN-based data augmentation.
Conclusion and Perspectives
The research presented in this paper has demonstrated a significant advancement in the field of Arabic handwritten recognition. By integrating Generative Adversarial Networks for data augmentation within a robust CNN-BLSTM architecture. The proposed approach addresses the challenges of AHR, including the complexity of Arabic script and the limited availability of training data. The use of GANs for data augmentation has proven effective in generating realistic and diverse synthetic samples, thereby augmenting the original training corpus. The CNN-BLSTM network, coupled with connectionist temporal classification for accurate sequence modeling and recognition, has demonstrated its efficacy in improving AHR performance. The results of this research pave the way for more robust and accurate AHR systems. While the results of this study are promising, there are several potential avenues for future research. The GAN-based data augmentation technique could be further refined to generate even more realistic and diverse synthetic samples. Additionally, the CNN-BLSTM architecture could be optimized to further improve sequence modeling and recognition. The research also opens up the possibility of applying similar techniques to other languages and scripts that present similar challenges in terms of complexity and limited availability of training data. The integration of GANs for data augmentation within robust deep learning architectures could potentially revolutionize the field of handwritten recognition. Finally, the research underscores the broader implications of this study for the AHR domain. By addressing data limitations in AHR and contributing to the development of more robust and accurate recognition systems, this research presents a promising approach to data augmentation in AHR.
Looking ahead, the application of Natural Language Processing (NLP) techniques presents a promising direction for further improving AHR systems. NLP can contribute to the refinement of AHR in several ways:
- Semantic Understanding: NLP can help in understanding the context and meaning behind the handwritten text, which can be particularly useful for recognizing words that are visually similar but have different meanings.
- Language Modeling: Incorporating NLP-based language models can improve the prediction of word sequences, thereby enhancing the overall recognition process.
- Error Correction: Post-recognition, NLP can be employed to correct errors that are common in handwritten texts, such as misspellings or grammatical mistakes, by using context-aware algorithms.
By exploring these NLP techniques, future research can lead to the development of more sophisticated AHR systems that not only recognize the text with high accuracy but also understand and process the content effectively.
References
-
Ghosh T, Sen S, Obaidullah SM, Santosh KC, Roy K, Pal U. Advances in online handwritten recognition in the last decades. Comput Sci Rev. 2022;46. ISSN 1574-0137. doi: 10.1016/j.cosrev.2022.100515.
Google Scholar
1
-
Alqahtani AS, Madheswari AN, Mubarakali A, Parthasarathy P. Secure communication and implementation of handwritten digit recognition using deep neural network. Opt Quant Electron. 2023;55:27. doi: 10.1007/s11082-022-04290-7.
Google Scholar
2
-
Faizullah S, Ayub MS, Hussain S, Khan MA. A survey of OCR in Arabic language: applications, techniques, and challenges. Appl Sci. 2023;13:4584. doi: 10.3390/app13074584.
Google Scholar
3
-
Wahdan A, Al-Emran M, Shaalan K. A systematic review of Arabic text classification: areas, applications, and future directions. Soft Comput. 2024;28:1545–66. doi: 10.1007/s00500-023-08384-6.
Google Scholar
4
-
Omar IO, Haboubi S, Benzarti F. New architectural optical character recognition approach for cursive fonts: the historical Maghrebian font as an example. Int J Innov Comput Appl. 2023;14(1–2):91–103. doi:10.1504/IJICA.2023.129361.
Google Scholar
5
-
Al-Barhamtoshy HM, Jambi KM, Rashwan MA, Abdou SM. An arabic manuscript regions detection, recognition and its applications for OCRing. ACM Trans Asian Low-Resour Lang Inf Process. January 2023;22(1):28. doi: 10.1145/3532609.
Google Scholar
6
-
Alheraki M, Al-Matham R, Al-Khalifa H. Handwritten Arabic character recognition for children writing using convolutional neural network and stroke identification. Hum-Cent Intell Syst. 2023;3:147–59. doi: 10.1007/s44230-023-00024-4.
Google Scholar
7
-
Nahar KMO, Alsmadi I, Al Mamlook RE, Nasayreh A, Gharaibeh H, Almuflih AS et al. Recognition of Arabic air-written letters: machine learning, convolutional neural networks, and optical character recognition (OCR) techniques. Sens. 2023;23:9475. doi: 10.3390/s23239475.
Google Scholar
8
-
Najam R, Faizullah S. Analysis of recent deep learning techniques for Arabic handwritten-text OCR and post-OCR correction. Appl Sci. 2023;13:7568. doi: 10.3390/app13137568.
Google Scholar
9
-
Kizilirmak F, Yanıkog ̆lu B. Cnn-bilstm model for english handwrit- ing recognition: comprehensive evaluation on the iam dataset. 2022. doi: 10.21203/rs.3.rs-2274499/v1.
Google Scholar
10
-
Gader T, Chibani I, Echi A. Arabic handwriting off-line recognition using convLSTM-CTC. Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods- ICPRAM, pp. 529–33, SciTePress; 2023, ISBN 978-989-758-626-2; ISSN 2184-4313. doi: 10.5220/0011794700003411.
Google Scholar
11
-
Geetha M, Suganthe RC, Nivetha SK, Hariprasath S, Gowtham S, Deepak CS. A hybrid deep learning based character identification model using CNN, LSTM, and CTC to recognize handwritten english characters and numerals. 2022 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–6, Coimbatore, India, 2022. doi: 10.1109/ICCCI54379.2022.9740746.
Google Scholar
12
-
Bisht M, Gupta R. Offline handwritten devanagari word recognition using CNN-RNN-CTC. SN Comput Sci. 2023;4:88. doi: 10.1007/s42979-022-01461-x.
Google Scholar
13
-
Dash A, Ye J, Wang G. A review of generative adversarial networks (GANs) and its applications in a wide variety of disciplines: from medical to remote sensing. IEEE Access. doi: 10.1109/AC- CESS.2023.3346273.
Google Scholar
14
-
Iglesias G, Talavera E, Díaz-Álvarez A. A survey on GANs for computer vision: recent research, analysis and taxonomy. Comput Sci Rev. 2023;48. doi: 10.1016/j.cosrev.2023.100553. ISSN 1574- 0137.
Google Scholar
15
-
Wang R, Bashyam V, Yang Z, Yu F, Tassopoulou V, Chintapalli SS, et al. Applications of generative adversarial networks in neuroimaging and clinical neuroscience. NeuroImage. 2023;269:119898.ISSN 1053-8119. doi: 10.1016/j.neuroimage.2023.119898.
Google Scholar
16
-
He Y, Seng KP, Ang LM. Generative adversarial networks (GANs) for audio-visual speech recognition in artificial intelligence IoT. Inf . 2023;14:575. doi: 10.3390/info14100575.
Google Scholar
17
-
Gilloux M. Hidden Markov models in handwriting recognition. In Fundamentals in Handwriting Recognition. NATO ASI Series. vol. 124. Impedovo S. Ed. Berlin, Heidelberg: Springer, 1994. doi: 10.1007/978-3-642-78646-4_15.
Google Scholar
18
-
Bertolami R, Bunke H. Hidden Markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recognit. 2008;41(11):3452–460. ISSN 0031-3203. doi: 10.1016/j.patcog.2008.04.003.
Google Scholar
19
-
Plötz T, Fink GA. Markov models for offline handwriting recognition: a survey. IJDAR. 2009;12:269–98. doi: 10.1007/s10032-009-0098-4.
Google Scholar
20
-
Rabi M, Amrouch M, Mahani Z. Recognition of cursive Arabic handwritten text using embedded training based on HMMs. J Electr Syst Inf Technol. 2018;5(2):245–51. ISSN 2314-7172. doi: 10.1016/j.jesit.2017.02.001.
Google Scholar
21
-
Bengio Y, LeCun Y, Nohl C, Burges C. LeRec: a NN/HMM hybrid for online handwriting recognition. Neural Comput. Nov. 1995;7(6):1289–303. doi: 10.1162/neco.1995.7.6.1289.
Google Scholar
22
-
Rabi M, Amrouch M, Mahani Z. Hybrid HMM/MLP models for recognizing unconstrained cursive Arabic handwritten text. In Advanced Information Technology, Services and Systems. AIT2S 2017. Lecture Notes in Networks and Systems. vol. 25. Ezziyyani M, Bahaj M, Khoukhi F Eds. Cham: Springer, 2018. doi: 10.1007/978-3-319-69137-4_39.
Google Scholar
23
-
Wang ZR, Du J, Wang WC, Zhai JF, Hu JS. A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition. IJDAR. 2018;21:241–51. doi: 10.1007/s10032-018-0307-0.
Google Scholar
24
-
Alzubaidi L, Bai J, Al-Sabaawi A, Santamaría J, Albahri AS, Al- dabbagh BSN, et al. A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications. J Big Data. 2023;10:46. doi: 10.1186/s40537-023-00727-2.
Google Scholar
25
-
Bansal A, Sharma R, Kathuria M. A systematic review on data scarcity problem in deep learning: solution and applications. ACM Comput Surv. 2022;54(10):1–29. doi: 10.1145/3502287.
Google Scholar
26
-
Maroñas J, Paredes R, Ramos D. Generative models for deep learning with very scarce data. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science. vol. 11401. Vera- Rodriguez R, Fierrez J, Morales A Eds. Cham: Springer, 2019. doi: 10.1007/978-3-030-13469-3_3.
Google Scholar
27
-
Shorten C, Khoshgoftaar TM, Furht B. Text data augmentation for deep learning. J Big Data. 2021;8:101. doi: 10.1186/s40537-021-00492-0.
Google Scholar
28
-
Li B, Hou Y, Che W. Data augmentation approaches in natural language processing: a survey. AI Open. 2022;3:71–90. ISSN 2666- 6510. doi: 10.1016/j.aiopen.2022.03.001.
Google Scholar
29
-
Kumar T, Mileo A, Brennan R, Bendechache M. Image data augmentation approaches: a comprehensive survey and future directions. Comput Vis Pattern Recognit. 2023. doi: 10.48550/arXiv.2301.02830.
Google Scholar
30
-
Xu M, Yoon S, Fuentes A, Park DS. A comprehensive survey of image augmentation techniques for deep learning. Comm Com Inf Sci. 2023;137:109347. ISSN 0031-3203. doi: 10.1016/j.patcog.2023.109347.
Google Scholar
31
-
Hayashi T, Gyohten K, Ohki H, Takami T. A study of data augmentation for handwritten character recognition using deep learning. 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 552–57, Niagara Falls, NY, USA, 2018. doi: 10.1109/ICFHR-2018.2018.00102.
Google Scholar
32
-
Mitani Y, Fujita Y, Hamamoto Y. Augmentation on CNNs for handwritten digit classification in a small training sample size situation journal of physics: conference series. Journal of Physics: Conference Series, Volume 1922, 5th International Conference on Robotics and Machine Vision (ICRMV) 2021, vol. 1922, Seoul, South Korea, 26–28 February 2021. doi: 10.1088/1742-6596/1922/1/012007.
Google Scholar
33
-
Wigington C, Stewart S, Davis B, Barrett B, Price B, Cohen S. Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 639–45, Kyoto, Japan, 2017. doi: 10.1109/ICDAR.2017.110.
Google Scholar
34
-
Brown D, Lidzhade I. Handwriting recognition using deep learning with effective data augmentation techniques. International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), pp. 1–9, Durban, South Africa, 2021. doi: 10.1109/icABCD51485.2021.9519359.
Google Scholar
35
-
Eltay M, Zidouri A, Ahmad I, Elarian Y. Improving handwrit- ten Arabic text recognition using an adaptive data-augmentation algorithm. In Document Analysis and Recognition—ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science. vol. 12916, Barney Smith EH, Pal U Eds. Cham: Springer, 2021. doi: 10.1007/978-3-030-86198-8_23.
Google Scholar
36
-
Hamdi Y, Boubaker H, Alimi AM. Data augmentation using geometric, frequency, and beta modeling approaches for improving multi-lingual online handwriting recognition. IJDAR. 2021;24:283– 98. doi: 10.1007/s10032-021-00376-2.
Google Scholar
37
-
Alaasam R, Barakat BK, El-Sana J. Synthesizing versus augmentation for Arabic word recognition with convolutional neural networks. 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), pp. 114–18, Lon- don, UK, 2018. doi: 10.1109/ASAR.2018.8480189.
Google Scholar
38
-
Antoniou A, Storkey AJ, Edwards H. Data augmentation generative adversarial networks. ArXiv, abs/1711.04340. 2017.
Google Scholar
39
-
Deshpande T, Chavan K, Gandhi P, Mangrulkar R. Neurode- generative disease detection using deep convolutional GANs and CNN. 2023 IEEE 8th International Conference for Convergence in Technology (I2CT), pp. 1–7, 2023.
Google Scholar
40
-
Nagasawa T, Sato T, Nambu I, Wada Y. fNIRS-GANs: data augmentation using generative adversarial networks for classifying motor tasks from functional near-infrared spectroscopy. J Neural Eng. 2020 Feb;17(1):016068. doi: 10.1088/1741-2552/ab6cb9.
Google Scholar
41
-
Alwaqfi YM, Mohamad M, Al-Taani AT. Generative adversarial network for an improved arabic handwritten characters recognition Int. Int J Advance Soft Comput Appl. March 2022;14(1). Print ISSN: 2710-1274, Online ISSN: 2074-8523. doi: 10.15849/IJASCA.220328.12.
Google Scholar
42
-
Eltay M, Zidouri A, Ahmad I, Elarian Y. Generative adversarial network based adaptive data augmentation for handwritten Arabic text recognition. PeerJ Comput Sci. 2022;8:e861. doi: 10.7717/peer- j-cs.861.
Google Scholar
43
-
Mustapha IB, Hasan S, Nabus H, Shamsuddin SM. Conditional deep convolutional generative adversarial networks for isolated handwritten Arabic character generation. Arab J Sci Eng. 2022;47:1309–20. doi: 10.1007/s13369-021-05796-0.
Google Scholar
44
-
Jemni SK, Souibgui MA, Kessentini Y, Fornés A. Enhance to read better: a multi-task adversarial network for handwritten document image enhancement. Comm Com Inf Sc. 2022;123:108370. ISSN 0031-3203. doi: 10.1016/j.patcog.2021.108370.
Google Scholar
45
-
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks. Commun ACM. 2020;63(11):139–44. doi: 10.1145/3422622.
Google Scholar
46
-
Alonso E, Moysset B, Messina RO. Adversarial generation of hand-written text images conditioned on sequences. 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 481–86, 2019.
Google Scholar
47
-
Jha G, Cecotti H. Data augmentation for handwritten digit recognition using generative adversarial networks. Multimedia Tools Appl. Dec 2020;79:47–8. doi: 10.1007/s11042-020-08883-w.
Google Scholar
48
-
Fogel S, Averbuch-Elor H, Cohen S, Mazor S, Litman R. Scrabble- GAN: semi-supervised varying length handwritten text generation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4323–332, 2020.
Google Scholar
49
-
Jemni SK, Ammar S, Kessentini Y. Domain and writer adaptation of offline Arabic handwriting recognition using deep neural networks. Neural Comput & Applic. 2022;34:2055–71. doi: 10.1007/s00521-021-06520-7.
Google Scholar
50
-
Maalej R, Kherallah M. Convolutional neural network and BLSTM for offline Arabic handwriting recognition. 2018 Inter-national Arab Conference on Information Technology (ACIT), pp. 1–6, Werdanye, Lebanon, 2018. doi: 10.1109/ACIT.2018. 8672667.
Google Scholar
51
-
Rabi M, Amrouch M. Convolutional Arabic handwriting recogni- tion system based BLSTM-CTC using WBS decoder. Int J Intell Syst Appl Eng (IJISAE). 2024. ISSN: 2147-6799.
Google Scholar
52
-
Le Cun Y, Kavukcuoglu K, Farabet C. Convolutional networks and applications in vision. International Symposium on Circuits and Systems, pp. 253–56, May 2010.
Google Scholar
53
-
Amrouch M, Rabi M, Es-Saady Y. Convolutional feature learning and CNN based HMM for Arabic handwriting recognition image and signal processing. 8th International Conference, ICISP 2018, Proceedings Jul 2018, pp. 265–74, Cherbourg, France, July 2–4, 2018. doi: 10.1007/978-3-319-94211-7_29.
Google Scholar
54
-
Graves A. Generating sequences with recurrent neural networks. ArXiv abs/1308.0850. 2013.
Google Scholar
55
-
Pechwitz M, Snoussi Maddouri S, Märgner V, Ellouze N, Amiri N. IFN/ENIT-database of handwritten arabic words. The 7th Colloque International Francophone sur l’Ecrit et le Document, CIFED 2002. Hammamet, Tunis, Oct. 21–23, 2002.
Google Scholar
56