Hybrid Model for Motor Imagery Biometric Identification

: Biometric systems are a continuously evolving and promising technological domain that can be used in automatic systems for the unique and efficient identification and authentication of individuals without necessitating users to carry or remember any physical tokens or passwords, in contrast to traditional methods such as password IDs. Biometrics are biological measurements or physical characteristics that can be used to ascertain and validate the identity of individuals. Recently, considerable interest has emerged in exploiting brain activity as a biometric identifier in automatic recognition systems, particularly focusing on data acquired through electroencephalography (EEG). Multiple research endeavors have indeed confirmed the presence of discriminative characteristics within brain signals recorded while performing specific cognitive tasks. However, EEG signals are inherently complex due to their nonstationary and high-dimensional properties, thus demanding careful consideration during both the feature extraction and classification processes. This study applied a hybridization technique integrating a pre-trained convolutional neural network (CNN) with a classical classifier and the short-time Fourier transform (STFT) spectrum. We used a hybrid model to decode two-class motor imagery (MI) signals for mobile biometric authentication tasks, which include subject identification and lock and unlock classification. To this purpose, nine potential classifiers (mostly classification algorithms) were utilized to build nine distinct hybrid models, with the ultimate goal of selecting the most effective one. Practically, six experiments were conducted in the experimental part of this study. The first experiment aims to develop a hybrid model for biometric authentication tasks. To do this, nine possible classifiers (mostly classification algorithms) were used to build nine hybrid models. It can be seen that the RF-VGG model achieved better performance compared with other models. Therefore, it was chosen to be utilized for mobile biometric authentication. The fourth experiment is to apply the RF-VGG model for doing the lock and unlock classification process, and their mean accuracy is 97.50%. Consequently, the fifth experiment was conducted to validate the RF-VGG model for the lock and unlock task, and their mean accuracy was 97.40%. Practically, the sixth experiment was to verify the RF-VGG model for the lock and unlock task over another dataset (unseen data), and their accuracy is 94.4%. It can be deduced that the hybrid model appraises the capability of decoding the MI signal for the left and right hand. Therefore, the RF-VGG model can contribute to the BCI-MI community by facilitating the deployment of the mobile biometric authentication task for (the subject identification and the lock and unlock classification).


INTRODUCTION
In the realm of information systems, identity authentication holds an important guarantee as a fundamental pillar of system security.However, traditional biometric identification technologies exhibit different degrees of security challenges, with biometric data being susceptible to theft and replication [1].Research into brain biometrics has shown that electroencephalogram (EEG) signals present a more secure avenue for authentication.The uniqueness, persistence, universality, and resilience of EEG signals against fraudulent activities offer the potential to establish highly secure biometric systems [2].EEG emerges as a potentially superior biometric modality due to its possession of unique attributes not shared by other modalities, such as fingerprints, retina scans, and face recognition [3], notably in terms of resistance against forgery, compliance with privacy regulations, aliveness detection, and its multifaceted uses as cognitive biomarkers [4].Electroencephalography (EEG) signals are derived from the scalp and measure postsynaptic brain activities through lightweight and non-invasive devices [5,6].Owing to their exceptional high temporal resolution and rich dynamic characteristics [5], EEG signals are considered one of the most promising biological signals for various biometric applications, surpassing other modalities such as electromyography (EMG) and electrocardiography (ECG) [7].Recently, there has been a surge of interest in electroencephalography (EEG) and motor imagery (MI) signals, as these signals encode an individual's intent in performing an action [8].Researchers have used MI signals to help individuals with disabilities control devices such as wheelchairs [9] and even autonomous vehicles [10].Additionally, they are used in [11].Decoding MI-EEG signals poses a formidable challenge due to their complexity, dynamic nature, and low signal-to-noise ratio [12].Therefore, MI pattern recognition systems require three essential processes: pre-processing of the EEG signal, feature extraction, and classification [13].Essentially, feature extraction stands out as the pivotal process in the MI-EEG pattern recognition model.
The time-frequency representation (TFR) of motor imagery (MI) features constitutes a widely employed technique in the context of brain-computer interface (BCI) applications, primarily for classification purposes.This representation describes the distribution and power of signal energy across distinct time intervals and frequencies by formulating a combined function of time and frequency [10].The MI signal primarily possesses a one-dimensional nature, necessitating its transformation into two-dimensional images, a task effectively achieved through the utilization of Continuous Wavelet Transform (CWT) and Short-Time Fourier Transform (STFT) techniques.The aforementioned approaches demonstrate high efficiency and proficiency in handling the signal characteristics, both in the time and frequency domains [11].However, in cases involving short time intervals, electroencephalograms (EEGs) are often considered non-stationary signals.In such instances, the Short-Time Fourier Transform (STFT) method seems a viable approach for extracting and calculating the spectrum of the brain signal within the time-frequency domain [12].Furthermore, the Short-Time Fourier Transform (STFT) method offers the advantage of providing simultaneous information regarding the time-frequency domain while maintaining a relatively low processing cost [13].Similarly, the convolutional neural network (CNN) has also exhibited its ability to extract spatial and temporal features from magnetic induction (MI) data.Previous research has established that convolutional neural networks (CNNs) have the capability to extract highly effective features through the utilization of both shallow and deep models [14], implying that valuable features can be obtained at various levels of the network architecture [8].Furthermore, the utilization of deep transfer learning techniques facilitates the seamless incorporation of novel datasets into a trained pre-existing machine learning model.This capability proves particularly advantageous in brain-computer interface (BCI) systems, where the quantity of available data often falls short of guaranteeing adequate model training [14].The findings of the BCI research revealed that subject-transfer techniques based on convolutional neural networks (CNN) outperformed alternate methods.These subject-transfer methods are derived from the concept that the normal patterns exhibited by the target subject and other subjects can be similar while engaging in identical activities [15].
Since the classification method plays a major role and has a direct impact on distinguishing between two MI-EEG mental commands, the selection of an appropriate classifier becomes a matter of paramount importance.The classical machine learning methods necessitate the use of handcrafted features to perform classification.However, deep convolutional neural networks (DCNN) perform classification by directly extracting features from raw data [15].
Previous studies, such as [16,17], have used pretrained CNNs combined with a classical machine learning algorithm for classifying computer vision challenges [18].Additionally, studies such as [19] have applied the same technique to detecting epileptic seizures.Therefore, this study seeks to employ a hybrid approach [20], merging pretrained CNNs with a classical machine learning algorithm, to decode two-class MI signals for EEG biometric recognition.For this purpose, the STFT method is used to generate 2D images (spectrograms) from a 4-second-long trial, with the aim of capturing both ERS and ERD motor activity.This process results in six images related to the alpha and beta bands, extracted from a single EEG channel.The VGG-16 model is then applied to extract features from the motor imagery signals, and the classification step is completed by fusing these features with the classifier.The paper is structured as follows: Section 2 outlines the methodology; Section 3 elaborates on the results and discussions; and Section 4 presents the findings and conclusions of this research.

Methodology
The methodological framework of the hybrid model for decoding the MI signal is presented in Figure 1.This framework describes the whole process of detecting the MI pattern and the details of the model elaborated in the following subsections: -

MI EEG Datasets
Developers typically choose to utilize a minimal number of channels when creating brain-computer interface (BCI) systems.This approach allows for easier implementation and reduces costs associated with real-time applications [14].Hence, this study selects two MI EEG datasets that were recorded using three channels.The two datasets utilized in this study were sourced from the BCI competition datasets, specifically recorded at Graz University.Additional information regarding the two datasets is provided in the subsequent subsections.The datasets comprise two separate parts, specifically the training part and the evaluation part.Consequently, the hybrid model was employed for the purpose of assessing inter and intra-subject variances [21].Due to the limited availability of a substantial dataset, it was necessary to merge the datasets of all nine participants in order to create a comprehensive dataset encompassing all trials.This approach was undertaken to construct a resilient model capable of effectively addressing the intricate challenges associated with brain complexity.

Dataset-I (BCI IV 2b dataset)
The dataset included three electroencephalogram (EEG) channels, specifically C3, Cz, and C4, to record the signals related to two separate motor imagery tasks, especially the movement of the left hand and the movement of the right hand.The dataset was acquired from a sample comprising of nine persons, with a sampling frequency of 250 Hz.A series of 160 trials were performed to collect electroencephalogram (EEG) data from a participant who was positioned in an armchair and instructed to focus on a flat screen.Two separate recording sessions were conducted, namely training sessions without feedback and evaluation sessions with smiley feedback.During the first two sessions, the participants were presented with a brief auditory cue in the form of a warning tone.This cue served as a trigger for the participants to engage in a motor imagining exercise, which lasted for a period of four seconds.The objective of this activity was to engage in a cognitive simulation of a certain movement as directed by an auditory cue in the form of a pointing arrow presented on a featureless display.In the ensuing three sessions, participants were given explicit instructions on how to change grey smiley feedback, which had been placed at the center of the monitor.They were instructed to move the feedback either towards the right or left direction after a short auditory cue was presented.The feedback, represented by a smiling face, is promptly presented within a duration of four seconds.The smiling face gets transition to the color red when it varies from the correct direction, but it changes to the color green when it moves in the right direction [15].

Dataset-II (BCI II dataset)
The dataset was acquired from one subject, namely a healthy female individual who was 25 years old.The electroencephalography (EEG) device consisted of three EEG channels, namely C3, Cz, and C4, operating at a sampling frequency of 128 Hz.Each trial has a total duration of 9 seconds.The dataset was collected utilizing the grazing process, in which participants observed a period of stillness for the first 2 seconds.At a time of 2 seconds, a visible stimulus was presented on the screen, indicating the initiation of the experimental session, accompanied by the appearance of a cross sign '+' for a period of 1 second.At the temporal point t=3 seconds, a visual stimulus is shown in the form of an arrow indicating a direction, either to the right or to the left.The research conducted a comprehensive analysis comprising 280 trials in order to evaluate the efficacy of motor imagery in hand movements, with particular emphasis on both right-and left-hand movements.The entire signal of the dataset passes notch filtering in the frequency range of 0.5 to 30 Hz. [16].

Pre-processing
The EEG-MI signal is exposed to contamination from various sources, including body movements, eye blinking, facial muscle activity, and artifacts from the surrounding environment, such as electromagnetic fields produced by electrical devices [17].Minimal preprocessing approaches are employed due to the utilization of deep learning within the framework.The utilization of frequency filtering is employed to enhance the ratio of signal to noise in the raw EEG data and to enhance the relevant information present within the signals.The utilization of a fourth-order Butterworth filter is specifically employed within the frequency range of 8-30 Hz, taking into consideration the dependence of motor imagery (MI) electroencephalogram (EEG) data on the alpha (8-13 Hz) and beta (14-30 Hz) rhythms.

VGG-16
The classification problem of EEG signals requires high dimensional features for representing the latent features of the brain signal.CNN depends on convoluting process in extracting dominant features by adopting a number of kernels (also known as filters) [15,22].A convolutional neural network is a type of artificial neural network that uses multiple perceptron's that analyzes image inputs and has learnable weights and bases for several parts of images and are able to segregate each other [23,24].Transfer learning is the method of using the knowledge of weights and its layers of an existing model to a new untrained model and speeds up the learning of a new model [25,26].CNN with transfer learning helps to identify the various affect states very accurately thereby improving the interaction between humans and computers [27].We could further probe if the same technique of using CNN with transfer learning can help solve the problem of occlusions [28].VGG stands for Visual Geometry Group; it is a standard deep Convolutional Neural Network (CNN) architecture with multiple layers.The "deep" refers to the number of layers with VGG-16 consisting of 16 convolutional layers [23].The VGG network architecture was initially proposed by Simonyan and Zisserman.The VGG models with 16 layers (VGG16) were the basis of their ImageNet Challenge 2014 submission, where the Visual Geometry Group (VGG) team secured the first and the second places in the localization and classification tracks respectively [29].VGG16 is one of the most common deep learning architectures introduced by Oxford University [30].we will employ one of the pretrained model -VGG-16 to classify image and check accuracy for training data and validation data [31].We use VGG16 because VGG16 has a smaller network architecture and easy to implement [32].The first and second convolutional layers are comprised of 64 feature kernel filters and size of the filter is 3×3.As input image (RGB image with depth 3) passed into first and second convolutional layer, dimensions changes to 224x224x64.Then the resulting output is passed to max pooling layer with a stride of 2.
The third and fourth convolutional layers are of 124 feature kernel filters and size of filter is 3×3.These two layers are followed by a max pooling layer with stride 2 and the resulting output will be reduced to 56x56x128.
The fifth, sixth and seventh layers are convolutional layers with kernel size 3×3.All three use 256 feature maps.These layers are followed by a max pooling layer with stride 2.
Eighth to thirteen are two sets of convolutional layers with kernel size 3×3.All these sets of convolutional layers have 512 kernel filters.These layers are followed by max pooling layer with stride of 1.
Fourteen and fifteen layers are fully connected hidden layers of 4096 units followed by a SoftMax output layer (Sixteenth layer) of 1000 units [31].

Short Time Fourier Transform for EEG Image Formulation
The short-time Fourier transform (STFT), developed by Gabor in 1946, is one of the most extensively utilized signal processing algorithms for analyzing non-linear and non-stationary signals.It has the ability to specify the phase and magnitude of a raw signal that changes with time and frequency [33].It separates a lengthy signal into segments with the same window size and applies the Fourier transform to each segment [34].It's a type of advanced Fourier analysis in which a signal is presented in such a way that it may be completely estimated in both domains.STFT employs a window function to cut out a section of the time domain signal, then applies the Fourier transform to the cut-out portion to identify various aspects of the signal [35].The processed EEG signal x(t) is multiplied by a short time window that slides along the time axis for STFT.The result is a set of windowed signal segments.Finally, each windowed signal segment is applied to the Fourier transform, yielding in two-dimensional time-frequency spectrums of the raw signal.STFT is defined mathematically as follows [36]: In equation ( 1), w(t)and τ represents a fixed window size with a limited number of non-zeros on the time axis respectively.STFT method helps in understanding the embedded EEG signal features by consider the signal in two domains namely, the time domain and the frequency domain concurrently.The raw MI signals are defined as E = {(Xi, yi)|i = 1, 2,..., N}, where Xi ∈ RC×K is a two-dimension matrix that represents the i-th MI trial in the dataset for a given C channels and K samples.The total number of samples in the dataset denoted as N, and the Xi corresponds to the total number of trials, and Yi corresponds to the label for each Xi trial.They get their values from L set that compromises M classes MI tasks.The total number of classes in this study is two classes, and their label set denoted as: L = {l1 = "left t", l2 = "right"}.Studies such as [37] reported the efficiency of the STFT for creating 2D images (spectrograms) for 4 s length to be fed then to the CNN as an input image.Because of this, a length of four seconds was chosen, which corresponds to a total of one thousand samples for each of the MI signals in the Xi trial.Then we select a window size of 64 samples with 50 samples of an overlapping.The output of this process is in image capturing the power spectral density (PSD) of any given MI signal and their values measured in Hertz.Therefore, three images are produced for data collected using three electrodes.But, in this study, we aim to capture the alpha and beta frequency bands corresponding to the ERS and ERD motor activity, therefore, the output of this process is six images for each MI trail.

Results and Discussion
This section presents and discusses the result of developing the hybrid model for motor imagery biometric identification.This model will be used for the purpose of the mobile biometric authentication task (the subject identification, and the lock and unlock classification).Basically, they describe the results for six experiments.The primary objective of the initial experiment is to construct a hybrid model that may be utilized for the task of biometric authentication.In order to accomplish this task, a total of nine potential classifiers, predominantly consisting of widely employed classification methods, were employed to construct nine hybrid models.The performance metrics of the aforementioned models are displayed in Table 1.Based on the performance metrics obtained from these hybrid models, it is observed that the hybrid model utilizing the RF classifier and AdaBoost classifier exhibits the most favorable outcomes.Specifically, this model achieves a classification accuracy of 0.966, F1 score of 0.966, precision of 0.966, recall of 0.966, AUC of 0.    2, indicating that the average accuracy for both lock and unlock tasks across the nine participants is 97.50%.As a result, the fifth experiment was carried out in order to validate the RF-VGG model for the task of lock and unlock.
The experiment was conducted on dataset I during the evaluation part.Additionally, this will aid in the assessment of the model's ability to mitigate the issue of complex brain signal variations within subjects.The outcome of the conducted experiment is clearly shown in Table 3, showing the average accuracy rates for both the lock and unlock tasks across the nine participants, which stands at 97.40%.The primary objective of the sixth experiment was to assess the efficacy of the RF-VGG model in performing the lock and unlock task using a distinct dataset that had not been before observed.The dataset comprises data for a single participant, divided into two distinct sections: the training portion and the testing portion.These two parts represent data that was gathered during two different sessions.The outcome of the RF-VGG model is 94.4 and 92.8 for the classification accuracy on the training and testing parts, respectively, as provided in Table 4 with their performance metrics.It is noticeable that when examining the accuracy of the RF-VGG model across dataset-I and dataset-II, the model demonstrates a notable level of effectiveness, surpassing the accuracy of the existing literature, as indicated in Table 5 and Table 6.This study evaluates the effectiveness of the proposed model in interpreting the brain signals associated with motor imagery for both the left and right hand.The findings of this research can potentially be applied in the domain of mobile biometric authentication for subject identification and the classification of lock and unlock actions.This hybrid model will make a valuable contribution to the BCI-MI community by enhancing the implementation of the suggested model within a biometric authentication system that relies on MI.

Conclusion
This study tried to apply the hybridization technique using VGG-16 with a classical classifier This technique is identified as a hybrid model and it was used for decoding two class MI signals for the mobile biometric authentication task (the subject identification, and the lock and unlock classification).To do this, nine possible classifiers (mostly used classification algorithms) were utilized to build nine hybrid model and then choosing the best one.Practically, sixth experiments were conducted in the experimental part of this study.The purpose of the first experiment is to develop a hybrid model for biometric authentication task.To do this, nine possible classifiers (mostly used classification algorithms) were utilized to build nine hybrid models.It can be seen that, the RF-VGG model achieved better performance compared with other models.Therefore, it was chosen to be utilized for the mobile biometric authentication.The fourth experiment is to apply the RF-VGG model for doing the lock and unlock classification process and their mean accuracy is 97.50%.Consequently, the fifth experiment conducted to validate the RF-VGG model for the lock and unlock task and their mean accuracy 97.40%.Practically, the sixth experiment was to verify the RF-VGG model for the lock and unlock task over another dataset (unseen data) and their accuracy is 94.4%.It can be concluded that the RF-VGG model can contribute to the BCI-MI community by facilitating the deployment for the mobile biometric authentication task for (the subject identification, and the lock and unlock classification).

Figure: 4
VGG-16 model architecture -13 convolutional layers and 2 Fully connected layers and 1 SoftMax classifier The precise structure of the VGG-16 network shown in Figure2is as follows:

FIGURE 3 .
FIGURE 3. -Confusion matrix for subject identification over the (Dataset-I)The outcome of the aforementioned experiment, specifically the evaluation of Data set I, is depicted in Figure2.The confusion matrix obtained from this evaluation is provided above.The classification accuracy, F1 score, precision, recall, and area under the curve (AUC) for the model are 0.922, 0.922, 0.922, 0.922, and 0.986, respectively.The log loss is 0.748, and the specificity is 0.990.The training time for the model is 22.949, and the testing time is 8.036.In order to assess the efficacy of the RF-VGG model in accurately identifying subjects, the third experiment was carried out.The confusion matrix for the evaluation of Data set I and Data set II in this experiment is provided in Figure3.The classification accuracy, F1 score, precision, recall, and area under the curve (AUC) for the given model are 0.932, 0.932, 0.932, 0.932, and 0.988, respectively.The log loss is 0.726, specificity is 0.992, and the training and testing times are 22.011 and 7.409, respectively.By comparing the outcomes of the three trials conducted for subject identification, it can be inferred that the RF-VGG model possesses the capability to accurately identify individuals based on their brain signals.

FIGURE 4 .
FIGURE 4. -Confusion matrix for subject identification over the (Dataset-I +Dataset-II)The fourth experiment involves the utilization of the RF-VGG model to perform the classification procedure for lock and unlock tasks.The experiment was conducted on dataset I, namely on the training part, which included a total of nine people.This will aid in assessing the ability of the model to cope with the complexities of intra-subject brain signal complexity.The complexities of brain signals vary among various subjects.The outcome of this experiment is effectively displayed in Table2, indicating that the average accuracy for both lock and unlock tasks across the nine participants is 97.50%.

Table 1 . -Identification of Training part over Dataset-I.
The RF-VGG model demonstrated superior performance in terms of Training Time and Testing Time when compared to the hybrid model utilizing the AdaBoost classifier.Hence, the RF-VGG model has been selected for use in mobile biometric authentication.In order to assess the efficacy of the RF-VGG model in identifying subjects, a second experiment was carried out.