Driver Drowsy and Yawn System Alert Using Deep Cascade Convolution Neural Network DCCNN

encoding mechanism for the separate parts of an image. By employing this approach, the network's ability Abstract: Driver drowsiness and fatigue are widely recognized as prevailing factors contributing to motor vehicle collisions. Annually, there is a significant increase in the number of deaths and fatalities due to a multitude of factors. The implementation of a driver warning system is imperative in order to mitigate traffic accidents, ultimately resulting in the preservation of human lives and public infrastructure. In this article, DDDS utilizing eye measure and head measure ensued developed as well as implemented. The DDDS was designed utilizing a high-resolution camera and employs Deep Cascaded Convolutional Neural Networks (DCCNN) to accurately identify instances of driver drowsiness. The Driver's Conduct Classification Neural Network (DCCNN) relies on the assessment of the driver's behavior through the analysis of visual cues, specifically eye movements, to determine whether the eyes are closed or open. The facial landmarks of the driver's frontal view were


INTRODUCTION
The prevalence of automobiles on roadways has exhibited a concerning upward trend, paralleling the growing significance of automobile transportation within contemporary society.The advent of novel technologies has engendered diverse prospects for the detection and assessment of the driver's physical well-being.Drowsiness can arise from various factors, such as alcohol consumption, fatigue, psychological strain, sleep deprivation, and habitual patterns.Deep learning, a methodology rooted in historical patterns, is utilized for the specific task of detecting ocular motion and distinguishing between closed and open eye states.In order to achieve this objective, the utilization of descriptors is employed to analyze the ocular behavior of passengers within moving vehicles, with the aim of extracting their physiological characteristics [1].In the pre-production phase, the visual attributes are extracted from a film that portrays the functioning of an AI-driven vehicle, and subsequently employed to identify noteworthy resemblances [2].Deep learning techniques [3][4][5][6] are utilized to perform the segmentation of the ocular image into smaller regions.These regions are then organized into a cohesive feature vector, allowing for the extraction of relevant features.
This investigation used deep learning methods [7][8][9][10] to examine crash scene footage in an effort to positively identify the driver based on their likeness.Following the completion of this process, a machine learning methodology is employed to accurately identify the exact positions of the irises.After that, these areas are divided into individual pixels [11][12][13][14].Examining a driver's ocular indicators is a reliable way to gauge the level of alertness or sleepiness being displayed by the driver.If the measured value surpasses the predetermined threshold and the driver's eyelids consistently remain closed, an audible alert will be produced.Pupils are searched for via frame-by-frame analysis.The monitoring process yields no results if the eyes are kept open throughout it [15].Recognizes to technical progress, a cheap device with the potential to instantly alert drivers is now feasible, greatly improving their safety.
However, when discussing neural networks specifically designed for pattern and image recognition, the term "deep convolutional neural network" (DCCNN) is used to denote a unique network architecture.This approach enhances the precision of object recognition and spatial perception within the realm of visual input by integrating the fundamental concepts of convolutional neural networks (CNNs) and convolutional perceptual networks (CapsNets).DCCNNs use capsules as an encoding mechanism for the separate parts of an image.By employing this approach, the network's ability Abstract: Driver drowsiness and fatigue are widely recognized as prevailing factors contributing to motor vehicle collisions.Annually, there is a significant increase in the number of deaths and fatalities due to a multitude of factors.The implementation of a driver warning system is imperative in order to mitigate traffic accidents, ultimately resulting in the preservation of human lives and public infrastructure.In this article, DDDS utilizing eye measure and head measure ensued developed as well as implemented.The DDDS was designed utilizing a high-resolution camera and employs Deep Cascaded Convolutional Neural Networks (DCCNN) to accurately identify instances of driver drowsiness.The Driver's Conduct Classification Neural Network (DCCNN) relies on the assessment of the driver's behavior through the analysis of visual cues, specifically eye movements, to determine whether the eyes are closed or open.The facial landmarks of the driver's frontal view were extracted using the Landmarks module from the Dlib toolkit.A novel parameter, referred to as "Eyes Aspect Ratio," has been identified through the analysis of landmarks associated with the eyes.The output of the Deep Cascaded Convolutional Neural Network (DCCNN) was subsequently employed to initiate a notification on the drive enclosure.In the experiment, an image with a resolution of 450320 pixels was utilized.The video is presented with a resolution of sixty frames per seconds (f/s).In relation to the accuracy of sleepiness detection, the present research has surpassed various previous studies.112 to comprehend and evaluate the spatial interactions among these elements is enhanced.The described architectural layout has many uses, including pose estimation, object recognition, and image segmentation, all of which benefit greatly from its implementation.Accurate results in these areas can only be attained by its incorporation of hierarchical structures and spatial information.

LITERATURE REVIEW
The availability of video cameras and digital media storage has expanded greatly due to the quickening pace of technological development.There has been a dramatic increase in the amount of recorded video that is being archived around the world.When it comes to effectively processing video data, the importance of autonomous analysis and comprehension is growing.Using a mental process called visual object tracking is essential for making sense of video content [16].The method involves recognizing and keeping record of the spatial position as well as shifts of one or more moving components within each frame of the film.
One way to increase trust in the results is to examine the impact of track characteristics with a larger sample of drivers, as suggested in [17].According to [18], we ought to create unique detection models, use cutting-edge technologies, and look into the possibility of autonomous vehicles.To better understand how fatigue detection systems work, field trials that focus on real-world applications are recommended [19].
In [20], a plan is presented for enhancing the reliability and productivity of detection systems.To optimize the system's performance, it would be beneficial to conduct further research into the effect that hand position has on detection, as suggested in [21] and [22].The paper "Larger Datasets and Real-World Performance" proposes doing so in more realistic, real-world settings with larger data sets to test and measure the effectiveness of systems.
Multiple data sources, as well as larger samples, are recommended for research projects in order to improve detection accuracy [23,24].A new approach to information detection [25] is proposed, which not only improves upon existing data collection methods but also introduces some novel ones.
To keep up with the rising demand for Brain-Computer Interface (BCI) systems, it will be necessary to implement strategies that reduce user anxiety and make the necessary adjustments to BCI systems [26].The current concept places an emphasis on using data collected in real-world settings to verify the efficiency of detection mechanisms.In this context, improving brain-computer interfaces (BCIs) [27] is primarily interested in reducing user pain and accurately assessing detection systems using real-world data.
In [28], we see a variety of techniques for accelerating detection and inference by means of hardware.The literature [29] recommends heavily that extensive testing and assessment be performed on hardware platforms to determine the efficacy and dependability of detection methods.The generalizability of models can be enhanced in two ways, as suggested by [30]: by increasing the size of the dataset and by addressing generalization challenges.The primary objectives of [31] are to enhance model accuracy, integrate new methods, expedite the development of prototype detection systems, and explore new technological possibilities.
According to another study [32], more extensive studies, experiments with simulated vehicles, and better detection methods need to be implemented.In addition, it is suggested to use more complex algorithms, add more measures and variables, and increase the size of the sample [33].
The study's primary goals are to accurately detect fatigue, develop standardized measures, and evaluate systems under realistic driving conditions [34].Study [35] proposes a wide range of solutions, such as the creation of comprehensive datasets, the improvement of data aggregation methods, the development of real-time detection capabilities, and the design of low-cost detection environments.These suggestions, taken as a whole, provide a guide for improving traffic safety and efficiency via the study of driver fatigue and drowsiness.

METHODOLOGY
This methodology offers techniques for data extraction from images, which facilitates the assembly of a useful dataset.A common technique is used to separate the continuous video stream into discrete frames.An impressive combination of Deep Learning and OpenCV techniques, with a focus on locating the eyes' regions of interest (ROIs), was required to distinguish between individual eyes.The algorithm will not activate if a thorough frame-by-frame analysis reveals that the subject's pupils are constricted.For instance, if a driver closes their eyes for an extended period of time and the resulting measurement is higher than a predetermined limit, this could be taken as strong evidence of driver fatigue.As a result, the simulator will issue a system-wide alert.Passengers and bystanders can use the aforementioned techniques to figure out if the driver is distracted.

Face Recognition
Many people believe that the face is the most important factor in recognizing individuals because it is the most prominent and easily accessible feature of a person's physical appearance [36].However, understanding this phenomenon is challenging because it is dynamic and subject to change due to a number of factors.Due to these unique characteristics, the face is often the only means of verification.The human face is a unique anatomical system due to its complexity and its capacity to quickly adapt to novel situations.The object under examination has a rough and uneven epidermis that makes it impossible to draw any firm conclusions about it.Facial pigmentation has been shown to rapidly change in response to emotional states like embarrassment and temperature changes.Sweat levels alter the skin's reflectivity, which accounts for these color changes.The complexity of this process is increased by factors such as individual variation in hair growth and removal, the appearance of wrinkles and sagging skin as a result of normal aging, and changes in skin color as a result of sun exposure [37].It gets trickier to recognize someone after they've undergone a drastic change in their physical appearance.Bandages and dressings are examples of temporary fixes, while makeup, jewelry, and piercings are examples of more permanent changes.
The ability to operate independently, without the need for expensive or specialized hardware, is just one of the many benefits of using facial recognition technology in access control systems.It is possible to build a highly reliable facial recognition system using only a camera and a regular computer.Due to its low entry barrier and broad applicability, the technology has the potential to revolutionize the security industry, particularly in areas like criminal identification and vehicle monitoring.Because of its flexibility, facial recognition technology excels in novel settings where more established methods of identification fail to cut it.

Ear and MAR measurement for DDDS
A new technique has been developed to detect signs of fatigue in real-world videos.The Eye Aspect Ratio (EAR) has been identified as a key feature for determining drowsiness in a video frame.Ocular location is determined by using six individual anatomical landmarks (p1 through p6 in this method).Distance between points p1 and p4 is represented by the horizontal line, while distance between p2 and p3 and p6 and p5 is represented by the vertical line.The vertical line in the diagram, representing eye level, is drawn to scale [38].This procedure is illustrated in Figure 2 .However, the length of the vertical line changes depending on the direction in which the eyes are looking.The diagnosis of lethargy requires determining the ratio of these lines.Although it is well established that visual acuity is maintained, this percentage has remained relatively stable.However, when fatigue sets in, that percentage rapidly approaches zero.Using the Open CV libraries, we first pinpoint the face, the eye region, and the individual eyes.The framework accepts photos as input and produces an image that includes labels for both the recognized face and the boolean region associated with it as output.Before applying the aforementioned outcome to the actual system, it is first tested virtually.Our top priority is to find cases of drowsy driving.Several methods, including calculating the Eye Aspect Ratio (EAR) and monitoring the driver's expressions, help us achieve this goal.Furthermore, the Mouth Aspect Ratio (MAR) is taken into consideration, which bears conceptual resemblance to the Eye Aspect Ratio (EAR) since it measures the proportion between the length and width of the mouth [39].Our hypothesis postulates that tired people have poorer mouth control, leading to more frequent yawning and an abnormally large MAR [40].
The term "complete monitoring" refers to an all-encompassing accumulation of camera-based observations focused on the eyes or mouth, while "complete detection" refers to the precise quantification of such occurrences.The distribution of eye and mouth features is very different.By contrasting the open and closed configurations, as indicated by formula (1), landmarks can be located.) × 100% (1) Some facial landmarks can be used to make educated guesses about the angle of the eyes and the size of the lips.(2)

FIGURE 2. The eye aspect ratio and mouth aspect ratio [41].
In the first the stage, a Loss Function is utilized to clarify the impacts of numerous constraints in an argument to a user, especially within the framework of Deep Convolutional Neural Networks (DCCNN).Loss functions are statistics that measure the dissimilarity between a predicted output and a labeled target, and their magnitude varies depending on the specifics of a given task.Preparing for any eventuality via the cross-loss function.For any xi, the cross-entropy loss function (3) holds true.
The symbol "π" is employed to denote the anticipated output of the networks, while "y_i1" signifies the factual labels for x_i, which can pertain to either facial or non-facial attributes.
The estimation of orientations for a bounding box including a face is another target.This objective is pertinent to the problem of regression.When training, the Euclidean loss function is used.The loss function is represented by the equation (4).
The network's predicted face region, or Pi, is smaller than the true face region, or yi2, in an image.A driver's likeness can be captured with high precision with the help of a Deep Convolutional Neural Network (DCCNN) that has been trained.Therefore, providing a high-quality face photo for the appropriate algorithm to analyze is much less of a hassle.

Algorithm of Driver Drowsiness Detection
Fatigue detection can be assessed with the help of OpenCV's Eye Aspect Ratio (EAR) and Mouth Aspect Ratio (MAR) monitoring features, as well as Dlib's specialized prediction and detection function based on Neural Network.The eyeaspect ratio (EAR) and the mouth-aspect ratio (MAR) can be determined using the eye-coordinates and mouthcoordinates, respectively, obtained from open CV.The procedure is depicted in the flowchart in Figure 3.

RESULTS AND DISCUSSIONS
The results of testing our camera-based driver fatigue detection system in different lighting conditions are discussed here.
In this case, the camera is the primary sensor for detecting fatigue and triggering the system.An 8-megapixel camera was used for the shoot to ensure the highest quality photos.Our study utilized Open CV libraries and Dlib to concentrate on the driver's face, ocular region, and eyes.The aforementioned method relies on real-time video transmission to produce an image.An audible signal is played from a speaker outside the vehicle when the driver's fatigue levels reach a predetermined level.Our framework achieves over 94% accuracy in detecting driver drowsiness when the distance between the driver's face and the camera is carefully controlled.The evaluations presented in the tables that follow are applicable to many different contexts.The EAR, or Eye Aspect Ratio, is a useful measurement of fatigue.In order to quantitatively evaluate the test results, the EAR is measured in relation to a threshold.We also assess the critical Mouth Aspect Ratio (MAR) and its associated threshold for detecting sleepiness.
Setting the Eye Aspect Ratio (EAR) at 0.3 and the Mouth Aspect Ratio (MAR) at 20 provides us with very specific parameters within which to operate.Please be aware that we do not take into account the wearing of eyeglasses.The driver's eyes and mouth are also assumed to be open and closed, and the detection process to be functioning normally.Based on a random sample of 10 measurements taken from a population of 100, the mean Equivalent Rectangular Bandwidth (EAR) and threshold (THRESH) values were calculated.These ten samples have been hand-picked to adequately represent the typical detection scenario, as shown in Table 1.By carefully selecting our samples, we can delve into their specific characteristics and acquire a more comprehensive understanding of them.The average EAR and THRESH values for the samples mentioned above are shown in Figure 4.
The values displayed by the EAR were found to be consistently higher than the THRESH threshold, which we were able to detect through our monitoring.Figure 5 illustrates a comparative representation of the mean values for the Minimum Angle of Resolution (MAR) and Threshold (THRESH).From the data at hand, we can see that the Mean Absolute Residual (MAR) consistently displays values below the predefined threshold (THRESH).Evidenced by the increase in EAR and decrease in MAR, the system clearly demonstrates its ability to distinguish prevalent scenarios.The results mentioned above provide crucial context for understanding the driver's level of focus.The results are also shown in Figures 6, 7, and 8.The median value of the Estimated Average Requirement (EAR) is depicted in Figure 6.The THRESH algorithm was used for the case study analysis, and its primary goal was to identify instances of eye closure and fatigue.The typical size of the yearly MAR is shown in Figure 7. Yawning and eye-opening times were given greater weight than other behavioral indicators when using the THRESH method to evaluate the subject's behavior.The average EAR over a given time period is depicted in a bar chart Figure 8. Experimenters used the THRESH method to look at the phenomenon, paying close attention to yawning and other signs of fatigue.In this analysis, we will present the insights we've gained from our observations and assess how those insights compare to those of previous studies.The presented scenario shows how the outcome changes depending on the EAR and MAR thresholds used, assuming the driver is not wearing corrective lenses.The effectiveness of the suggested model was improved by modifying it.The goal of this technology is to detect when a driver is becoming drowsy and, if necessary, to sound an alarm.In addition, the model of the proposed methodology outperforms previous efforts in its ability to identify signs of fatigue.The proposed method has been shown to achieve a detection accuracy of up to 99.99 percent for the Eye Aspect Ratio (EAR) and the Mouth Aspect Ratio (MAR), as was previously stated.The results of a performance analysis of the proposed method in two different settings are shown in Table 2.

CONCLUSION
The dangers of driving while tired inspired the authors of this paper to create a new standard for detecting nod offs behind the wheel.It is critical for drivers to adopt safer strategies, such as not driving at night or after taking drugs or alcohol, to reduce the risks associated with driver drowsiness while operating a vehicle.Researching algorithms developed to identify instances of drowsy driving is an important step toward reducing traffic accidents.Incorporating a large number of parameters and the fluctuations associated with them, this study presents a novel method for determining driver fatigue.It is clear from the existing empirical evidence that the evaluation tools of Estimated Average Requirement (EAR) and Minimum Acceptable Range (MAR) are widely used in academic research and nutritional studies.These metrics are used to evaluate the nutritional status of populations as a whole.It has been suggested to evaluate a driver's eye health with the EAR metric, which is implemented with the Dlib toolkit.Empirical evidence shows that the EAR metric is significantly correlated with driver visual acuity.The Dlib toolkit is used to calculate a metric called the Mouth Aspect Ratio (MAR), which is then used to evaluate the driver's oral health.A significant correlation between MAR and its effect on drivers' eye movement patterns is demonstrated by the empirical evidence uncovered by the study.The rational basis of this concept is supported by the muscular contractions that occur during the facial expression characterized by the movement of the mouth in an opening and shutting motion.

FIGURE 1 .
FIGURE 1. Show the camera and screen that was used.

Figure 1
Figure 1 depicts the system architecture that enables Python, Dlib, and open-cv to be used on individual computers.The analysis was conducted on a desktop computer outfitted with 8 processing cores, 32 gigabytes of random access memory, and an NVIDIA GTX1660 super 6G graphics processing unit.The research also made use of a camera with a frame rate of 60 fps and a resolution of either 1280 x 720 or 1920 x 1080.

FIGURE 3 .
FIGURE 3. Flow of the main system.

Table 2 . -A comparison of the proposed method's accuracy in two cases Cases Acc of EAR Acc of MAR No glasses open eye equal 99.99 percent close eye equal 99.98 percent open mouth equal 99.99 percent close mouth equal 99.99 percent
However, the findings derived from the proposed methodology in the initial case demonstrated a superior level of quality in comparison to the results presented in the other academic papers evaluated in Table3provided.