Using Electrooculography and Electrodermal Activity During a Cold Pressor Test to Identify Physiological Biomarkers of State Anxiety: Feature-Based Algorithm Development and Validation Study

doi:10.2196/69472

¹Computer Science, California Institute of Technology, Pasadena, CA, United States

²Medical Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, CA, United States

³Adult Psychiatry, Dartmouth College, Hanover, NH, United States

*these authors contributed equally

Corresponding Author:

Samuel Aaron Solomon, BSc, MEng, PhD

Related ArticlesPreprint (arXiv): https://arxiv.org/abs/2411.17935v1
Peer-Review Report by Daniela Saderi, Shailee Rasania, Toba Olatoye, Simon Muhindi Savai, Randa Salah Gomaa Mahmoud, Vasco Medeiros, and Mitchell Collier: https://med.jmirx.org/2025/1/e72093
Authors' Response to Peer-Review Reports: https://med.jmirx.org/2025/1/e77440

Background: Anxiety has become a significant health concern affecting mental and physical well-being, with state anxiety (s-anxiety)—a transient emotional response—linked to adverse cardiovascular and long-term health outcomes. Traditional physiological monitoring lacks the contextual sensitivity needed to assess anxiety in real time. Electrooculography (EOG) and electrodermal activity (EDA), 2 biosignals measurable by wearables, offer promising avenues for identifying biomarkers of s-anxiety in naturalistic environments.

Objective: This study aims to identify novel biomarkers of s-anxiety using EOG and EDA signals collected in real-world conditions. We further explore how noninvasive wearable technology can enable real-time monitoring of physiological responses during induced stress, focusing on distinguishing true anxiety-related signals from artifacts in noisy environments.

Methods: Our study presents two datasets: (1) the EOG signal blink identification dataset Blink Identification Electrooculography Dataset (BLINKEO), containing both true blink events and motion artifacts, and (2) the EOG and EDA signals dataset Emotion, Electrooculography, and Electrodermal Activity Monitoring in Cold Pressor Conditions Dataset (EMOCOLD), capturing physiological responses from a cold pressor test (CPT). From analyzing blink rate variability, skin conductance peaks, and associated arousal metrics, we identified multiple new anxiety-specific biomarkers. Shapley additive explanations (SHAP) were used to interpret and refine our model, enabling a robust understanding of the biomarkers that correlate strongly with s-anxiety.

Results: BLINKEO feature analysis achieved a classification accuracy of 98.17% and F₁-score of 0.87 in distinguishing blinks from noise. In the EMOCOLD, survey results confirmed elevated anxiety and affectivity during the CPT, which normalized during recovery. SHAP analysis revealed that specific EDA features (eg, Hjorth activity and spectral entropy) and EOG features (eg, opening phase energy and signal height) consistently contributed to accurate predictions of s-anxiety and affectivity. Contextual combinations of features outperformed single-feature analyses, revealing relationships critical for robust biomarker identification.

Conclusions: These results suggest that a combined analysis of EOG and EDA data offers significant improvements in detecting real-time anxiety markers, underscoring the potential of wearables in personalized health monitoring and mental health intervention strategies. This work contributes to the development of context-sensitive models for anxiety assessment, promoting more effective applications of wearable technology in health care.

JMIRx Med 2025;6:e69472

doi:10.2196/69472

Keywords

stress; biomarker discovery; EOG; electrooculography; medical informatics; EDA; electrodermal activity

Background

Despite being a short-term response, state anxiety (s-anxiety) has emerged as a significant factor impacting long-term health outcomes. Researchers have linked sustained s-anxiety with adverse cardiovascular effects [1], underscoring its profound effects on mental and physical health. Approximately 23.1% of US adults experience some form of diagnosable mental disorder [2], and 74% of US adults reported experiencing stress-related health issues within a given month [3], illustrating the widespread impact of anxiety-induced stress. Reliable biomarkers are essential for capturing the complexities of s-anxiety, enabling more precise and effective models.

Noninvasive wearable technology has the potential to transform health monitoring by continuously capturing physiological data through real-time sensor measurements [4,5]. These devices collect a broad array of metrics, yielding critical insights into the body’s responses to anxiety. The ability to seamlessly collect large amounts of health-related data opens new ways to study and build an understanding of the onset and progression of anxiety, enabling more effective interventions and advancing our knowledge of human health. Identifying reliable biomarkers of s-anxiety offers a promising pathway to real-time health monitoring using wearable biosensors that can detect subtle physiological changes not immediately obvious in raw signal data.

The cold pressor test (CPT) is a widely used experimental method for studying anxiety responses in controlled settings. Participants immerse their hand in ice-cold water (0‐4 °C), eliciting a sympathetic nervous system response. This test reliably induces physiological markers of anxiety [6-8], such as increased heart rate and sweat production. Other techniques, such as public speaking simulations and mental arithmetic tasks [9], also provoke anxiety and can be used to identify reliable biomarkers.

Physiological responses to s-anxiety and arousal have been extensively documented, revealing clear links between emotional states and indicators such as blink rate variability [10] and stress-induced sweating [11]. The 2-factor model of emotion, developed by Schachter and Singer [12], suggests that emotions arise from physiological arousal and subsequent cognitive interpretation. This model underscores that physiological responses are interpreted within a contextual framework, which are further hidden in indirect biomarkers for specific emotional experiences. For instance, fatigue, which affects the blink conditions, can intensify physiological arousal, directly impacting how the brain interprets anxious states. Such contextual cues are crucial for understanding s-anxiety in real-world settings, but they are often filtered out or controlled for in existing studies. Electrodermal activity (EDA) is a common measure of physiological arousal, but its reliability in depression research remains debated. Some studies report reduced EDA responses in individuals with major depressive disorder, suggesting impaired autonomic reactivity [13] and emotional hypo-responsiveness [14]. However, conflicting findings point to variability due to factors like medication use and methodological differences [15], emphasizing the need for further research on the relationship between physiological signals and emotional states.

Wearable devices offer a way to contextualize these arousal states dynamically. Through advanced human-machine interfaces, wearables can monitor how individuals respond to their environments, integrating data on physical responses to build a richer understanding of s-anxiety. There is growing interest in using noninvasive wearables to collect richer biomarker data for mental health study [16,17], interpreting physiological responses in respect to real-time contextual cues and providing a more comprehensive view of emotional states.

Research shows that blink rates tend to increase under difficult mental tasks or anxiety-provoking situations [18,19], reflecting activation of the autonomic nervous system. Electrooculography (EOG) captures electrical signals produced by eye movements, allowing for the detection of blink-related patterns. But EOG signals are often filtered out in stress studies to improve clarity of other signals [20], potentially overlooking valuable information related to emotional arousal. Studies suggest that specific components of EOG signals can be analyzed to extract physiological markers of s-anxiety, highlighting the need for further research into EOG biomarkers. Furthermore, fatigue—closely associated with emotional arousal—provides an additional avenue for understanding anxiety through EOG features [21,22]. Studies examining EOG signals in the context of drowsiness reveal correlations between blink frequency, blink duration, and stages of fatigue [19], highlighting a noninvasive method for tracking emotional arousal over time. Given the interplay between fatigue and anxiety, this relationship prompted our investigation into how fatigue-related features within EOG signals may serve as indirect indicators of anxiety, offering new opportunities for nuanced and comprehensive stress monitoring.

Similarly, stress has a pronounced effect on sweat production. Emotional sweating, triggered by the sympathetic nervous system, occurs in response to psychological stressors rather than temperature changes [15,16,23]. EDA is a method that measures changes in skin conductance. Under emotional arousal and stress, body sweats and skin conductance increases. Previous studies often rely on basic features like median values [24] or the phasic component of the EDA signal, focusing on nonspecific skin conductance responses (SCRs) to correlate with self-reported s-anxiety [25] scores. In such studies, peaks in the phasic signal exceeding 0.01 µS were counted as responses, and the frequency of these nonspecific SCRs per minute served as the primary measure for physiological s-anxiety. EDA primarily reflects the magnitude of emotional arousal without distinguishing between positive and negative affective states [26]. In other words, a high SCR could result from excitement or stress, making it challenging to interpret EDA data as a standalone indicator of anxiety. This underscores the importance of using EDA in combination with other physiological markers [27], such as heart rate variability or blink rate, to gain a more comprehensive picture of an individual’s emotional and physiological state. A more methodical exploration of signal characteristics found in EDA and EOG signals reveal nuanced physiological markers that strongly correlate with s-anxiety.

Currently, no widely accepted biomarkers reliably assess anxiety across diverse contexts, highlighting the need for continued exploration. Researchers have tested markers like heart rate variability, skin conductance, and blink rate, but results often vary due to individual differences and contextual influences. While many studies report that depressed patients exhibit reduced EDA responses, indicating diminished autonomic nervous system activity, some research presents conflicting findings. These discrepancies are attributed to variations in study designs, methodologies, and the influence of factors such as antidepressant treatment on EDA measurements [13].

While machine learning models have shown promise in detecting anxiety, their black-box nature limits interpretability, making it difficult to validate findings across diverse populations [28]. By introducing additional context-sensitive biomarkers, we aim to enhance the reliability and transparency of anxiety assessments, making models more applicable to real-world scenarios.

Objective

In our research, we leverage EOG and EDA data to develop a comprehensive, real-time model of s-anxiety. We have compiled 2 distinct datasets for this purpose. The first dataset, Blink Identification Electrooculography Dataset (BLINKEO), consists of EOG signal features from samples characterized by peak-like patterns, annotated to differentiate natural blink events from extraneous noise and wire movement artifacts. The second dataset, Emotion, Electrooculography, and Electrodermal Activity Monitoring in Cold Pressor Conditions Dataset (EMOCOLD), contains time-series EOG and EDA signals along with demographic data and stress responses elicited by the CPT. Using interpretability techniques such as SHAP (Shapley additive explanations), we identify and quantify specific biomarkers within the EOG and EDA data, with a focus on blink rate variability and sweat-related stress indicators. Our approach goes beyond simple anomaly detection by uncovering nuanced, anxiety-specific physiological markers informed by the 2-factor model of emotion. This research contributes to a more detailed understanding of stress mechanisms, with the potential to improve mental health interventions and enable personalized, context-specific stress management strategies with wearable technology.

Description of Question

This research aims to identify reliable, interpretable biomarkers of s-anxiety through EOG and EDA data for real-time stress monitoring.

Blink Identification EOG (BLINKEO) Data Collection

To create the BLINKEO dataset, EOG data were collected and analyzed to differentiate natural blinks from noise or wire movements. Our setup integrated the AD8232 (analog devices), a biopotential amplifier designed to capture physiological signals, which we optimized for measuring EOG activity. To detect vertical eye movements using EOG, one electrode was positioned above the eye and another below it, aligning on the vertical axis. This configuration captures the corneo-retinal potential changes associated with upward and downward eye movements. All trials were conducted on the same two individuals for consistency in signal characteristics. A total of 65 trials involving repeated blinking under controlled conditions where no extraneous movement occurred. In addition, 19 trials lasting between 30 seconds and 2 minutes were conducted under conditions with no blinking, but with deliberate wire movements introduced by manually adjusting or lightly tugging the electrode leads. These trials provided a baseline for accurately distinguishing noise artifacts from genuine blink events. Table 1 shows the characteristics of these trials, including session count, total recording time, and peak detection results before and after filtering.

To preprocess the EOG data, motion artifacts were identified and removed, to make the data suitable for downstream features. A fifth-order low-pass Butterworth filter using the Scipy Signal butter function was applied to isolate low-frequency components indicative of meaningful physiological signals. This was followed by a Savitzky-Golay filter using the Scipy Signal savgol_filter function for additional smoothing, which preserved essential features while reducing minor signal fluctuations [29].

Table 1. Characteristics of blink and wire movement trials in the blink identification dataset. This table summarizes the number of independent sessions, cumulative recording time, and peak detection results before and after literature-supported blink peaks filtering for both blink and wire movement events.

Trial label	Sessions, n (%)	Total time (s)	Peaks detected, n (%)	Peaks after filtering, n (%)
Blink	65 (77)	12,103.14	6792 (54)	4734 (96)
Wire movement	19 (23)	2007.75	5704 (46)	203 (4)

Peak detection was performed using the Scipy Signal find_peaks function, identifying peaks with a prominence exceeding 0.1 with a peak width greater than 0.04 seconds [30] (blinks typically last between 0.1 and 0.4 seconds [31], averaging around 0.25 s). To focus on blink-like events, we additionally applied criteria based on established blink characteristics: a maximum peak width of 0.5 seconds and a minimum peak height of 0.05 volts [30]. We compared the signal quality after this initial peak detection with that obtained using conventional blink filtering methods. Traditional filtering techniques frequently overlook subtle blink patterns or introduce artifacts during data cleaning, potentially compromising accuracy. In contrast, a learned-feature approach refines this process by reducing noise and enhancing the precision of true blink identification within the dataset. Figure 1A shows examples of detected blink peaks from the BLINKEO dataset, with red dotted lines marking the center of each peak. This figure demonstrates the effectiveness of the peak detection method described in this section, highlighting its ability to accurately locate and extract the central point of each blink event during blink trials.

**Figure 1.** A. Blink peak examples from the Blink Identification Electrooculography Dataset (BLINKEO). The grey dotted lines indicate the center of the peak, extracted by the peak detection method outlined in this section. B. Blink examples (blue) plotted against wire examples (green), as filtered EOG voltage signals, normalized per peak between 0 and 1. Peaks are time-aligned by time, in seconds, from the center of peak. Wire signals typically have higher variability. C. A singular blink peak. The purple dot marks the peak of the blink event, while the outer edges of the red and grey shaded sections represent the boundaries used for feature extraction. These boundaries are determined by identifying the nearest minimum on each side of the peak, providing a precise range for analyzing blink characteristics. D. Another example of a blink peak, demonstrating the variability in blink peak shapes observed across recordings. The feature extraction process remains consistent, with boundaries determined by identifying the nearest minima on either side of the peak. EOG: electrooculography.

However, wire movements can also produce peak-like shapes, which poses challenges for this filtering method. While effective in controlled or low-noise environments, the filter is easily triggered by noisy conditions, where artifacts such as wire movements may mimic blink patterns. Figure 1B presents time series segments of both blink and wire movement examples that have been classified as blinks under the current filtering approach, overlaid for comparison. The figure shows that wire movements exhibit greater variability in the regions surrounding the peak, as well as in the overall shape of the peak itself. Current approaches are unable to distinguish between true blinks and wire artifacts, underscoring the limitations of the method in noisier environments.

For each detected peak, baseline values were calculated to provide a reference point for the signal’s amplitude. This involved locating the nearest minimum values on either side of the peak by performing binary search with a window size of up to 0.5 seconds in the left and right direction from the peak observed (see algorithm pseudocode in Multimedia Appendix 1). It recursively narrows down the search range to locate a local minimum, while avoiding minor fluctuations.

After establishing the baseline points, we extracted a comprehensive set of amplitude-independent features for each peak. These features include blink duration and various acceleration and velocity metrics, as used in previous EOG feature extraction and peak signal analysis studies [32,33]. A total of 32 peak-related features and label are stored as examples in the dataset, with labels distinguishing natural blinks from noise artifacts.

Figure 1C shows examples of EOG signals from 2 independent singular blink events, with distinct sections of the peak highlighted for clarity. The purple dot at the peak center represents the highest voltage point, detected by the peak detection algorithm. Red dots indicate local maxima in velocity, while blue dots show local acceleration points. Shaded regions in different colors represent key sections of the blink, such as the rising and falling phases, as well as acceleration and deceleration phases. This segmentation captures various aspects of the blink shape, this detailed segmentation provides valuable insights into the blink dynamics, enabling the extraction of relevant blink-related features.

We establish bounds for each feature by discretizing its range into 50 intervals. This discretization splits the feature’s values into small, equally spaced segments, enabling a systematic exploration of possible lower and upper bounds that optimize model accuracy.

The process begins by identifying the minimum and maximum values of each feature. The range between these values is divided by the bin count (50), yielding an incremental “step size,” or delta value, for testing. This delta value determines how much the threshold will shift at each iteration when exploring the bounds. To identify the best lower bound, the algorithm starts from the minimum value and iteratively adds the delta value (eg, 0.2) to the threshold, testing each increment by culling data points below it and evaluating the model’s accuracy with the adjusted dataset. The lower bound with the highest accuracy is selected as the optimal starting point for that feature.

The search then proceeds to find an optimal upper bound, beginning with the maximum value and reducing it by increments of the delta value until reaching the previously identified lower bound. This decremental approach ensures the upper bound remains above the lower bound. Each new threshold is applied to the dataset, and the accuracy is recorded. The upper bound yielding the best accuracy becomes the final threshold for that feature.

The individually optimized lower and upper bounds for each feature are compiled into a list, representing the complete culling thresholds that maximize model performance across the dataset. By discretizing each feature’s range into 50 intervals, the individual search method ensures a thorough yet efficient exploration of potential thresholds.

Emotion, EOG, and EDA Monitoring in Cold Pressor Conditions (EMOCOLD) Data Collection

The data collection process employed wearable sensors to record EDA and EOG signals from participants during controlled stress trials. EOG recording used the same setup as the BLINKEO data collection. Electrodes were positioned above and below one eye to detect vertical eye movements by capturing corneo-retinal potential shifts. EDA signals were recorded using a galvanic skin response sensor with MCP606 (microchip technology) operational amplifiers, operating at an excitation voltage of 0.5 V to measure skin conductance. Electrodes were placed on the forehead, chosen for its sensitivity to stress-induced sweat gland activity. The recorded signals were digitized and processed in real time using an ESP32-S3 WROOM-1 (Espressif Systems) microcontroller, which managed data acquisition, signal processing, and wireless transmission.

A total of 16 participants, between ages 26 and 31 years took part in the study, and demographic information, including race and sex, was collected and is summarized in Table 2. Data were taken from each subject only once. Each trial lasted about 10‐15 minutes and was divided into 3 phases: baseline, CPT, and recovery. The length of the trial and the data used for feature analysis is as detailed in Table 3.

Table 2. Characteristics of trials in the Emotion, Electrooculography, and Electrodermal Activity Monitoring in Cold Pressor Conditions Dataset (EMOCOLD) dataset. Demographic details of the study participants, including race and assigned sex.

Characteristic	Count, n (%)
Assigned sex
Male	11 (69)
Female	5 (31)
Race
Asian	11 (69)
Hispanic or Latino	2 (13)
White	1 (6)
Middle Eastern or North African	1 (6)
Black or African American	1 (6)
Total participants	16 (100)

Table 3. Summary of trial durations across different experimental phases. Summary of the duration of time electrodermal activity and electrooculography features are collected across different experimental phases. For each phase—baseline (before hand submersion), cold pressor test (cold water immersion), and recovery (after hand removal)—the table lists the minimum, 25th percentile, median, 75th percentile, and maximum duration (in seconds).

Experiment	Length (seconds)
	Minimum	Median (IQR)	Maximum
Trial
Baseline	245.6	281.7 (274.0-310.0)	414.8
CPT^a	261.9	290.4 (278.4-306.4)	358.0
Recovery	238.6	261.3 (252.8-278.1)	311.2
Feature collection
Baseline	167.5	177.0 (172.1-182.3)	194.0
CPT	160.6	177.2 (165.0-184.1)	188.2
Recovery	157.1	172.1 (168.4-180.3)	191.9

^aCPT: cold pressure test.

EOG signals were recorded using a 3-electrode configuration designed to capture vertical eye movements, particularly blink activity. Electrodes were positioned as follows: 1 above the eye, 1 below the eye, and a reference electrode in the middle of the forehead. This setup effectively captured vertical eye movement signals, with the reference electrode providing signal stability and reducing noise.

For EDA, a single electrode was placed on the forehead to measure changes in skin conductance associated with sympathetic nervous system activation. The forehead was chosen for its accessibility and stable conductance properties, making it suitable for detecting stress-related physiological changes in skin conductance.

Participants wore the device throughout the CPT trials, which were conducted to simulate acute stress events. The trials included both physical and environmental stressors. In the cold-water trials, participants immersed their hand in a circulating water bath set to a constant temperature of 0‐6 °C. Participants maintained immersion for approximately 5 minutes or until voluntary withdrawal. This provided a controlled means of eliciting stress responses.

The design of these trials facilitated the collection of time-series data, capturing participants’ physiological reactions to both physical exertion and environmental stressors, thereby providing a comprehensive view of their autonomic responses under varying stress conditions. Features were extracted from partitions of this sensor data, including statistical measures (mean [SD] and variance), signal entropy, peak detection metrics, and frequency-domain characteristics relevant to stress-induced physiological changes. Figure 2 shows a graphical depiction of the trial methodology.

**Figure 2.** This figure presents a visual representation of the experiment timeline and the signals recorded during the experiment, detailing the baseline, cold pressor test (CPT), and recovery phases. The raw electrooculography (EOG) and electrodermal activity (EDA) signals across these phases show no immediately clear trend distinguishing the baseline and recovery from the CPT stressor. However, when specific features such as blink duration from EOG and Hjorth activity from EDA are extracted and overlaid, more distinct patterns emerge, and can be used to quantify physiological responses to stress induction and subsequent recovery.

At each stage of the experiment—baseline, CPT, and recovery—participants completed an excerpt of the Positive and Negative Affect Schedule (PANAS) and the State-Trait Anxiety Inventory (STAI-State) to assess their emotional responses. The PANAS measures both positive emotions (eg, inspired and attentive) and negative emotions (eg, upset and nervous) on a 5-point scale, capturing general mood states. The STAI-State survey, consisting of items such as “I feel tense” and “I feel worried,” assesses immediate anxiety levels on a 4-point scale, making it particularly useful for tracking s-anxiety in response to acute stress. The survey recorded at each stage is detailed in Multimedia Appendix 2. Administering these surveys at each stage allowed us to correlate physiological data from EOG and EDA signals with subjective emotional responses, providing a comprehensive view of how participants’ mood and anxiety levels evolved across stress phases.

EOG Signal Segmentation

In analyzing EOG signals, we segmented the data to isolate individual blink peaks, which are essential for understanding blink dynamics in response to stress. From these peaks, we extracted 35 features, including blink duration, amplitude, frequency, and various acceleration and velocity metrics. A comprehensive list of these features and their definitions is provided in Multimedia Appendix 3.

EDA Signal Segmentation

The tonic and phasic components of skin conductance reveal different aspects of autonomic arousal, with the tonic level representing a stable baseline and the phasic response capturing transient, stimulus-driven changes. Tonic signals vary significantly across individuals due to factors like skin type and hydration, making them challenging to analyze consistently in relation to specific stress events. Phasic responses, however, reflect rapid fluctuations in skin conductance directly tied to acute stress or anxiety-inducing stimuli, characterized by quick rises and gradual declines.

Phasic signals were divided into rise and fall phases to capture the dynamics of the SCR, which is indicative of sympathetic nervous system activation. Specifically, peaks were detected by identifying rapid increases in skin conductance (rise phases) followed by gradual decreases (fall phases). To preprocess the EDA data and extract the phasic signal, motion artifacts were identified and removed, to make the data suitable for downstream features. A first-order low-pass Butterworth filter was applied to isolate low-frequency components indicative of meaningful physiological signals.

This signal was divided into windows of 1 second in length. Each section was analyzed to determine key features, such as mean value, signal range, and standard deviation. 15 features were extracted from these windows, and the full list of features and their definitions can be found in Multimedia Appendix 4. These features are critical for quantifying the intensity and duration of autonomic arousal events, providing valuable insights into stress response dynamics. The segmentation process allowed for the extraction of detailed temporal characteristics of each skin conductance event, facilitating a comprehensive analysis of physiological arousal under stress.

Ethical Considerations

This study was conducted in accordance with ethical guidelines for research involving human participants. A total of 16 participants were recruited, following established ethical guidelines as delineated in protocols approved by the institutional review board at the California Institute of Technology (Caltech; protocol IR22-1280 and IR21-1102). Participants were not compensated. Participants were screened based on specific exclusion criteria, including non-English speakers unable to understand survey requirements, inability to provide informed consent, medication use affecting psychiatric states, pregnancy, irregular eye conditions (eg, ocular dysmetria), and pre-existing psychiatric or physical illnesses (eg, depression, anxiety, hypertension, hyperlipidemia, or chronic cardiovascular disease). All participants’ data were fully anonymized, with identifying information removed and data transmission secured using byte-splicing encryption methods. The study adhered to data privacy and security protocols to ensure the confidentiality and protection of participants.

Blink Identification EOG (BLINKEO) Analysis

Building upon the nonintentional blink signal processing outlined by previous research [34,35], a feature bounding analysis aligned closely with the study’s approach of differentiating blink events based on slope and derivative features. By using blink duration alone as a feature, we achieved a classification accuracy of 87.46% and an F₁-score of 0.80 in distinguishing blinks from wire movements (see Multimedia Appendix 5). This suggests that feature extraction can yield strong performance metrics. Even without deep learning techniques, finding the right markers of blink peaks can reach the same efficacy of the study’s outlined slope-based signal differentiation.

In our approach, we systematically evaluate all possible combinations of 5 selected features to optimize classification performance for distinguishing blink events from wire movements. For each feature combination, we apply a breadth-first search (BFS) traversal to explore and fine-tune the upper and lower bounds of each feature, seeking the configuration that maximizes classification accuracy.

The BFS traversal begins with initializing the bounds for each feature to cover its entire observed range, ensuring that no data points are culled at the outset. Each feature range is discretized into 15 bins, allowing for incremental adjustments to the bounds with a step size (delta) calculated as the range divided by the number of bins. These initial bounds are stored as a “node” in the BFS queue, representing a unique culling configuration.

During each iteration of the BFS traversal, we dequeue a culling configuration and calculate its classification accuracy and F₁-score using a performance function. If the configuration achieves a higher accuracy than previously recorded, it becomes the current optimal configuration. The BFS traversal then generates neighboring configurations by slightly tightening the bounds for each feature—either increasing the lower bound or decreasing the upper bound by the computed delta. Each of these neighboring configurations, if unvisited, is added to the queue for further exploration.

This BFS traversal continues until all relevant bound configurations for the current feature combination are evaluated. The outcome is an empirically derived set of feature bounds that maximizes classification performance for each combination of features. By applying this process across all combinations of the selected 5 features, we ensure a comprehensive search of the parameter space, yielding an optimal culling pipeline tailored for precise blink detection. This method demonstrates the robustness of combining BFS with multifeature analysis to achieve a high-performing, data-driven classification model.

In our approach, we select combinations of 5 high-quality features and use a BFS traversal to optimize their combined bounds for maximal classification performance. For each combination, BFS systematically explores adjustments to the upper and lower bounds of each feature, identifying the optimal configuration that yields the highest accuracy and F₁-score.

The optimal feature combination achieved an accuracy of 98.17% and an F₁-score of 0.8734, using 5 key features that capture distinctive characteristics of blink dynamics. These features include velocity entropy, the entropy of the first derivative of the signal, which measures the variability and complexity of the blink motion; signal entropy, the entropy of the signal itself, providing a broader assessment of the overall blink pattern; slope at closing tent, maximum acceleration, the maximum acceleration during the closing phase of a blink, which isolates the rapid deceleration typical of blink closure; blink duration, representing the total time span of the blink event; and maximum acceleration velocity ratio, the ratio between the maximum acceleration and maximum velocity during the closing phase, which captures the relationship between these peak dynamics, indicative of voluntary eye closure. Figure 3 shows the results of each feature bounding step, against the BLINKEO labeled examples.

**Figure 3.** Optimal culling steps for differentiating blink events from wire movement artifacts in electrooculography (EOG) data. This figure presents the sequential culling steps optimized to achieve the highest accuracy and F₁-score in distinguishing blink events (green) from wire artifacts (blue) in EOG data. Each subplot demonstrates a unique culling step, applying specific feature thresholds to progressively refine the data. The final subplot, “Peaks preserved over culling pipeline,” illustrates the proportion of retained peaks at each stage for both blink and wire signals, showcasing the efficacy of each step in isolating genuine blink events.

These features together form a comprehensive representation of blink characteristics, enabling differentiation of blinks from other signal types in the culling pipeline. This highlights how strategically selected bounds on multiple features, when combined, result in high classification performance without relying on complex algorithms.

Emotion, EOG, and EDA Monitoring in Cold Pressor Conditions (EMOCOLD) Analysis

Emotion Analysis

The EMOCOLD dataset analysis highlights significant physiological and emotional responses to acute stress induced by the CPT. Figure 4 shows participants’ aggregated self-reported survey scores for positive affectivity, negative affectivity, and s-anxiety across the 3 trial stages: baseline, CPT, and recovery. Figure 4 shows that for each stage, survey responses were summarized and visualized using box plots, which display the distribution of scores. Positive affectivity and negative affectivity are scored on a scale of 5‐25, and s-anxiety is scored on a scale of 20‐80.

**Figure 4.** User-reported survey responses during each stage of the trial, displaying both box-and-whisker plots and column graphs for positive affectivity, negative affectivity, and state anxiety (s-anxiety) across the baseline, cold pressor test (CPT), and recovery stages. During the CPT, participants showed higher levels of positive affectivity, negative affectivity, and stage anxiety. Elevated levels recovered to baseline responses when participants took their hand out of the cold-water bath during the recovery phase. STAI: State-Trait Anxiety Inventory.

Participants reported increased positive and negative affectivity, as well as elevated s-anxiety during the CPT, which returned to baseline during recovery. This dual affective response suggests heightened arousal may include both alertness and discomfort. The recovery phase indicates effective autonomic regulation, as emotional states normalized once the stressor was removed. These findings validate the CPT as a method for inducing short-term anxiety.

SHAP Analysis

Overview

SHAP analysis is a method used to explain the output of machine learning models by breaking down the prediction into contributions from each feature. SHAP values are based on Shapley values from cooperative game theory, which attribute the impact of each feature on the model’s output by treating each feature as a “player” in a game and calculating its contribution to the final prediction.

In this study, SHAP analysis was performed on combinations of 5 features, selected from the total feature set of 15 EDA and 35 EOG features, highlighting the significance of how certain biomarkers, used together, reveal more prominent interactions and effects on model predictions. This approach underscores that certain biomarkers, while potentially less impactful individually, can demonstrate substantial importance when analyzed as part of a group. By evaluating these interactions, we understand how combinations of features can provide insights into the model’s behavior that single-feature analyses might overlook.

The quality of a set of features is determined by considering their collective contribution to the model’s predictions, measured through the mean absolute SHAP values across the dataset. A high-quality set of features is one where the combination of features demonstrates substantial importance, as indicated by a higher mean absolute SHAP values. This benchmark reflects not only the magnitude of individual contributions but also the degree to which the features, as a group, interact to enhance the predictive power of the model.

The SHAP value maps provide insights into how various EOG features used in combination, and EDA features used in combination, contribute to predictions for positive affectivity negative affectivity, and s-anxiety. Each SHAP sub-plot illustrates the impact of individual features on model outputs, with higher SHAP values (toward the right) signifying a positive contribution to the prediction, and lower SHAP values (toward the left) indicating a negative contribution. Figure 5A highlights the SHAP analysis identifying the combination of features that best polarize model predictions across the affective states.

**Figure 5.** 5A. Shapley additive explanations (SHAP) analyses for optimal combinations of 5 electrooculography (EOG) features (top row) and 5 electrodermal activity (EDA) features (bottom row) for positive affectivity (left column), negative affectivity (middle column), and state anxiety (right column). 5B. SHAP analysis of feature combinations. This analysis explores the quality of distinguishing different affectivity levels using different sets of features. This is an example of 5 EOG features and their impact on the negative affectivity score. Substituting one key feature with another can reveal new interdependencies among remaining features, thereby enhancing the model’s interpretability.

EOG Feature Analysis

Among the EOG features analyzed, the opening phase energy, the integral of the opening phase of the peak signal, and opening signal range, the amplitude of the opening phase of the peak signal, consistently appeared in optimal feature combinations across all three outputs, suggesting their robustness as predictors. In addition, the signal height feature exhibited a particularly strong influence on predictions for negative affectivity and s-anxiety, underscoring its significance in these contexts.

EDA Feature Analysis

Among the EDA features analyzed, Hjorth parameters and the signal SD emerged as important predictors across the different affective states. These findings highlight the importance of analyzing feature interactions to reveal critical combinations that drive model performance, offering deeper insights into the physiological signals underpinning emotional and stress-related states.

The SHAP analyses in Figure 5B illustrate the importance of considering features in combination when identifying the most relevant biomarkers. By selecting sets of 5 features, we aim to identify a group of biomarkers that not only are individually relevant but also work effectively together. In Figure 5B, the inclusion of the feature opening phase energy contributes significantly to the model’s performance, yielding a well-defined distinction in SHAP values. When opening phase energy is removed from the features considered, model performance decreases, and features such as blink full-close duration appear to show more distinction.

Principal Findings

The main findings of this study show the potential of EOG and EDA as powerful tools for identifying nuanced physiological biomarkers associated with s-anxiety. Through the development and analysis of the BLINKEO and EMOCOLD datasets, we have introduced novel datasets and used advanced feature extraction techniques with interpretability methods such as SHAP analysis to uncover anxiety-specific markers. Our results emphasize the importance of understanding biomarkers in their context-dependent interactions and collective contributions to predictive models.

By systematically evaluating combinations of features, we mitigated challenges often faced in the literature, where biomarkers show inconsistent or nonsignificant correlations with anxiety due to situational variability. For instance, while blink rate and skin conductance metrics have been previously explored, our analysis reveals that their predictive use depends heavily on contextual factors, such as the type and intensity of the stressor. For example, biomarkers like blink duration and skin conductance peaks performed well under controlled CPT conditions but may not generalize to other stress-inducing scenarios like public speaking. This underscores the need for adaptive, context-sensitive models that account for the situational variability of physiological responses.

A key contribution of this work is the identification of feature combinations that consistently provide reliable predictions. For EOG data, features like blink duration, peak height, and the opening integral were shown to be robust predictors across various emotional states. Similarly, for EDA data, features such as the mean signal, permutation entropy, and Hjorth activity emerged as significant contributors. By leveraging SHAP analysis, we identified not only which features are most relevant but also when and how they interact to enhance model performance. This approach offers a more comprehensive understanding of physiological responses compared to studies focusing solely on single-feature analyses.

Our findings bridge a critical gap in the literature by offering a systematic approach to addressing the variability and context-dependence of physiological biomarkers. This research advances the field by providing a framework for building more robust, interpretable, and context-sensitive models for anxiety assessment. The ability to dynamically adapt to different stress scenarios makes these biomarkers more applicable to real-world settings, paving the way for more personalized and effective mental health interventions.

Limitations

This study advances s-anxiety biomarker detection using EOG and EDA, but several limitations should be noted. The participant pool (N=16) was demographically skewed, with a predominance of male and Asian participants, limiting generalizability. Data were collected only once per subject, preventing analysis of intraindividual variability over time. Future studies should incorporate larger and more diverse populations with longitudinal data.

The CPT was conducted in a controlled lab environment, which may not fully reflect real-world anxiety triggers. In addition, motion artifacts in EOG recordings, despite filtering efforts, could impact signal clarity. EDA signals were recorded using a single forehead electrode, though different placements (eg, fingertips) may improve accuracy. Improved artifact detection and additional motion-tracking sensors could enhance data quality.

Feature selection for SHAP analysis focused on optimizing interpretability, but alternative selections may yield different insights. Models and analyses constructed using this dataset may not generalize well to other stress-inducing scenarios. External validation using independent datasets is necessary to confirm these findings.

Future Work

Future work should focus on validating these findings across diverse populations and stress-inducing contexts to further enhance the generalizability of these biomarkers. An important next step is to investigate potential gender-based and race-based differences in physiological responses to acute stress and our current methods of inducing stress, as this study was not explicitly designed for such analysis but acknowledges its relevance. In addition, integrating these models into wearable technology has the potential to revolutionize mental health monitoring, providing real time, personalized insights that could transform how we understand and manage anxiety. By addressing the challenges of situational variability and leveraging the strengths of combined biomarker analyses, this study contributes significantly to the growing field of wearable health technology and its applications in mental health.

Acknowledgments

This work was funded by the Translational Research Institute for Space Health through NASA NNX16AO69A, Office of Naval Research grants N00014-21-1-2483 and N00014-21-1-2845, Army Research Office grant W911NF-23-1-0041, National Institutes of Health grants R01HL155815 and R21DK13266, National Science Foundation grant 2145802, National Academy of Medicine Catalyst Award, and High Impact Pilot Research Award T31IP1666 from the Tobacco-Related Disease Research Program.

Generative artificial intelligence (AI; ChatGPT, version GPT-4o, OpenAI, 2025) was only used to assist with grammar correction and formatting of the authors' original text for this manuscript. These tools were used to improve clarity while preserving the authors' original ideas and content. All AI-assisted outputs were thoroughly reviewed and edited by the authors for accuracy and consistency before submission.

Data Availability

The datasets generated or analyzed during this study are available in the "stress-biomarkers-public-dataset" repository. The Blink Identification Electrooculography Dataset (BLINKEO) and Emotion, Electrooculography, and Electrodermal Activity Monitoring in Cold Pressor Conditions Dataset (EMOCOLD) can be accessed on GitHub [36,37].

Authors' Contributions

SAS and JD conceived the project. JD and SAS led the main study, collected the overall data, and contributed to the data analysis. RL and SS contributed to the platform development, characterization, human studies, and data processing. JD and SAS cowrote the paper. All authors provided feedback on the paper.

Conflicts of Interest

None declared.

Multimedia Appendix 1

The bounds of each peak were determined by performing 2 binary searches within the position domain of the signal—one to the left of the peak and one to the right.

DOCX File, 123 KB

Multimedia Appendix 2

The survey items from the Positive and Negative Affect Schedule (PANAS) and the State-Trait Anxiety Inventory (STAI-State) were used to assess participants’ emotional and anxiety responses during the experiment. The PANAS scale consists of 10 items measuring positive affectivity and negative affectivity, each rated on a 1-5 Likert scale, where higher scores indicate stronger affective states. The STAI-State consists of 20 items assessing state anxiety, measured on a 1-4 Likert scale, where responses indicate varying degrees of agreement with statements reflecting anxiety levels. Higher scores in negative affectivity and anxiety-related items indicate greater distress, while higher scores in positive affectivity items indicate greater emotional well-being. The table below details each item, its corresponding scale, and the affectivity or anxiety dimension it evaluates.

DOCX File, 16 KB

Multimedia Appendix 3

Feature names and definitions extracted from windowed segments of electrodermal activity (EDA) signals.

DOCX File, 15 KB

Multimedia Appendix 4

Feature names and definitions extracted from windowed segments of electrooculography (EOG) signals.

DOCX File, 17 KB

Multimedia Appendix 5

A breadth-first search was performed to find the optimal range for distinguishing blinks from noise artifacts using the blink duration feature, which was extracted from electrooculography (EOG) signal peaks using our method. The identified bounds of 0.1227 to 0.3990 seconds align with values reported in the literature.

DOCX File, 42 KB

Steptoe A, Kivimäki M. Stress and cardiovascular disease. Nat Rev Cardiol. Apr 3, 2012;9(6):360-370. [CrossRef] [Medline]
Key substance use and mental health indicators in the United States: results from the 2022 National Survey on Drug Use and Health, center for behavioral health statistics and quality, HHS publication no PEP23-07-01-006, NSDUH series h-58. Substance Abuse and Mental Health Services Administration. 2023. URL: https://www.samhsa.gov/data/sites/default/files/reports/rpt42731/2022-nsduh-nnr.pdf [Accessed 2024-11-03]
APA 2023 stress in America topline data. American Psychological Association; 2023. URL: https://www.apa.org/news/press/releases/stress/2023/november-2023-topline-data.pdf [Accessed 2024-11-12]
Kazanskiy NL, Khonina SN, Butt MA. A review on flexible wearables – recent developments in non-invasive continuous health monitoring. Sensors and Actuators A: Physical. Feb 2024;366:114993. [CrossRef]
Xu C, Song Y, Sempionatto JR, et al. A physicochemical-sensing electronic skin for stress response monitoring. Nat Electron. Feb 2024;7(2):168-179. [CrossRef] [Medline]
Freeman R, Chapleau MW. Chapter 7 - testing the autonomic nervous system. In: Said G, Krarup C, editors. Handbook of Clinical Neurology, Peripheral Nerve Disorders. Vol 115. Elsevier; 2013:115-136. [CrossRef]
Bullock T, MacLean MH, Santander T, et al. Habituation of the stress response multiplex to repeated cold pressor exposure. Front Physiol. 2022;13:752900. [CrossRef] [Medline]
Mitchell LA, MacDonald RAR, Brodie EE. Temperature and the cold pressor test. J Pain. May 2004;5(4):233-237. [CrossRef] [Medline]
Allen AP, Kennedy PJ, Dockray S, Cryan JF, Dinan TG, Clarke G. The trier social stress test: principles and practice. Neurobiol Stress. Feb 2017;6:113-126. [CrossRef] [Medline]
Bentivoglio AR, Bressman SB, Cassetta E, Carretta D, Tonali P, Albanese A. Analysis of blink rate patterns in normal subjects. Mov Disord. Nov 1997;12(6):1028-1034. [CrossRef] [Medline]
Boucsein W. Principles of electrodermal phenomena. In: Electrodermal Activity. Springer US; 2012:1-86. [CrossRef]
Schachter S, Singer JE. Cognitive, social, and physiological determinants of emotional state. Psychol Rev. Sep 1962;69(5):379-399. [CrossRef] [Medline]
Sarchiapone M, Gramaglia C, Iosue M, et al. The association between electrodermal activity (EDA), depression and suicidal behaviour: a systematic review and narrative synthesis. BMC Psychiatry. Jan 25, 2018;18(1):22. [CrossRef] [Medline]
Dawson ME, Schell AM, Filion DL. The electrodermal system. In: Handbook of Psychophysiology. 3rd ed. Cambridge University Press; 2007:159-181. [CrossRef]
Carli V, Hadlaczky G, Petros NG, et al. A naturalistic, European multi-center clinical study of electrodermal reactivity and suicide risk among patients with depression. Front Psychiatry. 2021;12:765128. [CrossRef] [Medline]
Castro Ribeiro T, García Pagès E, Ballester L, et al. Design of a remote multiparametric tool to assess mental well-being and distress in young people (mhealth methods in mental health research project): protocol for an observational study. JMIR Res Protoc. Mar 29, 2024;13:e51298. [CrossRef] [Medline]
Anmella G, Corponi F, Li BM, et al. Exploring digital biomarkers of illness activity in mood episodes: hypotheses generating and model development study. JMIR Mhealth Uhealth. May 4, 2023;11:e45405. [CrossRef] [Medline]
Korda AI, Giannakakis G, Ventouras E, et al. Recognition of blinks activity patterns during stress conditions using CNN and Markovian analysis. Signals. 2021;2(1):55-71. [CrossRef]
Thorslund B. Electrooculogram analysis and development of a system for defining stages of drowsiness [Master’s thesis]. Dept. Biomedical Engineering, Linköping University, LiU-IMT-EX-351; 2003. URL: https://www.diva-portal.org/smash/get/diva2:673960/FULLTEXT01.pdf [Accessed 2025-11-12]
Mannan MMN, Kamran MA, Kang S, Jeong MY. Effect of EOG signal filtering on the removal of ocular artifacts and EEG‐based brain‐computer interface: a comprehensive study. Complexity. Jan 2018;2018(1). [CrossRef]
Zhu X, Zheng WL, Lu BL, Chen X, Chen S, Wang C. EOG-based drowsiness detection using convolutional neural networks. Presented at: 2014 International Joint Conference on Neural Networks (IJCNN); Jul 6-11, 2014:128-134; Beijing, China. [CrossRef]
Ma JX, Shi LC, Lu BL. An EOG-based vigilance estimation method applied for driver fatigue detection. Neurosci Biomed Eng. Jun 1, 2014;2:41-51. [CrossRef]
Harker M. Psychological sweating: a systematic review focused on aetiology and cutaneous response. Skin Pharmacol Physiol. 2013;26(2):92-100. [CrossRef] [Medline]
Sebastião R. Classification of anxiety based on EDA and HR. In: Goleva R, Garcia NDC, Pires IM, editors. IoT Technologies for HealthCare. Springer International Publishing; 2021:112-123. [CrossRef]
Strohmaier AR, Schiepe-Tiska A, Reiss KM. A comparison of self-reports and electrodermal activity as indicators of mathematics state anxiety. Frontline Learn Res. Feb 2020;8(1):16-32. [CrossRef]
Horvers A, Tombeng N, Bosse T, Lazonder AW, Molenaar I. Detecting emotions through electrodermal activity in learning contexts: a systematic review. Sensors (Basel). Nov 26, 2021;21(23):7869. [CrossRef] [Medline]
Gazi AH, Lis P, Mohseni A, et al. Respiratory markers significantly enhance anxiety detection using multimodal physiological sensing. Presented at: 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI); Jul 27-30, 2021:1-4; Athens, Greece. [CrossRef]
Shatte ABR, Hutchinson DM, Teague SJ. Machine learning in mental health: a scoping review of methods and applications. Psychol Med. Jul 2019;49(9):1426-1448. [CrossRef] [Medline]
Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. Mar 2020;17(3):261-272. [CrossRef] [Medline]
Kleifges K, Bigdely-Shamlo N, Kerick SE, Robbins KA. BLINKER: automated extraction of ocular indices from EEG enabling large-scale analysis. Front Neurosci. 2017;11:12. [CrossRef] [Medline]
Bartoshuk LM, Schiffman HR. Sensation and perception: an integrated approach. Am J Psychol. Dec 1977;90(4):718. [CrossRef]
Reda R, Tantawi M, Shedeed H, Tolba MF. Analyzing electrooculography (EOG) for eye movement detection. In: Hassanien E, Azar AT, Gaber T, Bhatnagar R, Tolba MF, editors. The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019). Springer International Publishing; 2020:179-189. [CrossRef]
López A, Ferrero F. Biomedical signal processing and artificial intelligence in EOG signals. In: Qaisar SM, Nisar H, Subasi A, editors. Advances in Non-Invasive Biomedical Signal Sensing and Processing with Machine Learning. 2023:185-206. [CrossRef]
Zheng WL, Lu BL. A multimodal approach to estimating vigilance using EEG and forehead EOG. J Neural Eng. Apr 2017;14(2):026017. [CrossRef] [Medline]
Hassanein A, Mohamed A, Abdullah M. Classifying blinking and winking EOG signals using statistical analysis and LSTM algorithm. J Electr Syst Inf Technol. 2023;10(1):44. [CrossRef]
Stress-biomarkers-public-dataset/blinkeo. GitHub. URL: https://github.com/jadee-dao/stress-biomarkers-public-dataset/tree/main/blinkeo [Accessed 2025-06-04]
Stress-biomarkers-public-dataset/emocold. GitHub. URL: https://github.com/jadee-dao/stress-biomarkers-public-dataset/tree/main/emocold [Accessed 2025-06-04]

‎

BFS: breadth-first search

BLINKEO: Blink Identification Electrooculography Dataset

CPT: cold pressor test

EDA: electrodermal activity

EMOCOLD: Emotion, Electrooculography, and Electrodermal Activity Monitoring in Cold Pressor Conditions Dataset

EOG: electrooculography

PANAS: Positive and Negative Affect Schedule

s-anxiety: state anxiety

SCR: skin conductance response

SHAP: Shapley additive explanations

STAI-State: State-Trait Anxiety Inventory

Edited by Amy Schwartz; submitted 30.11.24; peer-reviewed by Daniela Saderi, Shailee Rasania, Toba Olatoye, Simon Muhindi Savai, Randa Salah Gomaa Mahmoud, Vasco Medeiros, Mitchell Collier; final revised version received 10.05.25; accepted 12.05.25; published 10.07.25.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIRx Med, is properly cited. The complete bibliographic information, a link to the original publication on https://med.jmirx.org/, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

Using Electrooculography and Electrodermal Activity During a Cold Pressor Test to Identify Physiological Biomarkers of State Anxiety: Feature-Based Algorithm Development and Validation Study