BioHCI – Physiological Signal Processing

BioHCI is a comprehensive Python framework to streamline the pre-processing and facilitate the exploration of physiological time-series signal data collected during multiple user studies. It provides flexibility in experimenting with different machine learning and neural network-based approaches for model cross-validation and evaluation using PyTorch and scikit-learn.
This framework assists researchers in the data loading, pre-processing, splitting, and balancing phases, and facilitates quickly exploring deep learning models, with logging, and result visualization. BioHCI allows quick parameter adjustments, comparison of results from multiple learning algorithms, and provides a consistent way of representing data.
Enabling easy re-labeling of the same dataset dramatically accelerates the process of running different experiments with a set of sensor measurements. BioHCI also properly handles unbalanced datasets. This framework is a first step toward a more generalizable solution to physiological signal analysis and interpretation.
Challenges of Classifying Physiological User Data
- Data is prone to noise: Physiological signals are susceptible to noise, especially when the data is collected in real-world settings. Moreover, variability among subjects also affects them, making the representation which we aim to extract harder to distinguish. These factors often lead to potential misrepresentation of results, such as classification of a confounding variable rather than of the signal itself. Physiological data is not straightforward to process, and often requires designing specialized data processing, analysis, and storage tools. BioHCI allows for multiple computational components in the data pre-processing phase (such as signal pre-processing and feature construction) that might be necessary before the data is used as an input into a machine learning algorithm.
- Signals cannot be directly used with existing machine learning frameworks: For different sensors to intuitively work in the real world, the signal interpretation should be user-independent, not relying on individual calibration. Moreover, data from the same user needs to typically be kept in the same set, either training, validation, or evaluation.
- Small datasets: Unlike image or text data, which is widely available online, user physiological data is typically difficult and time-consuming to obtain, often requiring specialized lab equipment and setting. This data can be collected to investigate the subject's response to different phenomena, or to evaluate novel devices which are designed to enhance user experience. These constraints usually result in relatively small datasets, which could pose a problem for several machine learning algorithms, especially neural networks. While BioHCI does not directly address this problem, it facilitates exploration of classification results using different learning algorithms, some of which can be more appropriate for the size of the dataset.

Building Interactive Systems
The BioHCI system can be used to train and evaluate machine learning models for signal interpretation for recognizing different states. After models are trained with high accuracy, they can be deployed to lightweight hardware to be used in real-time systems. Two example applications of such systems are adaptive systems based on brain sensors and recognizing gestures on knitted sensors.
Using brain sensors as implicit inputs would enable a system to continuously collect brain data and make decisions while accounting for users’ cognitive states. The first figure shows the building blocks of an adaptive system incorporating such sensors, in addition to a trained machine learning model capable of recognizing different cognitive states. BioHCI can be used to train the model.
The second figure illustrates processes and component interactions that describe the model creation and the working of an interactive gesture recognition system using knitted sensors as an explicit input. Data collection, training and evaluation happen off-line and typically require more time and computing power. Once a model is trained to high accuracy using BioHCI, it can be deployed on lightweight hardware to recognize gestures in real time, supporting different interactive applications.