International Journal of Innovative Research in Computer and Communication Engineering
ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines
| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |
| TITLE | A Comprehensive Survey of Webcam-Based Real-Time Human State Monitoring: Methods, Datasets, and Intelligent Feedback Systems |
|---|---|
| ABSTRACT | The rapid growth of remote learning, hybrid workplaces, and screen-centric lifestyles has increased the need for non-intrusive monitoring of user well-being during prolonged device usage. This paper presents a comprehensive survey of webcam-based human state monitoring systems that infer affective and physical cues in real time. We focus on four complementary dimensions: (i) Facial Emotion Recognition (FER), (ii) posture monitoring, (iii) fatigue detection, and (iv) attentiveness/proximity analysis. Recent progress in deep learning and computer vision has enabled these capabilities using only a standard RGB webcam, leveraging CNN/ResNet architectures for FER, pose estimation frameworks for skeletal keypoints, and geometric landmark ratios such as Eye Aspect Ratio (EAR) and Mouth Aspect Ratio (MAR) for fatigue cues [8, 3, 1]. Unlike single-purpose solutions (e.g., only emotion or only drowsiness detection), an integrated monitoring pipeline can provide context-aware feedback that is more actionable for users in e-learning, work-from-home, and attention-critical settings. We consolidate core methodologies, datasets (FER2013, FER+, landmark and posture resources), evaluation metrics (accuracy, precision, recall, F1-score, latency/FPS), and deployment considerations (CPU feasibility, illumination robustness, privacy). We then outline a unified system architecture that fuses module outputs through a decision layer: St = f(eˆt, pˆt, EARt, MARt, dˆt), where eˆt is emotion, pˆt posture, EARt/MARt fatigue indicators, and dˆt proximity/attention state. Finally, we discuss challenges and research directions in fairness, ethics, on-device deployment, and multimodal fusion for HCI-oriented intelligent feedback. |
| AUTHOR | ATHARVA S. VYAS, PROF. DR. DEIPALI V. GORE M.Tech Student, Department of Computer Engineering, P. E. S.’s Modern College of Engineering, Pune, India Department of Computer Engineering, P. E. S.’s Modern College of Engineering, Pune, India |
| VOLUME | 183 |
| DOI | DOI: 10.15680/IJIRCCE.2026.1404008 |
| pdf/8_A Comprehensive Survey of Webcam-Based Real-Time Human State Monitoring Methods, Datasets, and Intelligent Feedback Systems.pdf | |
| KEYWORDS | |
| References | [1] I. Goodfellow, D. Erhan, P. Luc Carrier, A. Courville, and Y. Bengio, “Challenges in Representation Learn-ing: A Report on Three Machine Learning Contests,” NeurIPS Workshop, 2013. (FER2013) [2] E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, “Training Deep Networks for Facial Expression Recog-nition with Crowd-Sourced Label Distribution,” ACM ICMI, 2016. (FER+) [3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE CVPR, 2016. [4] A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applica-tions,” arXiv:1704.04861, 2017. [5] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields,” IEEE CVPR, 2017. (OpenPose) [6] C. Lugaresi et al., “MediaPipe: A Framework for Building Perception Pipelines,” arXiv:1906.08172, 2019. [7] D. E. King, “Dlib-ml: A Machine Learning Toolkit,” Journal of Machine Learning Research, 10, pp. 1755–1758, 2009. [8] T. Soukupova´ and J. Cˇ ech, “Real-Time Eye Blink Detection Using Facial Landmarks,” Computer Vision Winter Workshop (CVWW), 2016. (EAR) |