AI-Based Mental Health Monitoring System Using Multimodal Input

International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |

TITLE	AI-Based Mental Health Monitoring System Using Multimodal Input
ABSTRACT	Mental health conditions such as stress, anxiety, and emotional imbalance are commonly experienced by students and working people, but they are frequently not monitored continuously. In most cases, emotional assessment depends on personal self-reporting or occasional interaction with experts, which might not capture daily emotional variations. In this work, a multimodal mental health monitoring system is developed to analyze emotional state using facial expressions, written text, and spoken audio. Facial emotions are identified using a ResNet18-based convolutional neural network, while sentiment from text and speech is analyzed using a transformer-based ROBERTa model after speech-to-text conversion. The outputs from the three modalities are integrated with a weighted fusion approach to obtain a final emotional interpretation. The system is implemented as a web-based application using Streamlit and supports real-time webcam input, text entry, and audio recording. Testing conducted under different input conditions showed that combining multiple modalities produced more reliable emotional results compared to using a single input source. The system is designed for emotional awareness and self-monitoring purposes and is not intended for medical diagnosis.
AUTHOR	CHAITHRA B M, LAVANYA P C, KEERTHANA J L, LAXMI NAVI, ANANYA H M Assistant Professor, Dept. of CSE, Jain Institute of Technology, Davangere, Karnataka, India UG Students, Dept. of CSE, Jain Institute of Technology, Davangere, Karnataka, India
VOLUME	177
DOI	DOI: 10.15680/IJIRCCE.2025.1312104
PDF	pdf/104_AI-Based Mental Health Monitoring System Using Multimodal Input.pdf
KEYWORDS
References	[1] Y. Wu, Q. Mi, and T. Gao, “A comprehensive review of multimodal emotion recognition: Techniques, challenges, and future directions,” Biomimetics, vol. 10, no. 7, pp. 1–25, 2025. [2] W. Mostert, “Multi-modal emotion detection and tracking using facial, speech, and physiological signals,” Computers, vol. 14, no. 10, pp. 1–18, 2025. [3] M. Sadeghi, R. Lotfian, and A. Pouretemad, “Multimodal depression detection using facial, audio, and textual cues,” Nature Mental Health, vol. 2, no. 1, pp. 45–58, 2024. [4] L. S. Khoo, M. A. Rahman, and J. Kim, “Machine learning approaches for multimodal mental health detection,” Sensors, vol. 24, no. 2, pp. 1–22, 2024. [5] A. Xiang, Y. Chen, and Z. Li, “A transformer-based multimodal fusion network for emotion recognition,” arXiv preprint arXiv:2403.08511, 2024. [6] A. Li, H. Zhang, and Y. Wang, “M2SE: A multistage multitask framework for unified sentiment and emotion analysis,” arXiv preprint arXiv:2412.08049, 2024. [7] S. Poria, E. Cambria, and A. Hussain, “Multimodal sentiment analysis: Recent advances and challenges,” IEEE Intelligent Systems, vol. 38, no. 2, pp. 78–86, 2023. [8] M. Schmitt et al., “Multimodal affect recognition in the wild: Current trends and benchmarks,” IEEE Transactions on Affective Computing, vol. 15, no. 1, pp. 1–14, 2024. [9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for visual recognition,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016. [10] S. Li and W. Deng, “Deep facial expression recognition: A survey,” IEEE Transactions on Affective Computing, vol. 13, no. 3, pp. 1195–1215, 2022. [11] T. Schuller et al., “Speech emotion recognition: State of the art and future challenges,” IEEE Signal Processing Magazine, vol. 40, no. 5, pp. 128–139, 2023. [12] X. Qu and J. Liu, “Whisper-driven emotion recognition for AI mental health support,” in Proc. IEEE Conf. on Artificial Intelligence (CAI), pp. 290–297, 2024. [13] R. Baltrušaitis, C. Ahuja, and L. Morency, “Multimodal machine learning: A survey and taxonomy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 423–443, 2019. [14] N. Cummins et al., “A review of depression recognition from speech signals,” Computer Speech & Language, vol. 35, pp. 1–23, 2023. [15] J. Picard, “Affective computing for mental health applications,” IEEE Computer, vol. 56, no. 3, pp. 64–72, 2023. [16] E. Cambria, D. Olsher, and D. Rajagopal, “Sentiment analysis in affective computing,” IEEE Intelligent Systems, vol. 38, no. 1, pp. 56–63, 2023. [17] M. Valstar et al., “FER datasets and evaluation methodologies for affective computing,” IEEE Transactions on Affective Computing, vol. 14, no. 4, pp. 1120–1135, 2023. [18] A. Calvo and S. D’Mello, “Affect-aware systems for mental health and well-being,” IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 355–368, 2023. [19] World Health Organization, “Global mental health and well-being report,” WHO Press, Geneva, Switzerland, 2024. [20] M. Abadi et al., “TensorFlow: A system for large-scale machine learning,” USENIX Symposium on Operating Systems Design and Implementation, pp. 265–283, 2016 [21] H. Zhang, Y. Liu, and J. Wang, “Multimodal deep learning for emotion recognition in mental health monitoring systems,” IEEE Access, vol. 11, pp. 118450–118462, 2023. [22] R. Lotfian and C. Busso, “Robust emotion recognition using audio, visual, and textual cues in real-world environments,” IEEE Transactions on Affective Computing, vol. 15, no. 2, pp. 412–425, 2024. [23] S. K. Singh and P. Gupta, “AI-driven mental health assessment using multimodal emotion analysis,” Journal of Medical Systems, vol. 48, no. 1, pp. 1–14, 2024. [24] J. Huang, X. Zhao, and L. Chen, “Transformer-based multimodal fusion for emotion recognition,” Pattern Recognition, vol. 147, pp. 109932, 2024. [25] A. R. Chowdhury, M. H. Rahman, and S. Poria, “Context-aware multimodal sentiment and emotion analysis,” IEEE Intelligent Systems, vol. 39, no. 1, pp. 67–75, 2024. [26] P. Tzirakis et al., “End-to-end multimodal emotion recognition with attention mechanisms,” IEEE Signal Processing Letters, vol. 31, pp. 156–160, 2024. [27] N. Sharma and R. K. Agrawal, “Emotion-aware human–computer interaction using multimodal deep learning,” ACM Transactions on Interactive Intelligent Systems, vol. 14, no. 2, pp. 1–23, 2024. [28] M. R. Siddiqui, A. Khan, and S. Ahmad, “Speech-to-text based emotion recognition for mental well-being applications,” Applied Artificial Intelligence, vol. 38, no. 3, pp. 245–260, 2024. [29] Y. Kim and E. Cambria, “Advances in affective computing for mental health monitoring,” IEEE Computational Intelligence Magazine, vol. 19, no. 2, pp. 48–60, 2024. [30] S. Verma, P. Natarajan, and D. Roy, “Multimodal affect analysis using deep neural fusion networks,” Neurocomputing, vol. 552, pp. 127–140, 2024.

About Us

The primary objective of IJIRCCE is to serve as an international scholarly platform that enables researchers, innovators, students, and research scholars to disseminate their research findings and technological advancements to a global academic audience.

About Us

GET IN TOUCH

Useful Links

ARTICLES

About Us

GET IN TOUCH

Useful Links