International Journal of Innovative Research in Computer and Communication Engineering
ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines
| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |
| TITLE | Agentic AI Desktop Voice Assistant: An Offline and Online Multi-Step Task Automation Framework Using ReAct and On-Device LLMs |
|---|---|
| ABSTRACT | The proliferation of personal computing devices has created demand for intelligent, hands-free interfaces enabling natural spoken-language interaction. This paper presents an Agentic AI Desktop Voice Assistant — a fully offline, multi-agent pipeline bridging conversational AI with desktop automation. The system integrates Porcupine wake-word detection, OpenAI Whisper ASR, a locally hosted LLaMA-3 LLM via Ollama, Coqui TTS synthesis, and a modular tool-execution layer exposing file management, calendar, web search, email, and shell APIs. Multi-step behaviour is governed by a ReAct (Reasoning + Acting) loop with a ChromaDB-backed long-term memory module. Evaluated on a 150-task benchmark, the system achieves a Task Completion Rate of 91.3%, end-to-end latency of 1.8 s, and WER of 4.2% — running entirely on-device with full data privacy. |
| AUTHOR | ALEX JOSHWA X, ANTONY JESU AJAY J, ASWATH R R, KAMALRAJ T, S. N. SARANYA Department of Artificial Intelligence and Data Science, Christ The King Engineering College, Coimbatore, Tamil Nadu, India |
| VOLUME | 184 |
| DOI | DOI: 10.15680/IJIRCCE.2026.1405042 |
| pdf/42_Agentic AI Desktop Voice Assistant An Offline and Online Multi-Step Task Automation Framework Using ReAct and On-Device LLMs.pdf | |
| KEYWORDS | |
| References | [1] S. Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models," ICLR 2023. arXiv:2210.03629. [2] A. Radford et al. (OpenAI), "Robust Speech Recognition via Large-Scale Weak Supervision," arXiv:2212.04356, 2022. [3] T. Schick et al. (Meta AI), "Toolformer: Language Models Can Teach Themselves to Use Tools," NeurIPS 2023. arXiv:2302.04761. [4] C. Packer et al. (UC Berkeley), "MemGPT: Towards LLMs as Operating Systems," arXiv:2310.08560, 2023. [5] Picovoice Inc., "Porcupine Wake Word Engine," github.com/Picovoice/porcupine, 2022. [6] E. Gölge et al., "Coqui TTS: A Deep Learning Toolkit for Text-to-Speech," github.com/coqui-ai/TTS, 2021. [7] Meta AI, "LLaMA 3 Model Card," ai.meta.com/blog/meta-llama-3/, 2024. [8] Chroma, "ChromaDB: The AI-Native Open-Source Embedding Database," trychroma.com, 2023. [9] J. Wei et al., "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," NeurIPS 2022. [10] T. B. Brown et al., "Language Models are Few-Shot Learners (GPT-3)," NeurIPS 2020. [11] SYSTRAN, "Faster-Whisper: CTranslate2-based Whisper Implementation," github.com/SYSTRAN/faster-whisper, 2023. [12] J. Kaplan et al., "Scaling Laws for Neural Language Models," arXiv:2001.08361, 2020. |