International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |


TITLE Agentic AI Desktop Voice Assistant: An Offline and Online Multi-Step Task Automation Framework Using ReAct and On-Device LLMs
ABSTRACT The proliferation of personal computing devices has created demand for intelligent, hands-free interfaces enabling natural spoken-language interaction. This paper presents an Agentic AI Desktop Voice Assistant — a fully offline, multi-agent pipeline bridging conversational AI with desktop automation. The system integrates Porcupine wake-word detection, OpenAI Whisper ASR, a locally hosted LLaMA-3 LLM via Ollama, Coqui TTS synthesis, and a modular tool-execution layer exposing file management, calendar, web search, email, and shell APIs. Multi-step behaviour is governed by a ReAct (Reasoning + Acting) loop with a ChromaDB-backed long-term memory module. Evaluated on a 150-task benchmark, the system achieves a Task Completion Rate of 91.3%, end-to-end latency of 1.8 s, and WER of 4.2% — running entirely on-device with full data privacy.
AUTHOR ALEX JOSHWA X, ANTONY JESU AJAY J, ASWATH R R, KAMALRAJ T, S. N. SARANYA Department of Artificial Intelligence and Data Science, Christ The King Engineering College, Coimbatore, Tamil Nadu, India
VOLUME 184
DOI DOI: 10.15680/IJIRCCE.2026.1405042
PDF pdf/42_Agentic AI Desktop Voice Assistant An Offline and Online Multi-Step Task Automation Framework Using ReAct and On-Device LLMs.pdf
KEYWORDS
References [1] S. Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models," ICLR 2023. arXiv:2210.03629.
[2] A. Radford et al. (OpenAI), "Robust Speech Recognition via Large-Scale Weak Supervision," arXiv:2212.04356, 2022.
[3] T. Schick et al. (Meta AI), "Toolformer: Language Models Can Teach Themselves to Use Tools," NeurIPS 2023. arXiv:2302.04761.
[4] C. Packer et al. (UC Berkeley), "MemGPT: Towards LLMs as Operating Systems," arXiv:2310.08560, 2023.
[5] Picovoice Inc., "Porcupine Wake Word Engine," github.com/Picovoice/porcupine, 2022.
[6] E. Gölge et al., "Coqui TTS: A Deep Learning Toolkit for Text-to-Speech," github.com/coqui-ai/TTS, 2021.
[7] Meta AI, "LLaMA 3 Model Card," ai.meta.com/blog/meta-llama-3/, 2024.
[8] Chroma, "ChromaDB: The AI-Native Open-Source Embedding Database," trychroma.com, 2023.
[9] J. Wei et al., "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," NeurIPS 2022.
[10] T. B. Brown et al., "Language Models are Few-Shot Learners (GPT-3)," NeurIPS 2020.
[11] SYSTRAN, "Faster-Whisper: CTranslate2-based Whisper Implementation," github.com/SYSTRAN/faster-whisper, 2023.
[12] J. Kaplan et al., "Scaling Laws for Neural Language Models," arXiv:2001.08361, 2020.
image
Copyright © IJIRCCE 2020.All right reserved