Agentic AI Desktop Voice Assistant: An Offline and Online Multi-Step Task Automation Framework Using ReAct and On-Device LLMs

International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |

TITLE	Agentic AI Desktop Voice Assistant: An Offline and Online Multi-Step Task Automation Framework Using ReAct and On-Device LLMs
ABSTRACT	The proliferation of personal computing devices has created demand for intelligent, hands-free interfaces enabling natural spoken-language interaction. This paper presents an Agentic AI Desktop Voice Assistant — a fully offline, multi-agent pipeline bridging conversational AI with desktop automation. The system integrates Porcupine wake-word detection, OpenAI Whisper ASR, a locally hosted LLaMA-3 LLM via Ollama, Coqui TTS synthesis, and a modular tool-execution layer exposing file management, calendar, web search, email, and shell APIs. Multi-step behaviour is governed by a ReAct (Reasoning + Acting) loop with a ChromaDB-backed long-term memory module. Evaluated on a 150-task benchmark, the system achieves a Task Completion Rate of 91.3%, end-to-end latency of 1.8 s, and WER of 4.2% — running entirely on-device with full data privacy.
AUTHOR	ALEX JOSHWA X, ANTONY JESU AJAY J, ASWATH R R, KAMALRAJ T, S. N. SARANYA Department of Artificial Intelligence and Data Science, Christ The King Engineering College, Coimbatore, Tamil Nadu, India
VOLUME	184
DOI	DOI: 10.15680/IJIRCCE.2026.1405042
PDF	pdf/42_Agentic AI Desktop Voice Assistant An Offline and Online Multi-Step Task Automation Framework Using ReAct and On-Device LLMs.pdf
KEYWORDS
References	[1] S. Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models," ICLR 2023. arXiv:2210.03629. [2] A. Radford et al. (OpenAI), "Robust Speech Recognition via Large-Scale Weak Supervision," arXiv:2212.04356, 2022. [3] T. Schick et al. (Meta AI), "Toolformer: Language Models Can Teach Themselves to Use Tools," NeurIPS 2023. arXiv:2302.04761. [4] C. Packer et al. (UC Berkeley), "MemGPT: Towards LLMs as Operating Systems," arXiv:2310.08560, 2023. [5] Picovoice Inc., "Porcupine Wake Word Engine," github.com/Picovoice/porcupine, 2022. [6] E. Gölge et al., "Coqui TTS: A Deep Learning Toolkit for Text-to-Speech," github.com/coqui-ai/TTS, 2021. [7] Meta AI, "LLaMA 3 Model Card," ai.meta.com/blog/meta-llama-3/, 2024. [8] Chroma, "ChromaDB: The AI-Native Open-Source Embedding Database," trychroma.com, 2023. [9] J. Wei et al., "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," NeurIPS 2022. [10] T. B. Brown et al., "Language Models are Few-Shot Learners (GPT-3)," NeurIPS 2020. [11] SYSTRAN, "Faster-Whisper: CTranslate2-based Whisper Implementation," github.com/SYSTRAN/faster-whisper, 2023. [12] J. Kaplan et al., "Scaling Laws for Neural Language Models," arXiv:2001.08361, 2020.

About Us

The primary objective of IJIRCCE is to serve as an international scholarly platform that enables researchers, innovators, students, and research scholars to disseminate their research findings and technological advancements to a global academic audience.

About Us

GET IN TOUCH

Useful Links

ARTICLES

About Us

GET IN TOUCH

Useful Links