Development of a Toolkit for Voice Interface Creation in Aerospace VR and AR Simulators
Аuthors
Kazan Federal University, 18, Kremlyovskaya St., 420008, Russia
e-mail: ksenya_vasilieva@mail.ru
Abstract
This paper presents the architecture and implementation of a software toolkit for the Unity environment, designed for creating voice interfaces and integrating them into virtual reality (VR) and augmented reality (AR) applications targeted at the aerospace industry. It is shown that existing voice interaction modules used in VR and AR applications are not universal, are strictly tied to specific use cases, and do not provide sufficient customization flexibility. A comparative analysis of existing solutions revealed that key shortcomings include the lack of user interfaces for linking voice commands to application logic, the inability to choose between local and cloud-based Automatic Speech Recognition (ASR) services, network connectivity dependence, and a lack of robustness to variability in command wording. This paper also experimentally measured the time delay between the onset of a phrase and the receipt of the text result for the iFlyTek, Vosk, and Whisper ASR services using a set of phrases of three complexity levels. It was found that the iFlyTek service has the lowest speech recognition latency among cloud-based solutions (2934 ms), while the Vosk library demonstrated the best result among local implementations (3589 ms). Whisper demonstrated higher values: 4732 ms for the tiny model, 6314 ms for the base model, and 4913 ms for the small model. The experiment was conducted on a standalone Pico 4 VR headset with 8 GB of RAM running the Android 10 operating system. This paper describes the architecture and functional diagram of the developed toolkit, which supports local and cloud-based speech recognition, provides tools for configuring voice commands and ASR parameters, and a mechanism for automatically generating various command formulations based on the GLM-4.7-Flash model. The proposed solution reduces the labor costs of developing voice interfaces and improves the system's resilience to variability in the formulation of user voice commands by automatically generating semantically similar phrases. A prototype voice interface was implemented for interacting with a virtual instrument panel in a Unity scene. Its functionality confirms the applicability of the developed toolset for creating VR and AR systems for aerospace applications.
Keywords:
voice interface; automatic speech recognition; Unity; VR; AR; Vosk; iFlyTek; GLM-4.7-FlashReferences
- M. E. McCullins, S. Hampton, S. G. Fussell, K. Kiernan, and J. Thropp, “The effectiveness of using virtual reality training environments for procedural training in fourth-generation airliners,” The Aeronautical Journal, vol. 129, no. 1342, pp. 3327–3346, 2025. doi:10.1017/aer.2025.10086
- Kabanov A.A., Amosov M.V. VR/AR v izuchenii, sozdanii i ekspluatatsii aerokosmicheskoi tekhniki: iz makromira v mikromir, ot nablyudeniya k deistviyam // Trudy MAI. 2023. № 128. DOI: 10.34759/trd-2023-128-21
- Polyakov A.A., Zashchirinskii S.A. Ispol'zovanie virtual'nogo prostranstva dlya provedeniya maketno-konstruktorskikh ispytanii po elektronnomu maketu kosmicheskogo apparata // Trudy MAI. 2019. № 107. URL: https://trudymai.ru/published.php?ID=107877
- A. Siyaev and G.-S. Jo, “Neuro-Symbolic Speech Understanding in Aircraft Maintenance Metaverse,” IEEE Access, vol. 9, pp. 154484–154499, 2021, doi: https://doi.org/10.1109/access.2021.3128616.
- V. Krishnamurthy, B Jafrin Rosary, G Oliver Joel, S. Balasubramanian, and S. Kumari, “Voice command-integrated AR-based E-commerce Application for Automobiles,” May 2023, doi: https://doi.org/10.1109/iconscept57958.2023.10170152.
- Platforma Unity dlya razrabotki v real'nom vremeni | Dvizhok dlya 3D, 2D, VR i AR // Unity URL: https://unity.com/ru (accessed: 30.12.2025).
- Game Voice Control [Offline speech recognition] | Audio | Unity Asset Store // Unity Asset Store URL: https://assetstore.unity.com/packages/tools/audio/game-voice-control-offline-speech-recognition-1780... (accessed: 30.12.2025).
- cmusphinx/pocketsphinx: A small speech recognizer // GitHub URL: https://github.com/cmusphinx/pocketsphinx (accessed: 30.12.2025).
- Meta - Voice SDK - Immersive Voice Commands | Integration | Unity Asset Store // Unity Asset Store URL: https://assetstore.unity.com/packages/tools/integration/meta-voice-sdk-immersive-voice-commands-2645... (accessed: 30.12.2025).
- yasirkula/UnitySpeechToText: A native Unity plugin to convert speech to text on Android & iOS // GitHub URL: https://github.com/yasirkula/UnitySpeechToText (accessed: 30.12.2025).
- Speech Control Plugin for VR | Audio | Unity Asset Store // Unity Asset Store URL: https://assetstore.unity.com/packages/tools/audio/speech-control-plugin-for-vr-76855 (accessed: 30.12.2025).
- Speech-to-Text: AI voice typing & transcription // Google Cloud URL: https://cloud.google.com/speech-to-text (accessed: 30.12.2025).
- N. Ashtari and P. K. Chilana, “How New Developers Approach Augmented Reality Development Using Simplified Creation Tools: An Observational Study,” Multimodal Technologies and Interaction, vol. 8, no. 4, p. 35, Apr. 2024, doi: https://doi.org/10.3390/mti8040035.
- zai-org/GLM-4.7-Flash // Hugging Face URL: https://huggingface.co/zai-org/GLM-4.7-Flash (accessed: 30.12.2025).
- Hugging Face – The AI community building the future. URL: https://huggingface.co/ (accessed: 30.12.2025).
- XR Interaction Toolkit // Unity Documentation URL: https://docs.unity3d.com/Packages/com.unity.xr.interaction.toolkit%403.0/manual/index.html (accessed: 30.12.2025).
- PICO Unity Integration SDK // PICO Developer URL: https://developer.picoxr.com/document/unity/ (accessed: 30.12.2025).
- Short Form ASR WebAPI Document (Automatic Speech Recognition) // iFLYTEK Open Platform Documents URL: https://global.xfyun.cn/doc/asr/voicedictation/API.html (accessed: 30.12.2025).
- Macoron/whisper.unity: Running speech to text model (whisper.cpp) in Unity3d on your local machine. // GitHub URL: https://github.com/Macoron/whisper.unity (accessed: 30.12.2025).
- alphacep/vosk-unity-asr: Automatic Speech Recognition in Unity using Vosk library // GitHub URL: https://github.com/alphacep/vosk-unity-asr (accessed: 30.12.2025).
- JakeBayer/FuzzySharp: C# .NET fuzzy string matching implementation of Seat Geek's well known python FuzzyWuzzy algorithm. // GitHub URL: https://github.com/JakeBayer/FuzzySharp (accessed: 30.12.2025).
Download

