Skip to main content

Speech recognition tools can help to decode human speech for greater user experiences

The virtual reality (VR) market has grown enormously in the last two years and will continue to do so in the near future. Valuates Research suggests that by 2027, VR will be a $26.8 billion USD market globally, or an over three times increase from $7.7 billion in 2020.

As the VR market expands, users may also expect improved experiences and more flexibility. Currently, VR experiences are mostly controlled with handheld trackers, or in some cases, remote control-like devices. A handful of headsets feature eye-tracking while other companies are reportedly working on haptic glove-based controls.

One area that calls for more VR research and development (R&D) is voice and speech-based navigation, where VR users can engage in more natural interactions and navigate VR spaces like the Metaverse more intuitively.

What Is Speech Recognition in XR?

Speech recognition is a type of human-computer interaction (HCI) technology that leverages artificial intelligence (AI) algorithms and natural language processing (NLP)  to enable computers to understand languages like English.

Voice recognition is a similar technology is used to identify the speaker by looking for voice identifiers and tonality markers rather than analyzing the syntactical arrangement of the language.

Today, speech recognition technologies are all around us. Virtual assistants like Siri, Cortana, and Alexa use it to understand user commands and execute them, and in contact centers, speech recognition aids in automated transcriptions, sentiment analysis, and much more.

Speech recognition could also play a major role in the context of virtual reality by helping to:

  • Simulate conversations with AI entities or ‘digital humans’ – When the user speaks in a natural language, speech recognition technology can convert it into a machine-readable format for AI-powered virtual 3D beings to process, understand, and interact with.
  • Navigate the VR world without handheld controls – VR recreates the dynamics of the physical world that we navigate using both voice and hand commands, and speech recognition would enable a similar capability in VR. For instance, during game-play, users could still execute commands, even if their hands were occupied.
  • Simplify the hand controller UX – Handheld controls that come with VR headsets present a formidable design challenge, as they must pack in a large number of buttons, wheels, joysticks, and other navigation aids in a limited space. The addition of speech recognition would enable voice commands so that some of these functions could now be executed via speech.

How Does Speech Recognition in VR Work?

Speech recognition in VR works just like it does in any other digital application: when the app is designed, developers must incorporate speech recognition capabilities in application programme interfaces (APIs) or software developer kits (SDKs). Meta, formerly Facebook, offers such an SDK as part of its Presence Platform for Oculus developers.

At the backend, speech recognition works by breaking down human speech into distinct sounds or phonemes, which are then passed through an acoustic modeling algorithm to predict what users have intended to say. Based on the predictions, the sounds are recreated with a built-in language model, which can be understood by a device.

To unleash the real potential of speech recognition in VR, developers can leverage semantic analysis, using models to decode the real, intended meaning of the spoken speech as well as its syntactical construction. This would allow digital humans in VR to understand what a user has said, even if spoken differently.

Speech recognition engines are typically developed by AI companies. For instance, Meta’s voice SDK relies on technologies obtained via its acquisition of voice-to-text, Wit.ai.

How Do You Add Speech Recognition to VR Applications?

The easiest way to utilize VR speech recognition technology is by adopting the relevant APIs and tools provided by the VR company.

Meta has made this a relatively straightforward process by launching the Presence Platform at last year’s Connect 2021, which includes a host of SDKs to power different types of HCI. Apart from speech, this also included a tracked virtual keyboard.

The speech capability is part of Meta’s Voice SDK Experimental scheduled for launch this year, which lets developers design hands-free navigation systems, voice-driven gameplay, and voice-command searches to reduce reliance on Oculus controllers.

The Steam VR store also hosts a pre-built utility called VoiceAttack, a voice-controlled design system for VR games and applications to facilitate in-app commands and conduct voice chat conversations.

The other option is to use speech-to-text AI packages like Google Cloud Speech, Windows Dictation Recognition, or IBM Watson, and then build the VR integration manually.

Fortunately, most VR platform companies simplify the task. For example, developers will find ready-to-use packages for most VR environments such as Unity, Windows, Speech, and Watson for Unity.

What Are the Challenges with Speech Recognition for VR?

There is a reason why speech recognition in VR is not as mainstream as its benefits may merit. The algorithms are highly complex, which means that without a VR platform provider, it would be very difficult for developers to get started.

More importantly, speech recognition is not 100 percent accurate. In ideal conditions, it is around 90 percent accurate, meaning it would fail to understand 10 out of every 100 words. After factoring in individual accents and ambient noise, this decreases to 70 to 80 percent.

As a result, there is a risk that VR users may be unable to explain themselves clearly to an AI digital human inside a VR space on the first attempt. The need to repeat oneself would also impact the UX, and users may want to fall back on more reliable hand controller methods.

As speech recognition algorithms become more sophisticated, the industry is closer to overcoming this challenge. For instance, a company called Kanda is currently developing speech recognition software for VR in Danish, a challenging language to understand and learn, but for the industry as a whole, universalizing speech recognition in VR is truly the next frontier.

Quelle:

https://www.xrtoday.com/virtual-reality/what-is-speech-recognition-technology-in-vr/
Top-Trends 2022: „Bereiten Sie sich auf das Metaverse vor“ExamplesMetaverse

Top-Trends 2022: „Bereiten Sie sich auf das Metaverse vor“

Die Pandemie hat den Menschen wieder in den Mittelpunkt technologischer Entwicklungen gestellt. Dieser Trend wird…
6. Januar 2022
Developing Better Haptics With InterhapticsExamples

Developing Better Haptics With Interhaptics

The developer-focused platform for streamlining haptics launched last week. When you think about “haptics,” particularly…
25. September 2020
Virtual Reality: Titanische Pracht – „Titanic: Honor & Glory“ ausprobiertExamples

Virtual Reality: Titanische Pracht – „Titanic: Honor & Glory“ ausprobiert

Ein kleines Studio schickt sich an, die RMS Titanic in digitaler Form wiederauferstehen zu lassen.…
21. Juli 2017

Leave a Reply