An Introduction to Vitrual Reality, Pyschoacoustics and the demand on Audio
The push for compelling VR experiences is a current trend in the video game industry, and the demand for Audio is crucial for creating a persuasive VR experience. As the key role that audio cues play in our sense of being are presented in an actual, physical space. In the real world, Humans rely on psychoacoustics to localise sounds in three- dimensions.
‘For a long time, game audio has sought to accurately represent real world sound to support the truth or reality of the experience. To do this, games developers need some understanding of the physics of sound in the real world’ Stevens Raybould 2011
Localisation vs Spatialisation
The key components of localisation are direction and distance from the sound source, but many other factors exist such as timing, phase level, Impression of loudness, echo density, spaciousness, depth and size, motion and mobillity. How a sound source’s relation to its sonic environment, physical and psychoacoustical characterisation of the space (ie are the surfaces “dry ,“hard”,“soft” etc) refers to its Spatialisation. Spatialisation (has two possible meanings): a) the shaping of how a sound sources appears in a given space. b) creating the sonic environment in which sound sources reside.
None of these aspects are trivial or one-dimensional; so replication of these sounds in a tree-dimensional space needs high level conceptualisation or controls in order to create convincing impression of ‘space’. Virtual acoustics is a research field that aims to simulate sound propagation by borrowing notions from physics and graphics. Indeed, to gain more realistic simulations that can carry spatial information, we need to take another look at how sound propagates in an enclosed space and how we can simulate it.
As sound emanates from a source and travels through matter in waves. Your ears receive the sound directly (dry) or indirectly (wet) after it has passed through, or bounced off of various materials it comes into contact with. Typically you’d receive both of these types of sound and this gives you important information about the environment you are in. Stevens Raybould 2011
In figure 01 below you can see how a sound changes over time you can calculate the direction of sound reflections based on the position of various walls. If the position of a sound source and a listener is known, ray-based methods allow us to interactively position a sound, both in space and in time. To simulate the distance traveled by a sound, a delay is introduced based on the speed of sound in air.
The above illustration shows just how complex these reflections can get and is only showing paths based on one fixed position from the sound source to the listener. Now Imagine, if this was a game scenario in VR. Due to player autonomy the player is going to want to move around the room, thus making the propagation of the reflections being heard from a different path or perspective depending on what point the player is at in the room. With this in mind, the role of positional and ‘3D’ audio is much bigger in VR. In most 3D games, environmental soundscapes tend to consist of positional 3D sounds that are placed at specific points in the world
As sound designers, we have established how humans place sounds in the world and, more importantly, how we can fool people into thinking that a sound is coming from a particular point in space, we need to examine how we must change our approach to sound design to support spatialization in VR applications. It must also allow for head movement to ensure that sound sources move realistically relative to the player.
Ambisonics is a method of creating three-dimensional audio playback via a matrix of loudspeakers. It works by reproducing or synthesising a sound field in the way it would be experienced by a listener, with multiple sounds travelling in different directions. The basic approach of Ambisonics is to treat an audio scene as a full 360-degree sphere of sound coming from different directions around a centre point. The centre point is where the microphone is placed while recording, or where the listener’s ‘sweet spot’ is located while playing back.
The most popular Ambisonics format today, widely used in VR and 360 video, is a 4-channel format called Ambisonics B-format, which uses as few as four channels to reproduce a complete sphere of sound.
In order to replicate sound in 3D, a direction-selection filter can be encoded as a head-related transfer function (HRTF). The HRTF is the cornerstone for most modern 3D sound spatialization techniques. HRTFs by themselves may not be enough to localize a sound precisely, so we often rely on head motion to assist with localization. Simply turning our heads changes difficult front/back ambiguity problems into lateral localization problems that we are better equipped to solve.
Images courtesy of Oculus.com
HRTFs help us determine the direction to a sound source, but they give relatively sparse cues for determining the distance to a sound. We use a combination of the following factors to determine distance:
- Initial time delay
- Ratio of direct sound to reverberant sound
- Motion parallax
- High-frequency attenuation
One of the Key functions and effects of sound in games is to immerse us in the virtual world through sense of sonic envelopment. The term envelopment in relation to spatial sound has been used to describe sometimes overlapping and contradictory sounds. Collins, 2013, defines the term envelopment as the sensation of being inside a physical space (enveloped by that sound) but most commonly this feeling is accomplished through the use of the subwoofer and bass frequencies, which create a physical, tangible presence for sound in a space.
The problem in 3D sound is occlusion – when an object or surface comes between the listener and the sound source.
In an interview about the future of game sound, Ben Minto, Senior audio director/sound designer at EA DICE has also discussed the issues of physics and sound replication in VR games. Minto questions if we want to be more “correct” in our replication of sound or more decodable (if a conflict exists)? Do we always want real world behavior? Does real always sound right?
For example: ‘Working in a built-up city I’m still surprised by how often physics gets it “wrong” when a helicopter flies overhead or an ambulance approaches from a distance. All the “conflicting” reflections from the buildings make it really hard for my brain to pinpoint where the sound is coming from, its path and also its direction of travel. Is this something we want to replicate in our games or do we want to bend the rules to make the scenarios more readable? ” (Minto 2017)
It is without doubt that as the virtual worlds of VR are expanding, the audio and appropriate recreation of sound propagation in video games is in high demand. Despite being often overlooked in the face of more attention-grabbing visuals, audio is an essential component to creating presence in VR. In a quest to create increasingly lifelike audio in VR environments, companies such as Oculus has pushed out systems to its Audio SDK recently that provides developers with the ability to create more realistic audio by generating real-time reverb and occlusion based on the app’s geometry. Now in beta, the so called ‘Audio Propagation’ tool comes in the Oculus Audio SDK 1.34 update which produces more accurate audio propagation with “minimal set up,” the company stated in a recent blog post.
Using HRTF’s is an obvious trend but only the disadvantage of this is that not everybody has the same shaped head and ears, meaning that one set of HRTFs will not suffice for an accurate sound reproduction for all players.
Benoit Alery, 2017 Auralizing soundscapes through Virtual Acoustics Audiokinetic Blog https://blog.audiokinetic.com/auralizing-soundscapes-through-virtual-acoustics/
Collins, Karen, 2013: Playing With Sound, A Theory of Interacting with Sound and Music in Video Games, The MIT Press, England
Hartung, Klaus 1999: Comparisons of Different Methods For the Interpolation of Head-Related Transfer Functions AES 16th International Conference, Finland.
Minto, Ben, 2017:The Future of Game Audio, Interview Asoundeffect Blog https://www.asoundeffect.com/game-audio-future-ben-minto/
Oculus Developer Resources, Accessed 9/4/19 https://developer.oculus.com/documentation/audiosdk/latest/concepts/audio-intro-localization/#audio-intro-localization
Oculus Developer Blog update February 2019 https://developer.oculus.com/blog/february-monthly-tech-updates/
Stevens Raybould 2011: The Game Audio Tutorial Focal Press, England