Many of us have seen extended reality devices used in video games, but as time passes and more enterprise applications emerge, agencies like Allied Market Research forecast that the extended reality (XR) market — including augmented reality (AR), virtual reality (VR) and other types of mixed reality — will grow to over $571 billion in 2025. For this to happen, user experience (UX) must become a focus.
The tricky part here is that spatial UX in virtual reality experiences is nothing like the 2D UX we are accustomed to in our mobile devices and PCs. Here’s what makes spatial UX different from the common on-screen user interface (UI):
- Involves movement from at least half of the body
- Human factors such as fatigue come into play
- Must consider interaction techniques
- Must consider tradeoffs between speed and precision as tracking technologies evolve
The Evolution of Extended Reality Technologies
As this set of technologies has shown growth in recent years, we have seen an evolution in the devices that enable virtual experiences. Back in 2015, the ODG R6 and R7 were only capable of 3DoF (degrees of freedom) experiences or the visual tracking of images and QR codes.
Later in 2016, we started seeing 6DoF devices like the Hololens out in the market, but there was still limited support for hand gestures and no support for controllers except for a small clicker. Then the Hololens 2 and the Oculus Quest 2 brought really robust support for hand tracking and controllers.
Beyond the extended reality headset space, we have seen the evolution of devices and software. Apple’s ARKit and the LIDAR included in its devices have allowed for a better understanding of the space, which has resulted in a better augmented reality experience.
What Makes a Good Immersive Experience? Context is King
So, considering what we currently have available with immersive technology, what makes a good UX in an immersive experience such as AR or VR? The only rule that applies to every technology in this space is context is king. Apart from that, the answer varies.
To better answer this question, we need to consider the basics of Human-Computer Interaction, as every device has a different form factor and array of sensors available. We can use the McMahan user-system loop to describe the way the user interacts with the experience, as well as Norman’s action cycle, introduced in his book, The Design of Everyday Things.
In the user-system loop, the focus is on the inputs and outputs of the system and how the user perceives and interacts with them, while in the action cycle, the focus is on the user perspective of the accomplishment of the task at hand. With these frameworks in mind, you need to focus on two things when designing an immersive experience:
- The goal your user needs to achieve
- The device your user will be using or wearing and the environment in which the user will be using your experience
This is what we call the context. For example, using hand tracking on a mobile device is a bad fit, as you would need one hand to hold the device and one to use the gestures. To make matters worse, the length of the hands would mean the device needs to be very close to the user’s face in order to put the free hand in front of the device camera. Another bad fit would be to use a VR headset to tell the user to navigate a large area without a passthrough view to the real world as it would be hazardous for the user.
Considering the Basic Interactions of an Immersive Experience
To get started, we need to describe the basic interactions in an immersive experience. We can divide these interactions at their lowest level in canonical, navigation, and system control interactions:
- Canonical interactions are related to how we interact with 3D objects in the virtual world — how we select, manipulate and release them
- Navigation interactions are related to the direction where the user wants to move, the velocity and acceleration of the user’s movement, the input conditions related to where the user can navigate and where not, and the actual movement of the user through the space
- System control interactions are related to how we access menus and configurations of the system, what gestures are available to the user, and which tools the user has at hand
After considering these interactions, we need to understand if the experience will be for a single user or for a collaborative system. In this case, we need to consider if the experience will be co-located or remote, if the level of precision is something required or something optional, and if the collaboration will be synchronous or asynchronous.
Following the Diegesis Theory to Design the Actual Immersive Experience
Once we have considered all of these basic interactions, we can move on to the design of the actual AR or VR experience. Here, we will be relying on the videogame industry as it is more mature at the design of 3D interactions.
What has helped me over the years of prototyping these experiences is following the videogame diegesis theory on UI. The diegesis theory is based on how to integrate the narrative into the virtual world, but when we are speaking about the enterprise, the narrative is the context of our user and the task the user is trying to accomplish.
The diegesis theory divides the UI into four types: diegetic, spatial, meta, and non-diegetic. In video games, the diegetic interfaces are the ones that adapt to the virtual world. These interfaces are relevant for both the system and the user; for example, a life bar on the back of the spacesuit or the number of remaining shots on top of the gun. In an enterprise context, this would mean a prosthetic interface on top of a machine giving insights on how it is working or an array of virtual buttons on the machine to control it as you commonly see from Vuforia, a major AR company.
Diegetic UI is relevant within the XR world (both real and geometry) so the user and simulation can interact with them through visual, audible, or haptic means. Well executed diegetic UI elements enhance the contextual experience for the user, providing a more immersive and integrated experience.
Then comes the spatial interfaces, which rely on the virtual space but can break the context due to occlusion. Spatial UI elements are used when there’s a need to break the narrative in order to provide more information to the user than the system should be aware of.
Spatial UI interfaces help immerse the user and prevent them from having to break the experience by jumping to menu screens. For example, take Google’s approach to AR maps in order to place the route in the real world or Microsoft’s approach to guide the user to accomplish maintenance tasks with Hololens.
The meta interfaces are simpler. They are what attach to the borders of the display rather than to the geometry or real world. In its VR game The Climb, Crytek makes great use of this kind of UI by telling the user that their hands are tired in the game world by displaying a red blink on the corner of the screen when the hand is out of the field of view.
Finally, we have the non-diegetic interfaces, which are the most common. Here we have the basic menu buttons floating in front of us, the menu lists and carousels. The downside of these interfaces is that they block the user view in order to display options. As such, they break the immersion.
The good thing is that since the user is most accustomed to this kind of UI, it has the lowest learning curve and is the most usable when it comes to flow of interactions. So the best practice on the UI side is to use diegetic UI to increase immersion and use non-diegetic to lower learning curve and improve usability.
Examining the User Type & Differing Degrees of Freedom
One funny thing about spatial UX is that not only do the devices have degrees of freedom, but so do the users. While prototyping these kinds of experiences, I learned that there were three types of users
- The zero DoF who doesn’t move their head or device expecting that all of the content would appear in front of them just like on a normal screen.
- The 3DoF user who moves the device around, but doesn’t take a step from their initial position as 3D content disappears behind an obstacle due to occlusion
- The 6DoF user, an advanced user who has most likely already used an XR device and moves around the room interacting with the 3D content
Clearly, considering the user type means you need to add visual or sound cues to your application in order to let the user know when and where to move or what to look at.
Using Affordances in 3D
One final insight would be to use affordances. For decades, we have tried to use icons and other elements to give references to real-world objects on screens, like the recycling bin in windows or the little diskette icon to save stuff in every app. But with XR technologies we are going back to the 3D space, and as such we need to use stuff the user is familiar with.
For example, in a 2D painting app, we would use a crosshair in order to show the user where the paint will be applied, while in the XR space it is better to use a 3D model of a brush or a pencil to let the user know where the content will be created. Again, context is king! So use affordances that are relevant to the context of the user.
Summary: Best Practices for Designing an Immersive XR Experience
Taking all of the above in mind, my conclusions and best practices for designing an immersive XR experience are:
- Consider the form factor of the device you are targeting and design with it in mind
- Use UI to manage attention
- Consider diegesis theory to manage immersion and usability
- Display only information that is relevant to the context; avoid clutter or occlusion
- Do not display information just for the sake of it
- Consider your user’s DoF and human factors
And finally, remember that the context is ALWAYS the main priority for you to consider in designing your solution.