Stalking the Mixed Reality tech stack


Get the short URL

Every decade since the dawn of computing, we’ve seen a shift in how humans and computers interact, all the way from government mainframes to the smartwatch on your wrist. The next shift will give us mixed reality, and it’s nearly here.

Stack by Erslll on Flickr.

Each of these shifts has had its own stack: Mobile had IOS and Android; Web had LAMP, AJAX, TCP/IP and the browser wars; and client-server computing had Vines, Novell, Ethernet and Token Ring.

The coming decade will be shaped by two big stacks, one around the Internet of Things and smart devices; and one around smart, immersive environments and mixed-reality computing. The former is about machine-to-machine interaction, and the latter about human-computer interactions.

Right now, I want to focus on the stack that enables the latter—a set of technologies that, combined, let us interact with, and immerse ourselves into, computers.

This Mixed Reality Stack will change how we engage with information, how we do our jobs, and how we make decisions as a society. Just as the last big shift was shaped by mobile; the one before it by the web; and the one before it by the personal computer, so this shift will be subtle—and inevitable. And while predicting the future, it’s worthwhile to make some guesses.

The near-deaths of past shifts

Shifts in the dominant consumer technology stack have fundamentally altered the shape of technology and competition. Hundreds of billions of dollars have changed wallets because of such shifts; the way humans live, learn, interact, and play has forever changed with each of them. So understanding where this is headed matters, both economically and societally.


  • Microsoft was almost squashed by the move from the PC to the Internet, and only by sheer force of will and desktop incumbency were they able to hold on to the web, and make Internet explorer a dominant browser. The company later missed the mobile shift with its own mobile phones. So far, its Nokia-based phones have failed to gain market acceptance (but the Surface seems to be doing better.)
  • Facebook was about to lose to mobile users before they embarked on a full-court press that gave them a strong mobile platform. They’re rightly nervous about Tencent’s Wechat, which is one of the reasons they bought Whatsapp.
  • Sony missed the move from Walkman to MP3 player, ceding the market to Rio before Apple turned it into a portable game/assistant and then phone. Because they were also a content producer, their heavy-handed DRM meant they missed a market shift.
  • Google’s dominance in web advertising was under siege by mobile, so they launched Android, regaining visibility into consumers’ online interactions.
  • Amazon and Facebook tried to gain a foothold in the mobile device world, but failed.

What’s needed next?

Markets aren’t obvious until the technology becomes cheap and abundant, which makes prediction hard. For example, mobile phones were the domain of the wealthy and important, until they became so cheap that teens messaging one another in malls became a more important use case.

So what does this new stack look like, and who’s lining up to build it? Where are the gaps in acquisition? Who’ll make the transition, and who will be left behind?

Tomorrow’s tech stack consists of:

  • A smart agent that can understand context and provide emergent experiences in real time, largely based on past history, social graph, and a mass of personal data to which it has access. There will still be plenty of traditional user interfaces, but the future of design is graceful interruption, and the future of content is that which a bot has sifted for you.
  • An interruptive interface. This is simply a way for a machine to interrupt, or augment, one of your senses. With that in mind, it’s worth breaking things down further:
    • Augmented Reality, which is basically a heads-up displays that overlays things on what we see. Any Virtual Reality headset can be interruptive, because it can display a view of the world around the user within the display. When you’re using an AR app on your phone, it’s taking video from the camera and displaying it with additional information.
    • Touch is another sense that can easily be augmented, largely from wearables. Early reports of tactile interruption say it feels natural almost immediately.
    • Audio isn’t huge yet, but will be. It’s growing fast in cars, where hands-free laws make audio the safest way to communicate; and in some Smart Home applications.
  • A set of sensors. The stack needs to know what’s going on, whether through motion capture, voice detection, facial recognition, or gesture tracking.

Who might build it?

Let’s dig into these three building blocks in more detail.

When it comes to smart agents, Apple has Siri; Google has Now; Microsoft has Cortana; and Amazon has Echo. While not on that list today, Facebook is well positioned, because one of the most important things in making machines smart is a corpus of content on which to train them—and Facebook has perhaps the best corpus in the world. They’re also making serious investments in artificial intelligence.

As for interruptive interfaces, Google’s Magic Leap and Microsoft’s Hololens are clear front-runners in AR, but Meta is apparently launching something worth watching in a couple of weeks (HT to Mack for that.) Apple acquired (from Volkswagen, of all places) and shut it down, but it looked cool. In VR, Facebook has the Oculus Rift and Samsung Gear; Sony has Playstation VR; and HTC, in conjunction with Valve, has the Vive. Apple and Google have invested in smart watches, as have many wearable companies. Amazon’s Echo uses audio to make a smart home.

For sensing, Microsoft has Kinect; Google has Project Tango; Apple (though most haven’t noticed) bought Primesense for its Capri 3D sensing technology, which it quickly removed from the market. The tech seems to have re-appeared as Structure (it isn’t clear if that’s Occipital; I’m trying to find out.) Both Tango and Primesense are more about sensing environments rather than gestures, but can be adapted to new functions. Swiss government spinout Mindmaze bears mentioning simply because of the amount of funding it’s received, and other standouts like Leap Motion are making big progress. But because of the difficulty locating a headset in 3D space, most VR today is relegated to sitting, and using controllers; the HTC Vive is far ahead of other solutions in terms of platforms that let users walk around in a defined space.

Other than the aforementioned Apple, Google, Facebook, Microsoft and Amazon, there are large hardware makers (Samsung, HTC) who could introduce their own systems. There are also large-volume players like Huawei, who might join the fight. There are some notable gaps here: Amazon and Apple don’t have visual displays, for example; and several companies lack AI or sensor initiatives.

Sony and other consumer electronics makers are longer shots for the full stack, lacking the installed base or additional data and smart agent components needed, though the consoles (XBox, Wii, and Playstation) will be how many people first experience immersive environments, and have access to content distribution, as do Apple, Amazon, Google, and Valve.

There are other startups in the smart agent, interruptive interface, and sensor industries that could be hot targets for acquisition in the coming years such as Myo, Meta, Fitbit, Structure, Mindmaze, Leap Motion, and Univrses. For example, UploadVR recently reported that Univrses has a new version of their Playground SDK that might let smartphone-powered mobile VR move around in space.

How do they stack up?

The big players in this next tech stack have most of the pieces needed. Where there are gaps, the companies will likely either acquire or build it themselves. Apple, in particular, has shown it prefers to build new interfaces (such as the watch and the stylus) to properly control the user experience, so while it’s not giving any clues about its AR/VR plans, it certainly has some.

Here’s a quick table of companies with serious investment in more than one piece of the stack, and other companies that offer part of the stack, ripe for merger or acquisition.

Did I miss something? You can go comment on the document directly if you have something to add to the list.

Who will win?

Other than figuring out what shares to buy, or which startups might get acquired, the emergence of a new tech stack around smart agents and immersive interfaces has some pretty serious consequences.

As more and more consumers move to the new stack, we’ll see consolidation around dominant vendors, the same way we’ve seen it around Windows, or the PC architecture, or iOS, or Android—and for largely the same reasons:

Distribution and attention

Valve, Facebook, Apple, Amazon, Sony, and Microsoft all have our attention. Early VR systems will be expensive, and will probably be subsidized by licensed content the way game sales pay for consoles, or phone airtime pays for handsets.

Tech stack makers know this, and are investing in serious infrastructure: Google’s Play store and YouTube have VR apps; Facebook plays immersive videos and is funding Oculus Story Studio.

Developer lock-in

The makers of each stack will introduce enhanced features that abstract away the lower-level functions, tempting developers with shortcuts that ultimately lock them in.

Consider, for example, the positioning technology in a mobile phone. For an application that reminds you to do something when you leave your house, a developer could write code that constantly checks the phone’s location; or they could use a higher-level function (in iOS, this might be the AddGeotificationViewController.swift view controller.) The latter is easier, consumes less power, and benefits from any enhancements Apple might make to the software.

It’s the difference between remembering where you left your keys all the time, and asking someone else, who owns a car and keeps track of road conditions, to drive you. Humans are lazy, and will generally take the path of least resistance.

User behaviors and patterns

You pull down on an iPhone to refresh, but you can’t on an Android. Turns out most of those gestures may be covered by a patent issued to a company Twitter acquired. Even when patents are involved, the media may have convinced the world they are simply because the media got a bit breathless.

Content and DRM

I have hundreds of songs, movies, and TV shows linked to my iTunes account. If I were to move to an Android-based device, I’d lose them. The same would be true if I tried to move my desktop from MacOS to Windows. Companies like Steam and Amazon have content (games and books) that work across platforms, but even then, coverage is limited and constrained by dynamics like app store fees.

Technical hurdles

I’m spending a bunch of time in VR environments this year. There are big challenges left: Nausea, intuitive controls, multi-user interaction, freedom of motion, and more. While commercial, content, and rights management will chart much of the course of Mixed Reality, the technical limitations will also affect who wins.

Oculus, for example, won’t ship its Rift controllers with its headsets, and while games like I Expect You To Die make clever use of the mouse as an interface, they’re no substitute for the dedicated controllers that ship with the Valve/HTC Vive.

Next steps

I’m spending a lot of time trying to understand this technology stack. O’Reilly’s Strata+Hadoop World is a great place to chart what’s happening in big data and machine learning; the recently-announced SXSW VR/AR track will be a good glimpse into immersion.

What’s going to be most interesting is the dynamics that make this technology must-have, as Visicalc made the Apple ][, or Postscript drove the Mac, or Mario made a generation want a console. Maybe it’ll be a Spielberg movie, or something we use at work. Whatever that tipping point is, the combination of smart agents pushing interruptive content that we interact with is the next big stack.