Josh Marinacci (@joshmarinacci) recently joined Mozilla as a Senior Developer Evangelist working with the Mixed Reality Team. Josh will give an overview of Virtual, Augmented, and Mixed Reality; including a brief history of the field, what you need to know, and what you can ignore for now. Here's the full transcript:
Manel- Hello, people. Please join me in welcoming Josh Marinacci, from the United States. Josh recently joined the Mozilla team as a developer evangelist. And got right to work on virtual reality. You can find some video and demos in the etherpad. So we are all here today to attend his talk. He will give an overview of VR AR and MR. For people in a hurry. Like you. So if you have any question, to Josh, please find the etherpad link in the calendar invite, and I already dropped that in telegram channel an hour ago. And please, if you feel ready, Josh, just start.
Josh: - OK, thank you very much. I'm really excited to be here. So today, I'm going to give you an overview of MR, VR, AR, XR, like, all the R's, and explain what all this means. And my goal is not to do a deep technical dive, but to give you an overview of what's out there because everything is changing so quickly that it's, you know, all out of date immediately. I had to update my slides after the first time I gave this talk because Apple and Google announced a whole bunch of new stuff. So this is to give you an overview of where everything is and where it's going. And we hopefully will supplement this with future slides and vlogs and videos. So, yeah, there's lots of stuff. There's Sony's got the VR headset. HTC with the Vive. The HoloLens. Just this week, Microsoft announced their new mixed reality for Windows desktop initiative. So there's a bunch of headsets coming there. And lots of acronyms.
So this is what we're going to cover. What is VR, AR, and MR? And answer the question, why is all this happening right now? It seems like in just the last year or so, just the last four or five years, this has really been heating up, even though the technology, at least the concepts of it, are about 50 years old. And then, most importantly, what do you really need to know now and what can you ignore, either because it's already out of date, or because it is not quite ready, but you know, keep it on your radar.
So, what is VR, AR, MR? VR is virtual reality, AR is augmented reality, MR we call mixed reality, and occasionally you hear us use XR, which is, we don't really know what word's going to fit yet, that's just kind of a placeholder. Now, when you hear the term virtual reality, VR, you probably think of something like this. This is a completely immersive experience where you can't see the real world. Ideally, you have something that covers your eyes completely, blocking out all the extra light, gives you a very nice wide field of view, built-in headphones, so you're covering not just the eyes, but the ears. Very fast refresh rate, at least 90 frames per second. For both the screen and the sensor data, which is crucial, because when you are completely wrapping somebody in a new environment, if it's not very high, not just average frames per second, but also low latency, then it's very easy for your ears to get confused, and make you sick. So, pretty much any VR set these days is going to be at least 90 frames per second, and a lot of the technology people are inventing is about how to get, you know, how to work around the limitations of older PC and mobile technology to make this happen.
And when you hear the term AR, you probably think of something like this. This is a still from a promo video that Microsoft did. And you see the guy's got these like holo glasses on and he's looking at his real world, his kitchen, but there's computer graphics overlaid on top of it. And I do think it's interesting that in Microsoft's vision of the future that nobody has anything on their countertops, they're like these perfectly clean countertops. I have a six year old child, so our counters would be covered with Legos and dishes and banana peels. But anyway.
Augmented reality is where whatever the computer's generating is completely integrated into the real world. Which means not only is it projected on top of the world, but it actually is aware of what's in the real world. So for example, this little model of the beach is sitting on top of his countertop, and once it projects, it figures out what the countertop is, it projects it there and it anchors it to the countertop. So as the user moves their head around, it will still stay anchored there, and that's a very crucial part of making it integrated with reality. So we call it augmented reality because it's overlaid on top of the real world. It is augmenting our reality.
Now, in truth, in reality, this is really a spectrum. So, we use the term now mixed reality, it's not a great term, but it's the best we've come up with, that encompasses the spectrum of things from on one side, being the real environment with an augmentation on top, versus being completely enclosed in a completely virtual reality, and there's many degrees in-between them, depending on the device and the application. Everything in the middle here is kind of a blend. So we started to use MR as just kind of a catch all term for all of this stuff. Originally, they were somewhat separate environments, separate technology stacks, but they're really kind of bleeding together. So, some of the stuff in the middle would be things like Google Glass. Google Glass gives you an overlay, but what you see isn't anchored to reality. It's more like notifications. The one here in the middle is an automobile heads-up display. And it is mostly information about the car, so it's overlaid on top of your view, but it's not anchored to reality. But it probably has a few things like the direction you're driving, maybe the arrow indicating where the road is, or there's a curve coming up. So it's a little bit of augmentation of reality, but mostly it's not anchored to it.
On the right is what we call a magic window. This is where you're looking at the real world through a camera on your phone. So it's not 3D stereo. It's not anchored to your… it's not connected to your eyes, it's just something you look through, like a magic window. Things can be overlaid on reality, and with the right software, will be anchored, but it's not completely integrated the way, say, glasses would be. Typically, they will overlay, but don't anchor to reality, except maybe compass directions for the path of the road. But over time, this is also improving. This picture, of course, is from Pokemon Go, which is still probably the most popular AR application ever, though I expect by this Christmas, we're going see all sorts of exciting new stuff. Now, all of these things are going to keep getting better and cheaper, and the leaders in each area are going to change, the technology will change, so don't focus on the technology itself, focus on the experience. What experience do you want to build and figure out the tech you're going to need to support that.
Now, so what's our end game? Well, on the augmented reality side, it will probably be something like this. This is like stock, you know, stock photography with future people wearing something that's completely integrated into your sunglasses. It gives you information, both completely virtual information, just projected on your world, and things that are augmenting, like Microsoft shared on their video. And all built into something that you're going to wear all the time. This doesn't exist today, but we're actually a lot closer than you might think. This is a couple of years away, not 10 years. Now, the other extreme, of course, is completely immersive reality, which gives us the holodeck from Star Trek. And this also isn't possible yet, but again, we are probably closer than we think. This is an example of a shared immersive virtual reality space. I believe this is built with Vives, I'm not sure. And you know, it looks kind of clunky, but you know, this was science fiction only a few years ago, and it's reality now. It will continue getting better.
So, why now? Why is all this happening now? In the last couple years, there's been this explosion of VR and AR products shipped, and even more announced. So, to answer why now, we need to dive a little bit into history. This is the sword of Damocles. It was considered to be the first virtual reality head-mounted display. It was created by a computer scientist named Ivan Sutherland in 1968. Same guy who created sketchpad a few years earlier, which was the first GUI with a display, a light pen, object-oriented computing. So once he had, you know, invented the next 50 years of future, he was trying to figure out what was going to happen after that, and he thought, it's going to be something that directly integrates and augments our view of reality and ultimately makes people smarter. Now this was called the Sword of Damocles because it was using physical analog technology and it was this giant thing that actually stuck on top of his head and it was attached to computers. So it looked like the Sword of Damocles from the myth. So, this is a video. This actually says 1965, I'm not sure that date is correct, but this is a short video of what the Sword of Damocles looked like back in the '60s. You can see, he's got the headset on him, he's turning around, he's got to hold it with the handles it's so heavy, and it's projecting a cube that is actually rotating and adjusting as he alters his view. Which, for the '60s is pretty incredible, actually. It was incredibly prescient.
So, a quote from this essay he was writing around the same time: “The ultimate display would, of course, be a room within which the computer can control the existence of matter. A chair displayed in such a room would be good enough to sit in. Handcuffs would be confining and a bullet displayed in such a room would be fatal.” Which is pretty much exactly the holodeck, and I imagine several Star Trek episodes were inspired by this.
Now, after the 1960s, not much happened outside of some military research and a little bit of NASA work. We simply didn't have the supporting technology yet. So VR/AR kind of died for a while. Now, by the time of the mid '80s, we had powerful bitmap displays that could handle this stuff. A guy named Jaron Lanier was a virtual reality pioneer. He started VPL Research, building the data glove. And the data suit. And they did some really cool things. The idea was you'd wear these suits and it would measure information about you and create this 3D world around you. So not only could you see a 3D world, but you could begin to interact with it. But it was still too far ahead of its time and they went bankrupt in 1990. But they did license their technology to Mattel, who used it to make the Power Glove, which was the first mass market AR VR device. Now, I remember having one of these as a kid and it did not work very well, certainly not as well as it looked like on TV. But it did kind of work. A few games supported it. And it's important for its place in history, more than the actual games, and it was certainly the direct ancestor to the Wiimote, which is probably the first motion controller that most people remember. So that was the very end of the '80s, early '90s. Now by mid '90s, there was a company called Virtuality. Virtuality. Which created a set of VR games for arcades. These were kind of standing contraptions or pods that you could sit in that were set up in malls around the country, and they let you play 3D games. And they promised you this completely immersive environment. So, this is from some of their marketing material. You know, here's a woman wearing what looks like a fairly lightweight headset, she's got a hand controller, and she's like walking around in a reef underwater.
But this is what it actually looked like. So, this is a game called Dactyl Nightmare. And this was the highest end technology available in the mid '90s. Very low polygon. They did a decent refresh rate. I seem to recall it did get up to about 60 frames a second. But the tracking wasn't great. The resolution was horrible, the lighting was horrible. You know, it looks like a crummy mid '90s video game. So Virtuality folded by the late '90s. Again, the processing power just wasn't there.
Now, in the mid '90s, the web was also blossoming. Some people created a way to embed VR content into the web, and it was called VRML for Virtual Reality Markup Language. And it was scene-based. You didn't have to deal directly with polygons, you would just describe the objects that you wanted, and then the system would render it for you. Unfortunately, VRML pages were way too slow to download and too slow to render. Even with special plug-ins, unless you had a high-end graphics workstation, like people would use for doing movie graphics, then it just was a miserable experience. So it's there, and we learned a lot of lessons, and then the dot com bust happened in 2001, and that was the end of all the VR startups. Now, there were other attempts in the '90s. The thing on the left here is Sega VR, which they showed at CES but never actually shipped. The Virtual Boy was Nintendo's effort to create a virtual reality experience and they did ship it and it was so bad that they actually canceled it after three months. I remember seeing it in a mall and thinking, "How is it virtual reality if it sits on the table in front of you? "You can't actually move around."
So in the end, none of these things worked because the technology wasn't ready yet, even though we had great ideas. Turns out for VR and AR to work, we really need really powerful graphics, and that means fast CPUs that can be embedded in a device that you pick up. Graphics acceleration pushes all the pixels. And we also need cheap and really good cameras and sensors, and all of these things have improved in quality and fallen in cost dramatically over the last 23 years. And pretty much we can thank all of this because of the smartphone revolution. Sensors and cameras are made cheap because they're made by the billions. Moore's Law means CPUs and GPUs are really cheap powerful and crucially they were very battery friendly, so they could be in mobile devices.
Even a simple accelerometer that would have cost you know, in say a car airbag system, cost $10 or $20, had dropped to like less than a dollar by the late 2000s. And now they're pretty much integrated, you know, it's free. So it's now been 10 years since the original iPhone. Sensors and processors and cameras are simply amazing. Apple just announced their iPhone X, with the built-in depth-sensing camera, which is basically what was a Microsoft Kinect bar that was about this big. In the span of less than 10 years, they shrunk it down to that. It's got a 12, dual 12 megapixel cameras in the back. We finally have the technology to make this stuff real, and it's going to keep getting better and better. VR and AR will kind of ride the coat-tails of the smartphone revolution, so as smartphones get better, with people investing billions of dollars, so, too, will our VR AR stuff.
So, what do you need to know now? This is the general tech stack. No matter what you're building, you're going to need one of each of these things. You need some sort of input, which generally means sensors and cameras. You're going to need processing. So this is the actual algorithms which take the image data, the sensor data, and fuses it together to create a scene, create the information about the position of the headset and, if it's augmented reality, the real world. And then drawing it really fast, of course. And then all this stuff is very low level, so we need an application framework, so we can actually build things.
Now, input. Probably, these are the ones you're going to most work with. Anything in the MR spectrum requires input sensors. Generally they'll be internal sensors like an accelerometer, which measures the direction of gravity. Or a magnetometer, which is like a compass. It measures the direction of the Earth's magnetic field. And then we also have GPS, which lets you measure your location on Earth. And these things have gotten cheaper and cheaper. You can now get a nine DOF sensor board, nine DOF refers to nine degrees of freedom. You know, we have three for accelerometer, three for magnetometer, and three for gyroscope. All of that built into a single chip that you can buy in units of one for $15. Which, you know, means that by the time they get made by the billions to put in phones, it probably costs about a dollar.
The next most important input is the optical camera. This is a standard phone camera. Dual cameras with stereo can provide more information, but with the right software, we can do a lot in monoscopic view. And then finally a depth-sensing camera. Which is like what's in Microsoft Kinect and has been added to the new iPhone for their facial recognition. And you don't need all of these things. But adding more data overall makes the experience better. So once you have this raw data, we've got to process it. Sensors and the camera data have to merge into a single set data that describes the environment in 3D. And there's basically two ways to do this. There is marker-based AR, which is when the system looks for a specific thing in the image, usually q special symbol, it could be sort of like a QR code but where it can actually detect it's orientation and the angle away from it. But it doesn't have to be a QR code, it could be just an image that it's been trained to recognize. This can also be done with beacons. Like an IR beacon or a Bluetooth LED beacon. Think like a Nintendo Wii, there was a sensor bar. That was essentially a beacon for the Wiimotes.
SLAM is the other way of doing this. SLAM stands for Simultaneous Localization and Mapping, which means that the system can figure out the environment around it and navigate through it the first time it sees something. You don't have to go through beforehand and measure where everything is and build a 3D model. You don't need to put beacons or special symbols to indicate where stuff is. It's walking through the environment for the first time and detecting everything in 3D the way our eyes actually do. Now, SLAM can be done with stereo cameras or depth-sensing cameras. Google's Tango system uses depth sensing. But now it's become possible to do it with just a single camera plus the phone sensors, which is called monoSLAM. And this is really where a tremendous amount of research is going. We finally have the processing power and the algorithms to do this on a phone in real time. So it may be that we don't end up having depth-sensing cameras or they become just an extra layer on top, but any phone with a CPU can do monoSLAM with just a camera and the internal sensors.
Now, there's been a lot of research on this stuff for the past 20 years. Apple and Google and Facebook have been buying up all the little companies that do this stuff. SLAM is the magic that gives us, you know, the ability to finally figure out what, where the camera is relative to the walls and floor. So, it can figure out if there's a flat surface like a table or an object sitting on a table. Ideally, it would be able to do a complete 3D mapping of the entire room. You could pan around and now it's created a scan of the whole thing. And this is possible with depth-sensing cameras to some degree or with LIDAR, which is what is going to be on top of most autonomous cars. It can't quite happen in consumer-level hardware yet. But it's going to happen. And we don't really need to worry about how this happens because there's an API which does this for us.
APIs. Now, interestingly, even pure VR often needs a camera and some external beacon in addition to the internal sensors. You'll see internal sensors are very sensitive and they have low latency and are super cheap, which is great, but they also drift. Over time, they get less and less accurate. Meaning if I start at point A and I move around a bit, and go back to point A, the view will be off by just a tiny bit because the sensors have drifted. I need a way of getting it back to the correct value. So we need something external, which could be a beacon, like, you know, one of these things you see here in the picture. Or it could be a camera, which is looking at something in the room. And by combining these two systems, we can get long-term spatial stability. So I can look at something, place an object in 3D, move around, come back five minutes later and it will still be there exactly the same position.
So how does this processing happen? Well, a lot of tricky math and massive CPU power, something that was impossible, essentially, in a mobile device, until the last few years. Now, a lot of APIs have been designed to handle this for us. So let's start with Vuforia. Now, Vuforia is a commercial software library that you can license to use in your apps. It's the oldest of what we're going to talk about today. It's mainly targeted at mobile use. It will not only use the sensors in your phone, but also use the camera to determine the orientation of the phone, and it can be trained to recognize and track certain objects. So it will do more than just find flat surfaces or angles, but it can actually recognize what you trained it for. Like if you wanted to only find red apples, it could find red apples. If you want it to find, as we show in this example, a particular photo of some gravel on the ground, then it can recognize that and essentially turn that into a plane in space, which you can then put stuff on top of. It's available for iOS and Android. However, with all the newer stuff I'm about to talk about, you're probably never going to use Vuforia, so forget about that.
There's ARToolkit - a toolkit from Facebook that works in mobile and in browsers. And it is built around markers. So, in this example, you can see there's a marker called hero on the ground, and with the Lego thing next to it. And the system recognizes that marker, it can now attach stuff to that.
There is WebVR. Now, WebVR is an actual W3C spec, probably the most relevant to us here. For exposing VR capabilities to applications in the web browser. Essentially, it exposes the sensor data from whatever VR device you've connected as events in the browser. So if you have an HTC Vive, you'll get events as the head, …as the user actually moves their head around. And also, as they walk around in a 3D space. If the device isn't a true VR headset, but does have basic orientation, then you'll just get the orientation events, but not the position. So that means that I can tell how the user has tilted their head, but I can't tell if they move forward or back in space, which is position. But this is how WebVR can work on a standard mobile phone, without having anything special. And we have polyfills for this. So on mobile, this is what's used by a lot of web apps for Google Cardboard, works on any phone. On desktop, it works with the major HMDs, which is short for head-mounted displays. Currently Firefox with Windows ships with this in final form. You can get it in beta on Chrome. And Firefox for other platforms.
In particular, I know we just started turning it on for Firefox for Mac because as of June, Apple announced new VR support coming in the operating system. I believe currently you still have to have an external GPU, but I assume that will improve as Apple improves their desktop platform. So, the good thing is WebVR is here and continually expanding in web browsers today. Now, in June, Apple also announced ARKit. Now this is a software library specifically for iOS. So iPhones and iPads. That uses the sensors plus the phone's camera for true augmented reality. So this is doing monoSLAM. It can track both the position and orientation just using the internal sensors and then the camera. And it works on objects, as well as surfaces around the world. This is a video of a tape measure that - a little application somebody created - that I found on Twitter. Let's take a look. So those little dots are the surface features it's found, this is an imitation tape measure next to the real one, and you can see how accurately it is lining up. You know, good within a couple of percent. Which I find really incredible.
Now, it's using it to measure how big something is in the real world. That is just astounding to me. That we can do this on consumer hardware. It's really incredible. Now, if you search on Twitter for #madewithARKit, hashtag, then you're going to find tons of really cool demos like this. So, Apple did it, of course. Google recently announced their own equivalent to ARKit called ARCore. And it's available for both iOS and Android. Though it only works on certain Android devices.
Now, Google previously had something called Tango. And Tango didn't just use a regular camera, it had a depth-sensing camera like the Kinect, and it was built into a phone, it'll let you do full tracking in real time. ARCore does not require this special camera, just a regular film camera. Google hasn't really said what's going to happen with Tango, but since only two phones ever shipped with Tango hardware, in the more than a year that the program's been out, pretty sure that ARCore is the future for Google. And there's still the question of how well ARCore is going to work, though.
ARKit, Apple's version, works with both, works with Apple dual cameras or without, and without depth sensing, because Apple has very tight control over their camera and sensors. So they have a small number of devices, they can test and calibrate for every specific combination of sensors and camera. And you know, have algorithms to worry about the noise. Google doesn't have control over the hardware and there's so many different phones out there, OEMs are going to have to work really closely with Google to make sure the ARCore works really well. I'm assuming that at least the flagship devices of the new phones Google announced yesterday, and the new Samsungs, will support it. Lower end phones, we'll see. But this is a short video of what Google announced with ARCore, they imagine this environment of objects that people create in 3D, just kind of appearing in their environment and interacting with them. So, again, you see dots as they're recognizing stuff and now the person can drop a bunch of virtual objects, position them, build a little virtual house, stick your Android guy there. These are objects that are anchored to the real world, and as you'll see in a second, the lighting is dynamic. So it tries to estimate where the light actually is in the real world, and then shade the 3D object the same way. So if you turn off the real light, the 3D object gets darker. And, you know, just lots of fun silly things you can do with it.
Now, at Mozilla, we've been working on a new spec called WebXR. Now this is very much a work in progress. It's, I hesitate to call it a proposed future web standard. It's not even that far along. But the idea is to extend WebVR, which is already being extended in WebVR 2, add some more features to support augmented reality, as well, and work with APIs exposed from the native layer through ARKit and ARCore. And we also have a polyfill, which will work in older browsers, as long as they have web and some sensors. And then over time, you know, the polyfill will go away, as we get all this stuff built into real browsers. That's our hope, anyway. It's still very early. The API's changing all the time. I got a PR to review today that's going to change some of the API, but go check it out.
In particular, we want to hear from you guys about what features it needs to have, the sorts of applications you want to build. Can't create an API in absence of real world use cases. So, next thing you got to do, once you've got all your datas merged together, is actually draw it. Well, there's a bunch of different APIs, and what you're going to use depends on your goals, your abilities, and what platform you're targeting. In theory, you could write applications in a 2D-like canvas, but it wouldn't be very useful. The whole point of collecting 3D data is to do something with it. Now, if you've done 3D graphics before, you're probably familiar with the first item on this list, OpenGL, DirectX, WebGL. And WebGL is essentially the web version of OpenGL ES which is on essentially every phone. There, we call them medium level, but they're actually pretty low level. You need to know a lot about how 3D graphics works to use them. Next up is the newer lower level APIs. So, if you hear the term Metal, Metal is a new API from Apple that only works on their stuff. DirectX 12 is the new APIs from Microsoft. Of course only works on their platforms. And Vulkan is the new API from the Khronos Group, which is who makes OpenGL, so you can think of it as the new low level for OpenGL.
In all cases, the idea is that they're super low level, so it provides a lot of power to experienced developers. And they have as little overhead between them, between the app and the hardware. However, they are so low level, but no normal programmer would ever use them. Instead you use a graphics engine like Unity to support them. You can think of it almost like a compiler. Where you're going to have some high level thing, which then gets compiled down to low level code that's executed by Metal and Vulkan, Direct X. Which is why you need someone to write a compiler for you, like the Unity guys. So, that's the high level. You do something with Unity, Unreal, Three.js is my personal favorite, I found it pretty easy to work with, and there's a couple of other great 3D in the browser projects like Babylon. But chances are you're probably not going to work at that level, you're going to work at something even higher, maybe just dip down to the graphics level when you need to. Use something like Scene Kit. Scene Kit is a simplified 3D API, and a set of visual tools from Apple that targets all of their devices.
Cardboard and Daydream are APIs from Google that help you make mobile compatible 3D content. The Hololens Academy, there is some code there, but there's actually more a set of tutorials for using Unity with their plug-ins, and other tools, to build VR and AR. For the new mixed reality versus Windows. And Microsoft is really getting serious about this stuff, it's really exciting. So even if you're not targeting Windows, it's worth watching the idea in the Hololens Academy. They do a lot of really good lessons in there.
React VR is a new framework from Facebook that uses the same app model as regular React apps, where you have essentially a scene of objects, DOM objects, and then state gets propagated down. The difference is that instead of manipulating the DOM, it's manipulating 3D components.
ArgonJS is an augmented reality tool set that is a precursor to the work that we're doing in a WebXR. We have a lot of the same people working on it. Argon is far more mature, so if you want to work on something right now, then that's what you're going to want to start with. And of course, there's A-Frame.
A-Frame is a very high level framework with a goal of making VR development as easy as making a web page. So you write simple elements, and underneath the framework turns out into three.js objects. And we are in the process of adding some augmented reality components to A-Frame as well. This is what an A-Frame example would look like. It's so easy, you can do it in a codepen. You just have a scene. Here's a sphere, cube, a cylinder. Add a plane below it, and then set the color of the sky, and then that's turned into the 3D graphics you see here at the bottom. Along with the hooks to do stereo if the person is using a Cardboard or using a real head-mounted display. All that stuff is abstracted away so you can get down to actually building the content you want.
Now, assuming you actually want to use any of this stuff, you got to have something to show it on. And again, there's tons of stuff out there. Start on the Google Cardboard side. Generally under $20, but it has no input, essentially, other than gaze, which is where you look at stuff and then you kind of pause for a second, that triggers. You can also do, it also has a button for touch events, but essentially, it's just pushing a fake finger on the screen, so you just look for a touch event on the whole screen.
Google Daydream is kind of a nicer version of Cardboard that you can stand to have for a long time, and it has wireless input with this little, it's almost like a tiny Wiimote. So this is a newer VR spec from Google. I believe they announced Daydream 2.0 and there should be some new things coming out. But again, this is, you have an existing phone that you stick inside the box. There is Gear VR. Which only works with a few particular Samsung phones, but it does have a controller, really nice padding. And Samsung is pretty serious about this. They've been working with Oculus. They made customizations to the operating system to make the experience much smoother than you would get in a typical Cardboard setup. So if you have one of their supported phones, that's worth getting.
Now, when we get to the PC level, it's a lot more expensive. So there's HTC Vive. Which costs $600 plus, though they might have dropped the price recently. But that gives you a full headset with the beacons, with two hand controllers. Like, it is a full, true, VR experience. Oculus Rift is similar. It's a headset with, I don't think, I think they added the headphones here, I don't think those come with it. And has controllers.
Hololens. If you want to buy an actual Hololens from Microsoft, it's $4,000, but nobody's going to pay that, that's an SDK version. What's coming for consumers, well, we've got the Meta Two, which is similar to the Hololens, but less than half the price. Which you can buy today. And I did a blog about that recently. There's Playstation VR, which is about $500. And I think they've got version two of that coming out, and that comes with the headset, and it has a sensor bar and interactive controllers. And of course you need to have a Playstation.
What's more exciting is what was recently announced, the HP Lenovo Dell Acer ASUS, a bunch of PC manufacturers creating these head-mounted displays that will provide an experience similar to what you get with Hololens, but for a fraction of the price. It remains to be seen how much of this, what the quality's going to be, but the point is, the prices are rapidly being driven down. So this is going to become commodity stuff really soon, which is why it's such an exciting time to get into this stuff.
Now, there's also this company called Magic Leap. Mainly known for being super secretive, a bunch of ex-Apple people raised a bajillion dollars. They claim to be shipping something. You know, I've heard from people who've seen it and they say the tech is real, it's legitimate. But just nobody knows when they're actually going to ship something, but it will be some sort of augmented reality system and we'll see if it lives up to their hype.
So what do you really need to know? This stuff isn't mainstream yet, but it's coming quickly. If you want to get started, start with the web stack. Start with A-Frame, then move down to three.js. Start with Google Cardboard. Seriously, you can get it at $10 at the Target checkout, is a nice plastic with padding Google Cardboard viewer. Which, and the fact that it's at the checkout section means that this stuff is becoming mainstream very quickly. Follow up, Mozilla and Microsoft are doing Play with A-Frame. The important thing is that you should experiment. We are still in the Model T days. In the early days of the car, we didn't know if they should have handles or if they should have road controls, or lots of levers. It was, you know, wild west. People tried different things before we figured out that the steering wheel and the two pedals was the way to go. It's going to be the same thing here. Nobody knows what the killer app is going to be. Nobody knows what the good UI concepts are. This is our first truly new medium in decades. So, it's a chance to go out and play. Nobody knows what's going to stick yet, and what's going to fall on the floor. I can tell you that measuring tape apps are going to be like the new fart app. But we don't know what's going to be useful longterm, so it's a really exciting time to get involved. So, that is all of my presentation. Now, let me take a look at the etherpad. You guys, wow, there are a lot of questions here. Awesome.
- Thank you so much, Josh. Would you be cool with just doing a little bit of question and answer format for the Q and A portion?
- Yes, I would love to.
- Ask and answer, so to speak. Awesome, I'll go ahead and start off at the top, if that's all right with you. Do you personally do any game development?
- I've never been a professional game developer. I've written a few really bad indie games that mostly haven't shipped. I tend to be a fan of what I grew up with as a child, which is 2D, you know, NES and Super NES, Final Fantasy. But as I see my six year old son engage with this stuff, I'm kind of getting a view of where this is going through his mind. Now, he doesn't really play video games, per se. It's more educational content. But I can see video games are going to be a huge part. And in fact, video game professionals, because they're already working with 3D generally, already have all of the skills they're going to be required to be good VR developers.
- That's great, thank you for that. The next question is, what do you think the future of game development with VR and AR will look like? Or will be?
- I think the web is great for low commitment experiences where you don't want to install something. Like, I finally got my Vive set up, and it has Steam, and even a basic tool is a 500 gigabyte download. But on the web, I can go see a new VR experience immediately. And so that's what great about web VR, and the other magic on the web is that we have links, so I think links are going to be a huge part, if any of you have ever read Snow Crash or Neuromancer. You know, jumping from one virtual environment to another is going to be huge. So that's going to really affect the way that we build games. Augmented reality games are interesting. Pokemon Go showed that there was clearly a lot of interest in it. But I don't know what direction they're going to go. I haven't seen another thing like Pokemon Go. Well, I've seen lots of things like Pokemon Go, but none of them have been successful, the way Pokemon Go has been. So, I don't know if it's a one-off fluke success, or we're going to find that games which mix with the real world are going to be a new popular thing. But I'm going to keep trying stuff out, and you know, see what's fun.
- Absolutely. The next question. Do you think that mobile phones will continue to be the most popular devices used for VR and AR?
- By numbers, yes, just because we are moving towards a future where everyone will have a mobile phone. They're going to have it anyway, even if they're not doing VR and AR, so it's very easy to get somebody try out a new thing if they already have all the hardware required. However, there will be experiences that are richer which require better hardware. So I think we're going to have both. Just like mobile phones didn't kill off PC. They're not going to be the only thing we use for VR and AR. Certainly, in AR, they're going to be the biggest early adopters just because they're already mobile, and being mobile is a key part of AR. But the, you know, the AR headsets you can buy that attach to your computer, those are also going to get cheaper and lighter, so it's going to be interesting.
- It certainly will. Our next question is, how can augmented reality help in healthcare, aviation, and construction industries?
- So augmented reality helps in two ways. It can tease more information out of the environment you're in, by, say, giving you infrared vision, or magnifying your vision, or highlighting things that computer vision systems can see that you can't. It can also bring information that's already in the computer into your view when you need it. So, which is ultimately why I think AR is more exciting than VR. So for healthcare, we're going to see things like a surgeon not only has the information about the patient, both, you know, their stats and the current real time information, like their heart beat, but they can also have the full MRI scan projected on top of the patient, so they have an idea of exactly where they're going. The same concept can apply to aviation here. You have a pilot who needs certain information, real time, always available, not affecting their hands. And if it can be overlaid on top of the real world, then it's easier to see that mountain. Same with aviation repairmen. If you want to fix an airplane, like, the manual for an airplane, the repair manual is literally so big that it wouldn't fit in a room. Like it's the equivalent of millions of pages. Full 3D models. Having that information available without having to carry stuff is also huge, especially for airplanes where every hour an airplane is in the shop is thousands of dollars the airplane is not earning being in the air. And construction will be the same way. Construction's interesting because I see a lot of crossover between AR and drones. Construction is adopting drones very quickly because they constantly need to survey the current state. And just having one picture from the top of a tower isn't enough, they've got to do it every day, and multiple times a day. So, mixing the two I think is going to be exciting.
- Thank you so much. The next question asks for your thoughts on security risk posed my augmented reality, and how a person might combat those.
- So. That is interesting. It's not so much security as privacy risks. If somebody has access to, you know, all the information in Google, or all the information in Facebook, while they're on the go, and secretly, because it's just projecting inside their glasses, or in their phones, it's like they have a superpower, you know, like infinite memory, perfect recall. Which can be used in a lot of malicious ways, as well as beneficial ways. I think the key is going to be defining what privacy is in a way that has some sort of legal backing. Certainly, Europe has stronger privacy laws than the United States does. And then giving people the tools to protect their own privacy. I don't really know yet what that's going to look like, but we have a good team here at Mozilla working on those issues.
- Thank you for that, that was an excellent answer. Earlier you talked about Vulcan. That is not yet supported in almost, most of the devices out there. Do you think it'll take off? And when should we concentrate on it?
- I do think it will take off, but I don't think we need to worry about it. It's, you know, this is part, it's just going to be a part of the device. People are going to work at a higher level. That will, like Unity, where it's going to support Vulcan on the right platforms, it will support Direct X. You know. Ultimately, it's an implementation detail. Like, I don't care about the difference between the X86 instruction set versus the ARM instruction set. Because I know how to write assembly. You know, there's a guy whose job it is, a compiler writer who writes assembly. It's going to be the same thing. Ultimately, you know, at least at the application, the framework level, we just target whatever's available.
- Thank you, Josh, for answering that, and thank you Rabimba for providing the question. Our next question comes from Mario. He's asking about the common pitfalls that new developers would face with VR and AR.
- Common pitfalls, that's good. I've started making up a list here. The first thing I'd say is people building without testing. And this goes for all software, really, but it's especially true. Things that work in even a 3D game, on your PC, don't always work in VR. It's a different medium, even though it's still 3D. It is different. You experience it differently. So whenever you're building something, start small and test constantly. You know, when I'm developing, yes, I have the HTC Vive behind me, but I also have a Google Cardboard. Or one of these guys. So I can just, you know, pop it up and check and then all of a sudden, because the field of view is different, I realize that looked great on my screen, it looks horrible or it's hidden. In VR. So constant testing. Also start small. You know, the tools aren't very mature yet. It's going to take a lot longer to build anything in VR than you think, even if you've done 3D development before. A-Frame certainly makes it easier. But this is a different medium. You know, it's going to take some time to build these skills.
- Are you aware of any useful applications for integrating AI in AR technologies?
- I don't know. I don't think of AI as a separate thing. It's just, ultimately, AI is just extra algorithms in your toolbox as an application developer. So I wouldn't go around thinking, you know, "Hey, I got speech recognition, what can I do with it?" But think of the problem you're trying to solve, and then see if speech recognition or image recognition or you know, something with a neural net, will help you solve that problem. In some cases it will, some cases it won't.
I think what Google announced yesterday with their earbuds, you know, their earbuds with a microphone in 'em, but it's attached to, you know, the power of Google's cloud system. And it can do real-time translation. The value isn't that they have a bunch of neural nets and giant CPUs in the cloud. The value is, here's a problem that real people have, you know, I'm going to another country where I don't speak the language. And I need to read a sign or I need to talk to somebody. Well, I have an app on my phone that can read the sign. Now I have these earbuds that will automatically translate for me. And it's like magic. So focus on the use case first. I think, yes, we're definitely going to see a lot of interesting things. I tend not to think of AI as artificial intelligence, but as human augmentation. You know, if I'm a doctor looking at cancer scans, the AI isn't there to replace me, it's to help me as a doctor make better judgments. Just like, you know, a hammer isn't to replace me, a hammer is so I can put more nails in per day than I would with my fist. So think of it as augmentation and focus on the applications you want to solve.
- Thank you, Josh. Our next question is what has been your favorite product, if any, that addresses how to physically travel longer distances in virtual reality when in a confined environment like a living room.
- So that's really a question of application design. We want to have larger experiences and as possible. It works pretty well in the HTC Vive where you sort of hold a button and you get the sort of fishing line projection to choose the exact spot you want to jump to. But it still feels unnatural. It works. And it's, you know, the fishing rod mechanism feels good, but it doesn't fit in to the space. It's like, if I'm playing a zombie game, that's how I'm going to navigate it. If I'm playing a flying game, that's how I'm going to navigate. I prefer things that work on real world analogs, like a car, or an airplane. Where, you know, if I want to go a long distance, I'm going to get into a vehicle. So make a virtual vehicle that I can get into.
- That's great. What moniker do you want us to use WebXR or WebVR?
- For now, we just say MR. Mixed reality. Because honestly, we don't know what this is going to end up being. There is a WebVR 2.0. The AR bits may be part of that. It may be part of 3.O. We might change the name, who knows. In general, mixed reality is the best word we've come up with for this stuff.
- And that question comes from Rabimba, and there is a second part to the question. Will WebXR bits be merged into WebVR 2.0?
- I don't know yet. We're, you know. We have working meetings to figure all this stuff out. It turns out it's trickier than just let's, you know, can we physically build the API? Because there's a bunch of different devices. Some have backwards compatibility issues. There's a lot of privacy issues. For example, Web RTC lets you scan for the cameras. It used to let you scan, get a list of cameras. But getting a list of cameras on a device can actually be used as a form of tracking, more accurate than a cookie. So that was rewritten to just let you request, this is what I want, and the device gives you the best thing back. So we're looking at doing a similar thing with VR and AR capabilities. So designing for privacy is a lot harder than just designing to functionally make it work. So I can't say yet. But we're working on it.
- You'll have to let us know.
- And would you like us to start evangelizing, using the newer or experimental capabilities that MR proposal brings?
- I wouldn't, don't evangelize yet. I mean, you can say, you know, this is the direction everything's going. What we really want to hear is what are the use cases? So that we can build the API around what people actually want to do. You know, we can say AR on the web, but without having specific examples, we don't really know what that means.
- Absolutely. Going back to the Virtual Boy, this question is related, I heard the Virtual Boy caused damage to people's vision. Do you know if there's any truth to this?
- I never got to try one because I was 13, I think, when they came out and couldn't afford it. But I read magazine reports at the time that people got headaches. Essentially, so doing 3D with a 2D screen is actually really hard, it's more than just having two pictures. You're essentially forcing your eyes, you're tricking your eyes. And eventually your eyes figure that out and start causing pain. So, the fact that it didn't move helped. But they had focal plane issues where something would appear to be further away, but it was actually, you know, a local screen, and so your eyes are trying to focus on something further away. The fact that we can make the screens higher resolution and we have better optics helps that. But it's still kind of a problem. I don't recommend anybody wear these for more than an hour at a time. You know, this is not, this is not like optometrist optics yet. Now, given the rumors I've heard about Magic Leap, you know, having multiple focal points, we'll see. But yes, there's truth to that, and as application developers, we still need to be aware of physiological problems with 3D that we still have.
- OK, tell us a little bit about how SLAM works in the background. This questions comes from Vigneshwer.
- It's magic pixie dust. There's a lot of crazy math involved. Essentially, it's, and this is where I've forgotten all my high end math from college. My basic understanding is that from the camera, even though it's not stereo, you're constantly moving around, and so it's getting little bits of parallax information to get a rough estimate of what's out there. But it's just rough. Then the internal sensors give it a sense of relative change. Like, have you rotated relative to gravity or relative to the magnetic field. And it's by combining the two together that it can get rid of the error from the drift. But as you can imagine, all of these things are noisy. You know, there's always going to be some level of error. So calibrating it to the specific hardware makes that more accurate. However, even with all of that, it's still kind of lossy. And it takes a little while. So you often, starting an application, and it can take five or 10 seconds before it has found the specific anchors you're looking for. So we still need to plan on that. This is where our web philosophy of gracefully degrading the experience comes in handy. Whatever you're making, try to make it work both with and without that information.
- And it looks like we have two more questions left in the etherpad. Our next question, are these frameworks, say, for example, ARCore, cross-platform compatible by default?
- That is our goal, yes. So ARKit, from Apple, of course, only supports Apple's devices. Google had to support both. For the work we're doing, we want to make this, you know, eventually a part of a web standard, so it will work everywhere. But rather than try and implement everything on top, it's exposed the native capabilities, whatever is available. So, look at how WebRTC works, that's kind of the direction we're taking with this.
- Great. One final question for you, Josh, if you don't mind. Do you think that the rise of this industry will cause a boom in the demand on and for the GPUs?
- Absolutely. GPUs are, they're becoming general purpose, to the point where they're essentially, if you can phrase your computational problem in a way that's parallel, then you can get huge amounts of bang for the buck. But we still have a problem of - these things are mobile, and there's batteries. And right now, even, you know, I have a brand new iPhone Eight Plus. And if I try out one of the ARKit demos for more than, you know, 30 minutes, it has drained a significant amount of my battery. When I use my HTC Vive, I have the beefiest gaming laptop available and the fans are running full blast when I have VR on. This stuff is very intensive. And will continue to grow. So, you know, it's great news for Nvidia. Like, they, it's like with Intel or like, you know, no matter how fast I made a computer, we will have software to use at this cycle. No matter how fast these GPUs get, we will have richer scenes, richer interactions that take up all available GPU resources. So, yeah. I would certainly invest in some Nvidia stock. Note I am not a stock broker.
- All disclaimers.
- Well, thank you so much, Josh. I believe that is the last of the questions we have in the etherpad, let me take a quick look here, just confirm--
- And perfect timing.
- Great. All right, I think that's--
- Thank you all.
- --circle back, yeah.
- Thank you all. If you have any other questions, you can e-mail me. My Mozilla e-mail is JMarinacci@Mozilla.com. But no one can spell my last name, so email@example.com will help you get to me. Thank you very much. Have a good day.
- Bye Josh, thank you so much. It was a pleasure having you. Cheers.
Josh Marinacci (@joshmarinacci) recently joined Mozilla as a Senior Developer Evangelist working with the Mixed Reality Team. Josh will give an overview of Virtual, Augmented, and Mixed Reality; including a brief history of the field, what you need to know, and what you can ignore for now. Here's the full transcript: