
Member-only story
AR in the browser !
With Googleβs Mediapipe and Three.js
Big Picture/Introduction
The biggest drawback from VR ( for me, at least ) has been the need for a clunky headset, closely followed by the proprietary and expensive nature of the current efforts. So, I thought to look at AR in the humble browser as a way to dip my toes, so to speak. The good news is that the technology is available, if a little on the experimental side. The bad news is that coding for VR/AR is hard, but then again, so are all new technologies. So, come along and check out some of the basics.
Your AR/VR developing experience will probably be different from mine as
newer and, hopefully, better technologies will appear. And then there's
choice: you could learn for a platform like Meta, use frameworks like
Unity, or opt for a combination of tooling and platforms.
I chose AR in the browser and JavaScript because they are very
accessible. However, this is not cutting edge, so keep your
expectations low, and you'll be okay.
I recommend you read the whole article first, then try the code samples,
and finally, dig into the code. But you do you.
Some Architecture
The backbone of an AR/VR app consists of capturing reality and somehow digitizing it for later processing. In this case, we are going to use something called MediaPipe from Google, which bills itself as βSelf-serve ML solutions with simple-to-use abstractions.β In plain English, it is a series of high-grade cross-platform code to capture reality. It is not perfect (an early release right now), and the developer experience felt disjointed, but the βcapturing reality partβ works as advertised (and in JS!). The βhow it worksβ is better explained with a simple workflow (this will be important later). While there are a few things you can capture (solutions in Googleβs speak), Iβll focus on human facesβ¦

So how does one recognize faces, landmarks (the 3D geometry of your face), and blend shapes (eye direction, mouth position, head tilt, etc., etc.)?