Making Augmented Reality applications with JSARToolKit5

Introduction

This article is a short introduction to building augmented reality (AR) web apps with JSARToolKit5. We’re going to learn what is JSARToolKit5, what kind of AR apps you can use it for and how you can use it with the Three.js 3D engine to build 3D AR objects. I’ll also briefly touch on what AR is and why I think it’s going to be awesome.

Augmented Reality

Augmented Reality is a term used for talking about adding virtual objects to the real world. You may also have heard about Virtual Reality, which is about transporting you to places that are entirely virtual. The two use a lot of the same technologies, but the use cases are different. With AR, you could place virtual IKEA furniture on the floor of your place to see how they'd look like next to your other furniture. With VR, you could teleport yourself into the architectural blueprint of a flat and see how it'd look like when built. AR adds stuff to the real world, VR replaces the real world.

What makes me excited about AR is how you can bring virtual objects to the real world. Imagine adding your notes and streams around you on your desk, all updating live. Then walk over to a coffee shop, sit down, and have them all pop up alive there as well. No need to carry around a bunch of tiny screens, make them all virtual.

Another use for AR is adding smarts to real-world objects. Light switches that have a line going to the light they control, tree leaves that tell you what kind of tree it is, name tags hovering over people’s heads, navigation arrows appearing on the street, lines in the sky guiding you to meet your friends, translation appearing on top of text that you’re looking at. Adding knowledge and remote controls into the world. By knowing what you’re looking at, you can pull down what is known about it and what you can do with it.

Briefly, JSARToolKit5 is a JavaScript port of the latest and shiniest version of the ARToolKit augmented reality library. You can use JSARToolKit5 to add objects into images and videos and have them look like they’re a part of the real-world scene. And not only that, you can also make the virtual objects interactive and animated.

To demonstrate the kinds of applications you can make with JSARToolKit5, we’re going to write a small AR app that augments the video from your device’s camera. The app is going to place a small box into the video feed and you can tap on the box to open it and have a look at what’s inside.

Let’s get started!

Basics of JSARToolKit

JSARToolKit works by tracking special images in your video. These special images are called AR markers and JSARToolKit can figure out where they are in the video and what direction they’re pointing at. By getting the positions and orientations of the AR markers from JSARToolKit, you can draw 3D objects on top of the video at the right places to make it look like the objects are a part of the video.

To load JSARToolKit, include the minified script into your webpage.

<script src="build/artoolkit.min.js"></script>

Yes, as you have figured out, we’re going to need three things to build our AR app. We’re going to need an AR marker, a video, and a way to draw 3D graphics on top of the video. For the marker, we’ll use a special built-in marker called a BarcodeMarker. The video we’ll get from your device camera using the getUserMedia API. And we’re going to use Three.js to do the 3D graphics. The first thing we’ll do is create the video source. For that, we’ll use the getUserMedia API to get a URL for the device camera and then use that URL as the source for a video element. With this, we have a video element that shows the device camera video feed on it.

The simple way to do all this is to use a helper function in JSARToolKit5: ARController.getUserMedia(options). The onSuccess callback in the options object gets called with a ready-to-use video element. Note that on Chrome for Android, mobile video only plays after interacting with the page. ARController.getUserMedia adds a window-level touch event listener that plays the video.

var video = ARController.getUserMedia({
	maxARVideoSize: 320, // do AR processing on scaled down video of this size
	facing: "environment",
	onSuccess: function(video) {
		console.log('got video', video);
	}
});

The first thing we need to do is create an ARController. The ARController keeps track of the markers we have registered and reads out pixels from the video source. Let's make one using our new-fangled video element.

...

var arController = new ARController(video, 'Data/camera_para.dat');
arController.onload = function() {
	console.log('ARController ready for use', arController);
};

...

You could also write this by explicitly loading the ARCameraParam (maybe you want to reuse it):

...

var camera = new ARCameraParam('Data/camera_para.dat');
camera.onload = function() {
	var arController = new ARController(video.videoWidth, video.videoHeight, camera);
	console.log('ARController ready for use', arController);
};

...

Getting marker positions from JSARToolKit

Now that we have the video hooked up to the manager, we're ready to start tracking AR markers in the video. Once we see an AR marker, we want to get its details and find out where it is and which way it is pointing.

Luckily, this ARToolKit takes care of all the math and gives us a nice readily-usable transformation matrix. To get the markers in the video.

// Set the ARController pattern detection mode to detect barcode markers.
arController.setPatternDetectionMode( artoolkit.AR_MATRIX_CODE_DETECTION );

// Add an event listener to listen to getMarker events on the ARController.
// Whenever ARController#process detects a marker, it fires a getMarker event
// with the marker details.
//
var detectedBarcodeMarkers = {};
arController.addEventListener('getMarker', function(ev) {
	var barcodeId = ev.data.marker.idMatrix;
	if (barcodeId !== -1) {
		console.log("saw a barcode marker with id", barcodeId);

		// Note that you need to copy the values of the transformation matrix,
		// as the event transformation matrix is reused for each marker event
		// sent by an ARController.
		//
		var transform = ev.data.matrix;
		if (!detectedBarcodeMarkers[barcodeId]) {
			detectedBarcodeMarkers[barcodeId] = {
				visible: true,
				matrix: new Float32Array(16)
			}
		}
		detectedBarcodeMarkers[barcodeId].visible = true;
		detectedBarcodeMarkers[barcodeId].matrix.set(transform);

	}
});

var cameraMatrix = arController.getCameraMatrix();

for (var i in detectedBarcodeMarkers) {
	detectedBarcodeMarkers.visible = false;
}

// Process a video frame to find markers in it.
// Each detected marker fires a getMarker event.
//
arController.process(video);

Using JSARToolKit with Three.js

Now that we have the marker positions from JSARToolKit, we can copy them to the Three.js objects that should appear on top of the markers. Note that if you move the marker outside of the camera, the tracking is lost and the Three.js object won't get updated. You can make the object either stay where it is, have it abruptly disappear, or fade it out. I prefer fading out objects that have lost tracking, but it can be a bit fiddly controlling all the opacities of the materials in the object.

The cool thing is that the Three.js object tracking the marker is a normal Three.js object. So we can use all the usual tricks at our disposal. For this demo, we'll add a touch event handler to detect when you're tapping on the object and animating the object when you do. The code below is the usual Three.js mouse raycasting code, it casts a ray from the mouse coordinates into the 3D scene and checks the ray against objects in the scene to see if any of them intersect the ray. If we find an intersection it means that the intersected object is under the mouse cursor. Add the raycasting code in your tap event listener and presto! You've got a tappable Three.js scene.

I added a small animation to the tap listener as well, so that when you click on the box object, it opens and stuff flies out. To check it out, go visit the demo page and allow access to your camera. Now point your camera at a flat piece of paper with the marker. Hohoo, you can now see the box. Tap on it to open it!

Code-wise, I'm going to cheat a bit and use the Three.js integration in JSARToolKit5. It does all the stuff that we set up above, and packages it up into a couple of easy-to-use methods. Just load the js/artoolkit.three.js support library and you're good to go.

<script src="build/artoolkit.min.js"></script>
<script src="js/artoolkit.three.js"></script>
<script>
ARController.getUserMediaThreeScene(
	facing: 'environment',
	onSuccess: function(arScene, arController, arCameraParam) {

		arController.setPatternDetectionMode(artoolkit.AR_MATRIX_CODE_DETECTION);

		// Track the barcode marker with id 20.
		// markerRoot is a THREE.Object3D that tracks the marker position.
		//
		var markerRoot = arController.createThreeBarcodeMarker(20);

		// Add the openable box to the marker root.
		//
		var box = createOpenableBox();
		markerRoot.add(box);

		// Add the marker root to the AR scene.
		//
		arScene.scene.add(markerRoot);


		// Add event handlers to make the box open/close on tap.
		//
		window.addEventListener('touchend', function(ev) {
			if (box.hit( ev.touches[0], arScene.camera )) {
				box.toggleOpen();
			}
		}, false);

		window.addEventListener('mouseup', function(ev) {
			if (box.hit( ev, arScene.camera )) {
				box.toggleOpen();
			}
		}, false);


		// Create renderer and deal with mobile orientation.
		//
		var renderer = new THREE.WebGLRenderer({antialias: true});
		var f = Math.min(
			window.innerWidth / arScene.video.videoWidth,
			window.innerHeight / arScene.video.videoHeight
		);
		var w = f * arScene.video.videoWidth;
		var h = f * arScene.video.videoHeight;

		if (arController.orientation === 'portrait') {
			renderer.setSize(h,w);
			renderer.domElement.style.transformOrigin = '0 0';
			renderer.domElement.style.transform = 'rotate(-90deg) translateX(-100%)';
		} else {
			renderer.setSize(w,h);
		}
		document.body.appendChild(renderer.domElement);


		// Call arScene.renderOn on each frame,
		// it does marker detection, updates the Three.js scene and draws a new frame.
		//
		var tick = function() {
			requestAnimationFrame(tick);
			arScene.renderOn(renderer);
		};
		tick();

	}
);
</script>

See the demo in action. Best viewed on Firefox Android or Chrome Android. Also works on the desktop browsers, though it's a bit more hassle to interact with webcam video. Sadly, iOS doesn't support live camera feeds on websites, so you can't view this demo on iOS.

JSARToolKit and Three.js without the helpers

Okay, so that's how you'd do a Three.js scene with the helpers. Maybe you don't want to use the helpers for some reason. Or you want to use JSARToolKit with some other drawing library. Fair enough, let me walk you through how you'd use raw JSARToolKit with Three.js.

Doing the Three.js integration manually is not all that much more work, you just need a bunch of boilerplate. First off, we need to track the markers using the getMarker event listener. In the following, I'm going to keep track of the position and visibility of the barcode marker with id 20.

// Create a marker root object to keep track of the marker.
//
var markerRoot = new THREE.Object3D();

// Make the marker root matrix manually managed.
//
markerRoot.matrixAutoUpdate = false;

// Add a getMarker event listener that keeps track of barcode marker with id 20.
// 
arController.addEventListener('getMarker', function(ev) {
	if (ev.data.marker.idMatrix === 20) {

		// The marker was found in this video frame, make it visible.
		markerRoot.visible = true;

		// Copy the marker transformation matrix to the markerRoot matrix.
		markerRoot.matrix.elements.set(ev.matrix);

	}
});

// Add a cube to the marker root.
//
markerRoot.add( new THREE.Mesh(new THREE.BoxGeometry(1,1,1), new THREE.NormalGeometry()) );

// Create renderer with a size that matches the video.
//
var renderer = new THREE.WebGLRenderer({ antialias: true });
renderer.setSize(video.videoWidth, video.videoHeight);
document.body.appendChild(renderer.domElement);

// Set up the scene and camera.
//
var scene = new THREE.Scene();
var camera = new THREE.Camera();
scene.add(camera);
scene.add(markerRoot);

// Make the camera matrix manually managed.
//
camera.matrixAutoUpdate = false;

// Set the camera matrix to the AR camera matrix.
//
camera.matrix.elements.set(arController.getCameraMatrix());

// On each frame, detect markers, update their positions and
// render the frame on the renderer.
//
var tick = function() {
	requestAnimationFrame(tick);

	// Hide the marker, we don't know if it's visible in this frame.
	markerRoot.visible = false;

	// Process detects markers in the video frame and sends
	// getMarker events to the event listeners.
	arController.process(video);

	// Render the updated scene.
	renderer.render(scene, camera);
};
tick();

Don't let the above scare you, it's just got a lot of comments for a few lines of code. That takes care of rendering the AR scene on renderer and tracking a barcode marker. You probably also want to overlay the AR scene on top of the video feed. Let's add that. Nothing too complicated, just drawing a plane with the video on it before anything else.

// To display the video, first create a texture from it.
var videoTex = new THREE.Texture(video);

// Use linear downscaling for videoTex
// (otherwise it needs to be power-of-two sized and you
// need to generate mipmaps, which are kinda useless here)
videoTex.minFilter = THREE.LinearFilter;

// And unflip the video Y-axis.
videoTex.flipY = false;

// Then create a plane textured with the video.
var plane = new THREE.Mesh(
  new THREE.PlaneBufferGeometry(2, 2),
  new THREE.MeshBasicMaterial({map: videoTex, side: THREE.DoubleSide})
);

// The video plane shouldn't care about the z-buffer.
plane.material.depthTest = false;
plane.material.depthWrite = false;

// Create a scene and a camera to draw the video.
var videoScene = new THREE.Scene();
var videoCamera = new THREE.OrthographicCamera(-1, 1, -1, 1, -1, 1);
videoScene.add(videoCamera);
videoScene.add(plane);

// Set the renderer autoClear to false, otherwise it
// clears the canvas before each render call.
renderer.autoClear = false;

// Draw the videoScene before the AR scene.
var tick = function() {
	requestAnimationFrame(tick);

	markerRoot.visible = false;
	arController.process(video);

	// Clear the renderer before drawing the videoScene,
	// followed by the AR scene.
	renderer.clear();
	renderer.render(videoScene, videoCamera);
	renderer.render(scene, camera);
};
tick();

Not too bad! Just a lot of boilerplate. Oh, one more thing. The JSARController ARController has built-in portrait-mode correction. So if you pass it a portrait-mode video, it rotates it 90 degrees to make the landscape camera params work properly. The result of that is that your renderer canvas ends up being on its side, which is not what you want. Here's how to correct for it:

// Rotate the video plane and the renderer if the arController is in portrait mode.
if (arController.orientation === 'portrait') {
	plane.rotation.z = Math.PI/2;

	renderer.setSize(video.videoHeight, video.videoWidth);
	renderer.domElement.style.transformOrigin = '0 0';
	renderer.domElement.style.transform = 'rotate(-90deg) translateX(-100%)';
} else {
	renderer.setSize(video.videoWidth, video.videoHeight);	
}

How about doing things the ARToolKit way?

If you feel like this whole event loop thing is not really your thing, I hear you. It's not for everyone. So: JSARToolKit5 also exposes the C-style ARToolKit API. Because, hey, it's pretty neat! Here's how to do the marker processing loop with it:

// Detect barcode markers
arController.setPatternDetectionMode(artoolkit.AR_MATRIX_CODE_DETECTION);

// Process the video frame.
arController.detectMarker(video);

var markerMatrix = new Float32Array(12);
var glMatrix = new Float32Array(16);

// Get the number of markers from the ARController and iterate over each marker.
var markers = arController.getMarkerNum();
for (var i=0; i < markers.length; i++) {

	var marker = arController.getMarker(i);

	// If we found the number 5 marker, let's get its transform matrix.
	if (marker.idMatrix === 5) {
		arController.getTransMatSquare(i, 1 /* marker width */, markerMatrix);

		// And convert it to a WebGL matrix.
		arController.transMatToGLMat(markerMatrix, glMatrix);

		// Do something with the glMatrix ...

	}
}

If we had seen the marker before in the previous frame, we could use the previous marker matrix as an input to the matrix calculation. By using the previous matrix as the basis for the current matrix calculation, we can achieve more stable tracking.

// ... skip to the marker detection loop
if (marker.idMatrix === 5) {
	if (markerWasInPreviousFrame) {
		// Get the marker matrix using continuous tracking
		arController.getTransMatSquareCont(i, 1, markerMatrix, markerMatrix);
	} else {
		arController.getTransMatSquare(i, 1, markerMatrix);	
	}

	// ... The rest is as usual
}

Emscripten? How do I debug that?

Oh yes, the JSARToolKit5 port of the ARToolKit library is powered by Emscripten, the amazing C/C++ to ASM.js compiler. As a result, JSARToolKit5 runs at near-native speeds on Firefox, and not too shabbily on other browsers. And, better still, porting new features and improvements from the upstream library is just a compile away (well, you do need to add a few lines of bindings.)

As a result, JSARToolKit comes with a two variants. One is a debug build, built with all debug symbols intact and making life easy for debugging the C++ side of the codebase. The other one is the minified optimized build that loads and runs faster.

To use the debug build, load build/artoolkit.debug.js and js/artoolkit.api.js into your page. The artoolkit.api.js file contains the C++ <-> JS API bindings alongside the implementations of ARController and ARCameraParam.

<script src="build/artoolkit.debug.js"></script>
<script src="js/artoolkit.api.js"></script>

Conclusion

Now you know that JSARToolKit is a library that tracks AR markers in video and images. You also know that AR markers are special images and they come in three different types in JSARToolKit. You may also know that you can use Three.js to draw 3D graphics on top of the AR markers in the video (and you certainly know how to make that happen! Because you're awesome and read the code examples!)

I hope you take these few tidbits and use them to build a dazzlingly radiant future of AR applications that make everyone's life so much better that you can't even imagine how much better it will be (A LOT BETTER). Or at least have fun making some AR mobile web apps. Can't underestimate the importance of having fun when developing.

Sally forth! Augment reality!