More than 5 years have passed since last update.

Making Augmented Reality app easily with Scenekit + Vuforia (in English)

Last updated at 2016-02-27Posted at 2016-02-27

Japanese version of the article is also available

TL;DR

Making Augmented Reality app easily with Scenekit + Vuforia

（See here for full movie）

Augmented Reality (AR) app are usualy made with OpenGL or Unity. However it's difficult to create complex 3D scene such as scene with physics simulation and so on. With SceneKit, you can create your own AR app with complex scene easily and it can be embedded in usual app using UIKit.

SceneKit

SceneKit is a high-level 3D graphics framework that helps you create 3D animated scenes and effects in your apps. It incorporates a physics engine, a particle generator, and easy ways to script the actions of 3D objects so you can describe your scene in terms of its content — geometry, materials, lights, and cameras — then animate it by describing changes to those objects.

SceneKit - Apple Developer

SceneKit is a 3D framework which allows you to create something like 3D video games. It was only for Mac but since iOS8 you can use it in iOS. Add SCNView as subview to your view and add models, lights, camera to SCNView's SCNScene property, The 3D scene is automatically drawn by the framework. It's really easy. And also it provides a way to combine the scene with OpenGL or Metal. So let's take advangtage of that and add SceneKit object over the video camera input from Vuforia.

Vuforia

Vuforiais a mobile vision platform for AR developed by Qualcomm and now it was acquired by PTC. Vuforia provides AR library for iOS, Android and Unity and you can use all fundamental features without a fee (with watermark shown only once a day) Details are here。

The SDK is provided as a static library (.a) so you can't use Swift. How sad.

Bulid samples and run.

Visit Vuforia Developer Portal, then register and login.
Download SDK and Sample projects.
Move the sample project to samples directory in SDK. (official really short description are here）.
Issue Licence Key which will be used in the app at LicenceManager.
Open SampleApplicationSession.mm in the sample and replace with QCAR::setInitParameters(mQCARInitFalgs, *Your License Key*). (Thanks to the detailed explanation in [Log] Xcode + vuforiaでARアプリを動かしてみる - しめ鯖日記 In Japanese).
Build the target with real device (If build a target with iOS Simulator, it fails).
Voilà! Now you can try the sample app at your own iPhone. There are many samples in the app:

Image Targets - AR with pictures already registered in the Vuforia cloud. It enables highly precise tracking by image processing in the cloud.
Cylinder Targets, Multi Targets - Use cylinder or cuboid as a marker. Even solid objects can be used as a marker.
User Defined Targets - Users can take a picture includes planar object and Use that as a marker. It's awesome! From now on, let's integrate SceneKit into User Defined Targets sample.

AR principles

Theory

The AR in this context means reconstructing the marker's relative 3D position towards a camera from the 2D image which taken by the camera. The process is:

Convert Marker Coordinate (World Coordinate; its origin is a maker's position. 3D.) to Camera Coordinate(its origin is camera's position. 3D), which means get the marker's relative position seen by the camera.
Convert Camera Coordinate (3D) to final 2D coordinate.

This process can be represented as a formula (using homogeneous coordinates):

Intrinsic parameter matrix (intrinsic matrix)

Intrinsic parameter matrix represents the camera's information such as focal length and center coordinate of the image. It's peculiar to the camera. So your iPhone 6s and my iPhone6s has same intrinsic parameter matrix. Vuforia has a bunch of cameras' intrinsic matrix.

If intrinsic matrix isn't right, drawn virtual objects and image from camera's perspective does not match and result in weird looking.

If you interested in calculate intrinsic matrix of unknown camera, look up Zhang's method, which is implemented in OpenCV, too.

Extrinsic parameter matrix (extrinsic matrix)

Extrinsic parameter matrix represents 3-dimentional rotation and translation. With extrinsic matrix, we can convert a coordinate of the object in Marker Coordinate to Camera Coordinate.

Extrinsic matrix differs by how user hold his/her camera or where marker located so we have to calculate every frame. Vuforia is really good at that. very precise.

Theoretically, extrinsic parameter matrix can be obtained by solving PnP(Perspective n-Point Problem), which is implemented in OpenCV, too.

Implementation

In reality, you don't have to do the matrix calculation every frame and leave that to the 3D library. For example, Using OpenGL, set perspective projection matrix using intrinsic matrix and model view matrix using extrins matrix, OpenGL reproduces the relation between the marker and the camera. Samples in Vuforia uses OpenGL and be implemented in this way.

This process is located at renderFrameQCAR method in UserDefinedTargetsEAGLView.mm

The process obtaining model view matrix from extrinsic parameter which caluclated by Vuforia is:

QCAR::Matrix44F modelViewMatrix = QCAR::Tool::convertPose2GLMatrix(result->getPose());`

And obtaining perspective projection matrix from intrinsic matrix is:

&vapp.projectionMatrix.data[0]

vapp is a instance of ApplicationSession. And rest of the codes are OpenGL mumbo jumbo.

AR with SceneKit!

Finally, let's draw 3D objects in renderFrameQCAR with SceneKit instead of OpenGL.

Add properties

First, add Scenekit.framework to the project.

To draw Scenekit's objects into Vuforia's OpenGL context which draws camera input, We can use SCNRenderer. So add it as property to UserDefinedTargetsEAGLView.h.


# import <SceneKit/SceneKit.h> // Import SceneKit

@interface UserDefinedTargetsEAGLView : UIView <UIGLViewProtocol, GLResourceHandler>

// so much codes...

@property (nonatomic, strong) SCNRenderer *renderer; // renderer
@property (nonatomic, strong) SCNNode *cameraNode; // node which holds camera
@property (nonatomic, assign) CFAbsoluteTime startTime; // elapsed time in 3D scene

And add datasouce which holds SCNScene (An object which represents scene drawn in SceneKit).

@protocol UserDefinedTargetsEAGLViewSceneSource <NSObject>

- (SCNScene *)sceneForEAGLView:(UIView *)view;

@end

@property (weak, nonatomic) id<UserDefinedTargetsEAGLViewSceneSource> sceneSource;

About SCNScene, there are a bunch of the resources on the Internet. So play around with it.
One thing, Scenekit's objects size should be 10~100 so it fits Vuforia's extrinsic parameter scale.

Initiate SCNRenderer

- (void)setupRenderer {
    self.renderer = [SCNRenderer rendererWithContext:context options:nil];
    self.renderer.autoenablesDefaultLighting = YES;
    self.renderer.playing = YES;
    
    if (self.sceneSource != nil) {
        self.renderer.scene = [self.sceneSource sceneForEAGLView:self];
        
        SCNCamera *camera = [SCNCamera camera];
        self.cameraNode = [SCNNode node];
        self.cameraNode.camera = camera;
        [self.renderer.scene.rootNode addChildNode:self.cameraNode];
        self.renderer.pointOfView = self.cameraNode;
    }
    
}

Call this method after UserDefinedTargetsEAGLView's - (id)initWithFrame:appSession:. The method is for initiate SCNRenderer and set OpenGL context which is created in the sample and add camera to the scene.

The rest is setting intrinsic matrix and extrinsic matrix to the camera. But. Here's a problem. The extrinsic matrix given by Vuforia converts object's coordinate to camera coordinate. In OpenGL, you can use it directly as a model view matrix. But SceneKit does not expose such matrix. So we have to caluclate the camera's position and rotation with respect to the marker. I really tried hard to work it out... and it turns out really simple solution. Just caluclate inverse matrix of the extrinsic matrix and set it to SCNCamera's transform property. That's it.

Caution: OpenGL and Scenekit matrix's are column-major. You have to be careful about that when you work with element of the matrix.


// Converts Vuforia matrix to SceneKit matrix
- (SCNMatrix4)SCNMatrix4FromQCARMatrix44:(QCAR::Matrix44F)matrix {
    GLKMatrix4 glkMatrix;
    
    for(int i=0; i<16; i++) {
        glkMatrix.m[i] = matrix.data[i];
    }
    
    return SCNMatrix4FromGLKMatrix4(glkMatrix);
    
}

// Calculate inverse matrix and assign it to cameraNode
- (void)setCameraMatrix:(QCAR::Matrix44F)matrix {
    SCNMatrix4 extrinsic = [self SCNMatrix4FromQCARMatrix44:matrix];
    SCNMatrix4 inverted = SCNMatrix4Invert(extrinsic); // inverse matrix!
    self.cameraNode.transform = inverted; // assign it to the camera node's transform property.
}

Then set up perspective projection matrix from the intrinsic matrix.

- (void)setProjectionMatrix:(QCAR::Matrix44F)matrix {
    self.cameraNode.camera.projectionTransform = [self SCNMatrix4FromQCARMatrix44:matrix];
}

Finally, lets call these methods in rendeFrameQCAR and render the scene. lets replace renderFrameQCAR like below:

// *** QCAR will call this method periodically on a background thread ***
- (void)renderFrameQCAR
{
    [self setFramebuffer];
    
    // Clear colour and depth buffers
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    
    // Render video background and retrieve tracking state
    QCAR::State state = QCAR::Renderer::getInstance().begin();
    QCAR::Renderer::getInstance().drawVideoBackground();
    
    glEnable(GL_DEPTH_TEST);
    // We must detect if background reflection is active and adjust the culling direction.
    // If the reflection is active, this means the pose matrix has been reflected as well,
    // therefore standard counter clockwise face culling will result in "inside out" models.
    glEnable(GL_CULL_FACE);
    glCullFace(GL_BACK);
    if(QCAR::Renderer::getInstance().getVideoBackgroundConfig().mReflection == QCAR::VIDEO_BACKGROUND_REFLECTION_ON)
        glFrontFace(GL_CW);  //Front camera
    else
        glFrontFace(GL_CCW);   //Back camera
    
    // Render the RefFree UI elements depending on the current state
    refFreeFrame->render();
    
    [self setProjectionMatrix:vapp.projectionMatrix]; // Actually you don't have to call that every frame. Because projection matrix does not change.
    
    
    for (int i = 0; i < state.getNumTrackableResults(); ++i) {
        // Get the trackable
        const QCAR::TrackableResult* result = state.getTrackableResult(i);
        //const QCAR::Trackable& trackable = result->getTrackable();
        QCAR::Matrix44F modelViewMatrix = QCAR::Tool::convertPose2GLMatrix(result->getPose()); // Obtain model view matrix
        
        ApplicationUtils::translatePoseMatrix(0.0f, 0.0f, kObjectScale, &modelViewMatrix.data[0]);
        ApplicationUtils::scalePoseMatrix(kObjectScale, kObjectScale, kObjectScale, &modelViewMatrix.data[0]);
        
        [self setCameraMatrix:modelViewMatrix]; // SCNCameraにセット
        [self.renderer renderAtTime:CFAbsoluteTimeGetCurrent() - self.startTime]; // Render objects into OpenGL context
        
        ApplicationUtils::checkGlError("EAGLView renderFrameQCAR");
    }
    
    glDisable(GL_DEPTH_TEST);
    glDisable(GL_CULL_FACE);
    
    QCAR::Renderer::getInstance().end();
    [self presentFramebuffer];

}

Voilà! With SceneKit, you cand do so much things, lighting, materials, physics simulation, user interaction...

Response to user tap

SCNRenderer conforms to SCNSceneRenderer protocol so to get user touch, you can use hitTest:options method. Give touch position as CGPoint to it, it returns SCNNode which corresponds to the point. But be careful, origin of coordinate system in OpenGL is at bottom left and in Retina display, the size of Viewports is doubled. so when you use UITapGestureRecognizer or something like that, you have to double x and y and subtract y from Viewports height, and then pass the point to hitTest:options.

References

藤本雄一郎, 青砥隆仁, 浦西友樹, 大倉史生, 小枝正直, 中島悠太, 山本豪志朗, OpenCV3プログラミングブック

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up