Kinetic Interfaces – Intel RealSense for Web w/ Node.JS, Midterm (Kevin Li)

Realsense for Web and p5.js

 

Motivations

I wanted to explore Depth Sensing cameras and technology. I knew that regular cameras were able to output RGB images and we can use openCV or machine learning to process the image to learn basic features of a scene. For example, we can do blob or contour detection which allows us to do hand or body tracking. We could also do color tracking, optical flow, background subtraction in openCV. We could apply machine learning models to do face detection, tracking and recognition. Recently, pose or skeletal estimation has also been made possible (posenet, openpose).

However, even with openCV and ML, grasping dimensions (or depth) is a very difficult problem. This is where depth sensors come in.

 

Pre-Work Resources

I gathered a variety of sources and material to read before I began to do experiments. The links below constitute some of my research and are also some of the more interesting readings I thought were very relevant.

Articles on pose estimation or detection using ML:

https://medium.com/@samim/human-pose-detection-51268e95ddc2

https://github.com/CMU-Perceptual-Computing-Lab/openpose

https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5

On Kinect:

http://pages.cs.wisc.edu/~ahmad/kinect.pdf (great in-depth on how Kinect works)

On Depth Sensing vs Machine Learning:

https://blog.cometlabs.io/depth-sensors-are-the-key-to-unlocking-next-level-computer-vision-applications-3499533d3246 (great article!)

On Stereo Vision Depth Sensing:

Building and calibrating a stereo camera with OpenCV (<50€)

https://github.com/IntelRealSense/librealsense/blob/master/doc/depth-from-stereo.md

On Using Intel Realsense API and more CV resources:

https://github.com/IntelRealSense/librealsense/wiki/API-How-To

https://github.com/jbhuang0604/awesome-computer-vision

 

Research Summary

I learned through while cutting edge machine learning models can provide near real-time pose estimation, models are typically only trained to do the thing they are good at, such as detecting a pose. Furthermore, a big problem is energy consumption and the requirement of fast graphics processing units.

Quality depth information, however, is much more raw in nature, and can be make background removal, blob detection, point cloud visualization, scene measurement and reconstruction, as well as many other tasks more easy and fun.

They do this by adding a new channel of information, a depth (D), for every pixel which comprises of a depth map.

There are many different types of depth sensors such as structured light, infrared stereo vision, time-of-flight sensors and this article gives a really well written overview of all of them.

https://blog.cometlabs.io/depth-sensors-are-the-key-to-unlocking-next-level-computer-vision-applications-3499533d3246

All in all, each has specific advantages and drawbacks. Through this article, I knew the Kinect used structured light (http://pages.cs.wisc.edu/~ahmad/kinect.pdf) and I generally knew how the Kinect worked as well as its depth quality from having done previous experiments with it. I wanted to explore a new much smaller (runs on USB), depth sensor that uses a method known as infrared stereo-vision (which is inspired by our human vision system) to derive a depth map. It relies on two cameras and calculates depth by estimating disparities between matching key-points in the left and right images.

I knew the Realsense library had an open source SDK (https://github.com/IntelRealSense/librealsense), however it is written in C++ which means its not the easiest to get started with, to compile, and to document. But recently, they’ve released a NodeJS wrapper which I hope to use to make things easier for me. One of my goals is to figure out how to use the library but also see if I could make it easier to get started with a more familiar drawing library that we know or use.

 

Process

Hour 1 – 4: Getting Intel RealSense Set Up, Downloading and Installing RealSense Viewer, Installing Node-LibRealSense library bindings and C++ LibRealSense SDK, Playing Around With Different Configuration Settings in Viewer

Hour 5: Opening and Running Node Example Code, I see a somewhat complicated example of being able to send RealSense data through websockets, seems promising and I want to try to build my own.

Hour 6: Looking at different frontend libraries or frameworks (React, Electron) before deciding to just plunge in and write some code.

https://gist.github.com/polarizing/4463aacc88c58a9878cb180ad838c777

I’m able to open a context, look through available devices and sensors, get a specific sensor based on the name either “Stereo Module” or “RGB Camera”. Then I can get all the stream profiles for that sensor (there are a lot of profiles, depending on fps, resolution, and type — infrared or depth), but the most basic one that I want is a depth stream of resolution 1280*720 and at 30fps.

I can open a connection to the sensor by .open() which opens the subdevice for exclusive access.

Hour 7: Lots of progress! I canstart capturing frames by calling .start() and providing a callback function. A DepthFrame object is passed to this callback every frame which consists of the depth data, a timestamp, and a frame count number. I can then use the Colorizer class that comes with the Node RealSense library to visualize the depth data by tranforming the data into RGB8 format. This has a problem though, as the depth data is 1280*720 = 921600. However, this data is then stored as RGB8 which is 921600 * 3 or 2764800 or 2.76 MB. At 30 frames per second, this would be equivalent to nearly 83MB of data / second! Probably way too much for streaming anything between applications. We can compress this using a fast image compression library called Sharp. We can get quite good results which this. Setting our image quality to 10, we get 23kb per frame or 690 kb / s. Setting our image quality to 25, gets us 49kb per frame or 1.5MB a second (which is quite reasonable). Even at image quality 50, which is 76kb per frame, we can average 2.2MB / s. From this, I estimate it is quite reasonable to stream the depth data within local applications and has potential to even stream over the Internet. I might try that next.

https://gist.github.com/polarizing/12aa720526c91596883da541e584e018

Current Backend Diagram

[Insert Image]

Hour 8 – 9: More progress. I got stuck in a few spots that were sticky situations but ended up getting through it and now we have a quick and dirty working implementation of RealSense through WebSockets. I connected it through using the WebSockets Node library (ws).

Challenges Here + Blob

Hour 8 (backend w/ websockets): https://gist.github.com/polarizing/be9873a9a07d5df2155e7df436dc282d

Hour 9 (connecting w/ p5.js): https://editor.p5js.org/polarizing/sketches/Hyoci5V37

The video below show the color depth image being processed directly in p5.js using a very simple color matching algorithm (showing blue pixels) which gives us a rudimentary depth thresholding technique (this would be much easier to do directly in Node since we have the exact depth values, which we will try later). The second video shows another rudimentary technique of averaging the blue color pixels to get the center mass of the person in the video. Of course, this is all made much easier since we have the depth color image from RealSense. This would not be very possible, at least in a frontend library like p5.js, without sending depth camera information across the network since most computer vision libraries still exist only for backend. This introduces some new possibilities for creative projects using depth camera + frontend creative coding libraries, especially since the javascript is so adept at real-time networking and interfacing.

Before moving on to play with more examples of depth, specifically, point clouds, depth thresholding, and integrating openCV to do more fun stuff, and figuring out the best way to interface with this, I want to see if I can get the depth data sent through to Processing as well.

Hour 10 (Processing): I spent this hour researching what ways we can send blob (binary large object blobs) over network and settled on Websockets or OSC and if Java can actually decipher blob objects. I decided to move on instead of keep working on this part.

Hour 11 and Hour 12

I was met with a few challenges. One of the main challenges was struggling with async vs syncronous frame-polling. I did not know that the NodeJS wrapper had two different types of calls to poll for frames. One of which was an synchronous thread-blocking version — pipeline.waitForFrames() and the other which was a asynchronous — pipeline.pollForFrames(). In any case, the second async version is what we want but we would need to implement an event loop timer (setInterval or preferably something better like a draw() function) that can call pollForFrames every 30 seconds.

Hour 13 and Hour 14

Streaming raw depth data, receiving in p5 and processing image texture as points to render point cloud, lots of problems here

I wanted to stream the raw depth data because as of now, I have color depth data but having to do post-processing on it is a pain. I wanted to send the depth data as an image with depth values encoded between 0 – 255 as a grayscale image or as raw values which we can convert with a depth scale. This would be similar to how the Kinect operates.

I thought it would be pretty simple, just get the depth scale of the sensor and multiply by each raw depth data at each pixel and write back to the data buffer.

However, I was stuck for quite a long time because I was sending nonsensical values over websockets which resulted in a very weird visualization in p5. I wish I took a video of this. Anyways, I believe I was doing something wrong with the depth scale and my understanding of how the raw values worked. I decided to go to sleep and think about it the next day.

Hour 15 and 16

When I woke up, I realized something simple that I overlooked. I realized that I did not need to convert the raw values as I remembered the Realsense Viewer had an option to view the depth data as a white-to-black color scheme. I realized I could toggle the color scheme configuration when I was calling the colorizer() function to convert the depth map to an RGB image.

If I did: this.colorizer.setOption(rs.option.OPTION_COLOR_SCHEME, 2)

I could set the color-scheme to a already-mapped 0-255 grayscale image. Then, it would be the same process as sending the color image over. The simplicity of this approach was unbelievable as right after this realization and implementation, I tried it, and the results were immediate. I was able to send the depth image over p5 and I could p5 to sample the depth image to render a point cloud (see below video)

Hour 17 and 18

I was able to link multiple computers to receive the depth information being broadcast from the central server (laptop). I also took a demo video in a taxi as I realized that I was not limited in using the Realsense since it was portable and powered off USB. I simply ran the p5 sketch and the local server and it works!

Taxi Test Video:

IMG_9288-1

Hour 19 and 20

I used the remaining time to work on a quick presentation on my findings and work.

https://docs.google.com/presentation/d/1sAfK0ugYRg4xGDMlughxa2bBDK94lgAUkBgh6LAyLNI/edit?usp=sharing

Conclusions and Future Work

After doing this exploration, I think there is potential in further developing this into a more fully-fledged “Web Realsense” library similar to how the Kinectron (https://github.com/kinectron/kinectron) works for the Kinect.

Leave a Reply