Kinetic Interfaces – Intel RealSense for Web w/ Node.JS, Midterm (Kevin Li)

Realsense for Web and p5.js

 

Motivations

I wanted to explore Depth Sensing cameras and technology. I knew that regular cameras were able to output RGB images and we can use openCV or machine learning to process the image to learn basic features of a scene. For example, we can do blob or contour detection which allows us to do hand or body tracking. We could also do color tracking, optical flow, background subtraction in openCV. We could apply machine learning models to do face detection, tracking and recognition. Recently, pose or skeletal estimation has also been made possible (posenet, openpose).

However, even with openCV and ML, grasping dimensions (or depth) is a very difficult problem. This is where depth sensors come in.

 

Pre-Work Resources

I gathered a variety of sources and material to read before I began to do experiments. The links below constitute some of my research and are also some of the more interesting readings I thought were very relevant.

Articles on pose estimation or detection using ML:

https://medium.com/@samim/human-pose-detection-51268e95ddc2

https://github.com/CMU-Perceptual-Computing-Lab/openpose

https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5

On Kinect:

http://pages.cs.wisc.edu/~ahmad/kinect.pdf (great in-depth on how Kinect works)

On Depth Sensing vs Machine Learning:

https://blog.cometlabs.io/depth-sensors-are-the-key-to-unlocking-next-level-computer-vision-applications-3499533d3246 (great article!)

On Stereo Vision Depth Sensing:

Building and calibrating a stereo camera with OpenCV (<50€)

https://github.com/IntelRealSense/librealsense/blob/master/doc/depth-from-stereo.md

On Using Intel Realsense API and more CV resources:

https://github.com/IntelRealSense/librealsense/wiki/API-How-To

https://github.com/jbhuang0604/awesome-computer-vision

 

Research Summary

I learned through while cutting edge machine learning models can provide near real-time pose estimation, models are typically only trained to do the thing they are good at, such as detecting a pose. Furthermore, a big problem is energy consumption and the requirement of fast graphics processing units.

Quality depth information, however, is much more raw in nature, and can be make background removal, blob detection, point cloud visualization, scene measurement and reconstruction, as well as many other tasks more easy and fun.

They do this by adding a new channel of information, a depth (D), for every pixel which comprises of a depth map.

There are many different types of depth sensors such as structured light, infrared stereo vision, time-of-flight sensors and this article gives a really well written overview of all of them.

https://blog.cometlabs.io/depth-sensors-are-the-key-to-unlocking-next-level-computer-vision-applications-3499533d3246

All in all, each has specific advantages and drawbacks. Through this article, I knew the Kinect used structured light (http://pages.cs.wisc.edu/~ahmad/kinect.pdf) and I generally knew how the Kinect worked as well as its depth quality from having done previous experiments with it. I wanted to explore a new much smaller (runs on USB), depth sensor that uses a method known as infrared stereo-vision (which is inspired by our human vision system) to derive a depth map. It relies on two cameras and calculates depth by estimating disparities between matching key-points in the left and right images.

I knew the Realsense library had an open source SDK (https://github.com/IntelRealSense/librealsense), however it is written in C++ which means its not the easiest to get started with, to compile, and to document. But recently, they’ve released a NodeJS wrapper which I hope to use to make things easier for me. One of my goals is to figure out how to use the library but also see if I could make it easier to get started with a more familiar drawing library that we know or use.

 

Process

Hour 1 – 4: Getting Intel RealSense Set Up, Downloading and Installing RealSense Viewer, Installing Node-LibRealSense library bindings and C++ LibRealSense SDK, Playing Around With Different Configuration Settings in Viewer

Hour 5: Opening and Running Node Example Code, I see a somewhat complicated example of being able to send RealSense data through websockets, seems promising and I want to try to build my own.

Hour 6: Looking at different frontend libraries or frameworks (React, Electron) before deciding to just plunge in and write some code.

https://gist.github.com/polarizing/4463aacc88c58a9878cb180ad838c777

I’m able to open a context, look through available devices and sensors, get a specific sensor based on the name either “Stereo Module” or “RGB Camera”. Then I can get all the stream profiles for that sensor (there are a lot of profiles, depending on fps, resolution, and type — infrared or depth), but the most basic one that I want is a depth stream of resolution 1280*720 and at 30fps.

I can open a connection to the sensor by .open() which opens the subdevice for exclusive access.

Hour 7: Lots of progress! I canstart capturing frames by calling .start() and providing a callback function. A DepthFrame object is passed to this callback every frame which consists of the depth data, a timestamp, and a frame count number. I can then use the Colorizer class that comes with the Node RealSense library to visualize the depth data by tranforming the data into RGB8 format. This has a problem though, as the depth data is 1280*720 = 921600. However, this data is then stored as RGB8 which is 921600 * 3 or 2764800 or 2.76 MB. At 30 frames per second, this would be equivalent to nearly 83MB of data / second! Probably way too much for streaming anything between applications. We can compress this using a fast image compression library called Sharp. We can get quite good results which this. Setting our image quality to 10, we get 23kb per frame or 690 kb / s. Setting our image quality to 25, gets us 49kb per frame or 1.5MB a second (which is quite reasonable). Even at image quality 50, which is 76kb per frame, we can average 2.2MB / s. From this, I estimate it is quite reasonable to stream the depth data within local applications and has potential to even stream over the Internet. I might try that next.

https://gist.github.com/polarizing/12aa720526c91596883da541e584e018

Current Backend Diagram

[Insert Image]

Hour 8 – 9: More progress. I got stuck in a few spots that were sticky situations but ended up getting through it and now we have a quick and dirty working implementation of RealSense through WebSockets. I connected it through using the WebSockets Node library (ws).

Challenges Here + Blob

Hour 8 (backend w/ websockets): https://gist.github.com/polarizing/be9873a9a07d5df2155e7df436dc282d

Hour 9 (connecting w/ p5.js): https://editor.p5js.org/polarizing/sketches/Hyoci5V37

The video below show the color depth image being processed directly in p5.js using a very simple color matching algorithm (showing blue pixels) which gives us a rudimentary depth thresholding technique (this would be much easier to do directly in Node since we have the exact depth values, which we will try later). The second video shows another rudimentary technique of averaging the blue color pixels to get the center mass of the person in the video. Of course, this is all made much easier since we have the depth color image from RealSense. This would not be very possible, at least in a frontend library like p5.js, without sending depth camera information across the network since most computer vision libraries still exist only for backend. This introduces some new possibilities for creative projects using depth camera + frontend creative coding libraries, especially since the javascript is so adept at real-time networking and interfacing.

Before moving on to play with more examples of depth, specifically, point clouds, depth thresholding, and integrating openCV to do more fun stuff, and figuring out the best way to interface with this, I want to see if I can get the depth data sent through to Processing as well.

Hour 10 (Processing): I spent this hour researching what ways we can send blob (binary large object blobs) over network and settled on Websockets or OSC and if Java can actually decipher blob objects. I decided to move on instead of keep working on this part.

Hour 11 and Hour 12

I was met with a few challenges. One of the main challenges was struggling with async vs syncronous frame-polling. I did not know that the NodeJS wrapper had two different types of calls to poll for frames. One of which was an synchronous thread-blocking version — pipeline.waitForFrames() and the other which was a asynchronous — pipeline.pollForFrames(). In any case, the second async version is what we want but we would need to implement an event loop timer (setInterval or preferably something better like a draw() function) that can call pollForFrames every 30 seconds.

Hour 13 and Hour 14

Streaming raw depth data, receiving in p5 and processing image texture as points to render point cloud, lots of problems here

I wanted to stream the raw depth data because as of now, I have color depth data but having to do post-processing on it is a pain. I wanted to send the depth data as an image with depth values encoded between 0 – 255 as a grayscale image or as raw values which we can convert with a depth scale. This would be similar to how the Kinect operates.

I thought it would be pretty simple, just get the depth scale of the sensor and multiply by each raw depth data at each pixel and write back to the data buffer.

However, I was stuck for quite a long time because I was sending nonsensical values over websockets which resulted in a very weird visualization in p5. I wish I took a video of this. Anyways, I believe I was doing something wrong with the depth scale and my understanding of how the raw values worked. I decided to go to sleep and think about it the next day.

Hour 15 and 16

When I woke up, I realized something simple that I overlooked. I realized that I did not need to convert the raw values as I remembered the Realsense Viewer had an option to view the depth data as a white-to-black color scheme. I realized I could toggle the color scheme configuration when I was calling the colorizer() function to convert the depth map to an RGB image.

If I did: this.colorizer.setOption(rs.option.OPTION_COLOR_SCHEME, 2)

I could set the color-scheme to a already-mapped 0-255 grayscale image. Then, it would be the same process as sending the color image over. The simplicity of this approach was unbelievable as right after this realization and implementation, I tried it, and the results were immediate. I was able to send the depth image over p5 and I could p5 to sample the depth image to render a point cloud (see below video)

Hour 17 and 18

I was able to link multiple computers to receive the depth information being broadcast from the central server (laptop). I also took a demo video in a taxi as I realized that I was not limited in using the Realsense since it was portable and powered off USB. I simply ran the p5 sketch and the local server and it works!

Taxi Test Video:

IMG_9288-1

Hour 19 and 20

I used the remaining time to work on a quick presentation on my findings and work.

https://docs.google.com/presentation/d/1sAfK0ugYRg4xGDMlughxa2bBDK94lgAUkBgh6LAyLNI/edit?usp=sharing

Conclusions and Future Work

After doing this exploration, I think there is potential in further developing this into a more fully-fledged “Web Realsense” library similar to how the Kinectron (https://github.com/kinectron/kinectron) works for the Kinect.

Midterm Project – A Galaxy of Code; De Angelis (Moon)

A Galaxy of Code 

For our midterm project, Olesia Ermilova and I tried to recreate a galaxy on processing. Our project was mainly inspired by one of Olesia’s animation projects, in which she designed a space adventure made up of black and white planets and stars. Below you can see a video of her original project.

Thereafter, we decided to make a similar version of this project, yet with high levels of interaction. Our initial idea for interactivity involved face recognition. We wanted our code to detect the user’s face, and according to the position of the user on the screen, our galaxy would rotate accordingly and follow the user through her or his movement.

The interaction was inspired by a commercial game called Loner. The main idea of the game is to use your body to navigate through the game’s interface. Here is a video, for a more thorough understanding.

In short, our main idea was to combine the interaction of Loner with Olesia’s galaxy project. In this way, we would improve the visuals of an already successful game, by adding a more complex and aesthetic setting to the world through which the user has to navigate. Our game would bring tranquility to the user, with the easy interaction and galaxy setting. In part, also inspired by our childhood, when our mothers would stick glow-in-the-dark stars on our ceiling.

Unfortunately, our interaction did not go as planned. As I mentioned before, in order to create an in-depth feel to our visuals, we needed to change our render from 2D to 3D. This would allow us to not only toy with the x and y-scale of our code but with the z location of our objects as well. The problem lied around the fact that 3D graphics are not compatible with processing’s Camera library, hence interaction through face recognition was not an option.

Here is a video of our final project, in which you can see the final visuals and how we used the z-scale to create a feeling of depth in our galaxy.

To create our final project, we toyed around with several of the processing skills we learned throughout the semester so far. For starters, we created a class for each object, which helped us organize our code.

In said classes, we defined the functions we would want each object to be subject to. For instance, we had our restart () function in all sections, which allowed for all objects to be recreated once they were offScreen. Meaning, as soon as the objects reached their maximum z-value, they would once again appear at the back of our galaxy, constantly recreating themselves.

We played around with the planet,s stars, and moons, speed, in order to move the items around instead of having the stay intact in one place.

Our planet function, for instance, was as follows:

Planet (float _x, float _y, float _z) {
x = _x;
y = _y;
z = _z;
rad1 = random (15, 25);
rad2 = random (40, 50);
rad3 = random (10, 15);
velX = 0;
velY = 0;
velZ = random(0.5, 1.0);

}

and the characteristics of each planet were the following:

void display(float offsetX, float offsetY) {
pushStyle();
pushMatrix();

translate (x + offsetX , y + offsetY, z);
fill (255);
stroke(0);
strokeWeight(5);
fill(255);
stroke(255);
strokeWeight(5);
ellipse(0, 0, rad1*2.5, rad1*2.5);
noFill();
ellipse(0, 0, rad2*2, rad3*2);

popMatrix();
popStyle();
}

In order to move the planets around the screen, we created our void fast() function:

void fast() {
x += velX * 10;
y += velY * 10;
z += velZ * 10;

}

void restart() {

if (z > 1000)
z = -1000;
}

}

and finally, our restart which was previously explained.

The moon and stars classes had similar attributes, find our complete code here.

Now on to interaction. For our interaction aspect, we decided to switch from Face Recognition to Leap Motion. We used PRocessing’s leap motion library to call back on finger positions, thus switching our planet’s x and y locations to each finger’s location. Therefore, a planet would be created according to each finger’s position on the screen.

According to the feedback we received from the panel, this interaction could be significantly improved. They did not appreciate the fact that the user would always have 5 planets, given it took away from the galaxy’s reality. Five planets was an awkward number of planets to have, and the fact that they all remained together, reduced the realistic feeling of our setting.

We also included hand grab gestures into our code. Whenever the user would close their fists, the velocity of all our objects would multiply by 10, giving the user the feeling that they were navigating through space and could alter the navigation to their preference.

In the future, we would like to modify our project and turn it into a more immersive experience. This meaning, we would like to discard the Leap Motion functions and use software that would receive input from a user’s entire body, such as the Microsoft Kinect. Perhaps we will develop this idea further for our final project.

Kinetic Interfaces – Midterm Project (Francie)

Among the assignments in the first half of the semester, I am interested in exploring the webcam and playing with pixel manipulation. Inspired by the built-in widget in my Macbook, I came up with the idea of combining pixel manipulation with the tile game. Basically, a flat image is evenly divided into 16 tiles, and one of them is replaced by an empty position. Each time the tiles next to the empty space can be moved into it, and we can organize the disordered tiles in this way. In my project, I want to replace the still image with a real-time video shot by the webcam.

 

I started with capturing the video images from webcam and tried to divide the pixels into tiles. Later it turned out that this step required a large amount of mathematic calculation. Let’s use the top left tile as an example. First of all, I need to give a range for both x and y values, and here the coordinates are limited to 0 < x < width/4 and 0 < y < height/4. Then I take out the pixels from the original webcam images using the function index = x + y * width. Now I have picked out the index of the pixels that belong to the first tile of the total sixteen. Next, I am going to display these pixels in a new canvas with a 1/4 width and 1/4 height. By transforming the function into _index = x + y * width/4, I can cut out the 1/16 images of the video and put the pixels into a container called img11.

After I figure out the first tile, I then derive the formulas of other parts from it. The values of x and y in the remaining need to restore by deducting their displacements from the top left. Here are how to get the coordinates of the second horizontal line. The y values in the formula become y – height/4, while the x values are individually adjusted to x – width/4, x – width/2, and x – width*3/4 according to their positions.

Another challenge in this project is to switch the positions of the tiles. I fill in the bottom right tile with white color instead of updating the pixels from the video so that it looks like an empty space. To simplify the interaction, I use the keyboard to control the movement. The “switch case” function helps me to achieve the separate instructions while pressing different direction keys. However, I was stuck in this step for a long time because the printed coordinates were correct, but the images did not change at all. Here I must credit to my friend Joe who helped me figure out the solution. On the one side, it is significant to pay attention to the sequence of codes. The values assigned to the variables were changing, and I must be clear about what the variables in each line meant.

On the other side, the declaration of array also confuses me a lot. My friend used the below method to finally make the tiles movable, but I still do not understand why it is necessary to do the extra assignment.

And there is one more mysterious operation. In the beginning I created the images line by line, and it looked a little wordy. So I used a for-loop to generate the images in order to make the codes clearer. But somehow it did not work anymore. I feel that the two ways are the same, but actually only the complicated one is valid. Why is it problematic to use for-loop here?

The demo of the tile game comes below:

I really love this project and I want to add more creativity into this classic game. Currently the tiles do not appear randomly and I have to disorder them manually before I start to play. So it would be better if I could randomize the tiles. I also plan to improve the interaction by using leap motion or kinetic rather than keyboards. Moreover, I am willing to add 3D perspectives and projection mapping for a more attractive user experience.

Kinetic Interfaces (Midterm): A Band – Sherry

Partner: Peter

Ideation:  Peter and I agreed on doing an instrument simulation using Leap Motion after a brief discussion, but later I found this fun project in Chrome music lab and came up with the idea of a choir conductor with Leap Motion. When meeting with Moon, he suggested that we combine two ideas together with the help of oscP5 library, so we ended up creating a band in which a conductor controls several musical instruments, each conductor/instrument running on a processing sketch with a Leap Motion.

Implementation: Due to time constraints, we only had one kind of instrument — guitar communicate with the conductor. Peter was in charge of the implementation of conductor, and I worked on the guitar. We both did some research and got a basic understanding of how oscP5 works.

I got the inspiration from Garageband and tried to draw a horizontal guitar interface as shown in the picture above, but during testing I found that the distance between strings were too close and the accuracy was very low. It was difficult to pluck certain string(s). Therefore I decided to switch to a vertical interface:

Since Leap Motion has a wide detection range (is more sensitive) on x-axis, the accuracy becomes higher and user experience is better. Strings have different stroke weights to imitate the real guitar.

Above was my first version of string triggering code. However, under this circumstance, messages will continuously be printed out (and sound file will be played again and again) if my finger stays in the range, which wouldn’t happen if we play a real guitar. To solve this problem, I changed the algorithm of determining string plucking.

The main idea is to figure out which strings are in between previous and current finger positions and play corresponding sounds in order. I denote six strings as index 0 to 5, then floats “start” and “end” are the corresponding indices of previous and current finger positions. If start is greater than end, it means the user swiped to the left, and strings in between are triggered from right to left, and vice versa. With this algorithm, holding the finger on a string doesn’t trigger the sound, making the experience more realistic.

I introduced z-index of Leap Motion to simulate the effect of user’s finger on/above the string. When z is less than the threshold (user’s finger moving closer to his/her body), the dot that shows the position of user’s index finger is white, indicating it’s above the string and won’t trigger any sound as it moves. When z is greater that the threshold, the dot becomes red and sounds will be triggered.

String vibration effect and sine waves were added to enhance the visual experience.

Video demo of the leap motion guitar:

 

Then I started to study oscP5 library. Thanks to “oscP5sendreceive” example by andreas schlegel and noguchi’s example, I created a very basic data communication program. mouseX and mouseY in osc1 (the left sketch) were sent to osc2 (the right sketch) and used to draw an ellipse on canvas.

Later I met Peter and he used oscP5tcp example for the conductor, so we decided to use tcp for both. Initially we planed to pass three parameters: volume modifier, frequency modifier and mute boolean value, but we met two problems. Because of the limitation of minim library, we couldn’t change the volume and frequency of a sound file. After several trials we managed to modify the volume using setGain() instead of setVolume(), but unfortunately we could do nothing about the frequency.

Final demo:

Guitar:
Index finger: swipes horizontally above the Leap Motion to pluck the strings
moves back towards body to move away from the strings
Dot on the screen: red – available, will trigger sounds if moving across strings
white – unavailable, either muted by conductor or too close to the body

Conductor:
Hand: moves up and down in certain instrument’s part of screen to increase/decrease the volume
grab to mute certain instrument

Feedback: Professor Chen brought up the “why” question, which I think is quite important and deserves further reflection. I agree with her that the idea that the conductor can actually have control over other users is great, and I can’t really answer the question why we need to have a leap motion simulating real instrument while the experience of playing physical instruments is good enough. I’m thinking of keeping the tech part but wrapping it with a different idea that is more interesting or more meaningful (though I have no idea for now).

Kinetic Interfaces: Midterm – Collaborative Musical System (Peter)

Project Idea

The idea of this project is to create a collaborative music system with kinetic interfaces. The system consists of a conductor, several instruments, and leap motions as inputs, which are connected over the internet. To be more specific, the conductor monitors the music data from the instrument and controls the musical features of the instruments. For each musical instrument, the player controls it on a digital instrumental interface with a leap motion. In general, the project aims at creating a system by which users can create music cooperatively and explores how collaborative computing can do for musical or even artistic creation.

Network

The conductor and the virtual instruments are connected over the internet. To achieve this, we utilized a processing library called oscP5. By experimenting multiple example connection methods provided by the library, we decided that the oscP5’s TCP connection is the most suitable for this project. Using the TCP connection methods, we are able to make the server and the clients to respond in real time. More specifically, the clients send pings repeatedly to the server (every 0.5 seconds), and the server acknowledges the pings and responses with the data of musical features to the clients. Meanwhile, this way of connection also allows the system to identify the existence of the clients. If a node stopped to ping the server for a certain long enough time, the server would delete it from its list, and modify the interface accordingly as well.

Instrument (Client)

The instruments have two tasks. First, it provides the players with a digital interface of the instruments by which the players can play the instruments with a leap motion. Second, the players send pings to the server. The pings include the information of the name of the instruments such that the server can identify whether this is a new instrument that just came online, or it is an old one. This means that as long as there is a server online, the clients are free to come and go, thus enhancing the scalability of the system.

Conductor (Server)

For now, the conductor has three functions. Firstly, it controls the instruments that are connected to them by sending them data of musical features. Currently, the musical features only contain the volume, but we are going to explore more on additional features that make sense to play with. Secondly, the conductor is aware of the size and the identity of the whole network. This has two advantages. First, we will be able to design each difference instrument’s feature and send them to the correct instrument. Second, the conductor’s interface can thus be dynamic. For instance, when there are two instruments connected, the interface will be split into two, and more there are more instruments connected. And if one or more instrument leaves the system, the interface can be changed accordingly.

Improvements

This project is meant to be developed for the whole semester, and this midterm project is the first phase of it. For the rest of the semester, we are expecting the following improvements:

  1.    Create additional interfaces for additional musical instruments.
  2. Explore the musical features that make sense to alter into more detail.
  3. Consider using Microsoft Kinect as the user input device for the conductor.
  4. Improve and beautify the interfaces for both the instruments and the conductor.
  5. We might try to extend the system out of LAN to achieve better flexibility and scalability.

Video Demo

Week 8: Midterm

Proposal: For my midterm project, I wanted to explore several of the themes in Sophocles’ tragedy, Philoctetes. Philoctetes is a play that tells the story of Philoctetes, a famed Greek archer en route to fight in the Trojan War. On the way to Troy, Philoctetes is bitten by a cursed snake. The bite can never heal, and Philoctetes’ leg wound puts him in unbearable agony. He is is constant pain, screaming and crying out; Odysseus and the other soldiers can’t take the sound of his pain, and so they abandon Philoctetes on the island of Lemnos. Philoctetes’ exile lasts ten years, in which he is cast out from society and no one comes to his aid. When he is eventually “rescued”, Philoctetes realizes that many of his friends have passed away. Central to this legend is the relationship between society and individual pain; society only has tolerance for a certain level of expressed emotional pain, and does not accommodate anything more than that.

I have done Philoctetes-related projects in several of my other classes; I wanted to build off of my interactive chair project in Exhibition: Next for Kinetic Interfaces. In Exhibition: Next, I placed a chair in a dark room with a pair of headphones, and had everyone sitting in the chair put on the headphones. They would then hear an audio recording of someone sobbing. It was really interesting to watch peoples’ reactions to the project – some sat and listened for several minutes, others ripped the headphones off immediately. I wanted to explore the discomfort of witnessing someone else’s pain. The idea I had in mind involved using facial recognition to trigger audio of someone crying, and change an image onscreen.

Documentation: While I originally tried to use FaceOSC for this project, I switched to OpenCV and used the Minim library to activate this audio clip whenever a face was detected on the webcam, and pause it whenever someone looked away. I also sketched the following two images in Krita; somewhat counter-intuitively, it was the first image that was displayed when the audio played, and the second that displayed when it didn’t (one would expect someone to be crying into their arms, and not making eye contact, when sobbing.)

I wanted to try several different things for this project; for example, I wanted to experiment with soundGain so that when multiple faces were detective the volume of the sobbing would seem to be louder. But I did not have time to implement this function of interactivity. I received a lot of useful feedback during presentations, however; people suggested I film a video of someone crying instead, that I change the size of the display, and that I consider how I would install this prototype as a larger, more complete project.

Installation: I think the key for an installation like this would be putting it in a place where people do not expect it, and feel a jump of discomfort and uncertainty when they trigger the interaction; seeing someone cry openly on the subway, for example, one is not sure where to look.

While I am not sure how feasible it is in this class, I thought that the overlap between facial recognition and projection mapping would be a really interesting way to simulate a human interaction and emphasize a sense of discomfort. Could I do projection map on some sort of vaguely human-like mannequin? I could set up the installation a variety of different ways, so that someone can stumble across this human-like mannequin and, when they look at it, the mannequin will start to cry; when they look away, it stops. I think this would be a really interesting scenario to play with discomfort, but it might also shift the meaning/emphasis of my piece in another direction.

Another idea would be projection mapping a silhouette similar to the midterm on the walls of a small room. Once someone walks into the room, they will have the option of looking at several other pieces; the moment they notice the silhouette, though, and the facial recognition is recognized, the silhouette will start crying and trying to make eye contact. If the other person looks away and tries to ignore the silhouette, it will move someone closer in proximity to the person visiting the room, so that they have fewer options to avoid looking at someone in pain.

Kinetic Interfaces – Midterm (Moon) Andrew Huang

Author: Andrew Huang

Project name: Underfire! A bullet hell game

Professor: Moon

Date: Nov 3 2018

Goal: To Create a Upper body immersive bullet hell / dodging game.

Description:

After playing some Undertale and bullet hell games over the last couple of weeks, I was inspired to create a more kinetic and interactive version of the game. I used OpenCV to track the players face and used their X and Y positions in order to control the main character.  The player has to survive by dodging all of the homing fireballs that would come near the user. A camera feed at the bottom of the screen would let the user know their webcam is working and where it is relative to the game. When the user runs out of hp (they also periodically get an hp bump), the game displays GAME OVER, and the users score is displayed ( the number of total balls that the user has been hit by). The main character is a frisk sprite from Undertale and and the fireball is a fireball sprite from google.

 

Problems:

In the beginning, the fireballs would track very poorly because I was adjusting the xy position using linear velocity adjustments rather than using ease. By adding a dampening function, I was easily able to make sure that the velocity of the fireballs slowed down as they approached the player.

Even though some people complained that the bottom of my game had a camera picture and ruined the immersion of the game, I reasoned that it was better for the user to know where their face was relative to the game screen than the for the user to not know anything about the current state of the game.

Another problem I have is that the refresh rate of OpenCV is very poor and that the tracking isn’t good no matter what kind of program I run on my machine. This could be a limit on my machine, or is an inherent problem with OpenCV, further research would need to be conducted in order to address this.

Feedback & Future Development

Perhaps I can add better tracking / cameras. Some types of portable cameras can be integrated with OpenCV and a higher resolution experience using full body and limb tracking would result in a better playing experience. The user could probably use their hands to activate some sort of power-up which would either disperse all of the balls or they can use all of the balls as hp (similar to pacman). I was also thinking about adding a bezier track for a type of tracking that would be more realistic and fun to dodge than just using easing. I could also add a menu and better assets to polish the gameplay experience. Overall, I could see this project have promise, and would consider working on it more for the final.

// ANDREW
// KINETIC INTERFACES MIDTERM
// make tracking easier and add endgame
import gab.opencv.*;
import processing.video.*;
import java.awt.*;

Capture video;
OpenCV opencv;

ArrayList<Ball> balls = new ArrayList<Ball>();
PImage img, flameimg;
Health hp;
void setup() {
  size(500, 500
  );
  background(255);
  noStroke();
  smooth();
  noStroke();
  hp = new Health();
  fill(0);
  rect(mouseX, mouseY, 50, 7);
  img = loadImage("frisk.png");
  flameimg = loadImage("flame.png");

  video = new Capture(this, 640/4, 480/4);
  opencv = new OpenCV(this, 640/4, 480/4);
  opencv.loadCascade(OpenCV.CASCADE_FRONTALFACE);  

  video.start();
}
float inputx = 0, inputy = 0, facex = 0, facey = 0;
void draw() {
  opencv.loadImage(video);
  Rectangle[] faces = opencv.detect();
  background(#354c7c);
  text(frameRate, 20, 20);
  text("Balls added: " + balls.size(), 20, 40); 
  rectMode(CENTER);
  if (faces.length > 0) {
    inputx = map(faces[0].x, 0, 640/4, 500, -120);
    inputy = map(faces[0].y, 0, 480/4, 50, 400);
    facex = faces[0].width;
    facey = faces[0].height;
  }
  pushMatrix();
  image(img, inputx - img.width/20, inputy - img.height/20, img.width/10, img.height/10);
  stroke(100);
  noFill();
 // println(inputx, inputy);
  rect( inputx, inputy, img.width/10, img.height/10);
  popMatrix();
  hp.display();
  for (int i=0; i<balls.size(); i++) {
    balls.get(i).track(inputx, inputy);
    balls.get(i).bounce();
    balls.get(i).display();
    if (balls.get(i).removeBall(i)) {
      hp.hit();
    }
    //balls.get(i).age++;
  }
  if (frameCount % 100 == 0) {
    float r = random(0, 1);
    if (r > 0.5) {
      balls.add(new Ball(random(0, width), 10));
    } else {
      balls.add(new Ball(random(0, width), height));
    }
  }
  if (frameCount % 500 == 0 && hp.health < 100) {
    hp.health = ( hp.health + 20 ) > 100 ? 100 : hp.health + 20;
    for (int i = 0; i < balls.size(); i++) {
      balls.get(i).removeBall(i);
    }
  }
  if (hp.getHealth() <= 0) {
    textSize(40);
    fill(255, 0, 0);
    text("GAME OVER", width/3, height/2);
    text("SCORE: "  + hp.getHits(), width/3, height/2 + 50);
    noLoop();
    //delay(9999999);
  }
  pushMatrix();
  image(video, width/2 - 640/8, height-125);
  popMatrix();
}
void captureEvent(Capture c) {
  c.read();
}

void mouseClicked() {
  float r = random(0, 1);
  if (r > 0.5) {
    balls.add(new Ball(random(0, width), 10));
  } else {
    balls.add(new Ball(random(0, width), height));
  }
}

class Ball {
  float x, y, size;
  color clr;
  float xspeed, yspeed;
  float age = 0;

  Ball(float tempX, float tempY) {
    x = tempX;
    y = tempY;
    size = 20;
    clr = color(random(255), random(255), random(255));

    xspeed = random(-5, 5);
    yspeed = random(3, 5);
  }

  void display() {
    //fill(clr);
    tint(clr);
    image(flameimg, x,y,size*2,size*2);
    noTint();
    //ellipse(x, y, size, size);
  }

  void move() {
    x += xspeed;
    y += yspeed;
  }
  
  void track(float xx, float yy){
    float easing = 0.01;
     x += (xx - x) * easing;
     y += (yy - y) * easing;
  }

  void bounce() {

    if ((x>mouseX-25) && (x<mouseX+25) && (y > mouseY-3.5)) {
      yspeed = -yspeed;
    } 


    if (y < 0) {
      yspeed = -yspeed;
    } 
    if (x < 0) {
      xspeed = -xspeed;
    } else if (x > width) {
      xspeed = -xspeed;
    }
  }

  boolean removeBall(int i) { // hitdetection
    if (dist(inputx, inputy, x, y) < 50 || age > 400) {
      balls.remove(i);
      return true;
    }
    return false;
  }
}
class Health {
  float health = 100;
  float MAX_HEALTH = 100;
  float rectWidth = 200;
  float hits;
  
  Health(){
    return;
  }
  void display(){
    if (health < 25){
    fill(255, 0, 0);
    }  else if (health < 50) {
    fill(255, 200, 0);
    } else {
    fill(0, 255, 0);
    }
  
    noStroke();
    // Get fraction 0->1 and multiply it by width of bar
    float drawWidth = (health / MAX_HEALTH) * rectWidth;
    rectMode(CORNER);
    rect(200, 10, drawWidth, 10); 
    
    // Outline
    stroke(255);
    noFill();
    rect(200, 10, rectWidth, 10);
    stroke(0);
  }
  float getHealth(){
    return health;
  }
  float getHits(){
    return hits;
  }
  void hit(){
    health-= 5;
    hits++;
  }
  
}

Week 6: Interaction

For this assignment, I wanted to test out the mechanism for my midterm project by using faceOSC. I wanted the presence of a face in the webcam to trigger a sound effect – in this case, crying. Originally, I tried to use the sample faceOSC / sound example that Moon provided but he warned me that the sound library would be pretty buggy. Instead, I ended up working with the Minim sample that he added to our class resources.

I used this assignment to learn the process for adding audio from the Minim library. I had trouble with the sound.trigger() option because it sounded very abrupt and jarring; I ended up playing with several different terms until I was finally able to make the sound sample play when someone was looking at the screen and pause when they looked away.

Week 2: Transformation

In this assignment, I wanted to make a galaxy scene; I started off wondering how I could use the pushMatrix function to create moving planets? Since I was only just figuring things out, I decided to make the planets out of two ellipses: one, a stationary circle, would be the planet’s body, and one, a long rotating ellipse, would be its ring. That way, however, it would look like the whole planet was rotating.

I wasn’t sure what to do for the interaction, so I decided to try and make a star using the mousePressed function and the example on the Processing website. I used the middle example and messed with the numbers slightly – to be honest, I have no idea how I made the shape that I made, I only played around with different numbers. I really enjoy the way it turned out (it looks like the rays of a sun) but when you move the mouse to the top righthand corner the sun freezes and the sketch stops working.

Week 4: Pixels

For this assignment, I was inspired by both the webcam pixels characters sample code we used in class, and the pain scale used in hospital.

Initially I wanted to make it so that the webcam image would generate these emoticons instead of the characters in the sample code (“ ”, “,”, “&”, etc.). I tried introducing PImage to the sample code, and seeing if I could load any images. I realized this wasn’t going to be very effective, and so I wondered if I could do something similar by working with the webcam pixel grid sample code and matching certain camera color values with the six colors above (green, yellow-green, yellow, orange, red-orange, red). I finally figured out how to do it! I think I need to tweak some of the color values more to see the different effects, but I am excited I figured out how to adjust individual pixel values instead of a filter.