Milestone 2 is complete. We have a playable game of Simon says and a main character that interacts alright. The user can do a few actions of head nodding and spinning and even barking! There were many other technical tasks that were done as well. None of the work done in this milestone was visual, so this update will be a video of kids playing the game and text explanations of things created and experiences during development.
Ensure you turn on your speakers or plug in headphones. All interaction is done through voice and head movement. BTW, you are her dog; that is why you are so low to the ground and why she asks you to “bark” and calls you “doggie”.
Kids Hanging out with Ella Bella
I had a few kids try it at this stage of development. They had fun but there is still much room for improvement.
The game is, “Simon says”. Ella tells you what to do and you do it if she says, “Ella says”. She will try to trick you by leaving out “Ella says” every so often. Oh and “you” are not you. You are actually her loving dog.
The major sections of this blog update are:
- Playful Ella – the main character
- Doggie Actions – what a player can do
- Technical Topics – details about systems and libraries
Ella still looks the same but her personality and her awareness have changed quite a bit. She is still far from a Watson but hanging out with her is enjoyable. She will try to help you out and give you good feedback but she’s not afraid to tease you either.
At first I recorded my voice as placeholder but when I tested the game with other people they just couldn’t get past hearing my voice instead of a little girl. As a creator you may think this is not so important but when you hear she doesn’t sound like a little girl over and over, it suddenly becomes really important. So I recorded my daughter’s voice to be Ella Bella and it felt a lot better and more approachable than before. The audio is still placeholder though. Quality is pretty rough. I was able to clean up a bit with noise reduction in Audacity but still not close to production quality. I even used different microphones, which is very noticeable. Yet it still gets the job done for testing.
Ella is very talky and playful. She has to be a bit talky because she has no animation right now but besides this I still wanted her to be a little vocal. She does a lot of explaining and gives a lot of feedback during the game. She tries to encourage you some and says stuff like “oh no” and “fantastic”. Kids seemed to like the part where Ella says “I tricked you!” after you do an action where she didn’t say “Ella says”. Her personality is still basic but it is at a good early stage and is starting to have the right feel.
Ella’s awareness is also quite basic but good for this stage of the game. Ella knows when you are there (headset on) or are gone (headset off). She also can see where you are looking, notice head nods of yes or no and can “hear” you when you bark at her. I would like to add a lot more for her to respond to, and for these to lead into side conversations and activities with the user. These will be things to come in future milestones.
You are Ella’s dog. This means you don’t speak a human language and can’t do a lot of things that humans do. You can, however, chase your tail, bark and maybe even shake. That is the design. For implementation thus far we have two major modules, as described below.
We have a head motion detection module with some simple buffering. Right now the user can do three major movement actions.
Users can nod yes or no and Ella will notice. If you do something strange like nod yes and no she notices that as well and tells you that’s weird so do it again.
Originally Ella would ask you to “look right” or “look left” in the Simon Says game. This turned out to be not that fun so I yanked it from her commands but the functionality in this module still exists.
I replaced the “look” commands with “spin” commands and the game became much more fun. This required that I start to buffer the motions and come up with a simple way to detect spinning left and right. It works well and is not inertia based.
My early tests with just detecting head motion, looking left and right was not fun. I decided to pull in “barking” from two milestones out and just try to do it right now. And it worked! I’ll describe some basics below and note some of the challenges with using headphones vs being unplugged.
Under the hood
Reading the microphone is done using FMOD. We poll our microphone every frame and adjust our “playback” of that recorded audio to keep it recording and playing back at almost the same speed. Note that FMOD actually runs on another thread so we’re just pulling stuff that the other thread is reading for us. There are small variations so we are constantly adjusting “playback” speed. Why do we “playback” and why did I use quotes when I said, “playback”? Because we are playing the recorded sound inaudibly so that we can run a DSP (digital signal processor) on that audio. The DSP is simply an FFT (fast fourier transform) to read the signal into the frequency spectrum. Right now we have it set to only 128 bins. This works fine, all voices tested fall into 2 bins at this level of fidelity. During testing I increased up to 2048 bins and was able to tell a difference between my voice and each individual child’s voice just by looking at the frequencies. But that was not very useful to me yet so I cut it back down to 128 for now to save on processing. A “bin” here is just a frequency range. More bins means each bin holds a smaller range, which gives us better precision. We don’t need better precision right now though so we choose to save on processing. We do this every frame right now, that may change later.
The bark filtering is pretty simple but works pretty well on the 4 tested individuals (myself, boy 9yrs, girl 6yrs, boy 3yrs). Bark filtering is just a way to distinguish individual barks from the user. The implementation is just a double threshold and works simply like this:
When high threshold (volume) is passed, a “bark” is detected. No more barks can be detected. Once the volume drops below the low threshold we conclude that the bark is completed. The next bark must also pass the high threshold, like before. This simple system handles the ramp up and down of barking pretty well.
Headphones vs Unplugged
When users use headphones with a microphone the bark detection works great. Testing with simple Apple iPhone 6s headphones, we could even have a lot of chatter in the room with kids playing and it still would only detect the user’s voice, not all the background. This worked better than expected! I will be testing with another set of Turtle Beach headphones pretty soon, hopefully today.
When the headphones are unplugged I ran into a few issues, some expected, some not. I expected it to become harder to distinguish from user barks and background noise, that assumption was correct. It actually wasn’t bad when testing in this room but I imagine in a more active place, like a show floor, this won’t work well. Good thing there is a simple solution – use headphones.
Unexpectedly I ran into another issue when unplugged. The playback of audio is slower than expected. This results in the microphone picking up audio from the speakers that should have already passed. According to what our program understands, we get callbacks when sounds are “done” playing. It turns out that after we get that “done” callback we still have about 0.2 seconds of audio playing. It is actually a little more than that. So I had to add a delay to the bark detection. This was driving me crazy for awhile because I thought I messed up the FMOD callback system or something. It took a bit of exploration before I figured out the truth, that Android was lying to me!
A lot of work went into consolidating variables and functions in the main messaging system. The consolidating helped remove a lot of redundant code and helped to keep me from recreating bugs from the cut and paste of moving too quickly. The handling of flow and callbacks all became a lot easier too.
Action Tracking System
To perform long running activities with callbacks and the ability to cancel, I had to create an action tracking system. This is done at a shared level below my other modules and is generic enough to handle a lot of the callback/canceling type of stuff that is needed. Also, we made the decision to “outsource” the actual work to existing systems. All this means is that we reduce duplication of work by allowing the existing systems to batch process work, however they wish to do so. For example, we have a few long running commands we issue to the head tracking module: “detect nod yes” and “detect nod no”. The action system keeps track of status but does not actually do the detecting. It lets the head module do all its normal detecting and at the end a couple of flags are checked by the action system to then do callbacks. We get individual action tracking and we get batching efficiency. User spins, chained audio clip playback and user bark detection all use the same method.
The audio manager was a pain because of C function pointer limitations. Specifically because the FMOD callback have no way to associate our objects with the callbacks so we had to come up with ways for the callback to check globals and find the associated info needed. This is just normal C style library integration stuff. In fact, FMOD has a way to tie user data to sound clips I just didn’t see a way to do that with the channel for the callbacks. With my changes I was able to open the floodgates and let the system play lots of sounds, some chained together, and with lots of callbacks into our game systems. It works well now and I am happy.
I hooked up the controller to drive actions through buttons and to rotate the player’s head when not plugged into the GearVR. Oculus made this easy and available as “VrFrame.Input” for every frame update.
Oculus Library Mods
I debated about whether I should touch the Oculus libraries or not. In the end I decided, yes I would. It would be nice to just drop in new Oculus libs when they are available without doing much else but in order to do that I would have to create a bunch of wrappers around commonly used structures. After looking through the Oculus libraries, I decided they already offer most common structures I need and I don’t want to unnecessarily wrap everything. Plus we have good source control so it shouldn’t be too bad to reapply the changes after getting new libraries, which happens rarely anyway.
What did we add?
I wanted to do C++11 style for loops so I added “begin” and “end” to Hash and Array. This works well until…. until you delete something. Then the iterators break. Even brand new iterators will be broken. Maybe I am doing something wrong. Anyway, I resort to old school for loops for any Array or Hash that may have deletions. This iterator bug costed me a lot of time. I was not able to come up with a fix, only the workaround.
String Tokenizer. There is some token support in the libraries but it wasn’t quite useful in its form. So I extended String to allow for easier tokening. This was useful in chaining audio clips together by just using strings.
RemoveIf(cond, stuff). I added a “RemoveIf” which takes a function pointer and some “stuff”. I think I’ve just been spoiled with the functional programming type of thinking. Anyway, this works well and I like it. Should be just as efficient as normal for loop but I have not benchmarked yet. I am not using std::function here, just function pointer.
The game works. Kids were happy during testing. You can do stuff and you can play with Ella Bella. We added a lot of audio, a lot of interactivity, and the game has flow.
Next sprint we will be focused facial animation. The plan is to focus on collecting and authoring facial animation but we may shift into applying it to a model face if needed. Either way we will have a lot more pretty pictures to share next time.
Let me know if you would like to know more about any subject listed above. The journey is a little rough to go without a game engine but if your C/C++ skills are good and you understand basic gaming architecture then figuring out the quirks of Android and mobile are all that stand in the way and I can help you through that.
Subscribe below to receive an email when new blogs are posted.