Jarvis Part 2

20 Dec

I’m linking this back to a post I made a few months ago:  Jarvis from Iron Man It was a popular post apparently.  So far it has 38 comments from people wanting a similar system.  Unfortunately I haven’t been able to do anything with it, but the people in the comments section seem to have a pretty good grasp on things.  “PhysicistJedi” posted a link to a deviant art screencap,  Which reminded me of some applescripts that could be used to get the data using xml and rss feeds (I think… will have to dig up those links).  One of the things I was mostly interested in the whole Jarvis thing was the speech recognition.  Now, I’ve made applescripts to run speech recognition things before.  On macs there’s speech recognition built in, and after making a script for some of my favorite itunes playlists, I can say outloud “Computer play five star” and my computer will start playing my five star music.  This is problematic though because now that the computer has started playing the music it can’t hear me.  Herein lies the speech recognition problem.  For one of my many side projects (I have quite a few unfortunately) I want to try and fix that.

It may be a hardware limitation, but (theoretically at least) it should be possible to subtract the computer’s own audio from the input.  Going back to basics… sound is transmitted through the air in waves.  And just like in water they are susceptible to constructive and destructive interference.  So if you have two waves with a certain frequency, say 5 Hz, each travelling in opposite directions, and both of the same intensity… depending on how they are lined up (or in phase) with each other you get interference.  If the peaks for both of the waves overlap you get constructive interference, which causes the amplitude of the waves to double.  BUT if they are perfectly out-of-line (the peaks of one wave line up with the troughs of the other) you get DESTRUCTIVE interference, and in this particular case the waves cancel out completely.  If you were standing between the two (and could physically hear 5 Hz), if they were out-of-phase you wouldn’t hear anything.

Now, how does this apply to speech recognition?  The computer knows what it is outputting to the speakers, and most computers (mine at least) have a microphone.  If you could simultaneously take the output audio and subtract the input (mic) audio, given some calibration and applying the correct offset… anything that the mic picks up that the computer is not outputting should be heard clearly by the computer.  So… if (in the application I write) I tell my computer to play the five star playlist (it starts), but then I decide I want the volume turned down, or it to stop… I should be able to say just that, and it should happen.

In sound-cards I think this is referred to as full-duplex.  That’s when a sound card can record and output audio simultaneously.  So mine should be capable.

To-do:

  • Access the output audio (by frequency, unless there’s an easier way I don’t know about…yet… as an example display the waveform or frequency distribution)
  • Access the input audio (mic, same as above)
  • Make an audio input filter that takes these two and subtracts the difference
  • Apply different calibrations to make this work correctly (phase offset, amplification, etc)
  • Figure out that pesky speech recognition thing (hopefully plug into the pre-existing system).

Some of the other projects (related to this) that I’d eventually like to work on:

  • Home automation stuff using my macs
  • Multi-touch display (http://nuigroup.com/)
  • Artificial Intelligence to tie the above projects together
  • Making my own version of this:  NanoBrewMaster Home Brew Station (should allow me to track what ingredients, timings, temperatures, etc I use… tie this into the home automation stuff)

And a neat idea I thought up and might toy around with is doing something with LEDs… the idea being you could hang it on a wall in your bedroom somewhere… it’s computer controlled, and it would simulate a sunrise (might make getting up in the morning easier).  If I don’t… I could just put a light on a timer… haha.  The sunrise effect would be cool though.

4 Responses

  1. physicistjedi says:

    Thanks for the link. I can share the code, if you email me. Use my nick at gmail.

  2. Can Özbay says:

    Hey there, just read about your text. Theoretically you can subtract the audio. But practically you just cant. Because the main problem you’re going to encounter will be simply your voice itself. =D Let me clarify it.

    Basically when the audio signal is converted into air particles (basically sound waves) with your speakers. They will travel through the air. Depending to their frequencies, these waves are going to be completed in different places in your room.

    Like a 20Hz. wave requires ~17.3 meters to be completed in 25 celsius degrees. and a 20Khz. wave requires 1.73 centimeters to be completed. (in 25 celsius degrees) Which means : You can actually hear 20Khz. of audio 1,73 cm. away from your speakers. But you will need to go 17 meters away from your speakers to actually hear the 20Hz.

    Why you can’t subtract the internal audio , from the external audio :
    The microphone will be standing somewhere in the room. And it can’t simply get all the frequencies because of the reason I told you about. In a song, there are hell a lot of frequencies, mixed, and finalized in one audio file. And because of the differences of wavelengths of frequencies, phase differences, the microphone will not be hearing the same exact output thats theoretically coming out from your computer. (there are many more reasons for this, like objects reflecting sounds, souring frequencies, reflecting selective frequencies, the room resonance and etc. ) The microphone will get the music! yes… but the audio it’s getting, and the final frequencies of the music will be completely different from what should be outputted through your computer.

    Now, yes you can make the computer calculate the rooms’ acoustic parameters. We use a software named : ODEON Acoustics to design acoustical environments, to predict the acoustical result of the environment, and for the auralisation of the acoustics. { http://www.odeon.dk/ } With this software, we completely design the room / conference hall / stages / concert halls etc. and we place every single detail into the environment. Ex : If this environment is a cinema, we place the comfort chairs, speakers(transmitters), we input the technical details of the speakers, even what kind of cloth the seats use… (the materials of every object in that acoustical environment) Then, you set a receiver point. A coordinate in the room, or even outside the room. I mean a coordinate. (a microphone or a listener {you can add multiple receivers too} ) You give a *.wav file of the estimated audio, that’s going to be heard inside. And the program predicts, how it is going to be heard. Ex : you select 5A in the cinema, and give a *.wav file of the audio. Then you put on your headphones and listen from the ears of the seat 5A ;)

    This program is a gigantic program, and takes a little bit of time to calculate all the parameters. The most important thing is you have to design the environment, or you have to load a cad file (from AutoCad or stg.)..

    Why you can’t subtract the audio from the voice , because , you can’t get a calculation without the room parameters ;)
    These parameters are different for every single room on this planet.
    So you can’t make a universal calculator. You HAVE to input the room parameters.

    You can make something, that subtracts your voice out of the music. but your voice will probably get lost, or literally “blurred”…

    And Voice Recognition systems, calculate the wave and frequency differences of the words and the phrases. So they can only resolve your voice, if the frequencies are sharp, and understandably “unblurred”…

    The only way to do what you want is , to use a simple Bluetooth headset, and an iPhone.

    You’ll use the iPhone remote application to lower the volume of the music, and then give your commands with your bluetooth headset microphone ;)

    Hope I could clarify some of the main problems for you.
    If you need any help, feel free to e-mail me.

    cheater{dot}boss{at}gmail{dot}com ;)

  3. 5.1 speakers says:

    Good job guys. Keep the comments going.

  4. Andrew Pelt says:

    Great. Now i can say thank you

Leave a Reply