Hearing for robots and AI

Teaching computers to hear like humans do.

Along with vision, hearing is one of the most beloved and useful sensory experiences we have but what would we need to do to grant this sense to a computer or Ai program. While we already have plenty of ways to interact with sound on computers I am more interested in the natural way we perceive sound and how that can be written as a software program, it’s just a fun topic.

Sound & Hearing

Sound in its simplest form is what happens when energy travels through a physical medium like air and water creating acoustic vibrations we later perceive as sound.

Sound is usually described in wave notation which is useful to better understand and communicate concepts, let’s examine a simple beep sound and how it is represented in notation:

Going back to how we perceive sound, after entering your ear sound waves get amplified and eventually get picked up by hair cells inside the cochlea ( at the organ of corti/basilar membrane), these sound receptors are organized by frequency in an orderly way, a tonotopic map which we will try to eventually emulate.

What the hertz ?

All this is a bit vague without some real examples we can hear and code we can play with, let’s start with a simple tone:

200 Hz 1 second sine wave.

And here’s the python code to make it:

As mentioned we can only hear sound in a certain range, for instance here we have a number of tones across a number of frequencies that double in the number of cycles:

The first and last frequencies are really hard to hear even if you have good hearing:

The following example shows a common musical scale that still uses our simple sine wave, we’ll mention complex sounds later but for now just notice that these sounds sit within the preferred zone of hair cells:

What about volume ?

Hair cells are sensitive to specific frequencies, you can hopefully see now how they map in the cochlea/basilar membrane. When they are active they respond with action potentials which can be coded* as 0’s and 1’s… volume is believed to be coded* predominantly as the firing rate of these cells:

Simplified hair cell response.

Complex sounds

Very few sounds we encounter are as simple as a sine wave tone, complex sound can be understood as a combination of individual frequencies, think about the word hello :

You might be used to seeing these sounds represented as waveforms :

Generated with Audacity

A waveform does not tell the whole picture, it just shows you how loud some sound is over time and not what frequencies are in that sound, for that we need something called a spectrogram :

Perception of complex sounds in the brain

We now have all the elements necessary to tell the story of how we perceive complex sounds:

Where do we perceive sound ?

Have you had a catchy tune “playing” in your mind or conversations with yourself (aka inner or covert speech) ? Both perfectly normal and also somehow common not to have.

Well, in essence the question here is what/where is producing these sounds and a similar question is why do we hear at all ? This last one is related to the even thornier concept of “qualia”, but can also be demystified by a course in the cognitive neuroscience of language.

Unfortunately we don’t have all the answers but we do know that without these early stages of hearing both are severely impaired or absent all together.

Another clue comes in the form of the tonotopic map we have been studying, after leaving the ear, the sound information gets processed in different parts of the brain while preserving the same organization, perhaps it is here.

And the mixed clues go on, information from the brain and other cortex areas goes into the ear and other layers as much as it goes from the ear to them, so perhaps we replay sounds by re activating again hair cells in the ear or some related network somewhere else.

For now we will leave the biology, unknowns and higher order functions like speech perception and language so we can focus on the basics and how to translate them to digital counterparts, which I will hopefully cover in a future post.

I hope this has helped you with some sound basics and hope to see you next time.

Thanks for reading !

References & further reading:

Kandel, Eric R., and Sarah Mack. Principles of Neural Science. McGraw-Hill Medical, 2014.

Purves, Dale, et al. Neuroscience. Sinauer Associates, 2018.

Kemmerer, David L. Cognitive Neuroscience of Language. Psychology Press, 2015.

AI, Software Developer, Designer : www.k3no.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store