People of ACM - Raymond Kurzweil

May 3, 2016

The Kurzweil Machine was introduced to an enthusiastic public reception in 1976. What was your inspiration for developing a machine that could read text aloud to the blind? What was the greatest technical challenge on that project and how did you overcome it?

The most significant challenge was developing OCR that could recognize any type style and also deal with vagaries of print such as proportional spacing and touching and broken letters. The state of the art at the time was that OCR could recognize a single type front typed with fixed spacing. My team and I developed software that could recognize the abstract invariant features of letters. It was an example of pattern recognition, which has always been my primary interest. My thesis has been that human intelligence is primarily based on our ability to recognize patterns. This technology was a solution in search of a problem in that we did not have a good idea what it would be good for. Around 1974, I happened to sit next to a blind gentleman on a plane flight who explained to me that his blindness was not a limitation except for one thing, namely his inability to read ordinary print without a sighted assistant. This encounter inspired me to devote my Omni-font OCR to the blind reading problem. We needed two other technologies which also had not existed: text-to-speech synthesis (going from recognized text to spoken words) and flatbed scanning using the new CCD chips that had just been introduced. We developed these two new technologies and introduced the Kurzweil Reading Machine on January 13, 1976. We worked extensively with a team of eight blind engineers and scientists from the National Federation of the Blind in order to perfect the product.

What role did computer science play in your development of the first Kurzweil music synthesizer in 1983? You recently returned to Kurzweil Music Systems as a Chief Strategy Officer for the company. How are recent technological advances improving the quality of the sound that these music systems produce?

Although samplers existed in the early 1980s (musical instruments that could play back recorded sounds), these were not adequate to recreate the response of a piano. It was not possible to have a recorded sample for every loudness level of every note for the entire duration of a note so samplers would loop the last waveform as the note decayed. If you loop one waveform, all the overtones become perfect multiples of the fundamental frequency of the note. In a piano, however, the overtones are not perfect multiples of the fundamental frequency, they are what is called enharmonic and that gives the piano its unique sound. Moreover, if you hit a key harder, it is not just louder; the entire time-varying frequency contour changes. As a result, we modeled what a piano does to sound using signal processing and pattern processing algorithms. The Kurzweil 250, which we introduced in 1984, was recognized as the first instrument that could realistically recreate the sound and feel of a grand piano. The latest challenge we are dealing with at Kurzweil Music is to realistically model the resonances of all the strings when the pedal is down. All several hundred strings interact with each other in complex ways but we feel that advanced signal processing can realistically recreate this effect.

In your current role as a Director of Engineering at Google, you are working to create a system that can search and process information in a fundamentally deeper way than is now the case. You have said that language is the vehicle for developing machines that can understand the meaning of ideas and concepts. Why is language so central to this effort, and how might a computer process concepts such as “meaning” and “context”?

The world is inherently hierarchical in its structure. Evolution created the neocortex as a unique brain structure that could understand the hierarchical nature of reality. The neocortex emerged 200 million years ago with mammals. Two million years ago we expanded the neocortex when we became humanoids. Our large foreheads contain the frontal cortex which added to the neocortex. We used this additional neocortex to create more levels of hierarchy so that we could master more abstract knowledge. This was the enabling factor for our species to invent language (our first invention) and everything that followed including art, science, and technology. Language allows us to communicate a hierarchical idea from one brain to another so language is itself inherently hierarchical. In the hundreds of thousands of years that followed we have created a vast knowledge base of language that provides all of our thoughts and insights.

A recent mathematical breakthrough now allows us to develop deep neural nets with dozens of levels that can understand the hierarchical structures found in images and in particular in language. We are moving from search being based just on keywords to search being based on the actual meaning of language.

Your predictions on how technology will impact the future have been considered prescient by many and controversial by others. Recently, you predicted that, by the 2030s, shared, virtual environments will be immersive and actually feel “real.” Can you give us an example of what these environments might look like and what enabling technologies will make them possible?

There is a virtual and augmented reality revolution now getting started using external devices. By the 2030s we will be able to do this from within the nervous system. Nanorobots the size of blood cells will enter the brain through capillaries and provide the brain with signals as if they were coming from our real senses (eyes, ears, tactile sense, etc.). These nanorobots will be communicating wirelessly with each other and with the cloud and will provide an interactive virtual environment. In these virtual environments we will have a body which could be just like our body in real reality or could be different. A couple could exchange bodies for example in a virtual environment. Ultimately these virtual environments will be as realistic as real reality. Some will be recreations of earthly environments and some will be new fantastic ones limited only by our imagination. We will also have augmented reality so that you can be in a real living room but have a friend sitting there on the couch with you even if she is hundreds of miles away. In the 2030s, she may in fact be an artificial intelligence with no physical counterpart.


 

Raymond “Ray” Kurzweil is a computer scientist and inventor. He is also an author who has gained wide attention for his predictions about how advances in technology will shape the future. Kurzweil received ACM’s Grace Murray Hopper Award in 1978 for his invention of a computer-based machine that read pages aloud to the blind. The Kurzweil Machine, as it was called at the time, relied on Kurzweil’s invention of two component technologies, the charge coupled device (CCD)-Flatbed Scanner and a new text-to-speech synthesizer. The machine also relied heavily on Kurzweil’s major advances in optical character recognition (OCR). Having greatly enhanced the technology behind Omni-font OCR, Kurzweil developed a computer program that was capable of recognizing text written in any normal font.

In the 1970s, he became a household name with his invention of an electronic music synthesizer that closely imitates the sounds of traditional musical instruments. His many honors include being named as an ACM Fellow and receiving the National Medal of Technology.

Recently, Larry Page, CEO of Alphabet Inc., Google’s parent company, hired Kurzweil to serve as a Director of Engineering at Google to develop a system that understands language as well as humans do. It is expected that this project will result in groundbreaking new ways to manage information and search the web.