Digital Assistants – Interview
Toward Computers that Can Analyze the Content of Text
Interview with Tom M. Mitchell
Dr. Tom M. Mitchell, 56, is Professor and Head of the Machine Learning Department at Carnegie Mellon University in Pittsburgh, Pennsylvania. His research interests are generally in machine learning, artificial intelligence, and cognitive neuroscience. Mitchell is author of the widely used textbook "Machine Learning," past president of the American Association of Artificial Intelligence (AAAI), and a recent member of the U.S. National Research Council’s Computer Science and Telecommunications Board.
What will be the main areas in which computers will play a major role as digital assistants in support of human efforts?
Mitchell: When we look at driving, we already have computers assisting us with the current GPS systems. But that area is going to grow. The more that technology spreads, the more we’ll know where all the cars are and the more we’ll be able to spot the roads on which they will suddenly stop. Machines used in this way will be able to learn over time what your individual preferences are, and when establishing a route they can either minimize or maximize the freeways based on these preferences.
What will digital clerical assistants—in other words, tomorrow’s computers—be able to achieve?
Mitchell: Well, if we could afford to give everyone a personal assistant we’d all be more productive. Computerized assistants could offload some of the clerical work. For example, we could have used such a clerical assistant to do the email communication for us that we used to set up this call. What makes it hard is when we start sending stuff back and forth in text. Computers still can’t really interpret that. It’s hard for them to read. A computer could probably be trained to understand 80 % of the things we say. But for the other 20 % you’d need to understand the nuances. There is a lot of interesting work being done in the area of computer reading and interpretation of text. Right now there are some good systems available for interpreting arbitrary web pages and documents. But while they are good at spotting certain names in documents, say of companies, people, dates, they’re less robust when it comes to a more layered understanding of information, such as the relationship between these categories of names. This line of work is also accelerating right now.
What would happen if computers could read?
Mitchell: You and I would profit. Let’s say I’d want to attend the next conference on artificial intelligence and make travel arrangements. If computers could read, they could do that for you. They could find your flight, make your hotel reservation, and even find out whom you might want to interview while there—all without a human intermediary. You would also be able to have access to a lot of information on the web much quicker. If you think of companies like Microsoft, Google, and Yahoo, they have all developed search engines to draw information from the web. The next generation of computer systems will be able to analyze text content, not just pull up the information. Just imagine, instead of typing in a word you might be able to type in a question for the computer and get an answer.
How long do you think it will take until this will become a reality?
Mitchell: It’s going to happen in less than a decade. This is going to be one of those quantum leaps I think we’ve all been expecting. Companies are already putting a lot of resources into this area. I have made a bet for a lobster dinner that by 2015 we will have computer programs that will read 80 % of facts on the web. Once computers can read we will be able to collect vast amounts of data.
What makes you so sure it will take less than a decade for this to happen? A breakthrough for intelligent assistants has been projected for at least 25 years!
Mitchell: A tremendous amount of money has been invested in this area with companies like Google, Microsoft, and Yahoo really pushing this ahead. Also, I can see a path of technical results that get you there—advances in machine learning algorithms that have been going along nicely. Not since the emergence of the World Wide Web a decade ago, have computers had access to this much text to train on. With the internet, there is a lot of redundancy of text and the availability of a huge amount of data to train these algorithms. A lot of data is a good thing for training algorithms.
Could this play a role in developing systems in healthcare that assist in establishing a medical diagnosis, for example? Are there concerns about using computers to access such data?
Mitchell: I think privacy is going to be a big issue. This is something that will have to be addressed to get this sort of technology out there. A study I was involved with that addresses this concern is expected to be published sometime this year by the National Academy of Sciences. Some of the issues to be resolved revolve around the concern of people having their health information shared, for example. While the potential benefits to having access to a large pool of data are large, it’s a very complex field that comes with certain trade-offs.
Interview conducted by Karen M. Dente