automatically generated captions
00:00:00 Yoshua Bengio: So I think I showed this already to Frederick. Oh, come on.
00:00:18 Computer, wake up. All right, so we've worked for a few years on using deep learning information translation.
00:00:46 We've been able to reach the state of the art, and improve on the state of the art, using deep learning, and now,
00:00:52 other groups in the world are working on this, and it's pretty exciting.
00:00:56 But maybe not as exciting as seeing what happened
00:01:00 when we take the same technology that we use for our machine translation that goes, say, from French to English,
00:01:08 and apply it to translating from images to English. And so it does things like this.
00:01:16 You show the picture like this,
00:01:21 and the computer generates an actual language sentence that's kind of a description like,
00:01:27 "A woman is throwing a Frisbee in a park."
00:01:29 Interviewer: The computer generates that?
00:01:32 Yoshua Bengio: Yes, and it would do a different sentence next time if we try, all right?
00:01:39 And it's kind of generating somehow with some kind of randomness.
00:01:47 There are many possible sentences you could say about this image,
00:01:50 and it really learns about all the kids of sequences of words that are appropriate..
00:01:56 Interviewer: What kind of other sentences could be [inaudible 00:01:59].
00:01:59 Yoshua Bengio: Well, so here is some more, like here..
00:02:01 Interviewer: I mean for this same image.
00:02:03 Yoshua Bengio: Oh, I could image ir could say, "Two people in a park with green lawn." Simple, short sentences.
00:02:16 The computer was trained by imitating humans that have typed these kinds of short sentences. About 80,000 images
00:02:28 and 5 sentences for each image were labeled by humans, so that's supervised learning.
00:02:34 The human is telling the computer, "So for this image,
00:02:37 here is what I would say." And "Here's another sentences I could say." Different people produce different sentences for
00:02:42 the same image. That's how it was trained. It knows there is some diversity in what could be produced.
00:02:50 For each of these images,
00:02:51 what we see also is the particular system we developed uses what's called an "at attention mechanism." It's paying
00:03:02 attention to particular places in the image at different points in the sequence as it's generating each word. It
00:03:09 generates one word after the other, in order,
00:03:13 and as it's going to be generating the word "Frisbee," it's looking around the image
00:03:19 and focusing on the place where there's the Frisbee. So that part is interesting because it's not so supervised.
00:03:26 We didn't tell the computer, "When you're going to say that word,
00:03:29 you should be looking here in the image." It figured out by itself that this was useful to look at the right places as
00:03:39 it's producing each of the words in the sentence.
00:03:41 Interviewer: So it has a lot of words in the system?
00:03:46 Yoshua Bengio: Yeah, I don't know, something like 100,000 words.
00:03:51 Interviewer: And then it finds a connection by itself with the image?.
00:03:57 Yoshua Bengio: Yes, by itself...but it's been trained..
00:04:00 Interviewer: Yeah it's been trained,.
00:04:01 Yoshua Bengio: ...on a lot of the..
00:04:03 Interviewer: but It can also generate different sentences for the same image.
00:04:06 Yoshua Bengio: I don't have that example, but I can tell you that it does.
00:04:13 It's also useful to see sort of what's going on inside the machine, this attention, right?
00:04:18 So here, where it says, "A dog is standing on a hardwood floor," it's looking at the dog, and so on.
00:04:25 Like here, it's saying, "A group of people sitting on a boat in the water," and it's looking at the people,
00:04:30 so the underlying word here is the word at which we are going to display the attention picture, where it's focusing.
00:04:39 This attention is also useful to debug, to understand the mistakes. So here is some mistakes that it makes.
00:04:48 For example, in this picture, it says, "A woman holding a clock in her hand." It's getting confused because you see
00:04:57 when it says "clock," which is obviously wrong, it's looking at the round pattern on her shirt.
00:05:06 It thinks that the letters here maybe are the numbers in the clock, so it's making this mistake.
00:05:12 These are, maybe it doesn't have enough resolution to make out those details.
00:05:15 That's quite possible here. Also,
00:05:19 this system was trained on a fairly small data set compared to what's needed to really do that job well.
00:05:28 For example, our friends at Facebook, they trained a system on a lot more data.
00:05:35 Of course, this is sort of using their own internal data. Let's see if I still have this.
00:05:47 When the system was trained with more data, and also, to do something a bit hard, which is to answer questions,
00:05:53 so this comes from Facebook, my colleague's Facebook, [inaudible 00:05:57] one.
00:05:57 Computer Ask: Is there a baby?
00:06:02 Computer Answer: Yes.
00:06:04 Computer Ask: What is the man doing?
00:06:07 Computer Answer: Typing.
00:06:08 Computer Ask: Is the baby sitting on his lap?
00:06:12 Computer Answer: Yes.
00:06:14 Computer Ask: Are they smiling?
00:06:16 Computer Answer: Yes..
00:06:17 Yoshua Bengio: I'm not so sure that they are smiling, but I guess the man may be sort of smiling. This is..
00:06:28 Computer Ask: Is there a baby in the photo?
00:06:31 Yoshua Bengio: ...more impressive, this one.
00:06:32 Computer Answer: Yes.
00:06:33 Computer Ask: Where is the baby standing?
00:06:36 Computer Answer: Bathroom.
00:06:38 Computer Ask: What is the baby doing?
00:06:41 Computer Answer: Brushing teeth.
00:06:43 Yoshua Bengio: Yeah, that's not obvious.
00:06:45 Computer Ask: What game is being played?
00:06:48 Computer Answer: Soccer.
00:06:50 Computer Ask: Is someone kicking the ball?
00:06:53 Computer Answer: Yes.
00:06:54 Computer Ask: What color is the ball?
00:06:58 Computer Answer: Yellow.
00:06:59 Yoshua Bengio: Or green maybe?
00:07:01 Computer Ask: What game is the dog playing?
00:07:03 Computer Answer: Frisbee.
00:07:06 Yoshua Bengio: Yeah, so you see it's the same theme, right?
00:07:10 We have a new one at work that analyzes the image,
00:07:13 computer representation of it. I can go back here so you can get a sense of what I mean.
00:07:20 Interviewer: Is this your own presentation?.
00:07:28 Yoshua Bengio: That presentation? Yeah, these are my slides, yeah. So basically, that's actually..
00:07:38 Interviewer: Oh, that's interesting.
00:07:40 Yoshua Bengio: ...It's showing how the image is going to be transformed in different stages,
00:07:49 and so what we mean by deep is basically just that there are many such stages.
00:07:54 So the classical way of doing neural networks, there was only two stages because we didn't know how to train them..
00:08:03 But we have found ways to train much deeper ones since then. That's basically started in 2006, this..
00:08:11 Interviewer: Global?.
00:08:12 Yoshua Bengio: Being able..
00:08:17 Interviewer: Canadian.
00:08:17 Yoshua Bengio: Yeah, yeah.
00:08:18 That's Canadian conspiracy, that's right. Let me show you some more things that are fun. So another interesting...oh,
00:08:38 actually, we're going to meet my brother, Sammy. Here's his picture.
00:08:42 And some of the work he's done in that field is to use these approaches to learn to transform both images
00:08:53 and text to the same space.
00:08:57 An image is transformed into a point in some space, and if you type something like,
00:09:04 "dolphin" is also transformed into the same space.
00:09:09 And if both the word and image go nearly near to each other,
00:09:15 and it's a good cue that they mean something related. So if you are in Google image search,
00:09:23 and you're searching for images of dolphin, what's going to happen is,
00:09:29 it will computer this representation for the word, dolphin, and then look for images that have a representation nearby,
00:09:36 and then show you those images. So this notion of intermediate representation is very very important in deep learning.
00:09:43 That's really want makes the whole thing work. We're not trying to go directly from inputs to decisions.
00:09:49 We have these intermediate stages, intermediate computations, or intermediate representations, as we call them,
00:09:56 that can be more abstract, as we consider deeper systems.
00:10:00 Interviewer: Does it mean that you are on the right track? It completes the goal yet?
00:10:06 But you're on the right track, in the right direction?
00:10:08 Yoshua Bengio: Well, yeah, that's what we believe.
00:10:10 Future researchers will prove us wrong, as usual,
00:10:13 but currently the evidence is strongly suggesting that this idea of having multiple levels of representation is working
00:10:23 really well.
00:10:24 So we have both these kinds of experiments, as you've seen,
00:10:29 that show the system doing things thanks to these deep architectures. Also,
00:10:35 we're starting to understand more theory why it's happening like that, although,
00:10:41 much more theory probably needs to be done.
00:10:46 But I think there are lots of good reasons to believe that this concept of having multiple levels of representation is
00:10:56 quite important.
00:10:57 And we have also the evidence from our brain that it's actually doing something similar. So if you look at it kind of
00:11:08 like a map of what's happening in the visual cortex, which is the part of the brain that processes images,
00:11:17 and that's also the part that we understand the best, the information goes from your eye,
00:11:23 through different regions in your brain that have little names like LGN, B1, B2, B4, and so on,
00:11:29 and at each of those stages, they correspond to the layers in our simple models.
00:11:38 And the information becomes more represented in an abstract way. So at the lower level,
00:11:45 the brain recognizes very local things like edges, and as you go higher up, it starts recognizing little pieces
00:11:53 and shapes and eventually, around here, it's actually recognizing actual objects, faces, and things like that,
00:12:01 pretty high level things. Eventually, that information going to places where you're going to more actions
00:12:09 and decision-making, and so on. We understand some of the high level structure of the brain.
00:12:15 Like I said, it has helped us as a source of inspiration.
00:12:19 We still don't understand how the brain actually learns to do what it's doing.
00:12:24 We see the connections, we see information flowing, so that's kind of a fascinating topic.
00:12:30 Interviewer: But if you don't know how our brain learns, how can you possibly find out how a machine learns?
00:12:36 Yoshua Bengio: Oh, but for machines, we can design them based on mathematical concepts.
00:12:47 It's like if you're trying to build flying machines without seeing birds.
00:12:53 You could just based on the laws of physics that we know, and you could try to design something.
00:12:57 But of course, the way it happened is that the people who build those planes were highly inspired by birds,
00:13:05 maybe too much sometimes. I have a nice slide about this,
00:13:09 which illustrates that sometimes it's not so good to imitate nature too closely. Let me think, where is it?
00:13:18 Maybe not, not in this presentation.
00:13:24 Interviewer: Like a person who attaches wings to back?.
00:13:32 Yoshua Bengio: Yeah, One of the first planes that was built in France by..
00:13:49 Interviewer: Is it possible for me to have a copy of this presentation?
00:13:55 Yoshua Bengio: Yeah, of course..
00:13:56 Interviewer: Because I..
00:13:57 Yoshua Bengio: Any of these things. So he built this tentative plane by trying to imitate that,
00:14:06 and he called it Avion III. "Avion" means plane in French. Actually that's how the name, Avion, came up. It didn't fly.
00:14:17 He was trying to just copy, and he hadn't captured the required principles of flight.
00:14:25 So sometimes it's not enough to copy the details. There are a lot of things we know about the brain in terms of details,
00:14:32 but we don't have enough of the big picture in terms of the principles.
00:14:36 Interviewer: Where does this all end?
00:14:41 Yoshua Bengio: Where does this land?
00:14:43 Interviewer: End..
00:14:43 Yoshua Bengio: End. Oh, you know, research never ends. We always...
00:14:48 Interviewer: Suppose you're right..
00:14:49 Yoshua Bengio: ...have more to learn.
00:14:51 Interviewer: Suppose you're right with your research, where does it end?
00:14:54 Yoshua Bengio: Why should it end?
00:14:56 Interviewer: Do you have a first goal? Something that you'd like to reach?
00:15:00 Yoshua Bengio: Well, it may be...it's hard for me to think of a day
00:15:05 when we will actually have researched this goal because it seems so far right now.
00:15:13 But yeah, it's possible to imagine one day that we fully understand the principles that make us intelligent,
00:15:21 and maybe we go onto other questions.
00:15:24 Interviewer: You know which questions?
00:15:27 Yoshua Bengio: Our descendants will have questions I'm sure. Humans always have questions.
00:15:35 Interviewer: Is a questions like, "What is love?" One like that?
00:15:38 Yoshua Bengio: [laughs] Well, one thing that's true is that we're focusing on some aspect of who we are here,
00:15:48 like sort of pure understanding, pure intelligence learning.
00:15:54 Of course, we are much more than that, talking about love, emotions in general,
00:15:59 and people are starting to try to connect the dots for these aspects as well.
00:16:07 This is a kind of beginning, and also, connect to other related disciplines like sociology or anthropology
00:16:20 and how humans build societies and interact..
00:16:26 Interviewer: What do you think..
00:16:26 Yoshua Bengio: All of these aspects are not taking into consideration our understanding of how we are able to take
00:16:38 decisions because we don't understand that enough.
00:16:42 Of course, in psychology, there are lots of theories, and I won't diminish that at all.
00:16:46 Clearly we're missing a lot of that understand, and so when we make progress with that,
00:16:51 I think the connections with other sciences that have to do with humans will be important.
00:16:58 Interviewer: Do you think it's ever possible for a computer to love?
00:17:02 Yoshua Bengio: If we program them to, yes.
00:17:06 Interviewer: What yes?
00:17:09 Yoshua Bengio: Why?
00:17:10 Interviewer: What, yes? I try to...forget my question.
00:17:14 Yoshua Bengio: As far as I'm concerned, emotions are like anything else.
00:17:18 They can be understood, and we can have programs
00:17:23 or equations that follow the moves that correspond to the underlying mechanisms. We have a tendency to put emotions as
00:17:35 something magical, that isn't accessible to science. I don't think so.
00:17:41 I think we're understanding more and more about them, and we will even more in the future.
00:17:50 Once we do that, it's good for us. We can be less stupid with our emotions, as [laughs] I think we are right now.
00:17:58 Interviewer: [laughs] That's part of the charm of life.
00:18:02 Yoshua Bengio: Famous scientists like Einstein were saying things like,
00:18:10 "It's not incompatible to have understanding of the universe and how things work and feeling the charm of life
00:18:20 and the awe in front of the universe and its beauty," and so on. These are not incompatible.
00:18:25 In fact, it's the opposite.
00:18:27 The more we understand something, the more we can love something, and we can feel impressed or motivated,
00:18:38 or whatever the emotion.
00:18:40 Interviewer: You showed this magazine, can you take it?
00:18:47 Yoshua Bengio: Yeah, sure.
00:18:49 Interviewer: What does it say?
00:18:50 Yoshua Bengio: So they chose to highlight some progress in science,
00:18:57 and one thing that is true is that artificial intelligence research in the last year
00:19:05 or two has been making incredible progress.
00:19:09 And of course, that's one of the reasons why we hear so much about it in media, and also companies are investing.
00:19:19 and so on. They chose an advance..
00:19:22 Interviewer: What does it say on the cover?
00:19:24 Yoshua Bengio: Oh,
00:19:24 so it says this is a special issue that they do every year about ten discoveries of the year that they have selected,
00:19:35 and they chose to talk about some of the things we've done in my lab as well as in some of my colleagues' lab.
00:19:47 [inaudible 00:19:49] is actually at NYU.
00:19:50 Interviewer: Can you show it?
00:19:51 Yoshua Bengio: Yeah, yeah.
00:19:53 So is there no pictures except for me here, but what it's talking about is, in French here, it says,
00:20:00 "It's the end of a belief about neural networks." One of the reasons why the whole approach to neural networks was kind
00:20:11 of abandoned for more than a decade is that people have this false belief that it would be very difficult to train,
00:20:21 because learning would have to get stuck in what we call local [inaudible 00:20:28]
00:20:28 where from which there's no small improvement that the system can make to improve itself. We showed some evidence that
00:20:41 this is not the case. In fact, the bigger the neural net, the less likely this is going to be the case.
00:20:49 And we showed some empirical, some experimental evidence that in fact these local [inaudible 00:20:59]
00:21:00 people thought existed.
00:21:01 They do exist, but they correspond to very good configurations,
00:21:04 very good settings. I think it helps to understand why these neural nets that we have now, especially the bigger ones,
00:21:16 are working so well, in spite of this belief that people thought it was hopeless just a few years ago.
00:21:23 Interviewer: For me to understand, or in a few words, what are the neural net?.
00:21:28 Yoshua Bengio: So, a neural net is a..
00:21:31 Interviewer: Just a moment.
00:21:32 Yoshua Bengio: A neural net is a computer program, or a computer simulation,
00:21:43 that's inspired buy the biological neural nets, that is able to learn from examples.
00:21:50 It has a structure, I can maybe find some slides that illustrate. It's composed of little units, which make,
00:22:01 which compute. So we use a very cartoon neural net with some inputs... There's another one.
00:22:15 Where is it? ...some inputs.
00:22:18 These are supposed to be standing for artificial neurons, or we call them units,
00:22:25 and they're connected to other neurons that are connected to other neurons.
00:22:28 And then some of these are actually the outputs that produce the actions,
00:22:35 and some of those would be the inputs that correspond to persecution. The information flows in your brain through many
00:22:42 layers like this.
00:22:46 We have designed little equations that define what each of these artificial neurons is doing with computation.
00:22:55 It does very very simple things. The connections between neurons are things that can be changed while the system is
00:23:02 learning.
00:23:03 And we have algorithms, so computer recipes,
00:23:06 that tell us how that change should be done so that next time the system sees another example,
00:23:15 it will be more likely to produce the right answer, or at least the answer that we wanted.
00:23:20 Interviewer: Another one, just for me to understand, can you explain to me what is a neuron?
00:23:26 Yoshua Bengio: Yes. So you can think...these are graphical depictions to neurons.
00:23:37 They're connected to other neurons through little filaments, and they connect to some places called synapses.
00:23:46 They're signals, electrical spikes, that travel
00:23:50 and allow these neurons to communicates with each other. Each of these neurons is doing something very simple.
00:23:57 They're receiving those signals coming from other neurons, and they're just adding them up basically.
00:24:03 But they can give different importance to some of the signals that they're receiving.
00:24:10 These are called a synaped weights.
00:24:12 It's just a sum of products essentially, adding and multiplying, very simple things.
00:24:19 Of course, this is a simplification of what is going on in real neurons,
00:24:23 but it's a very useful simplification. There are also some more complicated operations that are happening inside the
00:24:31 neuron that allow it to, at the end of the day, do a little piece of computation, such that if we many such neurons,
00:24:42 so we have a neural network, so a neural network is just a bunch of these neurons connected to each other,
00:24:47 they can actually compute very complicated things.
00:24:49 In fact, we can prove that if you have enough neurons, you can compute almost anything that a [inaudihble 00:24:55]
00:24:55 computer could compute. The question then is,
00:24:59 how do we make sure that that neural network computes the things that we want, that are useful for us,
00:25:05 like recognizing objects, answering questions, understanding speech? That's where the learning comes.
00:25:15 The learning is about gradually changing those weights that control the importance that each neuron gives to the
00:25:23 neighbor to which they are connected.
00:25:24 Interviewer: But when you learn, then you also remember.
00:25:27 Yoshua Bengio: Yes, exactly.
00:25:29 Interviewer: How does it remember?
00:25:31 Yoshua Bengio: Well, it remembers because each time the network sees an example and makes some small changes,
00:25:41 it's going to be more likely to reproduce that kind of example, or produce the right answer for that example.
00:25:48 So it remembers what it should have done for this example, or at least, it remembers little bits.
00:25:54 If you repeat that example, then it basically learns it by heart. But it may learn things by heart.
00:26:01 It may be okay to teach it with some mistakes,
00:26:05 and it will kind of listen to the majority of what it sees to come up with a good general answer.
00:26:14 Interviewer: Okay. I think I understand. So what is the thing you are most excited about, yourself?
00:26:22 Yoshua Bengio: So what I'm most excited about what we call unsupervised learning.
00:26:28 That's the area of research,
00:26:31 which tries to make computers learn without having to tell them in detail what they should be doing.
00:26:41 We'd show the computer lots of text, or lots of videos, and it would learn about language
00:26:50 or about how images are formed just by these observations. That's something that we know is really important for really
00:27:00 progressing towards the eye because one, we know that humans do that a lot; and two,
00:27:10 we can't afford to continue as we are doing now by providing the computer with very details prescription of what it
00:27:17 should be doing or how to interpret ate each answer or each sentence, and so on.
00:27:21 We can do some of it, but the computer would have access to a lot more information, a lot more data,
00:27:27 a lot more examples, if it was better at unsupervised learning.
00:27:31 Interviewer: Do you think in the near future, this computer, which is able to learn by itself,
00:27:41 can you say then that the computer comes alive?
00:27:44 Yoshua Bengio: Can you say that the computer comes alive?
00:27:49 Yeah, that's a good question. There's some different definitions of life, of course,
00:27:56 and many of these definitions have been produced to fit life as we know it on Earth.
00:28:09 We would have to extend our notion of what life means.
00:28:12 But it's not inconceivable that if we're able to build machines that can learn more by themselves
00:28:21 and can take their own decisions, basically,
00:28:30 that it will start looking like a living being. It will be hard to say that they're not alive,
00:28:37 except that if it's like today's machines, we can just turn them off whenever we want.
00:28:45 In fact, we only turn them on for the time of an experiment.
00:28:49 It's not like these things are sitting all the time, right?
00:28:54 We just run an experiments where, it's like, we have the machine start a new life,
00:29:00 and it goes through some experience and we test it, and then that's it.
00:29:08 There's no consciousness in these things and no lasting state.
00:29:13 Interviewer: But does it need consciousness?.
00:29:15 Yoshua Bengio: Not for most of the things we need those machines to do. We could have..
00:29:21 Interviewer: But for a machine to be alive, does it need consciousness because when it's alive
00:29:25 and without consciousness then we have a problem.
00:29:29 Yoshua Bengio: It depends, first of all, by what you mean by "alive,"
00:29:33 but I think we could have computers that are very very intelligent without being conscious,
00:29:39 without having a self-consciousness. They could be very very useful to us.
00:29:45 They could help us solve a lot of problems that we have around us and help us make our world better,
00:29:55 and that doesn't require consciousness. I don't think so.
00:30:00 But humans being what they are, I suppose one day people will try to put in some kind of consciousness in computers,
00:30:08 in intelligent machines.
00:30:10 Interviewer: It's just the question of time, I guess.
00:30:12 Yoshua Bengio: Yeah. I guess, but it's pretty far away.