Jeremy Howard is an artificial intelligence researcher and the co-founder of fast.ai, a platform for non-experts to learn artificial intelligence and machine learning. Prior to starting fast.ai, he founded multiple companies — including FastMail and Enlitic, a pioneer in applying deep learning to the medical field — and was president and chief scientist of machine-learning competition platform Kaggle.
In this interview, Howard discusses what it means for different industries and even global regions now that people without PhDs from specialized research labs can build and work with deep learning models. Among other topics under this broad umbrella, he shares his thoughts on how to best keep up with state-of-the-art techniques, prompt engineering as a new skill set, and the pros and cons of code-generation systems like Codex.
FUTURE: After running fast.ai for the past several years, what are the effects you’re seeing of having so many more people familiar with the basic concepts of deep learning — versus several years ago when people with the knowledge were unicorns?
JEREMY HOWARD: When we started fast.ai, there were, basically, five significant university research labs working on deep learning — and the only people that knew how to do nearly anything with deep learning were people who were at, or had been at, those five labs. On the whole, code wasn’t being published, let alone data. And even the papers were not publishing the details of how to make it work in practice, partly because academic venues didn’t much care about practical implementation. It was very focused on theory.
So when we started, it was a very speculative question of, “Is it possible to do world-class deep learning without a PhD?”. We now know the answer is yes; we showed that in our very first course. Our very first alumni went on to create patents using deep learning, to build companies using deep learning, and to publish in top venues using deep learning.
I think your question is exactly the right one, which is about what happens when domain experts become effective deep learning practitioners? That’s where we’ve seen the most interesting things going on. Generally, the best startups are the ones built by people who personally have an itch to scratch. They used to be recruiters, so they’re doing a recruiting startup, or they used to be a paralegal, so they’re doing a legal startup, or whatever. And they’re, like, “Oh, I hate this thing about the job I had. And now that I know about deep learning, I know I could almost automate that whole thing.”
A lot of our students also are doing or have done their PhDs, but not in math or computer science; instead, they’re doing them in chemoinformatics, proteomics, data journalism, or whatever. And we very often find that they’re able to take their research to a whole other level. For example, we’re starting to see for the first time some big databases and data corpuses of public library materials starting to appear on the internet. And there are people in that field — library science — now who are doing stuff where it never even occurred to anybody that they could do anything on that scale before. But suddenly, it’s like, “Oh, my god, look at what happens when you analyze a library as a thing.”
I gave a talk at an animal husbandry conference where everybody was talking about deep learning. To me, that’s a really non-obvious usage, but to them it’s by far the most obvious usage. People are using it to solve real-world problems using real-world data within real-world constraints.
It seems from my experience, over the last few years, that deep learning can be applied to pretty much every industry — not every part of every industry, but some parts of pretty much every industry.
We got to know one guy who had been doing lots of interesting stuff with malaria diagnostics, which, as you can imagine, is not the top problem that people in San Francisco were trying to solve.
It seems like that inversion of knowledge bases — deep learning now being supplementary to domain expertise — could shift the balance between theory and application.
Right, and you can see that happening. One of the big things early in the deep learning era was the work the Google Brain did, where they analyzed lots of YouTube videos and discovered that cats were a latent factor in many videos. Their model learned to recognize cats because it saw so many of them. And that’s very interesting work, but nobody went away and built a company on that.
The things that people were building — again, useful, but within certain areas — like Google and Apple image photo-search got pretty good pretty quickly because you could actually search for the things that were in the photos. That’s really helpful. And that’s the kind of stuff everybody was working on — either really abstract stuff or real first-world-problem stuff. There’s nothing wrong with that, but there are a lot of other things that need to be worked on, as well.
So I was thrilled when, after a couple of years, I looked at the demographics of the people who had done our course and I discovered that one of the biggest cities outside the U.S. was Lagos [the capital of Nigeria]. I thought it was really great because this is a community that wasn’t previously doing deep learning. I literally asked people in the first course: “Anybody here from Africa?” And I think there was one guy from the Ivory Coast who was having to get things burned to CD-ROM in his library because they don’t have enough internet connection. So it really grew pretty quickly.
And then it was nice because we started getting groups of folks from Uganda, Kenya, and Nigeria flying into San Francisco to do the course in person and getting to know each other. We got to know one guy, for example, who had been doing lots of interesting stuff with malaria diagnostics, which, as you can imagine, is not the top problem that people in San Francisco were trying to solve.
It feels to me that having 16 different large language models trained on 5% of the internet is like having 16 water pipes come into your house and 16 sets of electricity cables come into your house.
What does the average career path look like for someone who’s coming out of a deep learning program like yours?
It’s so diverse. It’s really changed a lot from the early days, when it was just this super early-adopter mindset — the people who were largely either entrepreneurs or PhDs and early postdocs, and who just love cutting-edge research and trying new things. It’s not just early adopters anymore, it’s also folks who are trying to catch up or keep up with the way their industry is moving.
Nowadays, a lot of it is people who are like, “Oh, my god, I feel like deep learning is starting to destroy expertise in my industry. People are doing stuff with a bit of deep learning that I can’t even conceive of, and I don’t want to miss out.” Some people are looking a bit further ahead, and they’re more, like, “Well, nobody is really using deep learning in my industry, but I can’t imagine it’s the one industry that’s not going to be affected, so I want to be the first.”
Some people definitely have an idea for a company that they want to build.
The other thing we get a lot of is companies sending a bunch of their research or engineering teams to do the course just because they feel like this is a corporate capability that they ought to have. And it’s particularly helpful with the online APIs that are out there now that people can play around with — Codex or DALL-E or whatever — and get a sense of, “Oh, this is a bit like something I do in my job, but it’s a bit different if I could tweak it in these ways.”
However, these models also have the unfortunate side effect, maybe, of increasing the tendency of people to feel like AI innovation is only for big companies, and that it’s outside of their capabilities. They might choose to be passive consumers of the technology because they don’t believe they have any ability to personally build something that would be any better than what Google or OpenAI might be building.
A model that decides whether or not you seem to like a movie and a model that can generate haikus are going to be 98% the same . . . It’s very, very rare that we actually need to train a huge model from scratch on a vast swath of the internet.
Even if that’s the case — if you can’t outbuild OpenAI or Google — surely there’s a way to take advantage of what they’ve done, of API access to incredibly powerful models, right?
The first thing to say is it’s not true, not in some general sense, at least. There’s a certain bifurcation of AI training going on now: There’s the Google and OpenAI side, which is all about creating models that are as general as possible, and, nearly always, those researchers specifically have the goal in their head of getting to AGI. I’m not commenting whether that’s good or bad; it’s definitely resulting in useful artifacts for us normal folks, so that’s fine.
However, there’s a totally different path, which is the one that nearly all of our students take, which is: “How can I solve the real-world problems of people in my community in as pragmatic a way as possible?” And there’s much less overlap than you might think between the two methods, the two datasets, the two techniques.
In my world, we never train a model from scratch, basically. It’s always fine-tuning. So we definitely leverage the work of the big guys, but it’s always freely available, downloadable models. Stuff like the open-source large language models through BigScience is very helpful for that.
However, they’re probably going to trail 6 to 12 months behind the big guys until, maybe, we find some more democratic way of doing this. It feels to me that having 16 different large language models trained on 5% of the internet is like having 16 water pipes come into your house and 16 sets of electricity cables come into your house. It feels like it should be more of a public utility. It’s great to have competition, but it would also be nice if there was some better cooperation going on, so we didn’t all have to waste our time doing the same thing.
So, yeah, we end up fine-tuning, for our particular purposes, models that other people have built. And it’s kind of like how the human genome and the monkey genome are nearly entirely the same, except for a few percent here and there, which actually turn out to make a big difference. It’s the same with neural nets: A model that decides whether or not you seem to like a movie and a model that can generate haikus are going to be 98% the same because most of that is about understanding the world, and understanding language and stuff. It’s very, very rare that we actually need to train a huge model from scratch on a vast swath of the internet.
And that’s why you absolutely can compete with Google and OpenAI — because they’re probably not even going to be in your space. If you’re trying to create something to automate the work of paralegals, or help with disaster resilience planning, or generate a better understanding of gendered language over the last 100 years or whatever, you aren’t competing with Google, you’re competing with that niche that’s in your domain.
There’s a significant coding skill right now in knowing how to go faster . . . by being really good at coming up with the right Codex comments . . . For a lot of people, that’s probably a more valuable, immediate thing to learn than getting really good at coding.
How important is it to keep up with all the advances in the AI space, especially if you’re working with it on a smaller scale?
No one can keep up with all the advances. You’ve got to keep up with some advances, but the actual techniques we’re working with change, nowadays, very slowly. The amount of difference between the 2017 fast.ai course and the 2018 fast.ai course was vast, and between the 2018 and 2019 courses it was vast-ish. Nowadays, very little changes over a couple-of-year period.
The things that we think of as being really significant, like the rise of the transformer architecture, for example, is actually some years old now and mainly is just a bunch of sandwiched, plain feed-forward neural network layers, and some dot-products. It’s great, but for somebody wanting to understand it, who already understands convnets, recurrent nets, and basic multilayer perceptrons, it’s like a few hours of work.
One of the big things that happened in the last couple of years is that more people are starting to understand the practical aspects of how to train a model effectively. For example, DeepMind recently released a paper that essentially showed all language models out there were dramatically less efficient than they should be, literally because they weren’t doing some basic stuff. Facebook — and, specifically, a Facebook intern was the lead author on the paper — built a thing called ConvNeXt, which is basically saying, “Here’s what happens if we take a normal convolutional neural network and just put in the obvious tweaks that everybody knows about.” And they basically are the state-of-the-art image model now.
So, yeah, staying up to date with the foundational basics of how to build good deep learning models is way less hard than it seems. And you certainly don’t have to read every paper in the field. Particularly at this point, now that things are going so much less quickly.
But I do think it’s useful to have a broad understanding, not just of your own particular special area. Let’s say you’re a computer-vision person, it helps a lot to be good at NLP, collaborative filtering, and tabular analysis, as well — and vice versa because there’s not nearly enough cross-pollination between these groups. And from time to time, somebody takes a peek at another area, steals some of its ideas, and comes away with a breakthrough result.
This is exactly what I did with ULMFiT four or five years ago. I said, “Let’s apply all the basic computer-vision transfer learning techniques to NLP,” and got a state-of-the-art result by miles. Researchers at OpenAI did something similar, but replaced my RNN with a transformer and scaled it up, and that became GPT. We all know how that went.
Staying up to date with the foundational basics of how to build good deep learning models is way less hard than it seems. And you certainly don’t have to read every paper in the field.
You’ve mentioned that we’ve seen a step-function shift in AI in the past three to six months. Can you elaborate on that?
I’d actually call it a hook rather than a step function. I think we’re on an exponential curve, and from time to time, you can notice that things have really seemed to have sped up in a noticeable way. Where we’ve got to is that pre-trained models trained on very large corpuses of text and images now can do very impressive one-shot or few-shot things in fairly general ways, partly because in the last few months people have got better at understanding prompt engineering. Essentially, knowing how to ask the right question — the “explain your reasoning” step-by-step kinds of prompts.
And we’re discovering that these models are actually able to do things that a lot of academics have been telling us aren’t possible in terms of a compositional understanding of the world and being able to show step-by-step reasoning. A lot of people had been saying, “Oh, you have to use symbolic techniques; neural nets and deep learning will never get there.” Well, it turns out that they do. I think when we can all see that it can do these things that people were claiming it could never do, it makes us a bit more bold about trying to do more with them.
It reminds me of the first time I saw a video on the internet, which I remember showing to my mum because it was a physiotherapy video, and she’s a physiotherapist. It was a video of a joint mobility exercise in your shoulder, and I think it was 128 by 128 pixels. It was black and white, highly compressed, and maybe about 3 or 4 seconds long. I was very excited, and I said to my mum, “Wow, look at this: a video on the internet!” And, of course, she was not excited at all. She was like, “What’s the use of that? This is the most pointless thing I’ve ever seen.”
Of course, I was thinking that one day this is going to be a thousand by a thousand pixels, 60 frames a second, full color, beautiful video. The proof is there, now it’s just waiting for the rest to catch up.
So I think when people saw the really low-quality images from deep learning in the early days, there wasn’t a lot of excitement because most people don’t realize that technology scales like this. Now that we can actually produce high-quality, full-color images that look way better than nearly any of us could picture or photograph, people don’t need any imagination. They can just see that what’s being done right now is very impressive. I think that makes a big difference.
I feel like HCI is the biggest missing piece in nearly every deep learning project I have seen . . . If I was in HCI, I’d be wanting my whole field to be focused on the question of how we interact with deep learning algorithms.
The idea of prompt engineering — if not as a whole new career, but at least as a new skill set — is really interesting, actually.
It is, and I’m terrible at it. For example, DALL-E doesn’t really know how to write text properly, which wouldn’t be a problem except that it loves to put text in all of its bloody images. So there’s always these random symbols and I can’t, for the life of me, figure out how to come up with a prompt that doesn’t have text in it. And then sometimes, I’ll just randomly change a word here or there and, suddenly, none of them have text anymore. There’s some trick to this, and I haven’t quite figured it out yet.
Also, for example, there’s a significant coding skill right now in knowing how to go faster — particularly, if you’re not a particularly good coder — by being really good at coming up with the right Codex comments to have it generate things for you. And knowing what kinds of errors it tends to make, what kinds of things it’s good at and bad at, and knowing how to get it to create a test for the thing that it just built for you.
For a lot of people, that’s probably a more valuable, immediate thing to learn than getting really good at coding.
Specifically on Codex, what are your thoughts on the idea of machine-generated code?
I wrote a blog post on it when GitHub Copilot came out, actually. At the time, I was like, “Wow, this is really cool and impressive, but I’m not quite sure how useful it is.” And I’m still not sure.
One major reason being that I think we all know that deep learning models have no understanding of whether they’re right or wrong. Codex has improved a lot since I reviewed its first version, but it still writes a lot of wrong code. Also, it writes verbose code because it’s generating average code. For me, taking average code and making it into code that I like and I know to be correct is much slower than just writing it from scratch — at least in languages I know well.
But I feel like there’s a whole human-computer interface (HCI) question here, and I feel like HCI is the biggest missing piece in nearly every deep learning project I have seen: almost never do these things fully replace humans. Therefore, we’re working together with these algorithms. If I was in HCI, I’d be wanting my whole field to be focused on the question of how we interact with deep learning algorithms. Because we’ve had decades of learning how to interact with graphical user interfaces, command-line interfaces, and web interfaces, but this is a totally different thing.
And I don’t know how I as a programmer best interact with something like Codex. I bet there are really powerful ways to do it for every area — creating interfaces and binding data, building algorithms, and so forth — but I have no idea what those things are.