• Zero: What to expect in this series
    Sep 2 2023

    A short introduction to what you'll get out of these episodes!

    Exibir mais Exibir menos
    2 minutos
  • One: Brian Christian on the alignment problem
    Sep 2 2023

    Originally released in March 2021.

    Brian Christian is a bestselling author with a particular knack for accurately communicating difficult or technical ideas from both mathematics and computer science.

    Listeners loved our episode about his book Algorithms to Live By — so when the team read his new book, The Alignment Problem, and found it to be an insightful and comprehensive review of the state of the research into making advanced AI useful and reliably safe, getting him back on the show was a no-brainer.

    Brian has so much of substance to say this episode will likely be of interest to people who know a lot about AI as well as those who know a little, and of interest to people who are nervous about where AI is going as well as those who aren't nervous at all.

    Links to learn more, summary and full transcript.

    Here’s a tease of 10 Hollywood-worthy stories from the episode:

    • The Riddle of Dopamine: The development of reinforcement learning solves a long-standing mystery of how humans are able to learn from their experience.
    • ALVINN: A student teaches a military vehicle to drive between Pittsburgh and Lake Erie, without intervention, in the early 1990s, using a computer with a tenth the processing capacity of an Apple Watch.
    • Couch Potato: An agent trained to be curious is stopped in its quest to navigate a maze by a paralysing TV screen.
    • Pitts & McCulloch: A homeless teenager and his foster father figure invent the idea of the neural net.
    • Tree Senility: Agents become so good at living in trees to escape predators that they forget how to leave, starve, and die.
    • The Danish Bicycle: A reinforcement learning agent figures out that it can better achieve its goal by riding in circles as quickly as possible than reaching its purported destination.
    • Montezuma's Revenge: By 2015 a reinforcement learner can play 60 different Atari games — the majority impossibly well — but can’t score a single point on one game humans find tediously simple.
    • Curious Pong: Two novelty-seeking agents, forced to play Pong against one another, create increasingly extreme rallies.
    • AlphaGo Zero: A computer program becomes superhuman at Chess and Go in under a day by attempting to imitate itself.
    • Robot Gymnasts: Over the course of an hour, humans teach robots to do perfect backflips just by telling them which of 2 random actions look more like a backflip.

    We also cover:

    • How reinforcement learning actually works, and some of its key achievements and failures
    • How a lack of curiosity can cause AIs to fail to be able to do basic things
    • The pitfalls of getting AI to imitate how we ourselves behave
    • The benefits of getting AI to infer what we must be trying to achieve
    • Why it’s good for agents to be uncertain about what they're doing
    • Why Brian isn’t that worried about explicit deception
    • The interviewees Brian most agrees with, and most disagrees with
    • Developments since Brian finished the manuscript
    • The effective altruism and AI safety communities
    • And much more

    Producer: Keiran Harris.
    Audio mastering: Ben Cordell.
    Transcriptions: Sofia Davis-Fogel.

    Exibir mais Exibir menos
    2 horas e 56 minutos
  • Two: Ajeya Cotra on accidentally teaching AI models to deceive us
    Sep 2 2023

    Originally released in May 2023.

    Imagine you are an orphaned eight-year-old whose parents left you a $1 trillion company, and no trusted adult to serve as your guide to the world. You have to hire a smart adult to run that company, guide your life the way that a parent would, and administer your vast wealth. You have to hire that adult based on a work trial or interview you come up with. You don't get to see any resumes or do reference checks. And because you're so rich, tonnes of people apply for the job — for all sorts of reasons.

    Today's guest Ajeya Cotra — senior research analyst at Open Philanthropy — argues that this peculiar setup resembles the situation humanity finds itself in when training very general and very capable AI models using current deep learning methods.

    Links to learn more, summary and full transcript.

    As she explains, such an eight-year-old faces a challenging problem. In the candidate pool there are likely some truly nice people, who sincerely want to help and make decisions that are in your interest. But there are probably other characters too — like people who will pretend to care about you while you're monitoring them, but intend to use the job to enrich themselves as soon as they think they can get away with it.

    Like a child trying to judge adults, at some point humans will be required to judge the trustworthiness and reliability of machine learning models that are as goal-oriented as people, and greatly outclass them in knowledge, experience, breadth, and speed. Tricky!

    Can't we rely on how well models have performed at tasks during training to guide us? Ajeya worries that it won't work. The trouble is that three different sorts of models will all produce the same output during training, but could behave very differently once deployed in a setting that allows their true colours to come through. She describes three such motivational archetypes:

    • Saints — models that care about doing what we really want
    • Sycophants — models that just want us to say they've done a good job, even if they get that praise by taking actions they know we wouldn't want them to
    • Schemers — models that don't care about us or our interests at all, who are just pleasing us so long as that serves their own agenda

    And according to Ajeya, there are also ways we could end up actively selecting for motivations that we don't want.

    In today's interview, Ajeya and Rob discuss the above, as well as:

    • How to predict the motivations a neural network will develop through training
    • Whether AIs being trained will functionally understand that they're AIs being trained, the same way we think we understand that we're humans living on planet Earth
    • Stories of AI misalignment that Ajeya doesn't buy into
    • Analogies for AI, from octopuses to aliens to can openers
    • Why it's smarter to have separate planning AIs and doing AIs
    • The benefits of only following through on AI-generated plans that make sense to human beings
    • What approaches for fixing alignment problems Ajeya is most excited about, and which she thinks are overrated
    • How one might demo actually scary AI failure mechanisms

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

    Producer: Keiran Harris

    Audio mastering: Ryan Kessler and Ben Cordell

    Transcriptions: Katy Moore


    Exibir mais Exibir menos
    2 horas e 50 minutos
  • Three: Paul Christiano on finding real solutions to the AI alignment problem
    Sep 2 2023

    Originally released in October 2018.

    Paul Christiano is one of the smartest people I know. After our first session produced such great material, we decided to do a second recording, resulting in our longest interview so far. While challenging at times I can strongly recommend listening - Paul works on AI himself and has a very unusually thought through view of how it will change the world. This is now the top resource I'm going to refer people to if they're interested in positively shaping the development of AI, and want to understand the problem better. Even though I'm familiar with Paul's writing I felt I was learning a great deal and am now in a better position to make a difference to the world.

    A few of the topics we cover are:

    • Why Paul expects AI to transform the world gradually rather than explosively and what that would look like
    • Several concrete methods OpenAI is trying to develop to ensure AI systems do what we want even if they become more competent than us
    • Why AI systems will probably be granted legal and property rights
    • How an advanced AI that doesn't share human goals could still have moral value
    • Why machine learning might take over science research from humans before it can do most other tasks
    • Which decade we should expect human labour to become obsolete, and how this should affect your savings plan.

    Links to learn more, summary and full transcript.

    Here's a situation we all regularly confront: you want to answer a difficult question, but aren't quite smart or informed enough to figure it out for yourself. The good news is you have access to experts who *are* smart enough to figure it out. The bad news is that they disagree.

    If given plenty of time - and enough arguments, counterarguments and counter-counter-arguments between all the experts - should you eventually be able to figure out which is correct? What if one expert were deliberately trying to mislead you? And should the expert with the correct view just tell the whole truth, or will competition force them to throw in persuasive lies in order to have a chance of winning you over?

    In other words: does 'debate', in principle, lead to truth?

    According to Paul Christiano - researcher at the machine learning research lab OpenAI and legendary thinker in the effective altruism and rationality communities - this question is of more than mere philosophical interest. That's because 'debate' is a promising method of keeping artificial intelligence aligned with human goals, even if it becomes much more intelligent and sophisticated than we are.

    It's a method OpenAI is actively trying to develop, because in the long-term it wants to train AI systems to make decisions that are too complex for any human to grasp, but without the risks that arise from a complete loss of human oversight.

    If AI-1 is free to choose any line of argument in order to attack the ideas of AI-2, and AI-2 always seems to successfully defend them, it suggests that every possible line of argument would have been unsuccessful.

    But does that mean that the ideas of AI-2 were actually right? It would be nice if the optimal strategy in debate were to be completely honest, provide good arguments, and respond to counterarguments in a valid way. But we don't know that's the case.

    Get this episode by subscribing: type '80,000 Hours' into your podcasting app.

    The 80,000 Hours Podcast is produced by Keiran Harris.


    Exibir mais Exibir menos
    3 horas e 52 minutos
  • Four: Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters
    Sep 2 2023

    Can there be a more exciting and strange place to work today than a leading AI lab? Your CEO has said they're worried your research could cause human extinction. The government is setting up meetings to discuss how this outcome can be avoided. Some of your colleagues think this is all overblown; others are more anxious still.

    Today's guest — machine learning researcher Rohin Shah — goes into the Google DeepMind offices each day with that peculiar backdrop to his work.

    Links to learn more, summary and full transcript.

    He's on the team dedicated to maintaining 'technical AI safety' as these models approach and exceed human capabilities: basically that the models help humanity accomplish its goals without flipping out in some dangerous way. This work has never seemed more important.

    In the short-term it could be the key bottleneck to deploying ML models in high-stakes real-life situations. In the long-term, it could be the difference between humanity thriving and disappearing entirely.

    For years Rohin has been on a mission to fairly hear out people across the full spectrum of opinion about risks from artificial intelligence -- from doomers to doubters -- and properly understand their point of view. That makes him unusually well placed to give an overview of what we do and don't understand. He has landed somewhere in the middle — troubled by ways things could go wrong, but not convinced there are very strong reasons to expect a terrible outcome.

    Today's conversation is wide-ranging and Rohin lays out many of his personal opinions to host Rob Wiblin, including:

    • What he sees as the strongest case both for and against slowing down the rate of progress in AI research.
    • Why he disagrees with most other ML researchers that training a model on a sensible 'reward function' is enough to get a good outcome.
    • Why he disagrees with many on LessWrong that the bar for whether a safety technique is helpful is “could this contain a superintelligence.”
    • That he thinks nobody has very compelling arguments that AI created via machine learning will be dangerous by default, or that it will be safe by default. He believes we just don't know.
    • That he understands that analogies and visualisations are necessary for public communication, but is sceptical that they really help us understand what's going on with ML models, because they're different in important ways from every other case we might compare them to.
    • Why he's optimistic about DeepMind’s work on scalable oversight, mechanistic interpretability, and dangerous capabilities evaluations, and what each of those projects involves.
    • Why he isn't inherently worried about a future where we're surrounded by beings far more capable than us, so long as they share our goals to a reasonable degree.
    • Why it's not enough for humanity to know how to align AI models — it's essential that management at AI labs correctly pick which methods they're going to use and have the practical know-how to apply them properly.
    • Three observations that make him a little more optimistic: humans are a bit muddle-headed and not super goal-orientated; planes don't crash; and universities have specific majors in particular subjects.
    • Plenty more besides.

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

    Producer: Keiran Harris

    Audio mastering: Milo McGuire, Dominic Armstrong, and Ben Cordell

    Transcriptions: Katy Moore

    Exibir mais Exibir menos
    3 horas e 10 minutos
  • Five: Chris Olah on what the hell is going on inside neural networks
    Sep 2 2023

    Originally released in August 2021.

    Chris Olah has had a fascinating and unconventional career path.

    Most people who want to pursue a research career feel they need a degree to get taken seriously. But Chris not only doesn't have a PhD, but doesn’t even have an undergraduate degree. After dropping out of university to help defend an acquaintance who was facing bogus criminal charges, Chris started independently working on machine learning research, and eventually got an internship at Google Brain, a leading AI research group.

    In this interview — a follow-up to our episode on his technical work — we discuss what, if anything, can be learned from his unusual career path. Should more people pass on university and just throw themselves at solving a problem they care about? Or would it be foolhardy for others to try to copy a unique case like Chris’?

    Links to learn more, summary and full transcript.

    We also cover some of Chris' personal passions over the years, including his attempts to reduce what he calls 'research debt' by starting a new academic journal called Distill, focused just on explaining existing results unusually clearly.

    As Chris explains, as fields develop they accumulate huge bodies of knowledge that researchers are meant to be familiar with before they start contributing themselves. But the weight of that existing knowledge — and the need to keep up with what everyone else is doing — can become crushing. It can take someone until their 30s or later to earn their stripes, and sometimes a field will split in two just to make it possible for anyone to stay on top of it.

    If that were unavoidable it would be one thing, but Chris thinks we're nowhere near communicating existing knowledge as well as we could. Incrementally improving an explanation of a technical idea might take a single author weeks to do, but could go on to save a day for thousands, tens of thousands, or hundreds of thousands of students, if it becomes the best option available.

    Despite that, academics have little incentive to produce outstanding explanations of complex ideas that can speed up the education of everyone coming up in their field. And some even see the process of deciphering bad explanations as a desirable right of passage all should pass through, just as they did.

    So Chris tried his hand at chipping away at this problem — but concluded the nature of the problem wasn't quite what he originally thought. In this conversation we talk about that, as well as:

    • Why highly thoughtful cold emails can be surprisingly effective, but average cold emails do little
    • Strategies for growing as a researcher
    • Thinking about research as a market
    • How Chris thinks about writing outstanding explanations
    • The concept of 'micromarriages' and ‘microbestfriendships’
    • And much more.

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app.

    Producer: Keiran Harris
    Audio mastering: Ben Cordell
    Transcriptions: Sofia Davis-Fogel


    Exibir mais Exibir menos
    3 horas e 9 minutos
  • Six: Richard Ngo on large language models, OpenAI, and striving to make the future go well
    Sep 2 2023

    Originally released in December 2022.

    Large language models like GPT-3, and now ChatGPT, are neural networks trained on a large fraction of all text available on the internet to do one thing: predict the next word in a passage. This simple technique has led to something extraordinary — black boxes able to write TV scripts, explain jokes, produce satirical poetry, answer common factual questions, argue sensibly for political positions, and more. Every month their capabilities grow.

    But do they really 'understand' what they're saying, or do they just give the illusion of understanding?

    Today's guest, Richard Ngo, thinks that in the most important sense they understand many things. Richard is a researcher at OpenAI — the company that created ChatGPT — who works to foresee where AI advances are going and develop strategies that will keep these models from 'acting out' as they become more powerful, are deployed and ultimately given power in society.

    Links to learn more, summary and full transcript.

    One way to think about 'understanding' is as a subjective experience. Whether it feels like something to be a large language model is an important question, but one we currently have no way to answer.

    However, as Richard explains, another way to think about 'understanding' is as a functional matter. If you really understand an idea you're able to use it to reason and draw inferences in new situations. And that kind of understanding is observable and testable.

    Richard argues that language models are developing sophisticated representations of the world which can be manipulated to draw sensible conclusions — maybe not so different from what happens in the human mind. And experiments have found that, as models get more parameters and are trained on more data, these types of capabilities consistently improve.

    We might feel reluctant to say a computer understands something the way that we do. But if it walks like a duck and it quacks like a duck, we should consider that maybe we have a duck, or at least something sufficiently close to a duck it doesn't matter.

    In today's conversation we discuss the above, as well as:

    • Could speeding up AI development be a bad thing?
    • The balance between excitement and fear when it comes to AI advances
    • What OpenAI focuses its efforts where it does
    • Common misconceptions about machine learning
    • How many computer chips it might require to be able to do most of the things humans do
    • How Richard understands the 'alignment problem' differently than other people
    • Why 'situational awareness' may be a key concept for understanding the behaviour of AI models
    • What work to positively shape the development of AI Richard is and isn't excited about
    • The AGI Safety Fundamentals course that Richard developed to help people learn more about this field

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app.

    Producer: Keiran Harris
    Audio mastering: Milo McGuire and Ben Cordell
    Transcriptions: Katy Moore


    Exibir mais Exibir menos
    2 horas e 44 minutos
  • Seven: Ben Garfinkel on scrutinising classic AI risk arguments
    Sep 1 2023

    Originally released in July 2020.

    80,000 Hours, along with many other members of the effective altruism movement, has argued that helping to positively shape the development of artificial intelligence may be one of the best ways to have a lasting, positive impact on the long-term future. Millions of dollars in philanthropic spending, as well as lots of career changes, have been motivated by these arguments.

    Today’s guest, Ben Garfinkel, Research Fellow at Oxford’s Future of Humanity Institute, supports the continued expansion of AI safety as a field and believes working on AI is among the very best ways to have a positive impact on the long-term future. But he also believes the classic AI risk arguments have been subject to insufficient scrutiny given this level of investment.

    In particular, the case for working on AI if you care about the long-term future has often been made on the basis of concern about AI accidents; it’s actually quite difficult to design systems that you can feel confident will behave the way you want them to in all circumstances.

    Nick Bostrom wrote the most fleshed out version of the argument in his book, Superintelligence. But Ben reminds us that, apart from Bostrom’s book and essays by Eliezer Yudkowsky, there's very little existing writing on existential accidents.

    Links to learn more, summary and full transcript.

    There have also been very few skeptical experts that have actually sat down and fully engaged with it, writing down point by point where they disagree or where they think the mistakes are. This means that Ben has probably scrutinised classic AI risk arguments as carefully as almost anyone else in the world.

    He thinks that most of the arguments for existential accidents often rely on fuzzy, abstract concepts like optimisation power or general intelligence or goals, and toy thought experiments. And he doesn’t think it’s clear we should take these as a strong source of evidence.

    Ben’s also concerned that these scenarios often involve massive jumps in the capabilities of a single system, but it's really not clear that we should expect such jumps or find them plausible. These toy examples also focus on the idea that because human preferences are so nuanced and so hard to state precisely, it should be quite difficult to get a machine that can understand how to obey them.

    But Ben points out that it's also the case in machine learning that we can train lots of systems to engage in behaviours that are actually quite nuanced and that we can't specify precisely. If AI systems can recognise faces from images, and fly helicopters, why don’t we think they’ll be able to understand human preferences?

    Despite these concerns, Ben is still fairly optimistic about the value of working on AI safety or governance.

    He doesn’t think that there are any slam-dunks for improving the future, and so the fact that there are at least plausible pathways for impact by working on AI safety and AI governance, in addition to it still being a very neglected area, puts it head and shoulders above most areas you might choose to work in.

    This is the second episode hosted by our Strategy Advisor Howie Lempel, and he and Ben cover, among many other things:

    • The threat of AI systems increasing the risk of permanently damaging conflict or collapse
    • The possibility of permanently locking in a positive or negative future
    • Contenders for types of advanced systems
    • What role AI should play in the effective altruism portfolio

    Get this episode by subscribing: type 80,000 Hours into your podcasting app. Or read the linked transcript.

    Producer: Keiran Harris.
    Audio mastering: Ben Cordell.
    Transcriptions: Zakee Ulhaq.


    Exibir mais Exibir menos
    2 horas e 38 minutos