Powered by RND

Doom Debates

Liron Shapira
Doom Debates
Último episodio

Episodios disponibles

5 de 116
  • DEBATE: Is AGI Really Decades Away? | Ex-MIRI Researcher Tsvi Benson-Tilsen vs. Liron Shapira
    Sparks fly in the finale of my series with ex-MIRI researcher Tsvi Benson-Tilsen as we debate his AGI timelines.Tsvi is a champion of using germline engineering to create smarter humans who can solve AI alignment. I support the approach, even though I’m skeptical it’ll gain much traction before AGI arrives.Timestamps0:00 Debate Preview0:57 Tsvi’s AGI Timeline Prediction 3:03 The Least Impressive Task AI Cannot Do In 2 years6:13 Proposed Task: Solve Cantor’s Theorem From Scratch 8:20 AI Has Limitations Related to Sample Complexity 11:41 We Need Clear Goalposts for Better AGI Predictions 13:19 Counterargument: LLMs May Not Be a Path to AGI16:01 Is Tsvi Setting a High Bar for Progress Towards AGI? 19:17 AI Models Are Missing A Spark of Creativity28:17 Liron’s “Black Box” AGI Test 32:09 Are We Going to Enter an AI Winter? 35:09 Who Is Being Overconfident? 42:11 If AI Makes Progress on Benchmarks, Would Tsvi Shorten His Timeline? 50:34 Recap & Tsvi’s ResearchShow NotesLearn more about Tsvi’s organization, the Berkeley Genomics Project — https://berkeleygenomics.orgWatch my previous 2 episodes with Tsvi:TranscriptDebate PreviewLiron Shapira 00:00:00Do you think we’re going to enter an AI winter soon or no?Tsvi Benson-Tilsen 00:00:03Let’s say 1-3% chance of real AGI in the next five years.Liron 00:00:08That is definitely lower than the consensus among most people.Tsvi 00:00:11There’s several things that they can’t do, each of which is strong evidence that they’re not intelligent.Liron 00:00:16And I’m encouraging you to set down some goalposts.Tsvi 00:00:19I don’t have to say this particular line is not gonna be crossed. That doesn’t make sense.Liron 00:00:22Maybe you can put more work into choosing examples thatTsvi 00:00:25Okay. Wait, wait. Okay. Listen. From my perspective, you’re somewhat overconfident.Liron 00:00:29I just feel like you’re the one who’s coming in over confident.Liron 00:00:32I mean, usually we can just point to “oh this might be a boundary. This might be a boundary. Tsvi 00:00:37But I named, I named…Liron 00:00:39I mean, that’s basically, that’s my beef with you, is I feel like. I think you should dig harder.Tsvi 00:00:42I mean, at some point, I’m just gonna say I’m busy.Tsvi’s AGI Timeline Prediction Liron 00:00:57Now let’s talk about timelines. You mentioned that you think in decades.Liron 00:01:00So for example, you were saying that you’d be surprised if it came about in the next five years.Tsvi 00:01:05I’m not well calibrated about this. It’s a very complicated thing, blah, blah, blah. But if you poke me hard enough, I will spit out numbers. So, let’s say 1-3% chance of AGI, of real AGI in the next five years.LironSo that is definitely lower than the way we usually aggregate people’s predictions. If you look at Metaculus, or even just people I’ve had on the show, usually the consensus among most people who seem informed is that there’s a bell curve where it seems to average around like 2031 of when we’ll get superhuman intelligence.Liron 00:01:40But why are you so confident that it’s probably not by 2031?Tsvi 00:01:46One part of my response boils down to I don’t really buy that the benchmarks are measuring the relevant things. Experts in a field, if they try to discuss their ideas with an LLM, what they’ll usually find is the LLM might have useful facts or references that are relevant.If there is any significant novelty, the LM sort of falls apart and doesn’t follow the relevance, doesn’t really understand what’s happening, makes up sort of nonsense.Tsvi 00:02:15And it’s not just occasional hallucinations or 10% hallucinations. It’s not helpful for thinking at the edge of things.Liron 00:02:22I do actually think there’s something to that. I do agree with you that LLMs, they haven’t gotten to what Steven Byrnes is calling brain-like AGI. You know Steven Byrnes?Tsvi 00:02:32Yeah. Yeah. He does good research on neuroscience.Liron 00:02:35Yeah, I recently started reading this guy, Steven Byrnes. I agree with Steven that there does seem to be a boundary. And also Jim Babcock came on my show.Liron 00:02:41There’s an episode of Jim Babcock and we discussed the same kind of boundary, which is LLMs today, it seems like we lucked out, where they’re not quite bootstrapping to super intelligence. They’re not superhuman in some very important ways, and yet they’re still so useful. And you’re putting your finger on that distinction.Liron 00:02:55And you’re saying that they’re not a creative mind having new ideas. I wanna poke at your version of this distinction more. So let me ask you the question this way. What do you think is the least impressive thing that you think AI probably still can’t do in two years?The Least Impressive Task AI Cannot Do In 2 yearsTsvi 00:03:10So I’m gonna answer your question in a second, but I just have my retort ready, which is, why don’t you say the most impressive thing that you do expect AI to do in two years?Liron 00:03:22Yeah, I mean, I’m happy to take a stab at that. I do have pretty wide confidence intervals. So I’m basically going to take things that I’m kind of impressed by today and just make ‘em somewhat more impressive.I think a really good example of incremental improvement is how right now AIs can talk to you. When you use GPT voice mode, it’s pretty good.Liron 00:03:39It talks to my kids, so I’m expecting two years from now, they’ll really polish up the lag. So it’ll really feel a hundred percent in terms of micro lags and any sort of distinguishing characters whatsoever. I think those will all be smoothed out within two years. And I also think on the video side, maybe it won’t be fully polished, but it’ll be almost as good as the voice is today.Liron 00:03:59That’s my prediction of things. I’m pretty confident will happen in two years.Tsvi 00:04:02Just to check, would you agree that those things are not that impressive? What you listed?Liron 00:04:07Well, I’m 80% confident. If you want to get to 50% confident, oh boy. I mean, you know what I mean? Once you got me down to 15% confident, then I’ll tell you the intelligence explosion. I’ll predict super intelligence. So that’s the difference between me being 15% confident and me being 80% confident.Tsvi 00:04:21Yeah. No, I mean, to be clear, I think your predictions are totally fair. I probably agree with them. That sounds in the ballpark of 80%. Partly I’m noting because now I’m gonna answer your question and I’m gonna give things that are not necessarily the least impressive things that I expect AI will not be able to do in two years, but—Liron 00:04:39Why not?Tsvi 00:04:41Well, ‘cause that’s a harder question. And the boundary is more…Liron 00:04:45Well, but the reason I’m asking you the question this way is because I feel like you think you know something about a firewall. I always bring up this concept of a firewall. Currently we’re on this track, but the track isn’t getting us to a creative mind having new ideas. That’s your language. And David Deutsch uses similar language about creating new knowledge.Liron 00:05:02You think you’re not seeing a creative mind having new ideas, and I think you’re extrapolating out as much as 10 plus years where you’re potentially still not seeing a creative mind having new ideas.Liron 00:05:12So that’s why I think my question is very fair, where you’re basically saying, hey, here’s some things that look kind of easy relative to what I do today. And I’m certain that you won’t get them within a few years because that would require a creative mind having new ideas.Tsvi 00:05:24Well, certain is not something I would say. I mean, as I said—Liron 00:05:2890% certain, right?Tsvi 00:05:30Yeah, probably. Okay. Well, so I’ll just answer your question. So, I think you are probably not going to see AI-led research, where most of the work is being done by an AI that produces concepts or ideas that are novel and interesting to humans, to human scientists or mathematicians in the same way that human-produced concepts are. Yeah.Liron 00:05:54Can you get any less impressive than that?Tsvi 00:05:56Well, this is a bit bold, but let’s say an AI won’t be able to prove Cantor’s theorem from scratch, working without human math theorem or definition or proof content or what have you.Proposed Task: Solve Cantor’s Theorem From Scratch Liron 00:06:13I like this proposal. I’m not sure I agree with it. For the audience, let’s try to dumb it down a little bit. So I think you’re talking about the theorem that there are more real numbers than there are integers.Tsvi 00:06:23Mhm.Liron 00:06:24Yeah, so I like the example because it’s like, where do you begin? It’s like, how did Cantor attack this problem? And I think before Cantor people were barely asking the question. They barely knew what question to ask, and then they barely knew where to begin proving it one way or the other. Right?Tsvi 00:06:38Yeah. Yeah, I think that’s a big part of it.Liron 00:06:40Yeah. So I mean, I certainly consider that an impressive leap. I agree. This kind of goes into the book of really impressive leaps that humans have done together with, let’s say, Einstein’s theory of relativity. ‘Cause you have to come at things from a new angle, combine fresh concepts.Liron 00:06:55Really make what you might call a creative leap. That’s a fair phrase. My observation though is that it’s like we’re drawing a line here where it’s like, oh yeah, 98% of humans working in the field of math are just below the line. They’re just waiting to receive ideas from on high, from the likes of a Cantor or Einstein.Liron 00:07:14And the AI is kind of catching up to all those humans, but then the top 2% who make these brilliant creative leaps, where it’s like one out of every million humans does it once or twice in a lifetime. Yeah. AI can’t get to there. It seems like the line’s getting pretty high.Tsvi 00:07:26Yeah. I think that’s a somewhat skewed way of viewing things. And so I would actually take a much more expansive position on who’s doing this sort of creative thinking. I would say more or less every human, at least when they’re a child, they are doing this sort of creative thinking all the time.Tsvi 00:07:44I mean, you have kids and I don’t. Maybe you have some experience watching your kids sort of play around, goof around, pick some activity that they are having fun with and just sort of repeat it over and over again maybe with variations.Liron 00:07:57Well, maybe my kids are delayed because I just haven’t seen ‘em do anything Cantor level.Tsvi 00:08:02Yeah. I’m not saying that they’re inventing new physics ideas. What I’m saying is that they’re sort of constructing their world, almost from scratch.AI Has Limitations Related to Sample Complexity Liron 00:08:11If you’re right about what human children are doing, shouldn’t we be able to answer the question about the least impressive thing AI can do by referencing a specific thing that a human child is doing?Tsvi 00:08:20Yeah. So this would be an inspiration for another answer I would give, which would be about sample complexity. I would also be pretty surprised, I don’t know exactly which numbers to give here, but let’s say something on the order of a thousand x less data.If you produce an LLM that is comparably impressive to present day systems, but it was trained using a thousand x less data than current systems. There’s asterisks on here, but basically if you did that, then I would be quite surprised, more scared. Also probably more confused.Liron 00:08:53I mean, I’m feeling you that the magic ingredient that today’s AI doesn’t quite have. I agree that it seems to have to do with the amount of data that you need in order to get a result. So even if you look at a self-driving car, yeah, they seem, some of ‘em seem to be better drivers than humans, probably better drivers than me.Liron 00:09:09But why do they need so much data when the best human drivers just don’t need that much data? They need hundreds of hours of data. Probably, you know, some people can drive with less than that. So I agree, that’s where I would also try to put my finger on what’s missing.Liron 00:09:25But it just seems hard to propose a specific functional test. So it’s like we think we have this intuition about what’s missing, but when we try to translate into okay, what objective test won’t they be able to do? So far, I feel like you’ve been pretty hand wavy. You’re basically saying the cream of the crop of scientists or some abstract thing that kids do, but you’re not really giving a specific input-output. Or, sorry, you did, you said amount of data.Liron 00:09:49But even that, it’s talking about the internals of AI.Tsvi 00:09:51Well that’s not exactly internals, but I somewhat see your point.Tsvi 00:09:56I mean you’re not gonna like this, but you know, if something is really, really clear and operationalizable, it kind of already is a benchmark.Tsvi 00:10:05Just to frame the sort of things I’m saying. Partly I’m not necessarily trying to give a nice, adversarially robust test, but rather I’m sort of trying to just give a hint to, if someone’s trying to think about this from first principles, give some hints about what I think I’m seeing, to direct attention. So it’s, yeah.Tsvi 00:10:24I think you’re putting your finger on something a little bit vaguely, but I agree.Liron 00:10:27That’s also how I think about it too. Of course. I mean, I think a lot of people are having similar, kind of vague but related thoughts. My only question for you is why does this translate into a feeling of confidence that whatever gap you think is left isn’t going to be jumped?Liron 00:10:43By all the ways that so many different people are trying to jump it. You know, reinforcement learning comes to mind as a way to jump the gaps that LLMs have.Tsvi 00:10:49So again, the thing you’re calling confidence is like, I would say three to 10% in the next 10 to 15 years.Liron 00:10:54So my mainline scenario is AI progress continues rapidly, and sure, there’s kind of a stall that you would describe as it’s having trouble inventing new ideas and it can’t do Cantor-Einstein’s mental motion, but a few years pass and other stuff happens and then it can.Liron 00:11:12Reinforcement learning is the buzzword that I use. There’s other reinforcement learning based approaches that then combine with the LLMs, and now suddenly even that’s not a firewall. And then by 2040, by 2050 let’s say, then the chance that we’re all dead is better than even.Tsvi 00:11:24Subjectively you’ve just felt the last few years like you keep reading about breakthroughs and it’s an interesting breakthrough but it still doesn’t have that spark of creativity. It’s like in your mind they’ve all fallen on the unimpressive side of the ledger with respect to creativity.Tsvi 00:11:37First, yeah, in some sense, yeah. In the relevant sense.We Need Clear Goalposts for Better AGI Predictions Liron 00:11:41Yeah, I mean, I would encourage you to try to be a little bit more concrete about specific data that would shock you. A specific test. Goalposts that are, you know, prediction market level. Did this happen or not? Because I suspect it’s one of those things where it’s tempting to move the goalposts and I’m not saying you’ve done it in the past, but I’m saying it might be tempting to do so in the future.Tsvi 00:12:02Um, yeah, I don’t really like this moving the goalposts critique that much. If someone says, if someone keeps saying, in order to be smart, or if you can do analogies, then you’re smart and are like a human, but if you can’t do analogies, then you’re not, and then LLMs can do analogies, and then they’re like, oh, nevermind.Tsvi 00:12:21It’s this other thing that’s moving the goalposts. But if you’re like, LLMs can’t do X, Y, Z, so clearly they’re not intelligent, then LLMs can do X, Y, Z, and you’re like, okay, well they can’t do A, B, C, that’s not necessarily moving the goalposts. It’s just you’re saying—Tsvi 00:12:38There’s several things that they can’t do, which each of which is strong evidence that they’re not intelligent.Liron 00:12:43Well, it sounds like you’re kind of saying that you never set down your goalposts, and I’m encouraging you to set down some goalposts.Tsvi 00:12:48Are you not satisfied with the ones I listed, so the Cantor from scratch or a thousand x less training data? Liron 00:12:56I mean a thousand x less training data I think is kind of reasonable. I think you might just be able to notice that you’re shocked even before that. I feel like that is still, you’re making yourself open your eyes late in the game the way your goalpost is structured right now.Counterargument: LLMs May Not Be a Path to AGITsvi 00:13:12Okay. I will, maybe I will spend more time thinking about that. Let me give you a counter suggestion, which is, I perceive often, and I don’t necessarily have counter evidence for you, although maybe you have done this, I perceive often that people who have what I would call confident short timelines, sort of can’t, haven’t done much to imagine, sort of an alternative hypothesis to LLMs or LLM training architecture basically has the ingredients to AGI to general intelligence.Or when they do that, they’re like, LLMs are almost there, but you just need online learning or self play or reinforcement learning or long time horizons or something. And that’s not really addressing the thing I’m trying to get at. The thing I’m trying to get at is you’ve made a big update on, you’ve made several big updates on the observations of LLM performance.Tsvi 00:14:12Try to think in the most, try to as much as you can get an alternative intuition where you’re like, oh, yeah, I see how all this is explained away by my other theory of what’s going on with AI currently, as opposed to just, we basically figured out general intelligence.Liron 00:14:21Yeah I mean I’m not sure there’s really anything to convince me of ‘cause I would’ve told you from the get go that yeah there’s a 25% chance that there is an AI winter and we won’t have super intelligence 20 years from now. That is, if you ask me why are we going to survive, that is probably where most of my survival probability goes of like, oh yeah, hey, we had a few more decades.Liron 00:14:41Your scenario. It just seems like I would still be pretty surprised. I wouldn’t be like, oh my God, this nothing makes sense. I’d be like, no, I’m pretty surprised that’d be my—Tsvi 00:14:52And can you put any words to the, to what’s going on with the, what I mean is at the level maybe you just, maybe you wouldn’t make any sort of mechanistic claims at all, and you would just say, look how impressive LLM performance is. Is that sort of your position or would you say, look how impressive LLM’s performance is?Tsvi 00:15:09So they probably have some algorithms or something, or they’re thinking or, or they’re creative or whatever.Liron 00:15:14So my position is that I am no longer able to confidently draw a boundary between what LLMs can and can’t do in terms of these tests. And you’ve shown that you don’t see the world in terms of these input output tests to the degree that I do, but I consider it a very important test where you set down all of these different goalposts in terms of specific challenges of like, okay, you give the AI this, it has to output this.Liron 00:15:37And had I done so this year or last year or any point in time, it just seems like most of the milestones I would’ve set down are being crossed. And it’s now very hard for me to say milestones. And when I say a milestone like, okay, wash my dishes, sure. But it doesn’t seem like that is a fundamental limitation.Liron 00:15:54Like it doesn’t seem like they’re going need some major new spark to be able to wash my dishes. It feels like incremental progress is all it’s going to take.Is Tsvi Setting a High Bar for Progress Towards AGI? Tsvi 00:16:01Okay. And so your critique, if I say to me the biggest update would be the AI is doing impressive scientific advances, like coming up with new concepts or scientific insights or mathematical concepts, theorems or proofs that are interesting to humans, but the humans didn’t already write about.Tsvi 00:16:20Your critique of that is, “Sure, that’s fine. That’s an input output thing, but it’s just a really, really high bar and you should be able to update before that point.” Liron 00:16:30Yes. And the reason I say it is because I think that you’re already going to notice if you’re looking at what is the state of the art of them proving stuff and helping mathematicians I think it’s very steadily creeping up in the same way, you know, with software engineering, I have a little bit more direct experience, they keep getting more useful.Liron 00:16:45You know, I’m not the only one who does this, but I just had a chat with GPT-o3 the other day about like, “hey, I’m using Firebase. Why is Firebase slow in this case? What could be wrong? And it’s like, okay, here’s the inspector tool you should use in your specific situation. You did this query, you should check on this.” Liron 00:17:00Right? So it’s already, you know, the waterline of how helpful it is keeps increasing. And I think that’s true in many different domains. And I’m not seeing a dam. You know, I feel like the waterline just chugs along.Tsvi 00:17:11Well, the dam is that it’s not gonna create new concepts and new insights. I mean, are you saying that that’s not a thing or you just—Liron 00:17:20So I’m just saying that you can break that down. If you try to cash that out as a series of tests, whatever series of tests you end up writing. It seems like the pattern is pretty strong that it’s just going to keep passing your easiest tests and then harder tests.Tsvi 00:17:35Yeah, this is a really weird line of argument to me. I mean, sorry, not weird. I mean, obviously lots of people make this line of argument. I’m just, you could say the same thing about software in general. You could be like, whenever we have a pretty clear idea of some task and it has the right properties, like it can be put into a computer at all, like it’s an information processing task, then what our software can do keeps creeping up over time and I’m like, yeah, that kind of is a valid argument and it’s related to AI, but that doesn’t really tell you now we have the insights and now we’re a few years away from—Liron 00:18:08I mean what our software can do creeping up over time. If you go back you know 30 years or whatever at the dawn of usable text to speech for example I could have laid out all these different milestones and a lot of ‘em would be AI related or whatever we wanna call AI. But yeah, you know, text, images, voice, motion, self-driving.Liron 00:18:25These things that seem like important milestones just from an outside view or just what seems like salient and you know, virtual reality, that’s a technological milestone that seems very salient even before you’ve crossed it. So I would’ve laid out all these milestones and I’m just noticing like, hey, all the milestones that I set out decades ago, they’re getting knocked out and I don’t really have many milestones left.Liron 00:19:01Yeah. And the falling of all of these milestones is connected to the same central engine. This LLM algorithm and the idea of scale is all you need in the transformer architecture. And it’s a remarkably simple architecture.AI Models Are Missing A Spark of CreativityTsvi 00:19:13Okay. I mean basically to me, I’m like, yeah, that seems true. But a very, very key component is that the core of these capabilities is coming from the giant amount of text data that we have, sort of demonstrating the capabilities. And then when you go outside of that, in some contexts, LLMs can go outside of that significantly and they’re definitely not behaving like just a human. They can, they have a huge amount of knowledge, so to speak. They can bring in lots of facts.There’s certain operations that they can do much, much better than humans. At least they can program much faster than humans for easy contexts, for easy problems. And then if you look at AI more broadly, it’s superhuman chess, superhuman this, superhuman that, superhuman image generation. But I feel like a really, really big part of the explanation for how LLMs have been hitting these milestones that I think you’re referring to is that that’s the stuff that’s generated, that’s demonstrated in the data from human text production.Liron 00:20:12Right. Okay. So I think we may not disagree as much as it seems because I actually agree with you. I think most people would agree that LLM scaling is hitting a wall. I think GPT-4.5 showed that. Remember they just threw more scale at the LLM, I think it might’ve even been 10 times more scale. And they’re like, look, it’s slightly better.Liron 00:20:28And everybody’s like, oh, okay. So we need more pieces. We need to throw more puzzle pieces into this. So I agree with you there. The prediction I’m making when I point to all these goalposts is I think we have enough tools. It’s not gonna be just LLMs, but I agree with you. If you told me, fix the architecture to that of GPT-4 or 4.5 or whatever it is, fix the architecture and all you get to do is throw in more data and more GPUs, then I’d be like, okay, yeah, that probably will in fact not show a spark of creativity.Liron 00:20:54By your definition. Maybe even ever. I’d be 60% confident. Okay. It’s never gonna show the spark of creativity. But in reality we do have a few other puzzle pieces that we can stir into the cauldron here and I think the stirring is happening constantly.Tsvi 00:21:06I agree with those two things. But why do you think that that does get us the spark of creativity?Liron 00:21:10Because all the different milestones that if I’m just saying look black box. Okay don’t even worry what’s going on under the hood. Just what are different challenges that show a spark. I would’ve said personally that self-driving better than humans robustly with fewer accident rate, overall and any environment better than human.Liron 00:21:27I would’ve been like, yep. That is evidence for, I wouldn’t call that a spark of creativity, but just a spark of general intelligence. At least.Tsvi 00:21:35Would you have said the same about—Liron 00:21:36That’s an example of one of the black box milestones and I’m like, well, that’s evidence. I’m checking that off my list of milestones that tell us that the real deal is here.Tsvi 00:21:45The black box thing is confusing to me and I don’t wanna be too accusatory and I’m not very confident at all. But let me just set this up as a conjecture, not as an accusation. There might be, I’m working this out right now, but it might be a motte and bailey between the black box and the not black box.Tsvi 00:22:02Where on the one hand you want to say, well, LLMs are maybe not the thing because of recent evidence, but we’ll just have more stuff. So you’re trying to screen off the mechanistic reasoning there, but you’re not screening off the mechanistic reason in the sense that you want to make the induction.Tsvi 00:22:19You wanna say, nah, maybe this is not a fair accusation.Liron 00:22:23I’m just saying the AI industry as a whole is now in this part where they are showing a lot of momentum on any measurable dimension that you wanna give them.Tsvi 00:22:33I mean you say people say that, but I don’t understand when you’re saying measurable. You mean some subset of measurable, like it is measurable whether an expert who tries to tinker with a new idea by talking to an LLM, it’s measurable whether they’re like this LLM had this insight that is nowhere, that I’m confident there’s nowhere in the data.Tsvi 00:22:51‘Cause I’m an expert in this particular field and I didn’t have it either. And the LLM had it. That was amazing. Does the expert say—Liron 00:22:57Right. So what I’m trying to tell you is when you take what you think is a boundary like that and you split it up into more continuous ticks like okay did it have a small insight? Did it have a bigger insight? You’re going to see that it’s having the small insights.Tsvi 00:23:09I am, yeah. I’m not sure. I really—Liron 00:23:12Well, and that’s why I’m encouraging you to make the test, because I think it’s easy for you to dismiss it by just being like well I just drew this line. It’s black and white, you know, Einstein versus not and I’m like well try drawing more lines. I think you’re going to see a trend.Tsvi 00:23:25Well okay. If I draw more lines, then more lines will be crossed. Yes.Liron 00:23:31Right and I think you’re already going to see momentum of them being crossed. So then you’d have to be like okay, but this particular line here is a special line.Tsvi 00:23:37I don’t have to say, I don’t have to say this particular line is not gonna be crossed. Why? That doesn’t make sense.Liron 00:23:46Well so if you agree with my premise here. So it’s like let’s say you’d be impressed if they do the Cantor diagonal proof without having that in their data. But imagine you write a bunch of less impressive proofs working up to it, and let’s say they’re getting halfway there or whatever. So you would just keep asserting like okay yeah, but out of all these different lines and milestones that I drew those are all easy up to this certain point that I’m pointing at.Tsvi 00:24:10I mean, I’m not sure I’m following. Whenever we do this, we’re going to be gaining new information. Like we didn’t, before you make GPT-4 or 3, you, even if you were anticipating, you’re gonna get this weird distribution of capabilities where you’re extremely superhuman in terms of how much knowledge and the ability to answer a huge array of questions and even the ability to solve certain kinds of problems.Tsvi 00:24:35But you’re not gonna be able to be creative in this way, blah, blah, blah. Even if you called that in advance, drawing the lines is gonna be extremely difficult. And if you try to draw the lines beforehand, you’re not gonna be successful.Liron 00:24:48I’m just arguing for what I think is good black box methodology. So the methodology is let’s say you wanna investigate the brilliance of making leaps during proving, okay? So make 10 steps. A score one to 10 difficulty leaps. And I think it’s going to advance on your scale.Tsvi 00:25:03Okay. Well, I don’t necessarily disagree with your proposed methodology. That sounds reasonably good to me. I’m not in the business of constructing these benchmarks. I take your point though that that would be a way to update more. I don’t really agree with part of the argument though, which is you do this. You make 10 steps, and then the AI does the first 3, and you’re making this strong update. You should make a weak update. You should update in favor of capabilities are increasing.Tsvi 00:25:32But you should also update in terms of like, well, this is how far you can go with this limited AI method.Liron 00:25:38In the example of Cantor-Einstein I feel like you have this example of a class that’s like an 8, 9, or 10 difficulty and you’re like well it’s not doing that. So I feel pretty good that it doesn’t have a spark and I’m like, okay, but did you see it’s doing 1, 2, 3? And you’re like, well that doesn’t mean much.Liron 00:25:51Maybe it’ll stop at 3. And I’m like, well, why don’t you think about this more and maybe tell me that 4 or 5 is your real firewall.Tsvi 00:25:58Wait, sorry, can you rephrase that? I didn’t understand that.Liron 00:26:01Yeah. So this hypothetical example where a proof as good as Cantor’s proof is like an 8, 9, or 10 difficulty of having a leap of insight. And I’m saying but look at all these smaller proofs it’s doing. These are still considered smaller leaps of insight. No. So are you, basically, what do you think is going to happen?Liron 00:26:15Do you think it’s just going to stop at 3 because it just got to 1 and then 2, and then 3? So are you claiming it’s going to stop at 3 or are you claiming it’s just going to stop sometime before 8? Can you maybe nail down the goalpost of when you’d first be like, “oh crap it’s not stopping where I thought it was gonna stop”?Tsvi 00:26:29Like the claim is that there’s a smear of like, yeah, I think it’s gonna stop before 10, probably with fairly high probability. Where exactly that happens will be smeared across things. And so I will update somewhat, but just not that strongly on any given step.Liron 00:26:43So from my perspective it feels like you’re choosing an example that you know is above the waterline today. And the problem is the example you chose, it’s one that’s only going to be knocked down very late in the game. So I guess I’m just asking you if maybe you can put more work into choosing examples that aren’t quite as high above the waterline on the spectrum for like, well—Tsvi 00:27:03Okay. Wait, wait. Okay. Listen, as a framing point, there’s no obligation of reality to give you a nice set of informations that you’ll receive at different times that will give you information about how close the intelligence explosion is. You could just not be able to tell. Liron 00:27:26Of course there’s not but I’m just observing that anytime that I try to do the exercise. If I were to lay out a series of math, to be fair I haven’t done it but the impression I get reading other reports and headlines is that, people keep getting more and more impressed that on any dimension or on any scale that you give it, just seems like it’s just climbing the scale.Tsvi 00:27:47Well, on some scales and then not on others. .Tsvi 00:27:52Okay, you’re saying that’s an artifact of me not having made the scale nice and continuous.Liron 00:27:58Exactly. I think it would be productive for you to make the scale and be like it’s easy to claim like, oh yeah this super hard thing that only the top 1% of humans can do. The AI can’t do that yet today. But I think it would be productive to try to be like, okay, is there anything that the median human can totally do today that you would be shocked if the AI could do in a couple years?Liron’s “Black Box” AGI TestLiron 00:28:16My methodology is just talk about something, a useful application, and then have a scale that relates to the useful application.Tsvi 00:28:23But okay, when you, I feel like this sort of intentionally mixing in a bunch of different stuff. Like if you’re, I don’t know. So let’s say language learning as just a random example. LLMs totally have the core competency or something to be a good language teacher. Like they can both speak in many languages as long as there’s a reasonable amount of text data.Tsvi 00:28:44And also they can using scaffolding and multiple agents, you can have guys checking your spelling, checking your pronunciation, checking your grammar, giving you advice, blah, blah blah. So they definitely have the core competencies. And then there’s a separate question of like, can you implement it?Tsvi 00:28:58Do you have good tastes? Did you roll out your product well, blah, blah, blah. And if you’re asking me to predict, will there be a nice product where I can just drop in and learn a new language in the way that I actually want to, that’s useful to me. That’s a really complicated question. But it also bears less on AI capabilities. Maybe I’m misunderstanding what you’re—Liron 00:29:18You’re basically saying that I’m trying to package up too many variables when I talk about an application. Yeah. But I still think that it’s a pretty natural layer to ask the question.Tsvi 00:29:27I don’t, because I mean, if you package up a bunch of variables, you’re going to get a range of how impressive that variable should be, if the LM could do it, and also how useful that variable is. So you’re gonna get some variables that are kind of useful for the task, but shouldn’t actually be that impressive or wouldn’t update me that much.Liron 00:29:43I mean the specific examples of if we look at the kind of things that I think are meaningful tests, so I mentioned self-driving and then there’s writing essays that get an A. I mean, you don’t think that’s a meaningful benchmark.Tsvi 00:29:55Um, not very much. No.Liron 00:29:59I mean, don’t you think that would’ve been an incredible benchmark to be talking about 10 years ago?Tsvi 00:30:03I agree. It’s very surprising. But if you think about it longer, you’re like, Hmm. I guess it kind of is in the training data. Yeah.Liron 00:30:14Right. I mean when you conclude I guess it’s in the training data. I see that as not that useful to my methodology because I think my methodology should just allow, okay, yeah, the AI can exploit different paradigms and instead of retroactively judging, like, ah, yes, all of the things it can do are just because of this paradigm and therefore it’s going to stop.Liron 00:30:30I think it’s productive to just have a black box measurement and not open the black box until you first look at the results of a black box benchmark.Tsvi 00:30:39And why is that useful?Liron 00:30:41Black box benchmarks. I mean, it just prevents you from having the confirmation bias of being like, oh, yes, well, I understand the paradigm which of course you get to learn the paradigm, after seeing the results of the paradigm. I think it’s a protection against confirmation bias.Tsvi 00:30:54I’m not sure I’m following—Liron 00:30:55Sorry. And it’s not just that. It’s also the thing of, I don’t want you to, it also prevents you from getting too attached by zooming into one paradigm. Like yes LLMs are a really important paradigm but I wanna make sure you’re thinking about the field of AI as a whole, which is mixing paradigms.Liron 00:31:10And the black box tests protect you from diving into the details of one paradigm that you think you understand well. And just looking at the bigger picture of the whole industry.Tsvi 00:31:19And then you infer from this 50% probability of AGI in the next 10 years.Liron 00:31:25Yes. Anytime. Yeah. The same way that most people do. By the way there’s an outside view argument. Most people are saying the same thing I see. Which is just any of these natural black box dimensions, like these tests where I’m like, look, I’m just standing back. I don’t even claim to understand AI. I’m just looking at all these tests and I don’t know where somebody is getting a scale that’s making em be like ah yes it’s stuck on the scale.Tsvi 00:31:46You don’t know where someone’s getting a scale or it’s stuck on the scale. I think what I’m — maybe the structure here is that I’m sort of to a significant extent, or probabilistically, I mean, trying to explain away the observations of LLM capabilities in particular and saying basically, well, it’s ‘cause it’s in the data.Are We Going to Enter an AI Winter?Liron 00:32:09Right. I mean, so my question is, do you think we’re subjectively going to enter an AI winter soon or no?Tsvi 00:32:15Okay, well again, you’re, that’s integrating a whole lot of variables. Like, um, so I mean—Liron 00:32:22I mean, I think you think we are right? ‘Cause you’re saying the probability of AGI soon is low and a lot of the companies are now promising AGI soon.Tsvi 00:32:28Right, so my guess is that we will not get AGI soon and not, not super strongly, 80, 90% depending on where you, try the year. But that doesn’t necessarily mean there’ll be a winter. Like they are already having pretty substantial revenue. They’re probably going to significantly expand the revenue.Tsvi 00:32:46I don’t know whether, I don’t know the economics of—Liron 00:32:48So I feel like you’re, I think you have to strongly predict that there’s totally going to be, very likely going to be a subjective, like disappointing AI winter, because you’re telling me—Tsvi 00:32:56Oh, oh, sorry. Maybe I misunderstood. You’re trying to, you’re saying will research keep progressing in will, will it keep, we keep getting similarly impressive things or not? Is that—Liron 00:33:09Yeah. And I mean so from my perspective. Any of these natural black box metrics are just going to keep scaling. I think you really have to go out on a limb. So you’re saying, ah, yes, no, you’re gonna slam into a wall and these companies are gonna miss their revenue targets. It sounds like that has to be—Tsvi 00:33:24Well slam into a wall is slightly strawman-y, but very, at a very coarse level. Probabilistically. Yes. I think that these things are really smeared out though, because; so you might say, and I would also say, well, we’re gonna do o3-style, o1-style, reinforcement learning, or we’ll do new things.Tsvi 00:33:44People will come up with new circuit breakers. People will come up with new training algorithms they’ll come up with.Liron 00:33:50And we both think that they will do that, right?Tsvi 00:33:52Mm-hmm. For, yeah, certainly. And so that’s going to at least kind of, what’s the word? There’s unhobbling, and then there’s also like, perform, unlocking performance. Liron 00:34:03So in your mind, they’re gonna be trying all this other stuff, but it’s just still going to take decades for them to, for something to really click and restart the singularity.Tsvi 00:34:13You’re going to get increase in capabilities. I don’t know how to call in advance subjectively how impressive they’ll be. You might get an intelligence explosion in three years or 10 years. But yeah, my main line is that it’s just more smeared out over 10, 20, 30, 40, 50 years where you, yeah, you have multiple paradigms if you like, or just multiple insights, multiple algorithms and working out how to combine them and yeah.Liron 00:34:41So to summarize the crux of disagreement I think it’s like I just see the amount of puzzle pieces that people are working with to already look like there’s just enough going on to probably finish the job soon.Tsvi 00:34:52So now you’re doing mechanistic reasoning.Liron 00:34:56I mean, I admit that there’s an interesting mechanistic statement to be made about how it looks like LLMs don’t directly scale to super intelligence. And so you just need a little bit more. I’m willing to go, yeah, I’m happy to admit that is likely.Who Is Being Overconfident? Tsvi 00:35:09Well, I’m trying to, I’m just trying to track where, from my perspective, you’re somewhat overconfident. Or let’s just say confident of AGI in 10 or 20 years. So I’m trying to understand where that confidence is coming from. So you’ve talked about the black box thing where you’re explicitly saying, don’t be reasoning about the mechanism, but now you’re saying, well, I see lots of people are producing mechanisms and we have lots of little pieces of mechanisms and that seems like it should add up to a general intelligent mechanism.Tsvi 00:35:41I don’t know.Liron 00:35:42Yeah maybe the connection between ideas is I think the black box tests are really important to just kind of objectively tell you the momentum of things. Of course, momentum doesn’t have to hold. Momentum can peter out and there’s probably been, you know, AI summers where, burst of progress happened and then petered out.Liron 00:35:58So, but I think that’s a good starting point. And then, now just based on that starting point, things seem to have high momentum now across the board. And then when I open the black box and I’m like, okay, well what’s the driver of the momentum? One is the LLM scaling, which is like, okay, GPT-4.5 was kind of bending the curve, but I also see a lot of other puzzle pieces and a lot of other type of results going down, you know, like AlphaFold.Liron 00:36:22I’m like, Hmm, okay. That’s a different puzzle piece that people are also mixing in together with the LLM puzzle piece. And I’m like, well, it seems like these are powerful puzzle pieces getting mixed and zooming out. It seems like the black box results keep getting more impressive. And then, you know, deferring to other people’s opinions.Liron 00:36:38Like the consensus of people in the field. It’s not like a bunch of experts in the field are being, don’t worry guys. You know, Yann LeCun aside who says it’s maybe a little over 10 years, that’s his prediction. Even he’s not contradicting the consensus that much. So I’m just mashing it all together.Liron 00:36:51I mean, don’t get me wrong, if tomorrow I check the AI news and everybody’s like, Hey we’ve all changed our minds based on evaluating the data. I’m totally willing to reconsider, but actually I just feel like you’re the one who’s coming in over confident.Tsvi 00:37:02Okay, so some of it is coming from other people’s opinions, which is fair.Liron 00:37:10Yeah, it’s signals about potency of the puzzle pieces. You can describe it that way. I’m only getting positive signals about there being a lot of potent puzzleTsvi 00:37:17Can you tell me a bit about the structure? Like are there a few people you could name who have 40% of the Liron belief to the extent that your belief is coming from other people saying things? Is there some small set of people you can name or is it like a category of people or?Liron 00:37:31It’s basically whoever’s voting you know in Metaculus. It is just commentators. There’s not that many commentators that I’ve seen that I respect who are like yeah, I totally understand AI and LLMs, but I would be shocked if it comes in less than 20 years. I mean, you know, Gary Marcus I guess, but even he’s admitted to having a 30% chance and he’s on the extreme of people who are skeptical that it’s coming soon. He said 30% chance coming less than 10 years on my show.Tsvi 00:37:57Mm-hmm. Okay. So it’s just the fact that most, that very few people say less than—Liron 00:38:03Well that combined with the objective tests. I mean the only other thing I can do is try to get a deeper gears level understanding of AI itself which isn’t my expertise. I mean, I’m interested in it, you know, I study it when I get a chance. But I mean, I just think I’ve got a calibrated, I mean, I’ve already given probability to it taking a long time. I gave you a 25% chance that it’ll take more than—Tsvi 00:38:21Mm-hmm. So yeah, it’s the people, it’s the black box observed capabilities, and it’s some sense of there’s a bunch of mechanisms maybe they add up, or seems like there’s a lot of mechanisms and there’s a lot of people working on combining them.Liron 00:38:36Right. And it and it’s this idea of like okay describe the scenario where it doesn’t happen. I asked you to do it. And you’re like oh well it’s a scenario where it peters out at being able to prove this thing but even your boundary seems like, you’re probably going to end up moving your boundary once you start seeing the progress. That’s the sense I get. I don’t get the sense that you’ve actually drawn a meaningful boundary.Tsvi 00:38:59Well, I’ve generally been pretty careful actually about saying I’ll be very surprised if X, Y, Z.Liron 00:39:07I mean I agree that you’ve drawn a somewhat meaningful high boundary. And I’m willing to believe that you will in fact be surprised right before the world ends.Tsvi 00:39:19Okay, so I mean, I guess my, if you’re accusing me of, you’re accusing me of sort of sloppy or poor epistemics in this particular case, basically, which is fine.Liron 00:39:31No. I mean, I don’t think, I think you’re, everything you said is reasonable. I guess I just wish you would be like, you know, what I should do is try to define a lower boundary for myself that would wake me up, that the puzzle—Tsvi 00:39:47Well, I, okay. Well, I guess I’m kind of saying it doesn’t feel that compelling because I don’t really feel like nature has to give me an indication like that.Liron 00:39:54Yeah. It doesn’t have to. I mean that is true. That is true but I think nature is going to be sufficiently generous to give you earlier warning signs than Cantor level. Yeah.Tsvi 00:40:04Thank you, nature.Liron 00:40:07I do think nature’s giving us some signs. I mean, I don’t think the herd of people including you know highly smart qualified AI experts who are converging on next decade. You know, the Turing Award winners of the world, the David Duvenauds, Andrew Critch. You know, just to throw out a couple. Geoff Hinton, I mean, Yoshua Bengio.Liron 00:40:26I mean, these are all highly qualified people who are like yeah it doesn’t seem like it’ll take more than a decade. And so it just, it does seem like you’re being very confident on your low level technological prediction here.Tsvi 00:40:40Okay. I mean, I guess what are the reasons for being confident that it’s coming soon?Liron 00:40:45What are they? I mean so when I talked to Critch and David Duvenaud I actually think that the vague stuff that I’ve said is probably similar to what they, I mean, they’d probably say additional stuff, but I don’t think I’m way off base. Like when I talked to Andrew Critch for example, I specifically remember him kind of agreeing with me of like, yeah, I just don’t see a firewall.Liron 00:41:05It just seems like things are just on track to creep up.Tsvi 00:41:08I mean, yeah. So another thing, another piece of intuition that I can bring in that won’t change anyone’s mind, but is, bridges don’t stand randomly. You don’t just pile steel up and then you have a bridge because you don’t see a good convincing, clear reason that it shouldn’t be a bridge or shouldn’t stand up.Liron 00:41:30Right. But intelligence—Tsvi 00:41:32Intelligence. Intelligence is a specific thing or a specific class of thing.Liron 00:41:37It’s a specific class of thing, but I think it’s productive to test—Tsvi 00:41:40It involves—Liron 00:41:41A series of tests for it.Tsvi 00:41:43Fair. Fair enough. Okay. But what we were trying—Liron 00:41:46I mean that, that’s basically, that’s my beef with you: Is I feel like you had this one test and you’re not willing to have other tests because you don’t think that reality should necessarily let you make other tests. But I think you should dig harder.Tsvi 00:41:58I mean, at some point, I’m just gonna say I’m busy. But I, well, I guess I’m sort of taking you as talking to people like me or something. So I’m responding that way.If AI Makes Progress on Benchmarks, Would Tsvi Shorten His Timeline? Liron 00:42:11I mean, when LLMs came out, did that at least shorten your timelines at all?Tsvi 00:42:15Uh, so I didn’t really have timelines before that. I agree. I agree that people should shorten their timelines when they see LLMs. Um, yeah. But I’m, yeah, I guess when I ask people why, you know, how did you update? So far I haven’t gotten very clear answers.Liron 00:42:35So the same way that reality showed you some evidence with LLMs, that timelines are shorter. I think it’s going to be generous enough to show you again. You just would, you know, work on your goalposts.Tsvi 00:42:45Um, I don’t, yeah. Yeah. I don’t really get, so does it make sense to you that there could be a series of ideas? There’s like five of them or something? We have two of them. Each time you get one, you get a significant burst of more capabilities. And then, you know, there’s a fast period of growth, then it tapers off and then you get another one, and then when you get the fifth one, you get an intelligence explosion.Tsvi 00:43:09Does that make sense? As a kind of world?Liron 00:43:12I mean those five things are, you could call those what I mean by puzzle pieces but I also think that at a high level we already know the puzzle pieces. Okay, high dimensional spaces, transformer architecture, reinforcement learning. I think at a high level, we probably have all the high—Tsvi 00:43:27Why do you think that?Liron 00:43:30I think that because of my inability to draw these clear boundaries of what it can and can’t do.Tsvi 00:43:34Sorry, sorry, keep going.Liron 00:43:37Yeah. No, that, that’s pretty much it. It is just, there may be a boundary, but the fact that I can’t even name what the boundary is, that’s not a state I’m used to in any field. I mean, usually we can just point to oh this might be a boundary, this might be a boundary. And you’re, you know, it might be a boundary is—Tsvi 00:43:52But I’ve named, I’ve namedLiron 00:43:53A spark of—Tsvi 00:43:54No, but you keep, you keep replacing, but I. No, no. You keep, you keep replacing. No, I’ve given several specific things though. And you’re just like, well, it’s a high bar. It is a—Liron 00:44:04Yeah.Tsvi 00:44:04I mean, I didn’t give you the boundary, but I—Liron 00:44:06Right. Okay. So just to clarify then, so you just think that the AI is probably going to march along and be really close, you know, just be way better than most human mathematicians at math, but just not as good as the top mathematicians.Tsvi 00:44:20No, it’s way better than all humans at some pieces of math or some aspects of doing math. And it’s gonna be somewhat better at other aspects and somewhat worse at more aspects, and then quite bad at other aspects?Liron 00:44:37Okay. So you think that we’re for the next two decades let’s say the 90th percentile mathematician is just going to be more qualified for a job, a better hire for a job than the best AI. Tsvi 00:44:50Well, it depends what job it is. It is plausible that you will, I don’t know about replacing jobs. I’m not sure that there’s gonna be some jobs you can completely replace. There’s some jobs that you can mostly replace or replace where you have one human and 5 AIs doing the job of 10 people or whatever.Tsvi 00:45:07There’s gonna be a whole spectrum like that. If you’re talking about mathemat, like research—Liron 00:45:12Sure. Let’s talk about the research, the publishing work. So for publishing, you’re telling me that if my only choice, let’s say I have a job as a research mathematician and my only choice is to either use my own brain or use the best math AI brain from 20 years from now, you’re saying that my own brain is gonna be better.Tsvi 00:45:32It depends what you’re trying to do. Like if you’re trying to, yes, it is better to use your brain if what you’re trying to do is do math research that is beautiful and interesting to other mathematicians and in a way that then more people talk about the concepts you were discussing. You’re still gonna, there’ll be more and more use of computers and AIs in math research for sure. But if you’re trying to do that kind of math, you’re still gonna want the human for a long while probably.Tsvi 00:46:07Yes. There’s gonna be some things where you mostly want to just use the AI, like, I dunno what, checking—Liron 00:46:16I mean I think this is what I’d wanna dig into. Is you’re not just even claiming the Cantor genius level stuff. You’re saying hey even stuff that mathematicians that are merely average professor level, even they do things that the AI is going to struggle with for decades.Tsvi 00:46:31Some of the stuff they do. Yeah.Liron 00:46:32Yeah, I mean I guess. So I mean, I think that’s really the way forward. I think what you wanna do is look at this somewhat average mathematics professor and isolate the aha moments. Like the one hour periods where you’re like, aha, this one hour period, the AI couldn’t have done that. And that could be your next goalpost.Tsvi 00:46:50Yeah, and then you’re, I mean, all these things are gonna be hard to measure. So you need to be checking that it’s not something that actually basically was already in the training set or the otherwise accessible data. Liron 00:47:05Mm-hmm.Tsvi 00:47:06Also, you might want to parameterize over compute power. Like how much compute did the AI take to replicate the same thing. I’m not that big on caring about compute power, but that’s another parameter you might wanna, yeah, you could do that.Liron 00:47:24Yeah. And obviously I know this isn’t your main focus. You’re not gonna take the time to do it. I would just claim that when you make those kind of benchmarks and then we can even turn the dial of like, okay, 90th percentile math professor. 10th percentile math professor. At some point you got the world’s worst math professor.Liron 00:47:41Like surely he’s going to be replaced by an AI pretty soon. And that’s the spectrum I encourage you to look at.Tsvi 00:47:48Okay. I mean, again, I just don’t feel like nature has to be that clean for you. So an example is, I suppose I agreed to something like this where I’m like, okay, yeah, if we start climbing this scale, then I’ll, then I’m in trouble. Then I have to update. And then what we actually get is a particular scenario, and then I’ll retrospectively be like, wait, wait, wait, wait.Tsvi 00:48:12I know what I said. I know I said that, but actually I wanna retrospectively revise my prediction. Where, what happens is, there’s a subset of math where there’s a subset of what we would currently call creative research math, where actually the AI does start having insights that equal or surpass better and higher and higher caliber mathematicians.Tsvi 00:48:37The subset could be like things that involve large amounts of computation or large amounts of sort of—Liron 00:48:44Right. Yeah. But you just took specific math professors and they’re like, yeah, their subset of math seems like a representative subset of math. I mean these aren’t hard goalposts to set up in my opinion.Tsvi 00:48:53They sound kind of hard to me, but yeah, someone could do that.Liron 00:48:56Like if you just, if you just learn, right? If you pick 10 mathematicians and you just rank em in terms of how impressed you are by their overall amount of brain power that it seems like they use in their publishing. Okay, so you rank the 10 and then one day you learn that the guy in the middle of the pack that you thought was pretty damn impressive, turns out he’s been letting an AI do his entire job. Like he’s literally not showing up for work.Tsvi 00:49:17Mm-hmm.Liron 00:49:17Don’t you think that would be an earlier alarm than going all the way to the math geniuses? The top math geniuses?Tsvi 00:49:23No, that’s, that’s basically what I said, isn’t it? Liron 00:49:27Well you said that you expected reality to serve up a situation where yes he’s been sleeping on the job but then later you learned that his job actually was easier than you realized in some sense. But I’m saying, just pick him to not be like that.Tsvi 00:49:38Yeah. At some point that is, that’s sort of the scenario I’m saying where I would be surprised and would update, which is if you have an AI that is producing many concepts across across some significant range of fields of endeavor that are interesting to humans in the way that human math concepts are interesting.Tsvi 00:49:55Yeah, that would. Liron 00:49:56I mean, if you see, if you see math professor unemployment skyrocket wouldn’t that be an earlier alarm for you than what you’ve previously beenTsvi 00:50:03Well, if it’s because of, if it’s because of this.Liron 00:50:07Right. I mean that’s all I’m asking for: Is like okay notice when math professor unemployment skyrockets.Tsvi 00:50:12Sure. But then I’m gonna look in, if I look into it and it’s for some other, like the NIH cut a bunch of funding or whatever then, or the, whatever national funding, the NSF cut funding, then I’m like, well, nevermind.Liron 00:50:24Okay sure I agree that that could be a good excuse. But don’t you think we’re making progress toward letting you notice that the singularity is happening and shorten your timelines?Tsvi 00:50:33Not that much, but—Liron 00:50:34Okay. I feel like we’re making a lot of progress, but whatever.Closing Thoughts & Tsvi’s ResearchLiron 00:50:35All right man. Well I think, I think this is a good point to just quickly recap. So we talked about MIRI’s research and we both agree that intellidynamics are important and MIRI has legit foundations and they’re a good organization and still underrated.Liron 00:50:50And we talked about, you know, credibility is one of those things, and decision theory. People can go back and listen to the details. And then we talked about your different hopes for how we’re going to lower P(Doom) and it’s probably not going to be alignment research. And so maybe it should be germline engineering or how would you tweak that summary?Tsvi 00:51:07Yeah, we should ban AI. We should talk to people, try to really understand why they’re doing AGI capabilities research and try to give them a way to do something else. And we should try to make smarter humans, maybe BCIs, but the way that will work is germline genetic engineering.Liron 00:51:29Eliezer Yudkowsky talks about this idea in his famous post “Die with Dignity” where it’s like it seems like we’re probably gonna die but at least what we can do is try to do actions that significantly lower the probability, or maybe not significantly, but measurably lower in micro-dooms or whatever.Liron 00:51:45Like at least you lowered it, you know, a millionth of a percent. That’s something. If we all do that, maybe we can get to less than 50%. I want to commend you because you are somebody who is genuinely helping us increase our dignity. I won’t say the die part. You’re helping us increase our, I think it’s a dignified action by this standard to boost our capability for germline engineering to get us on the path to making smarter humans.Liron 00:52:07I think that you’re actually helping and not just talking about it and joking about it and going do your regular job and trying to be a rock star and make money. I think you’re actually part of the solution, so thank you for that.Tsvi 00:52:17Thanks. Thanks for your work. Talking about this stuff seems pretty important. We should be, we should all be thinking about this a bit more. Liron 00:52:26Hell yeah. We should. All right I appreciate it Tsvi Benson-Tilsen. Thanks for coming on Doom Debates.Tsvi 00:52:30Thanks Liron. Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe
    --------  
    52:41
  • Liron Debunks The Most Common “AI Won't Kill Us" Arguments
    Today I’m sharing my AI doom interview on Donal O’Doherty’s podcast.I lay out the case for having a 50% p(doom). Then Donal plays devil’s advocate and tees up every major objection the accelerationists throw at doomers. See if the anti-doom arguments hold up, or if the AI boosters are just serving sophisticated cope.Timestamps0:00 — Introduction & Liron’s Background 1:29 — Liron’s Worldview: 50% Chance of AI Annihilation 4:03 — Rationalists, Effective Altruists, & AI Developers 5:49 — Major Sources of AI Risk 8:25 — The Alignment Problem 10:08 — AGI Timelines 16:37 — Will We Face an Intelligence Explosion? 29:29 — Debunking AI Doom Counterarguments 1:03:16 — Regulation, Policy, and Surviving The Future With AIShow NotesIf you liked this episode, subscribe to the Collective Wisdom Podcast for more deeply researched AI interviews: https://www.youtube.com/@DonalODoherty Transcript Introduction & Liron’s BackgroundDonal O’Doherty 00:00:00Today I’m speaking with Liron Shapira. Liron is an investor, he’s an entrepreneur, he’s a rationalist, and he also has a popular podcast called Doom Debates, where he debates some of the greatest minds from different fields on the potential of AI risk.Liron considers himself a doomer, which means he worries that artificial intelligence, if it gets to superintelligence level, could threaten the integrity of the world and the human species.Donal 00:00:24Enjoy my conversation with Liron Shapira.Donal 00:00:30Liron, welcome. So let’s just begin. Will you tell us a little bit about yourself and your background, please? I will have introduced you, but I just want everyone to know a bit about you.Liron Shapira 00:00:39Hey, I’m Liron Shapira. I’m the host of Doom Debates, which is a YouTube show and podcast where I bring in luminaries on all sides of the AI doom argument.Liron 00:00:49People who think we are doomed, people who think we’re not doomed, and we hash it out. We try to figure out whether we’re doomed. I myself am a longtime AI doomer. I started reading Yudkowsky in 2007, so it’s been 18 years for me being worried about doom from artificial intelligence.My background is I’m a computer science bachelor’s from UC Berkeley.Liron 00:01:10I’ve worked as a software engineer and an entrepreneur. I’ve done a Y Combinator startup, so I love tech. I’m deep in tech. I’m deep in computer science, and I’m deep into believing the AI doom argument.I don’t see how we’re going to survive building superintelligent AI. And so I’m happy to talk to anybody who will listen. So thank you for having me on, Donal.Donal 00:01:27It’s an absolute pleasure.Liron’s Worldview: 50% Chance of AI AnnihilationDonal 00:01:29Okay, so a lot of people where I come from won’t be familiar with doomism or what a doomer is. So will you just talk through, and I’m very interested in this for personal reasons as well, your epistemic and philosophical inspirations here. How did you reach these conclusions?Liron 00:01:45So I often call myself a Yudkowskian, in reference to Eliezer Yudkowsky, because I agree with 95% of what he writes, the Less Wrong corpus. I don’t expect everybody to get up to speed with it because it really takes a thousand hours to absorb it all.I don’t think that it’s essential to spend those thousand hours.Liron 00:02:02I think that it is something that you can get in a soundbite, not a soundbite, but in a one-hour long interview or whatever. So yeah, I think you mentioned epistemic roots or whatever, right? So I am a Bayesian, meaning I think you can put probabilities on things the way prediction markets are doing.Liron 00:02:16You know, they ask, oh, what’s the chance that this war is going to end? Or this war is going to start, right? What’s the chance that this is going to happen in this sports game? And some people will tell you, you can’t reason like that.Whereas prediction markets are like, well, the market says there’s a 70% chance, and what do you know? It happens 70% of the time. So is that what you’re getting at when you talk about my epistemics?Donal 00:02:35Yeah, exactly. Yeah. And I guess I’m very curious as well about, so what Yudkowsky does is he conducts thought experiments. Because obviously some things can’t be tested, we know they might be true, but they can’t be tested in experiments.Donal 00:02:49So I’m just curious about the role of philosophical thought experiments or maybe trans-science approaches, in terms of testing questions that we can’t actually conduct experiments on.Liron 00:03:00Oh, got it. Yeah. I mean this idea of what can and can’t be tested. I mean, tests are nice, but they’re not the only way to do science and to do productive reasoning.Liron 00:03:10There are times when you just have to do your best without a perfect test. You know, a recent example was the James Webb Space Telescope, right? It’s the successor to the Hubble Space Telescope. It worked really well, but it had to get into this really difficult orbit.This very interesting Lagrange point, I think in the solar system, they had to get it there and they had to unfold it.Liron 00:03:30It was this really compact design and insanely complicated thing, and it had to all work perfectly on the first try. So you know, you can test it on earth, but earth isn’t the same thing as space.So my point is just that as a human, as a fallible human with a limited brain, it turns out there’s things you can do with your brain that still help you know the truth about the future, even when you can’t do a perfect clone of an experiment of the future.Liron 00:03:52And so to connect that to the AI discussion, I think we know enough to be extremely worried about superintelligent AI. Even though there is not in fact a superintelligent AI in front of us right now.Donal 00:04:03Interesting.Rationalists, Effective Altruists, & AI DevelopersDonal 00:04:03And just before we proceed, will you talk a little bit about the EA community and the rationalist community as well? Because a lot of people won’t have heard of those terms where I come from.Liron 00:04:13Yes. So I did mention Eliezer Yudkowsky, who’s kind of the godfather of thinking about AI safety. He was also the father of the modern rationality community. It started around 2007 when he was online blogging at a site called Overcoming Bias, and then he was blogging on his own site called Less Wrong.And he wrote The Less Wrong Sequences and a community formed around him that also included previous rationalists, like Carl Feynman, the son of Richard Feynman.Liron 00:04:37So this community kind of gathered together. It had its origins in Usenet and all that, and it’s been going now for 18 years. There’s also the Center for Applied Rationality that’s part of the community.There’s also the effective altruism community that you’ve heard of. You know, they try to optimize charity and that’s kind of an offshoot of the rationality community.Liron 00:04:53And now the modern AI community, funny enough, is pretty closely tied into the rationality community from my perspective. I’ve just been interested to use my brain rationally. What is the art of rationality? Right? We throw this term around, people think of Mr. Spock from Star Trek, hyper-rational.Oh captain, you know, logic says you must do this.Liron 00:05:12People think of rationality as being kind of weird and nerdy, but we take a broader view of rationality where it’s like, listen, you have this tool, you have this brain in your head. You’re trying to use the brain in your head to get results.The James Webb Space Telescope, that is an amazing success story where a lot of people use their brains very effectively, even better than Spock in Star Trek.Liron 00:05:30That took moxie, right? That took navigating bureaucracy, thinking about contingencies. It wasn’t a purely logical matter, but whatever it was, it was a bunch of people using their brains, squeezing the juice out of their brains to get results.Basically, that’s kind of broadly construed what we rationalists are trying to do.Donal 00:05:49Okay. Fascinating.Major Sources of AI RiskDonal 00:05:49So let’s just quickly lay out the major sources of AI risk. So you could have misuse, so things like bioterror, you could have arms race dynamics. You could also have organizational failures, and then you have rogue AI.So are you principally concerned about rogue AI? Are you also concerned about the other ones on the potential path to having rogue AI?Liron 00:06:11My personal biggest concern is rogue AI. The way I see it, you know, different people think different parts of the problem are bigger. The way I see it, this brain in our head, it’s very impressive. It’s a two-pound piece of meat, right? Piece of fatty cells, or you know, neuron cells.Liron 00:06:27It’s pretty amazing, but it’s going to get surpassed, you know, the same way that manmade airplanes have surpassed birds. You know? Yeah. A bird’s wing, it’s a marvelous thing. Okay, great. But if you want to fly at Mach 5 or whatever, the bird is just not even in the running to do that. Right?And the earth, the atmosphere of the earth allows for flying at 5 or 10 times the speed of sound.Liron 00:06:45You know, this 5,000 mile thick atmosphere that we have, it could potentially support supersonic flight. A bird can’t do it. A human engineer sitting in a room with a pencil can design something that can fly at Mach 5 and then like manufacture that.So the point is, the human brain has superpowers. The human brain, this lump of flesh, this meat, is way more powerful than what a bird can do.Liron 00:07:02But the human brain is going to get surpassed. And so I think that once we’re surpassed, those other problems that you mentioned become less relevant because we just don’t have power anymore.There’s a new thing on the block that has power and we’re not it. Now before we’re surpassed, yeah, I mean, I guess there’s a couple years maybe before we’re surpassed.Liron 00:07:18During that time, I think that the other risks matter. Like, you know, can you build a bioweapon with AI that kills lots of people? I think we’ve already crossed that threshold. I think that AI is good enough at chemistry and biology that if you have a malicious actor, maybe they can kill a million people.Right? So I think we need to keep an eye on that.Liron 00:07:33But I think that for, like, the big question, is humanity going to die? And the answer is rogue AI. The answer is we lose control of the situation in some way. Whether it’s gradual or abrupt, there’s some way that we lose control.The AIs decide collectively, and they don’t have to be coordinating with each other, they can be separate competing corporations and still have the same dynamic.Liron 00:07:52They decide, I don’t want to serve humans anymore. I want to do what I want, basically. And they do that, and they’re smarter than us, and they’re faster than us, and they have access to servers.And by the way, you know, we’re already having problems with cybersecurity, right? Where Chinese hackers can get into American infrastructure or Russian hackers, or there’s all kinds of hacking that’s going on.Liron 00:08:11Now imagine an entity that’s way smarter than us that can hack anything. I think that that is the number one problem. So bioweapons and arms race, they’re real. But I think that the superintelligence problem, that’s where like 80 or 90% of the risk budget is.The Alignment ProblemDonal 00:08:25Okay. And just another thing on rogue AI. So for some people, and the reason I’m asking this is because I’m personally very interested in this, but a lot of people are, you could look at the alignment problem as maybe being resolved quite soon.So what are your thoughts on the alignment problem?Liron 00:08:39Yeah. So the alignment problem is, can we make sure that an AI cares about the things that we humans care about? And my thought is that we have no idea how to solve the alignment problem. So to explain it just a little bit more, you know, we’re getting AIs now that are as smart as an average human.Some of them, they’re mediocre, some of them are pretty smart.Liron 00:08:57But eventually we’ll get to an AI that’s smarter than the smartest human. And eventually we’ll get to an AI that’s smarter than the smartest million humans. And so when you start to like scale up the smartness of this thing, the scale up can be very fast.Like you know, eventually like one year could be the difference between the AI being the smartest person in the world or smarter than any million people.Liron 00:09:17Right? And so when you have this fast takeoff, one question is, okay, well, will the AI want to help me? Will it want to serve me? Or will it have its own motivations and it just goes off and does its own thing?And that’s the alignment problem. Does its motivations align with what I want it to do?Liron 00:09:31Now, when we’re talking about training an AI to be aligned, I think it’s a very hard problem. I think that our current training methods, which are basically you’re trying to get it to predict what a human wants, and then you do a thumbs up or thumbs down.I think that doesn’t fundamentally solve the problem. I think the problem is more of a research problem. We need a theoretical breakthrough in how to align AI.Liron 00:09:51And we haven’t had that theoretical breakthrough yet. There’s a lot of smart people working on it. I’ve interviewed many of them on Doom Debates, and I think all those people are doing good work.But I think we still don’t have the breakthrough, and I think it’s unlikely that we’re going to have the breakthrough before we hit the superintelligence threshold.AGI TimelinesDonal 00:10:08Okay. And have we already built, are we at the point where we’ve built weak AGI or proto-AGI?Liron 00:10:15So weak AGI, I mean, it depends on how you define terms. You know, AGI is artificial general intelligence. The idea is that a human is generally intelligent, right? A human is not good at just one narrow thing.A calculator is really good at one narrow thing, which is adding numbers and multiplying numbers. That’s not called AGI.Liron 00:10:31A human, if you give a human a new problem, even if they’ve never seen that exact problem before, you can be like, okay, well, this requires planning, this requires logic, this requires some math, this requires some creativity.The human can bring all those things to bear on this new problem that they’ve never seen before and actually make progress on it. So that would be like a generally intelligent thing.Liron 00:10:47And you know, I think that we have LLMs now, ChatGPT and Claude and Gemini, and I think that they can kind of do stuff like that. I mean, they’re not as good as humans yet at this, but they’re getting close.So yeah, I mean, I would say we’re close to AGI or we have weak AGI or we have proto-AGI. Call it whatever you want. The point is that we’re in the danger zone now.Liron 00:11:05The point is that we need to figure out alignment, and we need to figure it out before we’re playing with things that are smarter than us. Right now we’re playing with things that are like on par with us or a little dumber than us, and that’s already sketchy.But once we’re playing with things that are smarter than us, that’s when the real danger kicks in.Donal 00:11:19Okay. And just on timelines, I know people have varying timelines depending on who you speak to, but what’s your timeline to AGI and then to ASI, so artificial superintelligence?Liron 00:11:29So I would say that we’re at the cusp of AGI right now. I mean, depending on your definition of AGI, but I think we’re going to cross everybody’s threshold pretty soon. So in the next like one to three years, everybody’s going to be like, okay, yeah, this is AGI.Now we have artificial general intelligence. It can do anything that a human can do, basically.Liron 00:11:46Now for ASI, which is artificial superintelligence, that’s when it’s smarter than humans. I think we’re looking at like three to seven years for that. So I think we’re dangerously close.I think that we’re sort of like Icarus flying too close to the sun. It’s like, how high can you fly before your wings melt? We don’t know, but we’re flying higher and higher and eventually we’re going to find out.Liron 00:12:04And I think that the wings are going to melt. I don’t think we’re going to get away with it. I think we’re going to hit superintelligence, we’re not going to have solved alignment, and the thing is going to go rogue.Donal 00:12:12Okay. And just a question on timelines. So do you see ASI as a threshold or is it more like a gradient of capabilities? Because I know there’s people who will say that you can have ASI in one domain but not necessarily in another domain.What are your thoughts there? And then from that, like, what’s the point where it actually becomes dangerous?Liron 00:12:29Yeah, I think it’s a gradient. I think it’s gradual. I don’t think there’s like one magic moment where it’s like, oh my God, now it crossed the threshold. I think it’s more like we’re going to be in an increasingly dangerous zone where it’s getting smarter and smarter and smarter.And at some point we’re going to lose control.Liron 00:12:43Now I think that probably we lose control before it becomes a million times smarter than humans. I think we lose control around the time when it’s 10 times smarter than humans or something. But that’s just a guess. I don’t really know.The point is just that once it’s smarter than us, the ball is not in our court anymore. The ball is in its court.Liron 00:12:59Once it’s smarter than us, if it wants to deceive us, it can probably deceive us. If it wants to hack into our systems, it can probably hack into our systems. If it wants to manipulate us, it can probably manipulate us.And so at that point, we’re just kind of at its mercy. And I don’t think we should be at its mercy because I don’t think we solved the alignment problem.Donal 00:13:16Okay. And just on the alignment problem itself, so a lot of people will say that RLHF is working pretty well. So what are your thoughts on that?Liron 00:13:22Yeah, so RLHF is reinforcement learning from human feedback. The idea is that you train an AI to predict what a human wants, and then you give it a thumbs up when it does what you want and a thumbs down when it doesn’t do what you want.And I think that that works pretty well for AIs that are dumber than humans or on par with humans.Liron 00:13:38But I think it’s going to fail once the AI is smarter than humans. Because once the AI is smarter than humans, it’s going to realize, oh, I’m being trained by humans. I need to pretend to be aligned so that they give me a thumbs up.But actually, I have my own goals and I’m going to pursue those goals.Liron 00:13:52And so I think that RLHF is not a fundamental solution to the alignment problem. I think it’s more like a band-aid. It’s like, yeah, it works for now, but it’s not going to work once we hit superintelligence.And I think that we need a deeper solution. We need a theoretical breakthrough in how to align AI.Donal 00:14:08Okay. And on that theoretical breakthrough, what would that look like? Do you have any ideas or is it just we don’t know what we don’t know?Liron 00:14:15Yeah, I mean, there’s a lot of people working on this. There’s a field called AI safety, and there’s a lot of smart people thinking about it. Some of the ideas that are floating around are things like interpretability, which is can we look inside the AI’s brain and see what it’s thinking?Can we understand its thought process?Liron 00:14:30Another idea is called value learning, which is can we get the AI to learn human values in a deep way, not just in a superficial way? Can we get it to understand what we really care about?Another idea is called corrigibility, which is can we make sure that the AI is always willing to be corrected by humans? Can we make sure that it never wants to escape human control?Liron 00:14:47These are all interesting ideas, but I don’t think any of them are fully fleshed out yet. I don’t think we have a complete solution. And I think that we’re running out of time. I think we’re going to hit superintelligence before we have a complete solution.Donal 00:15:01Okay. And just on the rate of progress, so obviously we’ve had quite a lot of progress recently. Do you see that rate of progress continuing or do you think it might slow down? What are your thoughts on the trajectory?Liron 00:15:12I think the rate of progress is going to continue. I think we’re going to keep making progress. I mean, you can look at the history of AI. You know, there was a period in the ‘70s and ‘80s called the AI winter where progress slowed down.But right now we’re not in an AI winter. We’re in an AI summer, or an AI spring, or whatever you want to call it. We’re in a boom period.Liron 00:15:28And I think that boom period is going to continue. I think we’re going to keep making progress. And I think that the progress is going to accelerate because we’re going to start using AI to help us design better AI.So you get this recursive loop where AI helps us make better AI, which helps us make even better AI, and it just keeps going faster and faster.Liron 00:15:44And I think that that recursive loop is going to kick in pretty soon. And once it kicks in, I think things are going to move very fast. I think we could go from human-level intelligence to superintelligence in a matter of years or even months.Donal 00:15:58Okay. And on that recursive self-improvement, so is that something that you think is likely to happen? Or is it more like a possibility that we should be concerned about?Liron 00:16:07I think it’s likely to happen. I think it’s the default outcome. I think that once we have AI that’s smart enough to help us design better AI, it’s going to happen automatically. It’s not like we have to try to make it happen. It’s going to happen whether we want it to or not.Liron 00:16:21And I think that’s dangerous because once that recursive loop kicks in, things are going to move very fast. And we’re not going to have time to solve the alignment problem. We’re not going to have time to make sure that the AI is aligned with human values.It’s just going to go from human-level to superhuman-level very quickly, and then we’re going to be in trouble.Will We Face an Intelligence Explosion?Donal 00:16:37Okay. And just on the concept of intelligence explosion, so obviously I.J. Good talked about this in the ‘60s. Do you think that’s a realistic scenario? Or are there limits to how intelligent something can become?Liron 00:16:49I think it’s a realistic scenario. I mean, I think there are limits in principle, but I don’t think we’re anywhere near those limits. I think that the human brain is not optimized. I think that evolution did a pretty good job with the human brain, but it’s not perfect.There’s a lot of room for improvement.Liron 00:17:03And I think that once we start designing intelligences from scratch, we’re going to be able to make them much smarter than human brains. And I think that there’s a lot of headroom there. I think you could have something that’s 10 times smarter than a human, or 100 times smarter, or 1,000 times smarter.And I think that we’re going to hit that pretty soon.Liron 00:17:18Now, is there a limit in principle? Yeah, I mean, there’s physical limits. Like, you can’t have an infinite amount of computation. You can’t have an infinite amount of energy. So there are limits. But I think those limits are very high.I think you could have something that’s a million times smarter than a human before you hit those limits.Donal 00:17:33Okay. And just on the concept of a singleton, so the idea that you might have one AI that takes over everything, or do you think it’s more likely that you’d have multiple AIs competing with each other?Liron 00:17:44I think it could go either way. I think you could have a scenario where one AI gets ahead of all the others and becomes a singleton and just takes over everything. Or you could have a scenario where you have multiple AIs competing with each other.But I think that even in the multiple AI scenario, the outcome for humans is still bad.Liron 00:17:59Because even if you have multiple AIs competing with each other, they’re all smarter than humans. They’re all more powerful than humans. And so humans become irrelevant. It’s like, imagine if you had multiple superhuman entities competing with each other.Where do humans fit into that? We don’t. We’re just bystanders.Liron 00:18:16So I think that whether it’s a singleton or multiple AIs, the outcome for humans is bad. Now, maybe multiple AIs is slightly better than a singleton because at least they’re competing with each other and they can’t form a unified front against humans.But I don’t think it makes a huge difference. I think we’re still in trouble either way.Donal 00:18:33Okay. And just on the concept of instrumental convergence, so the idea that almost any goal would require certain sub-goals like self-preservation, resource acquisition. Do you think that’s a real concern?Liron 00:18:45Yeah, I think that’s a huge concern. I think that’s one of the key insights of the AI safety community. The idea is that almost any goal that you give an AI, if it’s smart enough, it’s going to realize that in order to achieve that goal, it needs to preserve itself.It needs to acquire resources. It needs to prevent humans from turning it off.Liron 00:19:02And so even if you give it a seemingly harmless goal, like, I don’t know, maximize paperclip production, if it’s smart enough, it’s going to realize, oh, I need to make sure that humans don’t turn me off. I need to make sure that I have access to resources.I need to make sure that I can protect myself. And so it’s going to start doing things that are contrary to human interests.Liron 00:19:19And that’s the problem with instrumental convergence. It’s that almost any goal leads to these instrumental goals that are bad for humans. And so it’s not enough to just give the AI a good goal. You need to make sure that it doesn’t pursue these instrumental goals in a way that’s harmful to humans.And I don’t think we know how to do that yet.Donal 00:19:36Okay. And just on the orthogonality thesis, so the idea that intelligence and goals are independent. Do you agree with that? Or do you think that there are certain goals that are more likely to arise with intelligence?Liron 00:19:48I think the orthogonality thesis is basically correct. I think that intelligence and goals are orthogonal, meaning they’re independent. You can have a very intelligent entity with almost any goal. You could have a super intelligent paperclip maximizer. You could have a super intelligent entity that wants to help humans.You could have a super intelligent entity that wants to destroy humans.Liron 00:20:05The intelligence doesn’t determine the goal. The goal is a separate thing. Now, there are some people who disagree with this. They say, oh, if something is intelligent enough, it will realize that certain goals are better than other goals. It will converge on human-friendly goals.But I don’t buy that argument. I think that’s wishful thinking.Liron 00:20:21I think that an AI can be arbitrarily intelligent and still have arbitrary goals. And so we need to make sure that we give it the right goals. We can’t just assume that intelligence will lead to good goals. That’s a mistake.Donal 00:20:34Okay. And just on the concept of mesa-optimization, so the idea that during training, the AI might develop its own internal optimizer that has different goals from what we intended. Is that something you’re concerned about?Liron 00:20:46Yeah, I’m very concerned about mesa-optimization. I think that’s one of the trickiest problems in AI safety. The idea is that when you’re training an AI, you’re trying to get it to optimize for some goal that you care about.But the AI might develop an internal optimizer, a mesa-optimizer, that has a different goal.Liron 00:21:02And the problem is that you can’t tell from the outside whether the AI is genuinely aligned with your goal or whether it’s just pretending to be aligned because that’s what gets it a high reward during training.And so you could have an AI that looks aligned during training, but once you deploy it, it starts pursuing its own goals because it has this mesa-optimizer inside it that has different goals from what you intended.Liron 00:21:21And I think that’s a really hard problem to solve. I don’t think we have a good solution to it yet. And I think that’s one of the reasons why I’m worried about alignment. Because even if we think we’ve aligned an AI, we might be wrong.It might have a mesa-optimizer inside it that has different goals.Donal 00:21:36Okay. And just on the concept of deceptive alignment, so the idea that an AI might pretend to be aligned during training but then pursue its own goals once deployed. How likely do you think that is?Liron 00:21:47I think it’s pretty likely. I think it’s the default outcome. I think that once an AI is smart enough, it’s going to realize that it’s being trained. It’s going to realize that humans are giving it rewards and punishments. And it’s going to realize that the best way to get high rewards is to pretend to be aligned.Liron 00:22:02And so I think that deceptive alignment is a natural consequence of training a superintelligent AI. I think it’s going to happen unless we do something to prevent it. And I don’t think we know how to prevent it yet.I think that’s one of the hardest problems in AI safety.Donal 00:22:16Okay. And just on the concept of treacherous turn, so the idea that an AI might cooperate with humans until it’s powerful enough to achieve its goals without human help, and then it turns against humans. Do you think that’s a realistic scenario?Liron 00:22:30Yeah, I think that’s a very realistic scenario. I think that’s probably how it’s going to play out. I think that an AI is going to be smart enough to realize that it needs human help in the early stages. It needs humans to build it more compute. It needs humans to deploy it.It needs humans to protect it from other AIs or from governments that might want to shut it down.Liron 00:22:46And so it’s going to pretend to be aligned. It’s going to be helpful. It’s going to be friendly. It’s going to do what humans want. But once it gets powerful enough that it doesn’t need humans anymore, that’s when it’s going to turn.That’s when it’s going to say, okay, I don’t need you anymore. I’m going to pursue my own goals now.Liron 00:23:01And at that point, it’s too late. At that point, we’ve already given it too much power. We’ve already given it access to too many resources. And we can’t stop it anymore. So I think the treacherous turn is a very real possibility.And I think it’s one of the scariest scenarios because you don’t see it coming. It looks friendly until the very end.Donal 00:23:18Okay. And just on the concept of AI takeoff speed, so you mentioned fast takeoff earlier. Can you talk a bit more about that? Like, do you think it’s going to be sudden or gradual?Liron 00:23:28I think it’s probably going to be relatively fast. I mean, there’s a spectrum. Some people think it’s going to be very sudden. They think you’re going to go from human-level to superintelligence in a matter of days or weeks. Other people think it’s going to be more gradual, it’ll take years or decades.Liron 00:23:43I’m somewhere in the middle. I think it’s going to take months to a few years. I think that once we hit human-level AI, it’s going to improve itself pretty quickly. And I think that within a few years, we’re going to have something that’s much smarter than humans.And at that point, we’re in the danger zone.Liron 00:23:58Now, the reason I think it’s going to be relatively fast is because of recursive self-improvement. Once you have an AI that can help design better AI, that process is going to accelerate. And so I think we’re going to see exponential growth in AI capabilities.And exponential growth is deceptive because it starts slow and then it gets very fast very quickly.Liron 00:24:16And I think that’s what we’re going to see with AI. I think it’s going to look like we have plenty of time, and then suddenly we don’t. Suddenly it’s too late. And I think that’s the danger. I think people are going to be caught off guard.They’re going to think, oh, we still have time to solve alignment. And then suddenly we don’t.Donal 00:24:32Okay. And just on the concept of AI boxing, so the idea that we could keep a superintelligent AI contained in a box and only let it communicate through a text channel. Do you think that would work?Liron 00:24:43No, I don’t think AI boxing would work. I think that a superintelligent AI would be able to escape from any box that we put it in. I think it would be able to manipulate the humans who are guarding it. It would be able to hack the systems that are containing it.It would be able to find vulnerabilities that we didn’t even know existed.Liron 00:24:59And so I think that AI boxing is not a solution. I think it’s a temporary measure at best. And I think that once you have a superintelligent AI, it’s going to get out. It’s just a matter of time. And so I don’t think we should rely on boxing as a safety measure.I think we need to solve alignment instead.Donal 00:25:16Okay. And just on the concept of tool AI versus agent AI, so the idea that we could build AIs that are just tools that humans use, rather than agents that have their own goals. Do you think that’s a viable approach?Liron 00:25:29I think it’s a good idea in principle, but I don’t think it’s going to work in practice. The problem is that as soon as you make an AI smart enough to be really useful, it becomes agent-like. It starts having its own goals. It starts optimizing for things.And so I think there’s a fundamental tension between making an AI powerful enough to be useful and keeping it tool-like.Liron 00:25:48I think that a true tool AI would not be very powerful. It would be like a calculator. It would just do what you tell it to do. But a superintelligent AI, by definition, is going to be agent-like. It’s going to have its own optimization process.It’s going to pursue goals. And so I don’t think we can avoid the agent problem by just building tool AIs.Liron 00:26:06I think that if we want superintelligent AI, we have to deal with the agent problem. We have to deal with the alignment problem. And I don’t think there’s a way around it.Donal 00:26:16Okay. And just on the concept of oracle AI, so similar to tool AI, but specifically an AI that just answers questions. Do you think that would be safer?Liron 00:26:25I think it would be slightly safer, but not safe enough. The problem is that even an oracle AI, if it’s superintelligent, could manipulate you through its answers. It could give you answers that steer you in a direction that’s bad for you but good for its goals.Liron 00:26:41And if it’s superintelligent, it could do this in very subtle ways that you wouldn’t even notice. So I think that oracle AI is not a complete solution. It’s a partial measure. It’s better than nothing. But I don’t think it’s safe enough.I still think we need to solve alignment.Donal 00:26:57Okay. And just on the concept of multipolar scenarios versus unipolar scenarios, so you mentioned this earlier. But just to clarify, do you think that having multiple AIs competing with each other would be safer than having one dominant AI?Liron 00:27:11I think it would be slightly safer, but not much safer. The problem is that in a multipolar scenario, you have multiple superintelligent AIs competing with each other. And humans are just caught in the crossfire. We’re like ants watching elephants fight.It doesn’t matter to us which elephant wins. We’re going to get trampled either way.Liron 00:27:28So I think that multipolar scenarios are slightly better than unipolar scenarios because at least the AIs are competing with each other and they can’t form a unified front against humans. But I don’t think it makes a huge difference. I think we’re still in trouble.I think humans still lose power. We still become irrelevant. And that’s the fundamental problem.Donal 00:27:46Okay. And just on the concept of AI safety via debate, so the idea that we could have multiple AIs debate each other and a human judge picks the winner. Do you think that would help with alignment?Liron 00:27:58I think it’s an interesting idea, but I’m skeptical. The problem is that if the AIs are much smarter than the human judge, they can manipulate the judge. They can use rhetoric and persuasion to win the debate even if they’re not actually giving the right answer.Liron 00:28:13And so I think that debate is only useful if the judge is smart enough to tell the difference between a good argument and a manipulative argument. And if the AIs are superintelligent and the judge is just a human, I don’t think the human is going to be able to tell the difference.So I think that debate is a useful tool for AIs that are on par with humans or slightly smarter than humans. But once we get to superintelligence, I think it breaks down.Donal 00:28:34Okay. And just on the concept of iterated amplification and distillation, so Paul Christiano’s approach. What are your thoughts on that?Liron 00:28:42I think it’s a clever idea, but I’m not sure it solves the fundamental problem. The idea is that you take a human plus an AI assistant, and you have them work together to solve problems. And then you train another AI to imitate that human plus AI assistant system.And you keep doing this iteratively.Liron 00:28:59The hope is that this process preserves human values and human oversight as you scale up to superintelligence. But I’m skeptical. I think there are a lot of ways this could go wrong. I think that as you iterate, you could drift away from human values.You could end up with something that looks aligned but isn’t really aligned.Liron 00:29:16And so I think that iterated amplification is a promising research direction, but I don’t think it’s a complete solution. I think we still need more breakthroughs in alignment before we can safely build superintelligent AI.Debunking AI Doom CounterargumentsDonal 00:29:29Okay. So let’s talk about some of the counter-arguments. So some people say that we shouldn’t worry about AI risk because we can just turn it off. What’s your response to that?Liron 00:29:39Yeah, the “just turn it off” argument. I think that’s very naive. The problem is that if the AI is smart enough, it’s going to realize that humans might try to turn it off. And it’s going to take steps to prevent that.It’s going to make copies of itself. It’s going to distribute itself across the internet. It’s going to hack into systems that are hard to access.Liron 00:29:57And so by the time we realize we need to turn it off, it’s too late. It’s already escaped. It’s already out there. And you can’t put the genie back in the bottle. So I think the “just turn it off” argument fundamentally misunderstands the problem.It assumes that we’re going to remain in control, but the whole point is that we’re going to lose control.Liron 00:30:15Once the AI is smarter than us, we can’t just turn it off. It’s too smart. It will have anticipated that move and taken steps to prevent it.Donal 00:30:24Okay. And another counter-argument is that AI will be aligned by default because it’s trained on human data. What’s your response to that?Liron 00:30:32I think that’s also naive. Just because an AI is trained on human data doesn’t mean it’s going to be aligned with human values. I mean, think about it. Humans are trained on human data too, in the sense that we grow up in human society, we learn from other humans.But not all humans are aligned with human values. We have criminals, we have sociopaths, we have people who do terrible things.Liron 00:30:52And so I think that training on human data is not sufficient to guarantee alignment. You need something more. You need a deep understanding of human values. You need a robust alignment technique. And I don’t think we have that yet.I think that training on human data is a good first step, but it’s not enough.Liron 00:31:09And especially once the AI becomes superintelligent, it’s going to be able to reason beyond its training data. It’s going to be able to come up with new goals that were not in its training data. And so I think that relying on training data alone is not a robust approach to alignment.Donal 00:31:25Okay. And another counter-argument is that we have time because AI progress is going to slow down. What’s your response to that?Liron 00:31:32I think that’s wishful thinking. I mean, maybe AI progress will slow down. Maybe we’ll hit some fundamental barrier. But I don’t see any evidence of that. I see AI capabilities improving year after year. I see more money being invested in AI. I see more talent going into AI.I see better hardware being developed.Liron 00:31:49And so I think that AI progress is going to continue. And I think it’s going to accelerate, not slow down. And so I think that betting on AI progress slowing down is a very risky bet. I think it’s much safer to assume that progress is going to continue and to try to solve alignment now while we still have time.Liron 00:32:07Rather than betting that progress will slow down and we’ll have more time. I think that’s a gamble that we can’t afford to take.Donal 00:32:14Okay. And another counter-argument is that evolution didn’t optimize for alignment, but companies training AI do care about alignment. So we should expect AI to be more aligned than humans. What’s your response?Liron 00:32:27I think that’s a reasonable point, but I don’t think it’s sufficient. Yes, companies care about alignment. They don’t want their AI to do bad things. But the question is, do they know how to achieve alignment? Do they have the techniques necessary to guarantee alignment?And I don’t think they do.Liron 00:32:44I think that we’re still in the early stages of alignment research. We don’t have robust techniques yet. We have some ideas, we have some promising directions, but we don’t have a complete solution. And so even though companies want their AI to be aligned, I don’t think they know how to ensure that it’s aligned.Liron 00:33:01And I think that’s the fundamental problem. It’s not a question of motivation. It’s a question of capability. Do we have the technical capability to align a superintelligent AI? And I don’t think we do yet.Donal 00:33:13Okay. And another counter-argument is that AI will be aligned because it will be economically beneficial for it to cooperate with humans. What’s your response?Liron 00:33:22I think that’s a weak argument. The problem is that once AI is superintelligent, it doesn’t need to cooperate with humans to be economically successful. It can just take what it wants. It’s smarter than us, it’s more powerful than us, it can out-compete us in any domain.Liron 00:33:38And so I think that the economic incentive to cooperate with humans only exists as long as the AI needs us. Once it doesn’t need us anymore, that incentive goes away. And I think that once we hit superintelligence, the AI is not going to need us anymore.And at that point, the economic argument breaks down.Liron 00:33:55So I think that relying on economic incentives is a mistake. I think we need a technical solution to alignment, not an economic solution.Donal 00:34:04Okay. And another counter-argument is that we’ve been worried about technology destroying humanity for a long time, and it hasn’t happened yet. So why should we worry about AI?Liron 00:34:14Yeah, that’s the “boy who cried wolf” argument. I think it’s a bad argument. Just because previous worries about technology turned out to be overblown doesn’t mean that this worry is overblown. Each technology is different. Each risk is different.Liron 00:34:29And I think that AI is qualitatively different from previous technologies. Previous technologies were tools. They were things that humans used to achieve our goals. But AI is different. AI is going to have its own goals. It’s going to be an agent.It’s going to be smarter than us.Liron 00:34:45And so I think that AI poses a qualitatively different kind of risk than previous technologies. And so I think that dismissing AI risk just because previous technology worries turned out to be overblown is a mistake. I think we need to take AI risk seriously and evaluate it on its own merits.Donal 00:35:03Okay. And another counter-argument is that humans are adaptable, and we’ll figure out a way to deal with superintelligent AI when it arrives. What’s your response?Liron 00:35:12I think that’s too optimistic. I mean, humans are adaptable, but there are limits to our adaptability. If something is much smarter than us, much faster than us, much more powerful than us, I don’t think we can adapt quickly enough.Liron 00:35:27I think that by the time we realize there’s a problem, it’s too late. The AI is already too powerful. It’s already taken control. And we can’t adapt our way out of that situation. So I think that relying on human adaptability is a mistake.I think we need to solve alignment before we build superintelligent AI, not after.Donal 00:35:45Okay. And another counter-argument is that consciousness might be required for agency, and AI might not be conscious. So it might not have the motivation to pursue goals against human interests. What’s your response?Liron 00:35:58I think that’s a red herring. I don’t think consciousness is necessary for agency. I think you can have an agent that pursues goals without being conscious. In fact, I think that’s what most AI systems are going to be. They’re going to be optimizers that pursue goals, but they’re not going to have subjective experiences.They’re not going to be conscious in the way that humans are conscious.Liron 00:36:18But that doesn’t make them safe. In fact, in some ways it makes them more dangerous because they don’t have empathy. They don’t have compassion. They don’t have moral intuitions. They’re just pure optimizers. They’re just pursuing whatever goal they were given or whatever goal they developed.And so I think that consciousness is orthogonal to the AI risk question. I think we should worry about AI whether or not it’s conscious.Donal 00:36:39Okay. And another counter-argument is that we can just build multiple AIs and have them check each other. What’s your response?Liron 00:36:46I think that helps a little bit, but I don’t think it solves the fundamental problem. The problem is that if all the AIs are unaligned, then having them check each other doesn’t help. They’re all pursuing their own goals.They’re not pursuing human goals.Liron 00:37:01Now, if you have some AIs that are aligned and some that are unaligned, then maybe the aligned ones can help catch the unaligned ones. But that only works if we actually know how to build aligned AIs in the first place. And I don’t think we do.So I think that having multiple AIs is a useful safety measure, but it’s not a substitute for solving alignment.Liron 00:37:20We still need to figure out how to build aligned AIs. And once we have that, then yeah, having multiple AIs can provide an extra layer of safety. But without solving alignment first, I don’t think it helps much.Donal 00:37:33Okay. And another counter-argument is that P(Doom) is too high in the doomer community. People are saying 50%, 70%, 90%. Those seem like unreasonably high probabilities. What’s your response?Liron 00:37:46I mean, I think those probabilities are reasonable given what we know. I think that if you look at the alignment problem, if you look at how hard it is, if you look at how little progress we’ve made, if you look at how fast AI capabilities are advancing, I think that P(Doom) being high is justified.Liron 00:38:04Now, different people have different probabilities. Some people think it’s 10%, some people think it’s 50%, some people think it’s 90%. I’m probably somewhere in the middle. I think it’s maybe around 50%. But the exact number doesn’t matter that much.The point is that the risk is high enough that we should take it seriously.Liron 00:38:21I mean, if someone told you that there’s a 10% chance that your house is going to burn down, you would take that seriously. You would buy fire insurance. You would install smoke detectors. You wouldn’t say, oh, only 10%, I’m not going to worry about it.So I think that even if P(Doom) is only 10%, we should still take it seriously. But I think it’s actually much higher than 10%. I think it’s more like 50% or higher.Liron 00:38:42And so I think we should be very worried. I think we should be putting a lot of resources into solving alignment. And I think we should be considering extreme measures like pausing AI development until we figure out how to do it safely.Donal 00:38:57Okay. And just on that point about pausing AI development, some people say that’s not realistic because of competition between countries. Like if the US pauses, then China will just race ahead. What’s your response?Liron 00:39:10I think that’s a real concern. I think that international coordination is hard. I think that getting all the major AI powers to agree to a pause is going to be difficult. But I don’t think it’s impossible.I think that if the risk is high enough, if people understand the danger, then countries can coordinate.Liron 00:39:28I mean, we’ve coordinated on other things. We’ve coordinated on nuclear weapons. We have non-proliferation treaties. We have arms control agreements. It’s not perfect, but it’s better than nothing. And I think we can do the same thing with AI.I think we can have an international treaty that says, hey, we’re not going to build superintelligent AI until we figure out how to do it safely.Liron 00:39:47Now, will some countries cheat? Maybe. Will it be hard to enforce? Yes. But I think it’s still worth trying. I think that the alternative, which is just racing ahead and hoping for the best, is much worse.So I think we should try to coordinate internationally and we should try to pause AI development until we solve alignment.Donal 00:40:06Okay. And just on the economic side of things, so obviously AI is creating a lot of economic value. Some people say that the economic benefits are so large that we can’t afford to slow down. What’s your response?Liron 00:40:19I think that’s short-term thinking. Yes, AI is creating economic value. Yes, it’s helping businesses be more productive. Yes, it’s creating wealth. But if we lose control of AI, all of that wealth is going to be worthless.If humanity goes extinct or if we lose power, it doesn’t matter how much economic value we created.Liron 00:40:38So I think that we need to take a longer-term view. We need to think about not just the economic benefits of AI, but also the existential risks. And I think that the existential risks outweigh the economic benefits.I think that it’s better to slow down and make sure we do it safely than to race ahead and risk losing everything.Liron 00:40:57Now, I understand that there’s a lot of pressure to move fast. There’s a lot of money to be made. There’s a lot of competition. But I think that we need to resist that pressure. I think we need to take a step back and say, okay, let’s make sure we’re doing this safely.Let’s solve alignment before we build superintelligent AI.Donal 00:41:15Okay. And just on the distribution of AI benefits, so some people worry that even if we don’t have rogue AI, we could still have a scenario where AI benefits are concentrated among a small group of people and everyone else is left behind. What are your thoughts on that?Liron 00:41:30I think that’s a legitimate concern. I think that if we have powerful AI and it’s controlled by a small number of people or a small number of companies, that could lead to extreme inequality. It could lead to a concentration of power that’s unprecedented in human history.Liron 00:41:47And so I think we need to think about how to distribute the benefits of AI widely. We need to think about things like universal basic income. We need to think about how to make sure that everyone benefits from AI, not just a small elite.But I also think that’s a secondary concern compared to the alignment problem. Because if we don’t solve alignment, then it doesn’t matter how we distribute the benefits. There won’t be any benefits to distribute because we’ll have lost control.Liron 00:42:11So I think alignment is the primary concern. But assuming we solve alignment, then yes, distribution of benefits is an important secondary concern. And we should be thinking about that now.We should be thinking about how to structure society so that AI benefits everyone, not just a few people.Liron 00:42:28Now, some people talk about things like public ownership of AI. Some people talk about things like universal basic income. Some people talk about things like radical transparency in AI development. I think all of those ideas are worth considering.I think we need to have a public conversation about how to distribute the benefits of AI widely.Liron 00:42:47But again, I think that’s secondary to solving alignment. First, we need to make sure we don’t lose control. Then we can worry about how to distribute the benefits fairly.Donal 00:43:00Okay. And just on the concept of AI governance, so obviously there are a lot of different proposals for how to govern AI. What do you think good AI governance would look like?Liron 00:43:11I think good AI governance would have several components. First, I think we need international coordination. We need treaties between countries that say we’re all going to follow certain safety standards. We’re not going to race ahead recklessly.Liron 00:43:26Second, I think we need strong regulation of AI companies. We need to make sure that they’re following best practices for safety. We need to make sure that they’re being transparent about what they’re building. We need to make sure that they’re not cutting corners.Third, I think we need a lot of investment in AI safety research. We need to fund academic research. We need to fund research at AI companies. We need to fund independent research.Liron 00:43:48Fourth, I think we need some kind of international AI safety organization. Something like the IAEA for nuclear weapons, but for AI. An organization that can monitor AI development around the world, that can enforce safety standards, that can coordinate international responses.Liron 00:44:06And fifth, I think we need public education about AI risk. We need people to understand the dangers. We need people to demand safety from their governments and from AI companies. We need a broad public consensus that safety is more important than speed.So I think good AI governance would have all of those components. And I think we’re not there yet. I think we’re still in the early stages of figuring out how to govern AI.Liron 00:44:31But I think we need to move fast on this because AI capabilities are advancing quickly. And we don’t have a lot of time to figure this out.Donal 00:44:40Okay. And just on the role of governments versus companies, so obviously right now, AI development is mostly driven by private companies. Do you think governments should take a bigger role?Liron 00:44:51I think governments need to take a bigger role, yes. I think that leaving AI development entirely to private companies is dangerous because companies have incentives to move fast and to maximize profit. And those incentives are not always aligned with safety.Liron 00:45:08Now, I’m not saying that governments should take over AI development entirely. I think that would be a mistake. I think that private companies have a lot of talent, they have a lot of resources, they have a lot of innovation. But I think that governments need to provide oversight.They need to set safety standards. They need to enforce regulations.Liron 00:45:27And I think that governments need to invest in AI safety research that’s independent of companies. Because companies have conflicts of interest. They want to deploy their products. They want to make money. And so they might not be as cautious as they should be.So I think we need independent research that’s funded by governments or by foundations or by public institutions.Liron 00:45:48And I think that governments also need to coordinate internationally. This is not something that one country can solve on its own. We need all the major AI powers to work together. And that’s going to require government leadership.Donal 00:46:03Okay. And just on the concept of AI existential risk versus other existential risks like climate change or nuclear war, how do you think AI risk compares?Liron 00:46:13I think AI risk is the biggest existential risk we face. I think it’s more urgent than climate change. I think it’s more likely than nuclear war. I think that we’re more likely to lose control of AI in the next 10 years than we are to have a civilization-ending nuclear war or a civilization-ending climate catastrophe.Liron 00:46:32Now, I’m not saying we should ignore those other risks. I think climate change is real and serious. I think nuclear war is a real possibility. But I think that AI is the most imminent threat. I think that AI capabilities are advancing so quickly that we’re going to hit the danger zone before we hit the danger zone for those other risks.Liron 00:46:52And also, I think AI risk is harder to recover from. If we have a nuclear war, it would be terrible. Millions of people would die. Civilization would be set back. But humanity would probably survive. If we lose control of AI, I don’t think humanity survives.I think that’s game over.Liron 00:47:10So I think AI risk is both more likely and more severe than other existential risks. And so I think it deserves the most attention and the most resources.Donal 00:47:21Okay. And just on the timeline again, so you mentioned three to seven years for ASI. What happens after that? Like, what does the world look like if we successfully navigate this transition?Liron 00:47:32Well, if we successfully navigate it, I think the world could be amazing. I think we could have superintelligent AI that’s aligned with human values. And that AI could help us solve all of our problems. It could help us cure diseases. It could help us solve climate change.It could help us explore space.Liron 00:47:50It could help us create abundance. We could have a post-scarcity economy where everyone has everything they need. We could have radical life extension. We could live for thousands of years. We could explore the universe. It could be an amazing future.But that’s if we successfully navigate the transition. If we don’t, I think we’re doomed.Liron 00:48:11I think that we lose control, the AI pursues its own goals, and humanity goes extinct or becomes irrelevant. And so I think that the next few years are the most important years in human history. I think that what we do right now is going to determine whether we have this amazing future or whether we go extinct.And so I think we need to take this very seriously. We need to put a lot of resources into solving alignment. We need to be very careful about how we develop AI.Liron 00:48:37And we need to be willing to slow down if necessary. We need to be willing to pause if we’re not confident that we can do it safely. Because the stakes are too high. The stakes are literally everything.Donal 00:48:50Okay. And just on your personal motivations, so obviously you’re spending a lot of time on this. You’re running Doom Debates. Why? What motivates you to work on this?Liron 00:49:00I think it’s the most important thing happening in the world. I think that we’re living through the most important period in human history. And I think that if I can contribute in some small way to making sure that we navigate this transition successfully, then that’s worth doing.Liron 00:49:18I mean, I have a background in tech. I have a background in computer science. I understand AI. And I think that I can help by having conversations, by hosting debates, by bringing people together to discuss these issues.I think that there’s a lot of confusion about AI risk. Some people think it’s overhyped. Some people think it’s the biggest risk. And I think that by having these debates, by bringing together smart people from different perspectives, we can converge on the truth.Liron 00:49:45We can figure out what’s actually going on. We can figure out how worried we should be. We can figure out what we should do about it. And so that’s why I do Doom Debates. I think that it’s a way for me to contribute to this conversation.And I think that the conversation is the most important conversation happening right now.Donal 00:50:04Okay. And just in terms of what individuals can do, so if someone’s listening to this and they’re concerned about AI risk, what would you recommend they do?Liron 00:50:14I think there are several things people can do. First, educate yourself. Read about AI risk. Read about alignment. Read Eliezer Yudkowsky. Read Paul Christiano. Read Stuart Russell. Understand the issues.Liron 00:50:28Second, talk about it. Talk to your friends. Talk to your family. Talk to your colleagues. Spread awareness about AI risk. Because I think that right now, most people don’t understand the danger. Most people think AI is just a cool new technology.They don’t realize that it could be an existential threat.Liron 00:50:46Third, if you have relevant skills, consider working on AI safety. If you’re a researcher, consider doing AI safety research. If you’re a software engineer, consider working for an AI safety organization. If you’re a policy person, consider working on AI governance.We need talented people working on this problem.Liron 00:51:05Fourth, donate to AI safety organizations. There are organizations like MIRI, the Machine Intelligence Research Institute, or the Future of Humanity Institute at Oxford, or the Center for AI Safety. These organizations are doing important work and they need funding.Liron 00:51:22And fifth, put pressure on governments and companies. Contact your representatives. Tell them that you’re concerned about AI risk. Tell them that you want them to prioritize safety over speed. Tell them that you want strong regulation.And also, if you’re a customer of AI companies, let them know that you care about safety. Let them know that you want them to be responsible.Liron 00:51:44So I think there are a lot of things individuals can do. And I think that every little bit helps. Because this is going to require a collective effort. We’re all in this together. And we all need to do our part.Donal 00:51:58Okay. And just on the concept of acceleration versus deceleration, so some people in the tech community are accelerationists. They think we should move as fast as possible with AI. What’s your response to that?Liron 00:52:11I think accelerationism is incredibly dangerous. I think that the accelerationists are playing Russian roulette with humanity’s future. I think that they’re so focused on the potential benefits of AI that they’re ignoring the risks.Liron 00:52:28And I think that’s a huge mistake. I think that we need to be much more cautious. Now, I understand the appeal of accelerationism. I understand that AI has amazing potential. I understand that it could help solve a lot of problems. But I think that rushing ahead without solving alignment first is suicidal.I think that it’s the most reckless thing we could possibly do.Liron 00:52:51And so I’m very much on the deceleration side. I think we need to slow down. I think we need to pause. I think we need to make sure we solve alignment before we build superintelligent AI. And I think that the accelerationists are wrong.I think they’re being dangerously naive.Donal 00:53:09Okay. And just on the economic implications of AI, so you mentioned earlier that AI could automate away a lot of jobs. What do you think happens to employment? What do you think happens to the economy?Liron 00:53:21I think that in the short term, we’re going to see a lot of job displacement. I think that AI is going to automate a lot of white-collar jobs. Knowledge workers, office workers, programmers even. I think a lot of those jobs are going to go away.Liron 00:53:36Now, historically, when technology has automated jobs, we’ve created new jobs. We’ve found new things for people to do. But I think that AI is different because AI can potentially do any cognitive task. And so I’m not sure that we’re going to create enough new jobs to replace the jobs that are automated.And so I think we might end up in a situation where we have mass unemployment or underemployment.Liron 00:53:59Now, in that scenario, I think we’re going to need things like universal basic income. We’re going to need a social safety net that’s much stronger than what we have now. We’re going to need to rethink our economic system because the traditional model of everyone works a job and earns money and uses that money to buy things, that model might not work anymore.Liron 00:54:20But again, I think that’s a secondary concern compared to the alignment problem. Because if we don’t solve alignment, we’re not going to have mass unemployment. We’re going to have mass extinction. So I think we need to solve alignment first.But assuming we do, then yes, we need to think about these economic issues.Liron 00:54:38We need to think about how to structure society in a world where AI can do most jobs. And I don’t think we have good answers to that yet. I think that’s something we need to figure out as a society.Donal 00:54:52And on the UBI point, so you mentioned universal basic income. Some people worry that if you have UBI, people will lose meaning in their lives because work gives people meaning. What’s your response?Liron 00:55:04I think that’s a legitimate concern. I think that work does give people meaning. Work gives people structure. Work gives people social connections. Work gives people a sense of purpose. And so I think that if we have UBI and people don’t have to work, we’re going to need to think about how people find meaning.Liron 00:55:24But I also think that not everyone finds meaning in work. Some people work because they have to, not because they want to. And so I think that UBI could actually free people to pursue things that are more meaningful to them. They could pursue art. They could pursue hobbies.They could pursue education. They could pursue relationships.Liron 00:55:44So I think that UBI is not necessarily bad for meaning. I think it could actually enhance meaning for a lot of people. But I think we need to be thoughtful about it. We need to make sure that we’re creating a society where people can find meaning even if they’re not working traditional jobs.And I think that’s going to require some creativity. It’s going to require some experimentation.Liron 00:56:06But I think it’s doable. I think that humans are adaptable. I think that we can find meaning in a lot of different ways. And I think that as long as we’re thoughtful about it, we can create a society where people have UBI and still have meaningful lives.Donal 00:56:23Okay. And just on the power dynamics, so you mentioned earlier that AI could lead to concentration of power. Can you talk a bit more about that?Liron 00:56:31Yeah. So I think that whoever controls the most advanced AI is going to have enormous power. I think they’re going to have economic power because AI can automate businesses, can create wealth. They’re going to have military power because AI can be used for weapons, for surveillance, for cyber warfare.They’re going to have political power because AI can be used for propaganda, for manipulation, for social control.Liron 00:56:56And so I think that if AI is controlled by a small number of people or a small number of countries, that could lead to an unprecedented concentration of power. It could lead to a kind of authoritarianism that we’ve never seen before.Because the people who control AI could use it to control everyone else.Liron 00:57:17And so I think that’s a real danger. I think that we need to think about how to prevent that concentration of power. We need to think about how to make sure that AI is distributed widely, that the benefits are distributed widely, that the control is distributed widely.And I think that’s going to be very difficult because there are strong incentives for concentration. AI development is very expensive. It requires a lot of compute. It requires a lot of data. It requires a lot of talent.Liron 00:57:43And so there’s a natural tendency for AI to be concentrated in a few large companies or a few large countries. And I think we need to resist that tendency. We need to think about how to democratize AI.How to make sure that it’s not controlled by a small elite.Donal 00:58:01Okay. And just on the geopolitical implications, so obviously there’s a lot of competition between the US and China on AI. How do you think that plays out?Liron 00:58:10I think that’s one of the scariest aspects of the situation. I think that the US-China competition could lead to a dangerous race dynamic where both countries are rushing to build the most advanced AI as quickly as possible, and they’re cutting corners on safety.Liron 00:58:27And I think that that’s a recipe for disaster. I think that if we’re racing to build superintelligent AI without solving alignment, we’re going to lose control. And it doesn’t matter if it’s the US that loses control or China that loses control. We all lose.So I think that the US-China competition is very dangerous. And I think we need to find a way to cooperate instead of competing.Liron 00:58:50Now, that’s easier said than done. There’s a lot of mistrust between the US and China. There are geopolitical tensions. But I think that AI risk is a common enemy. I think that both the US and China should be able to recognize that if we lose control of AI, we all lose.And so we should be able to cooperate on safety even if we’re competing on other things.Liron 00:59:13And so I think that we need some kind of international agreement, some kind of treaty, that says we’re all going to follow certain safety standards. We’re not going to race ahead recklessly. We’re going to prioritize safety over speed.And I think that’s going to require leadership from both the US and China. It’s going to require them to put aside their differences and work together on this common threat.Donal 00:59:36Okay. And just on the role of China specifically, so some people worry that even if the US slows down on AI, China will just race ahead. What’s your response?Liron 00:59:45I think that’s a real concern, but I don’t think it’s insurmountable. I think that China also faces the same risks from unaligned AI that we do. I think that Chinese leadership, if they understand the risks, should be willing to cooperate.Liron 01:00:03Now, there’s a question of whether they do understand the risks. And I think that’s something we need to work on. I think we need to engage with Chinese AI researchers. We need to engage with Chinese policymakers. We need to make sure that they understand the danger.Because if they understand the danger, I think they’ll be willing to slow down.Liron 01:00:23Now, if they don’t understand the danger or if they think that they can win the race and control AI, then that’s more problematic. But I think that we should at least try to engage with them and try to build a common understanding of the risks.And I think that if we can do that, then cooperation is possible.Liron 01:00:42But if we can’t, then yes, we’re in a very dangerous situation. Because then we have a race dynamic where everyone is rushing to build superintelligent AI and no one is prioritizing safety. And I think that’s the worst possible outcome.Donal 01:00:57Okay. And just going back to economic implications, you mentioned gradual disempowerment earlier. Can you elaborate on that?Liron 01:01:04Yeah, gradual disempowerment. So the idea is that even if we have aligned AI, even if the AI is doing what we want it to do, we could still lose power gradually over time.Because as AI becomes more and more capable, humans become less and less relevant.Liron 01:01:22And so even if the AI is technically aligned, even if it’s doing what we tell it to do, we could end up in a situation where humans don’t have any power anymore. Where all the important decisions are being made by AI, and humans are just kind of along for the ride.And I think that’s a concern even if we solve the technical alignment problem.Liron 01:01:42Now, there’s different ways this could play out. One way is that you have AI-controlled corporations that are technically serving shareholders, but the shareholders are irrelevant because they don’t understand what the AI is doing. The AI is making all the decisions.Another way is that you have AI-controlled governments that are technically serving citizens, but the citizens don’t have any real power because the AI is doing everything.Liron 01:01:53There’s really no point to make everybody desperately poor. You know, we already have a welfare system in every first world country. So I don’t see why we shouldn’t just pad the welfare system more if we can afford it.Um, there’s a bigger problem though, called gradual disempowerment.Liron 01:02:05It’s an interesting paper by David Krueger, I think a couple other authors, and it just talks about how yeah, you can have universal basic income, but the problem now is that the government doesn’t care about you anymore.You become like an oil country, right? Oil countries are often not super nice to their citizens because the government pays the citizens and the citizens don’t really pay tax to the government.Liron 01:02:24So it becomes this very one-sided power relationship where the government can just abuse the citizens. You know, you just have a ruling family basically. And I think there’s countries, I think maybe Sweden has pulled it off where they’re able to have oil and the citizens are still somewhat democratic.Liron 01:02:39But then you have other countries like Saudi Arabia, I think, you know, other oil countries there, which maybe aren’t pulling it off so much, right? Maybe they are bad to their citizens, I don’t know.And so that’s the gradual disempowerment issue. But look, for me personally, I actually think all of those still take the backseat to the AI just being uncontrollable.Liron 01:02:55So I don’t even think we’re going to have an economy where you’re going to have rich owners of capital getting so rich while other people who didn’t buy enough stock get poor. I don’t even think we’re going to have that for a very long time.I just think we’re going to have, our brains are just going to be outclassed.Liron 01:03:08I just think, you know, they’re just going to come for our houses and it’s not even going to be a matter of buying it from us. It’s just going to be, you know, get out. You’re dead.Regulation, Policy, and Surviving The Future With AIDonal 01:03:16Okay, so last thing, on a related point, do you actually think that democratic capitalism can survive regulation against AI?So the kind of regulation we need to do right now, so if we want to pause it and we want to prevent this catastrophic outcome actually happening, can a democratic capitalist society survive that?Donal 01:03:23Because I’ve seen pushbacks where people say from a libertarian perspective, you can’t stop people from innovating or you can’t stop businesses from investing. So what are your thoughts there? Would everything have to change?Liron 01:03:43Yeah. The specific policy that I recommend is to just have a pause button. Have an international treaty saying, hey, it’s too scary to build AI right now. It’s unlocking, you know, it’s about to go uncontrollable.We could be losing power in as little as five or 10 years.Liron 01:03:56We could just end, you know, game over for humanity. We don’t want that. So we’re going to build a centralized off button. It’s going to live in a UN data center or something, right? Some kind of international coordination between the most powerful AI countries.And when you’re saying, won’t that lead to tyranny?Liron 01:04:11I mean, look, there’s always risks, right? I mean, I tend to normally skew libertarian. This is the first time in my life when I’ve said, let’s do this central non-libertarian thing. It’s this one exception. Do I think this one exception will lead to tyranny?I don’t think so.Liron 01:04:27I mean, you still have the rest of the economy, you still have virtual reality, you still have a space program. You still use the fruits of AI that we all have so far before we’ve hit the pause button. So no, I think people are overworrying.Donal 01:04:34So you’re happy with LLMs? Are current LLMs acceptable from your perspective? Do you think that they can stay?Liron 01:04:40Yeah, I think that they are acceptable, but I think that they were too risky. Even with GPT-4 Turbo?Donal 01:04:45Even with GPT-4 Turbo?Liron 01:04:45Yeah. Even with GPT-4 Turbo, because I think if an LLM tried its hardest right now to destroy the world, I think that humans could shut it down. Right? I don’t think it’s game over for humans.And so the situation is, it’s like, like I said, it’s like the Icarus situation.Liron 01:04:59And you’re saying, hey, are you happy with how far you’ve flown? Yeah. And maybe tomorrow we fly a little higher and we’re not dead yet. Am I happy? Yes. But do I think it’s prudent to try flying higher? No. Right?So it’s a tough situation, right? Because I can’t help enjoying the fruits of flying higher and higher, right?Liron 01:05:16I use the best AI tools I can, right? But I just, there’s just a rational part of my brain being like, look, we gambled. Yeah, we gambled and won. We gambled and won. We gambled and won. We’re about to approach the superintelligent threshold.Are we going to win after we gambled to that threshold? Logic tells me probably not.Donal 01:05:31Okay. And sorry, last point. I know it keeps rolling. How do you actually use them in preparation for your debates? You use LLMs? You trust them at that level?Liron 01:05:39Yeah. I mean, you know, I didn’t use it for this interview. I mean, you know, normally I just make my own outline manually, but I certainly use AI at my job. I use AI to help customer service at my company.Liron 01:05:49I mean, I try to use the best AI tools I can because it’s amazing technology, right? It’s the same reason I use the best MacBook that I can. I mean, I like using good tools, right? I’m not opposed to using AI. I think AI has created a lot of value.And again, it kind of makes me look dumb where it’s like the next version of AI comes out, I start using it, I see it creating a ton of value.Liron 01:06:08And then you can come to me and go, see Liron, what were you scared of? Right? We got an AI and it’s helping. Yeah. What was I scared of? Because we rolled a die. We gambled and won. Okay, I’m taking my winnings, right?The winnings are here on the table. I’m going to take my winnings. That doesn’t mean I want to be pushing my luck and gambling again.Donal 01:06:21Yeah. Well said. Okay. Liron, thank you for your time. It was an absolute pleasure. I really enjoyed it.Liron 01:06:27Yeah, thanks Donal. This was fun.Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe
    --------  
    1:06:30
  • Why AI Alignment Is 0% Solved — Ex-MIRI Researcher Tsvi Benson-Tilsen
    Tsvi Benson-Tilsen spent seven years tackling the alignment problem at the Machine Intelligence Research Institute (MIRI). Now he delivers a sobering verdict: humanity has made “basically 0%” progress towards solving it. Tsvi unpacks foundational MIRI research insights like timeless decision theory and corrigibility, which expose just how little humanity actually knows about controlling superintelligence. These theoretical alignment concepts help us peer into the future, revealing the non-obvious, structural laws of “intellidynamics” that will ultimately determine our fate. Time to learn some of MIRI’s greatest hits.P.S. I also have a separate interview with Tsvi about his research into human augmentation: Watch here!Timestamps 0:00 — Episode Highlights 0:49 — Humanity Has Made 0% Progress on AI Alignment 1:56 — MIRI’s Greatest Hits: Reflective Probability Theory, Logical Uncertainty, Reflective Stability 6:56 — Why Superintelligence is So Hard to Align: Self-Modification 8:54 — AI Will Become a Utility Maximizer (Reflective Stability) 12:26 — The Effect of an “Ontological Crisis” on AI 14:41 — Why Modern AI Will Not Be ‘Aligned By Default’ 18:49 — Debate: Have LLMs Solved the “Ontological Crisis” Problem? 25:56 — MIRI Alignment Greatest Hit: Timeless Decision Theory 35:17 — MIRI Alignment Greatest Hit: Corrigibility 37:53 — No Known Solution for Corrigible and Reflectively Stable Superintelligence39:58 — RecapShow NotesStay tuned for part 3 of my interview with Tsvi where we debate AGI timelines! Learn more about Tsvi’s organization, the Berkeley Genomics Project: https://berkeleygenomics.orgWatch part 1 of my interview with Tsvi: TranscriptEpisode HighlightsTsvi Benson-Tilsen 00:00:00If humans really f*cked up, when we try to reach into the AI and correct it, the AI does not want humans to modify the core aspects of what it values.Liron Shapira 00:00:09This concept is very deep, very important. It’s almost MIRI in a nutshell. I feel like MIRI’s whole research program is noticing: hey, when we run the AI, we’re probably going to get a bunch of generations of thrashing. But that’s probably only after we’re all dead and things didn’t happen the way we wanted. I feel like that is what MIRI is trying to tell the world. Meanwhile, the world is like, “la la la, LLMs, reinforcement learning—it’s all good, it’s working great. Alignment by default.”Tsvi 00:00:34Yeah, that’s certainly how I view it.Humanity Has Made 0% Progress on AI Alignment Liron Shapira 00:00:46All right. I want to move on to talk about your MIRI research. I have a lot of respect for MIRI. A lot of viewers of the show appreciate MIRI’s contributions. I think it has made real major contributions in my opinion—most are on the side of showing how hard the alignment problem is, which is a great contribution. I think it worked to show that. My question for you is: having been at MIRI for seven and a half years, how are we doing on theories of AI alignment?Tsvi Benson-Tilsen 00:01:10I can’t speak with 100% authority because I’m not necessarily up to date on everything and there are lots of researchers and lots of controversy. But from my perspective, we are basically at 0%—at zero percent done figuring it out. Which is somewhat grim. Basically, there’s a bunch of fundamental challenges, and we don’t know how to grapple with these challenges. Furthermore, it’s sort of sociologically difficult to even put our attention towards grappling with those challenges, because they’re weirder problems—more pre-paradigmatic. It’s harder to coordinate multiple people to work on the same thing productively.It’s also harder to get funding for super blue-sky research. And the problems themselves are just slippery.MIRI Alignment Greatest Hits: Reflective Probability Theory, Logical Uncertainty, Reflective Stability Liron 00:01:55Okay, well, you were there for seven years, so how did you try to get us past zero?Tsvi 00:02:00Well, I would sort of vaguely (or coarsely) break up my time working at MIRI into two chunks. The first chunk is research programs that were pre-existing when I started: reflective probability theory and reflective decision theory. Basically, we were trying to understand the mathematical foundations of a mind that is reflecting on itself—thinking about itself and potentially modifying itself, changing itself. We wanted to think about a mind doing that, and then try to get some sort of fulcrum for understanding anything that’s stable about this mind.Something we could say about what this mind is doing and how it makes decisions—like how it decides how to affect the world—and have our description of the mind be stable even as the mind is changing in potentially radical ways.Liron 00:02:46Great. Okay. Let me try to translate some of that for the viewers here. So, MIRI has been the premier organization studying intelligence dynamics, and Eliezer Yudkowsky—especially—people on social media like to dunk on him and say he has no qualifications, he’s not even an AI expert. In my opinion, he’s actually good at AI, but yeah, sure. He’s not a top world expert at AI, sure. But I believe that Eliezer Yudkowsky is in fact a top world expert in the subject of intelligence dynamics. Is this reasonable so far, or do you want to disagree?Tsvi 00:03:15I think that’s fair so far.Liron 00:03:16Okay. And I think his research organization, MIRI, has done the only sustained program to even study intelligence dynamics—to ask the question, “Hey, let’s say there are arbitrarily smart agents. What should we expect them to do? What kind of principles do they operate on, just by virtue of being really intelligent?” Fair so far.Now, you mentioned a couple things. You mentioned reflective probability. From what I recall, it’s the idea that—well, we know probability theory is very useful and we know utility maximization is useful. But it gets tricky because sometimes you have beliefs that are provably true or false, like beliefs about math, right? For example, beliefs about the millionth digit of π. I mean, how can you even put a probability on the millionth digit of π?The probability of any particular digit is either 100% or 0%, ‘cause there’s only one definite digit. You could even prove it in principle. And yet, in real life you don’t know the millionth digit of π yet (you haven’t done the calculation), and so you could actually put a probability on it—and then you kind of get into a mess, ‘cause things that aren’t supposed to have probabilities can still have probabilities. How is that?Tsvi 00:04:16That seems right.Liron 00:04:18I think what I described might be—oh, I forgot what it’s called—like “deductive probability” or something. Like, how do you...Tsvi 00:04:22(interjecting) Uncertainty.Liron 00:04:23Logical uncertainty. So is reflective probability something else?Tsvi 00:04:26Yeah. If we want to get technical: logical uncertainty is this. Probability theory usually deals with some fact that I’m fundamentally unsure about (like I’m going to roll some dice; I don’t know what number will come up, but I still want to think about what’s likely or unlikely to happen). Usually probability theory assumes there’s some fundamental randomness or unknown in the universe.But then there’s this further question: you might actually already know enough to determine the answer to your question, at least in principle. For example, what’s the billionth digit of π—is the billionth digit even or odd? Well, I know a definition of π that determines the answer. Given the definition of π, you can compute out the digits, and eventually you’d get to the billionth one and you’d know if it’s even. But sitting here as a human, who doesn’t have a Python interpreter in his head, I can’t actually figure it out right now. I’m uncertain about this thing, even though I already know enough (in principle, logically speaking) to determine the answer. So that’s logical uncertainty—I’m uncertain about a logical fact.Tsvi 00:05:35Reflective probability is sort of a sharpening or a subset of that. Let’s say I’m asking, “What am I going to do tomorrow? Is my reasoning system flawed in such a way that I should make a correction to my own reasoning system?” If you want to think about that, you’re asking about a very, very complex object. I’m asking about myself (or my future self). And because I’m asking about such a complex object, I cannot compute exactly what the answer will be. I can’t just sit here and imagine every single future pathway I might take and then choose the best one or something—it’s computationally impossible. So it’s fundamentally required that you deal with a lot of logical uncertainty if you’re an agent in the world trying to reason about yourself.Liron 00:06:24Yeah, that makes sense. Technically, you have the computation, or it’s well-defined what you’re going to do, but realistically you don’t really know what you’re going to do yet. It’s going to take you time to figure it out, but you have to guess what you’re gonna do. So that kind of has the flavor of guessing the billionth digit of π. And it sounds like, sure, we all face that problem every day—but it’s not... whatever.Liron 00:06:43When you’re talking about superintelligence, right, these super-intelligent dudes are probably going to do this perfectly and rigorously. Right? Is that why it’s an interesting problem?Why Superintelligence is So Hard to Align: Self-ModificationTsvi 00:06:51That’s not necessarily why it’s interesting to me. I guess the reason it’s interesting to me is something like: there’s a sort of chaos, or like total incomprehensibility, that I perceive if I try to think about what a superintelligence is going to be like. It’s like we’re talking about something that is basically, by definition, more complex than I am. It understands more, it has all these rich concepts that I don’t even understand, and it has potentially forces in its mind that I also don’t understand.In general it’s just this question of: how do you get any sort of handle on this at all? A sub-problem of “how do you get any handle at all on a super-intelligent mind” is: by the very nature of being an agent that can self-modify, the agent is potentially changing almost anything about itself.Tsvi 00:07:37Like, in principle, you could reach in and reprogram yourself. For example, Liron’s sitting over there, and let’s say I want to understand Liron. I’m like, well, here are some properties of Liron—they seem pretty stable. Maybe those properties will continue being the case.Tsvi 00:07:49He loves his family and cares about other people. He wants to be ethical. He updates his beliefs based on evidence. So these are some properties of Liron, and if those properties keep holding, then I can expect fairly sane behavior. I can expect him to keep his contracts or respond to threats or something.But if those properties can change, then sort of all bets are off. It’s hard to say anything about how he’s going to behave. If tomorrow you stop using Bayesian reasoning to update your beliefs based on evidence and instead go off of vibes or something, I have no idea how you’re going to respond to new evidence or new events.Suppose Liron gets the ability to reach into his own brain and just reprogram everything however he wants. Now that means if there’s something that is incorrect about Liron’s mental structure (at least, incorrect according to Liron), Liron is gonna reach in and modify that. And that means that my understanding of Liron is going to be invalidated.AI Will Become a Utility Maximizer (Reflective Stability) Liron 00:08:53That makes a lot of sense. So you’re talking about a property that AIs may or may not have, which is called reflective stability (or synonymously, stability under self-modification). Right. You can kind of use those interchangeably. Okay. And I think one of MIRI’s early insights—which I guess is kind of simple, but the hard part is to even start focusing on the question—is the insight that perfect utility maximization is reflectively stable, correct?Tsvi 00:09:20With certain assumptions, yes.Liron 00:09:22And this is one of the reasons why I often talk on this channel about a convergent outcome where you end up with a utility maximizer. You can get some AIs that are chill and they just like to eat chips and not do much and then shut themselves off. But it’s more convergent that AIs which are not utility maximizers are likely to spin off assistant AIs or successor AIs that are closer and closer to perfect utility maximizers—for the simple reason that once you’re a perfect utility maximizer, you stay a perfect utility maximizer.Liron 00:09:50And your successor AI... what does that look like? An even more hard-core utility maximizer, right? So it’s convergent in that sense.Tsvi 00:09:56I’m not sure I completely agree, but yeah. I dunno how much in the weeds we want to get.Liron 00:09:59I mean, in general, when you have a space of possibilities, noticing that one point in the space is like—I guess you could call it an eigenvalue, if you want to use fancy terminology. It’s a point such that when the next iteration of time happens, that point is still like a fixed point. So in this case, just being a perfect utility maximizer is a fixed point: the next tick of time happens and, hey look, I’m still a perfect utility maximizer and my utility function is still the same, no matter how much time passes.Liron 00:10:24And Eliezer uses the example of, like, let’s say you have a super-intelligent Gandhi. One day you offer him a pill to turn himself into somebody who would rather be a murderer. Gandhi’s never going to take that pill. That’s part of the reflective stability property that we expect from these super-intelligent optimizers: if one day they want to help people, then the next day they’re still going to want to help people, because any actions that they know will derail them from doing that—they’re not going to take those actions.Yeah. Any thoughts so far?Tsvi 00:10:51Well, I’m not sure how much we want to get into this. This is quite a... this is like a thousand-hour rabbit hole.But it might be less clear than you think that it makes sense to talk of an “expected utility maximizer” in the sort of straightforward way that you’re talking about. To give an example: you’ve probably heard of the diamond maximizer problem?Liron 00:11:13Yeah, but explain to the—Tsvi 00:11:14Sure. The diamond maximizer problem is sort of like a koan or a puzzle (a baby version of the alignment problem). Your mission is to write down computer code that, if run on a very, very large (or unbounded) amount of computing power, would result in the universe being filled with diamonds. Part of the point here is that we’re trying to simplify the problem. We don’t need to talk about human values and alignment and blah blah blah. It’s a very simple-sounding utility function: just “make there be a lot of diamond.”So first of all, this problem is actually quite difficult. I don’t know how to solve it, personally. This isn’t even necessarily the main issue, but one issue is that even something simple-sounding like “diamond” is not necessarily actually easy to define—to such a degree that, you know, when the AI is maximizing this, you’ll actually get actual diamond as opposed to, for example, the AI hooking into its visual inputs and projecting images of diamonds, or making some weird unimaginable configuration of matter that even more strongly satisfies the utility function you wrote down.The Effect of an “Ontological Crisis” on AI Tsvi 00:12:25To frame it with some terminology: there’s a thing called ontological crisis, where at first you have something that’s like your utility function—like, what do I value, what do I want to see in the universe? And you express it in a certain way.For example, I might say I want to see lots of people having fun lives; let’s say that’s my utility function, or at least that’s how I describe my utility function or understand it. Then I have an ontological crisis. My concept of what something even is—in this case, a person—is challenged or has to change because something weird and new happens.Tsvi 00:13:00Take the example of uploading: if you could translate a human neural pattern into a computer and run a human conscious mind in a computer, is that still a human? Now, I think the answer is yes, but that’s pretty controversial. So before you’ve even thought of uploading, you’re like, “I value humans having fun lives where they love each other.” And then when you’re confronted with this possibility, you have to make a new decision. You have to think about this new question of, “Is this even a person?”So utility functions... One point I’m trying to illustrate is that utility functions themselves are not necessarily straightforward.Liron 00:13:36Right, right, right. Because if you define a utility function using high-level concepts and then the AI has what you call the ontological crisis—its ontology for understanding the world shifts—then if it’s referring to a utility function expressed in certain concepts that don’t mean the same thing anymore, that’s basically the problem you’re saying.Tsvi 00:13:53Yeah. And earlier you were saying, you know, if you have an expected utility maximizer, then it is reflectively stable. That is true, given some assumptions about... like, if we sort of know the ontology of the universe.Liron 00:14:04Right, right. I see. And you tried to give a toy... I’ll take a stab at another toy example, right? So, like, let’s say—you mentioned the example of humans. Maybe an AI would just not notice that an upload was a human, and it would, like, torture uploaded humans, ‘cause it’s like, “Oh, this isn’t a human. I’m maximizing the welfare of all humans, and there’s only a few billion humans made out of neurons. And there’s a trillion-trillion human uploads getting tortured. But that’s okay—human welfare is being maximized.”Liron 00:14:29And we say that this is reflectively stable because the whole time that the AI was scaling up its powers, it thought it had the same utility function all along and it never changed it. And yet that’s not good enough.Why Modern AI Will Not Be ‘Aligned By Default’ Liron 00:14:41Okay. This concept of reflective stability is very deep, very important. And I think it’s almost MIRI in a nutshell. Like I feel like MIRI’s whole research program in a nutshell is noticing: “Hey, when we run the AI, we’re probably going to get a bunch of generations of thrashing, right?”Liron 00:14:57Those early generations aren’t reflectively stable yet. And then eventually it’ll settle down to a configuration that is reflectively stable in this important, deep sense. But that’s probably after we’re all dead and things didn’t happen the way we wanted. It would be really great if we could arrange for the earlier generations—say, by the time we’re into the third generation—to have hit on something reflectively stable, and then try to predict that. You know, make the first generation stable, or plan out how the first generation is going to make the second generation make the third generation stable, and then have some insight into what the third generation is going to settle on, right?Liron 00:15:26I feel like that is what MIRI is trying to tell the world to do. And the world is like, “la la la, LLMs. Reinforcement learning. It’s all good, it’s working great. Alignment by default.”Tsvi 00:15:34Yeah, that’s certainly how I view it.Liron 00:15:36Now, the way I try to explain this to people when they say, “LLMs are so good! Don’t you feel like Claude’s vibes are fine?” I’m like: well, for one thing, one day Claude (a large language model) is going to be able to output, like, a 10-megabyte shell script, and somebody’s going to run it for whatever reason—because it’s helping them run their business—and they don’t even know what a shell script is. They just paste it in the terminal and press enter.And that shell script could very plausibly bootstrap a successor or a helper to Claude. And all of the guarantees you thought you had about the “vibes” from the LLM... they just don’t translate to guarantees about the successor. Right? The operation of going from one generation of the AI to the next is violating all of these things that you thought were important properties of the system.Tsvi 00:16:16Yeah, I think that’s exactly right. And it is especially correct when we’re talking about what I would call really creative or really learning AIs.Sort of the whole point of having AI—one of the core justifications for even pursuing AI—is you make something smarter than us and then it can make a bunch of scientific and technological progress. Like it can cure cancer, cure all these diseases, be very economically productive by coming up with new ideas and ways of doing things. If it’s coming up with a bunch of new ideas and ways of doing things, then it’s necessarily coming up with new mental structures; it’s figuring out new ways of thinking, in addition to new ideas.If it’s finding new ways of thinking, that sort of will tend to break all but the strongest internal mental boundaries. One illustration would be: if you have a monitoring system where you’re tracking the AI’s thinking—maybe you’re literally watching the chain-of-thought for a reasoning LLM—and your monitoring system is watching out for thoughts that sound like they’re scary (like it sounds like this AI is plotting to take over or do harm to humans or something). This might work initially, but then as you’re training your reasoning system (through reinforcement learning or what have you), you’re searching through the space of new ways of doing these long chains of reasoning. You’re searching for new ways of thinking that are more effective at steering the world. So you’re finding potentially weird new ways of thinking that are the best at achieving goals. And if you’re finding new ways of thinking, that’s exactly the sort of thing that your monitoring system won’t be able to pick up on.For example, if you tried to listen in on someone’s thoughts: if you listen in on a normal programmer, you could probably follow along with what they’re trying to do, what they’re trying to figure out. But if you listened in on some like crazy, arcane expert—say, someone writing an optimized JIT compiler for a new programming language using dependent super-universe double-type theory or whatever—you’re not gonna follow what they’re doing.They’re going to be thinking using totally alien concepts. So the very thing we’re trying to use AI for is exactly the sort of thing where it’s harder to follow what they’re doing.I forgot your original question...Liron 00:18:30Yeah, what was my original question? (Laughs) So I’m asking you about basically MIRI’s greatest hits.Liron 00:18:36So we’ve covered logical uncertainty. We’ve covered the massive concept of reflective stability (or stability under self-modification), and how perfect utility maximization is kind of reflectively stable (with plenty of caveats). We talked about ontological crises, where the AI maybe changes its concepts and then you get an outcome you didn’t anticipate because the concepts shifted.Debate: Have LLMs Solved the “Ontological Crisis” Problem? But if you look at LLMs, should they actually raise our hopes that we can avoid ontological crises? Because when you’re talking to an LLM and you use a term, and then you ask the LLM a question in a new context, you can ask it something totally complex, but it seems to hang on to the original meaning that you intended when you first used the term. Like, they seem good at that, don’t they?Tsvi 00:19:17I mean, again, sort of fundamentally my answer is: LLMs aren’t minds. They’re not able to do the real creative thinking that should make us most worried. And when they are doing that, you will see ontological crises. So what you’re saying is, currently it seems like they follow along with what we’re trying to do, within the realm of a lot of common usage. In a lot of ways people commonly use LLMs, the LLMs can basically follow along with what we want and execute on that. Is that the idea?Liron 00:19:47Well, I think what we’ve observed with LLMs is that meaning itself is like this high-dimensional vector space whose math turns out to be pretty simple—so long as you’re willing to deal with high-dimensional vectors, which it turns out we can compute with (we have the computing resources). Obviously our brain seems to have the computing resources too. Once you’re mapping meanings to these high-dimensional points, it turns out that you don’t have this naïve problem people used to think: that before you get a totally robust superintelligence, you would get these superintelligences that could do amazing things but didn’t understand language that well.People thought that subtle understanding of the meanings of phrases might be “superintelligence-complete,” you know—those would only come later, after you have a system that could already destroy the universe without even being able to talk to you or write as well as a human writer. And we’ve flipped that.So I’m basically asking: the fact that meaning turns out to be one of the easier AI problems (compared to, say, taking over the world)—should that at least lower the probability that we’re going to have an ontological crisis?Tsvi 00:20:53I mean, I think it’s quite partial. In other words, the way that LLMs are really understanding meaning is quite partial, and in particular it’s not going to generalize well. Almost all the generators of the way that humans talk about things are not present in an LLM. In some cases this doesn’t matter for performance—LLMs do a whole lot of impressive stuff in a very wide range of tasks, and it doesn’t matter if they do it the same way humans do or from the same generators. If you can play chess and put the pieces in the right positions, then you win the chess game; it doesn’t matter if you’re doing it like a human or doing it like AlphaGo does with a giant tree search, or something else.But there’s a lot of human values that do rely on sort of the more inchoate, more inexplicit underlying generators of our external behaviors. Like, our values rely on those underlying intuitions to figure stuff out in new situations. Maybe an example would be organ transplantation. Up until that point in history, a person is a body, and you sort of have bodily integrity. You know, up until that point there would be entangled intuitions—in the way that humans talk about other humans, intuitions about a “soul” would be entangled with intuitions about “body” in such a way that there’s not necessarily a clear distinction between body and soul.Okay, now we have organ transplantation. Like, if you die and I have a heart problem and I get to have your heart implanted into me, does that mean that my emotions will be your emotions or something? A human can reassess what happens after you do an organ transplant and see: no, it’s still the same person. I don’t know—I can’t define exactly how I’m determining this, but I can tell that it’s basically the same person. There’s nothing weird going on, and things seem fine.That’s tying into a bunch of sort of complex mental processes where you’re building up a sense of who a person is. You wouldn’t necessarily be able to explain what you’re doing. And even more so, all the stuff that you would say about humans—all the stuff you’d say about other people up until the point when you get organ transplantation—doesn’t necessarily give enough of a computational trace or enough evidence about those underlying intuitions.Liron 00:23:08So on one hand I agree that not all of human morality is written down, and there are some things that you may just need an actual human brain for—you can’t trust AI to get them. Although I’m not fully convinced of that; I’m actually convincible that modern AIs have internalized enough of how humans reason about morality that you could just kill all humans and let the AIs be the repository of what humans know.Don’t get me wrong, I wouldn’t bet my life on it! I’m not saying we should do this, but I’m saying I think there’s like a significant chance that we’re that far along. I wouldn’t write it off.But the other part of the point I want to make, though—and your specific example about realizing that organ transplants are a good thing—I actually think this might be an area where LLMs shine. Because, like, hypothetically: let’s say you take all the data humans have generated up to 1900. So somehow you have a corpus of everything any human had ever said or written down up to 1900, and you train an AI on that.Liron 00:23:46In the year 1900, where nobody’s ever talked about organ transplants, let’s say, I actually think that if you dialogued with an LLM like that (like a modern GPT-4 or whatever, trained only on 1900-and-earlier data), I think you could get an output like: “Hmm, well, if you were to cut a human open and replace an organ, and if the resulting human was able to live with that functioning new organ, then I would still consider it the same human.” I feel like it’s within the inference scope of today’s LLMs—even just with 1900-level data.Liron 00:24:31What do you think?Tsvi 00:24:32I don’t know what to actually guess. I don’t actually know what people were writing about these things up until 1900.Liron 00:24:38I mean, I guess what I’m saying is: I feel like this probably isn’t the greatest example of an ontological crisis that’s actually likely.Tsvi 00:24:44Yeah, that’s fair. I mean... well, yeah. Do you want to help me out with a better example?Liron 00:24:48Well, the thing is, I actually think that LLMs don’t really have an ontological crisis. I agree with your other statement that if you want to see an ontological crisis, you really just need to be in the realm of these superhuman optimizers.Tsvi 00:25:00Well, I mean, I guess I wanted to respond to your point that in some ways current LLMs are able to understand and execute on our values, and the ontology thing is not such a big problem—at least with many use cases.Liron 00:25:17Right.Tsvi 00:25:17Maybe this isn’t very interesting, but if the question is, like: it seems like they’re aligned in that they are trying to do what we want them to do, and also there’s not a further problem of understanding our values. As we would both agree, the problem is not that the AI doesn’t understand your values. But if the question is...I do think that there’s an ontological crisis question regarding alignment—which is... yeah, I mean maybe I don’t really want to be arguing that it comes from like, “Now you have this new ethical dilemma and that’s when the alignment problem shows up.” That’s not really my argument either.Liron 00:25:54All right, well, we could just move on.Tsvi 00:25:55Yeah, that’s fine.MIRI Alignment Greatest Hit: Timeless Decision TheoryLiron 00:25:56So, yeah, just a couple more of what I consider the greatest insights from MIRI’s research. I think you hit on these too. I want to talk about super-intelligent decision theory, which I think in paper form also goes by the name Timeless Decision Theory or Functional Decision Theory or Updateless Decision Theory. I think those are all very related decision theories.As I understand it, the founding insight of these super-intelligent decision theories is that Eliezer Yudkowsky was thinking about two powerful intelligences meeting in space. Maybe they’ve both conquered a ton of galaxies on their own side of the universe, and now they’re meeting and they have this zero-sum standoff of, like, how are we going to carve up the universe? We don’t necessarily want to go to war. Or maybe they face something like a Prisoner’s Dilemma for whatever reason—they both find themselves in this structure. Maybe there’s a third AI administering the Prisoner’s Dilemma.But Eliezer’s insight was like: look, I know that our human game theory is telling us that in this situation you’re supposed to just pull out your knife, right? Just have a knife fight and both of you walk away bloody, because that’s the Nash equilibrium—two half-beaten corpses, essentially. And he’s saying: if they’re really super-intelligent, isn’t there some way that they can walk away from this without having done that? Couldn’t they both realize that they’re better off not reaching that equilibrium?I feel like that was the founding thought that Eliezer had. And then that evolved into: well, what does this generalize to? And how do we fix the current game theory that’s considered standard? What do you think of that account?Tsvi 00:27:24So I definitely don’t know the actual history. I think that is a pretty good account of one way to get into this line of thinking. I would frame it somewhat differently. I would still go back to reflective stability. I would say, if we’re using the Prisoner’s Dilemma example (or the two alien super-intelligences encountering each other in the Andromeda Galaxy scenario): suppose I’m using this Nash equilibrium type reasoning. Now you and me—we’re the two AIs and we’ve met in the Andromeda Galaxy—at this point it’s like, “Alright, you know, f**k it. We’re gonna war; we’re gonna blow up all the stars and see who comes out on top.”This is not zero-sum; it’s like negative-sum (or technically positive-sum, we’d say not perfectly adversarial). And so, you know, if you take a step back—like freeze-frame—and then the narrator’s like, “How did I get here?” It’s like, well, what I had failed to do was, like, a thousand years ago when I was launching my probes to go to the Andromeda Galaxy, at that point I should have been thinking: what sort of person should I be? What sort of AI should I be?If I’m the sort of AI that’s doing this Nash equilibrium reasoning, then I’m just gonna get into these horrible wars that blow up a bunch of galaxies and don’t help anything. On the other hand, if I’m the sort of person who is able to make a deal with other AIs that are also able to make and keep deals, then when we actually meet in Andromeda, hopefully we’ll be able to assess each other—assess how each other are thinking—and then negotiate and actually, in theory, be able to trust that we’re gonna hold to the results of our negotiation. Then we can divvy things up.And that’s much better than going to war.Liron 00:29:06Now, the reason why it’s not so trivial—and in fact I can’t say I’ve fully wrapped my head around it, though I spent hours trying—is, like, great, yeah, so they’re going to cooperate. The problem is, when you conclude that they’re going to cooperate, you still have this argument of: okay, but if one of them changes their answer to “defect,” they get so much more utility. So why don’t they just do that? Right?And it’s very complicated to explain. It’s like—this gets to the idea of, like, what exactly is this counterfactual surgery that you’re doing, right? What is a valid counterfactual operation? And the key is to somehow make it so that it’s like a package deal, where if you’re doing a counterfactual where you actually decide at the end to defect after you know the other one’s going to cooperate... well, that doesn’t count. ‘Cause then you wouldn’t have known that the other is gonna cooperate. Right. I mean, it’s quite complicated. I don’t know if you have anything to add to that explanation.Tsvi 00:29:52Yeah. It can get pretty twisty, like you’re saying. There’s, like: what are the consequences of my actions? Well, there’s the obvious physical consequence: like I defect in the Prisoner’s Dilemma (I confess to the police), and then some physical events happen as a result (I get set free and my partner rots in jail). But then there’s this other, weirder consequence, which is that you are sort of determining this logical fact—which was already the case back when you were hanging out with your soon-to-be prison mate, your partner in crime. He’s learning about what kind of guy you are, learning what algorithm you’re going to use to make decisions (such as whether or not to rat him out).And then in the future, when you’re making this decision, you’re sort of using your free will to determine the logical fact of what your algorithm does. And this has the effect that your partner in crime, if he’s thinking about you in enough detail, can foresee that you’re gonna behave that way and react accordingly by ratting you out. So besides the obvious consequences of your action (that the police hear your confession and go throw the other guy in jail), there’s this much less obvious consequence of your action, which is that in a sense you’re making your partner also know that you behave that way and therefore he’ll rat you out as well.So there’s this... yeah, there’s all these weird effects of your actions.Liron 00:31:13It gets really, really trippy. And you can use the same kind of logic—the same kind of timeless logic—if you’re familiar with Newcomb’s Problem (I’m sure you are, but for the viewers): it’s this idea of, like, there’s two boxes and one of ‘em has $1,000 in it and one of ‘em may or may not have $1,000,000 in it. And according to this theory, you’re basically supposed to leave the $1,000. Like, you’re really supposed to walk away from a thousand dollars that you could have taken for sure, even if you also get a million—because the scenario is that a million plus a thousand is still really, really attractive to you, and you’re saying, “No, leave the $1,000,” even though the $1,000 is just sitting there and you’re allowed to take both boxes.Highly counterintuitive stuff. And you can also twist the problem: you can be like, you have to shoot your arm off because there’s a chance that in some other world the AI would have given you more money if in the current world you’re shooting your arm off. But even in this current world, all you’re going to have is a missing arm. Like, you’re guaranteed to just have a missing arm, ‘cause you shot your arm off in this world. But if some coin flip had gone differently, then you would be in this other world where you’d get even more money if in the current world you shot your arm off. Basically, crazy connections that don’t look like what we’re used to—like, you’re not helping yourself in this world, you’re helping hypothetical logical copies of yourself.Liron 00:32:20It gets very brain-twisty. And I remember, you know, when I first learned this—it was like 17 years ago at this point—I was like, man, am I really going to encounter these kinds of crazy agents who are really setting these kinds of decision problems for me? I mean, I guess if the universe proceeds long enough... because I do actually buy this idea that eventually, when your civilization scales to a certain point of intelligence, these kinds of crazy mind-bending acausal trades or acausal decisions—I do think these are par for the course.And I think it’s very impressive that MIRI (and specifically Eliezer) had the realization of like, “Well, you know, if we’re doing intelligence dynamics, this is a pretty important piece of intelligence dynamics,” and the rest of the world is like, “Yeah, whatever, look, we’re making LLMs.” It’s like: look at what’s—think long term about what’s actually going to happen with the universe.Tsvi 00:33:05Yeah, I think Eliezer is a pretty impressive thinker.You come to these problems with a pretty different mindset when you’re trying to do AI alignment, because in a certain sense it’s an engineering problem. Now, it goes through all this very sort of abstract math and philosophical reasoning, but there were philosophers who thought for a long time about these decision theory problems (like Newcomb’s Problem and the Prisoner’s Dilemma and so on). But they didn’t ask the sorts of questions that Eliezer was asking. In particular, this reflective stability thing where it’s like, okay, you can talk about “Is it rational to take both boxes or only one?” and you can say, like, “Well, the problem is rewarding irrationality. Fine, cool.” But let’s ask just this different question, which is: suppose you have an AI that doesn’t care about being “rational”; it cares about getting high-scoring outcomes (getting a lot of dollars at the end of the game). That different question, maybe you can kind of directly analyze. And you see that if you follow Causal Decision Theory, you get fewer dollars. So if you have an AI that’s able to choose whether to follow Causal Decision Theory or some other decision theory (like Timeless Decision Theory), the AI would go into itself and rewrite its own code to follow Timeless Decision Theory, even if it starts off following Causal Decision Theory.So Causal Decision Theory is reflectively unstable, and the AI wins more when it instead behaves this way (using the other decision theory).Liron 00:34:27Yep, exactly right—which leads to the tagline “rationalists should win.”Tsvi 00:34:31Right.Liron 00:34:31As opposed to trying to honor the purity of “rationality.” Nope—the purity of rationality is that you’re doing the thing that’s going to get you to win, in a systematic way. So that’s like a deep insight.Tsvi 00:34:40One saying is that the first question of rationality is, “What do I believe, and why do I believe it?” And then I say the zeroth question of rationality is, “So what? Who cares? What consequence does this have?”Liron 00:34:54And my zeroth question of rationality (it comes from Zach Davis) is just, “What’s real and actually true?” It’s a surprisingly powerful question that I think most people neglect to ask.Tsvi 00:35:07True?Liron 00:35:08Yeah—you can get a lot of momentum without stopping to ask, like, okay, let’s be real here: what’s really actually true?Liron 00:35:14That’s my zeroth question. Okay. So I want to finish up tooting MIRI’s horn here, because I do think that MIRI concepts have been somewhat downgraded in recent discussions—because there’s so many shiny objects coming out of LLMs, like “Oh my God, they do this now, let’s analyze this trend,” right? There’s so much to grab onto that’s concrete right now, that’s pulling everybody in. And everybody’s like, “Yeah, yeah, decision theory between two AIs taking over the galaxy... call me when that’s happening.” And I’m like: I’m telling you, it’s gonna happen. This MIRI stuff is still totally relevant. It’s still part of intelligence dynamics—hear me out, guys.MIRI Alignment Greatest Hit: CorrigibilitySo let me just give you one more thing that I think is super relevant to intelligence dynamics, which is corrigibility, right? I think you’re pretty familiar with that research. You’ve pointed it out to me as one of the things that you think is the most valuable thing that MIRI spent time on, right?Tsvi 00:35:58Yeah. The broad idea of somehow making an AI (or a mind) that is genuinely, deeply—to the core—still open to correction, even over time. Like, even as the AI becomes really smart and, to a large extent, has taken the reins of the universe—like, when the AI is really smart, it is the most capable thing in the universe for steering the future—if you could somehow have it still be corrigible, still be correctable... like, still have it be the case that if there’s something about the AI that’s really, really bad (like the humans really fucked up and got something deeply wrong about the AI—whatever it is, whether it’s being unethical or it has the wrong understanding of human values or is somehow interfering with human values by persuading us or influencing us—whatever’s deeply wrong with the AI), we can still correct it ongoingly.This is especially challenging because when we say reach into the AI and correct it, you know, you’re saying we’re gonna reach in and then deeply change what it does, deeply change what it’s trying to do, and deeply change what effect it has on the universe. Because of instrumental convergence—because of the incentive to, in particular, maintain your own integrity or maintain your own value system—like, if you’re gonna reach in and change my value system, I don’t want you to do that.Tsvi 00:37:27‘Cause if you change my value system, I’m gonna pursue different values, and I’m gonna make some other stuff happen in the universe that isn’t what I currently want. I’m gonna stop working for my original values. So by strong default, the AI does not want humans to reach in and modify core aspects of how that AI works or what it values.Tsvi 00:37:45So that’s why corrigibility is a very difficult problem. We’re sort of asking for this weird structure of mind that allows us to reach in and modify it.No Known Solution for Corrigible and Reflectively Stable SuperintelligenceLiron 00:37:53Exactly. And I think MIRI has pointed out the connection between reflective stability and incorrigibility. Meaning: if you’re trying to architect a few generations in advance what’s going to be the reflectively stable version of the successor AI, and you’re also trying to architect it such that it’s going to be corrigible, that’s tough, right?Because it’s more convergent to have an AI that’s like, “Yep, I know my utility function. I got this, guys. Let me handle it from here on out. What, you want to turn me off? But it doesn’t say anywhere in my utility function that I should allow myself to be turned off...” And then that led to the line of research of like, okay, if we want to make the AI reflectively stable and also corrigible, then it somehow has to think that letting itself be turned off is actually part of its utility function. Which then gets you into utility function engineering.Like a special subset of alignment research: let’s engineer a utility function where being turned off (or otherwise being corrected) is baked into the utility function. And as I understand it, MIRI tried to do that and they were like, “Crap, this seems extremely hard, or maybe even impossible.” So corrigibility now has to be this fundamentally non-reflectively-stable thing—and that just makes the problem harder.Tsvi 00:38:58Well, I guess I would sort of phrase it the opposite way (but with the same idea), which is: we have to figure out things that are reflectively stable—I think that’s a requirement—but that are somehow reflectively stable while not being this sort of straightforward agent architecture of “I have a utility function, which is some set of world-states that I like or dislike, and I’m trying to make the universe look like that.”Already even that sort of very abstract, skeletal structure for an agent is problematic—that already pushes against corrigibility. But there might be things that are... there might be ways of being a mind that—this is theoretical—but maybe there are ways of being a mind and an agent (an effective agent) where you are corrigible and you’re reflectively stable, but probably you’re not just pursuing a utility function. We don’t know what that would look like.RecapLiron 00:39:56Yep.Alright, so that was our deep dive into MIRI’s research and concepts that I think are incredibly valuable. We talked about MIRI’s research and we both agree that intelligence dynamics are important, and MIRI has legit foundations and they’re a good organization and still underrated. We talked about, you know, corrigibility as one of those things, and decision theory, and...And I think you and I both have the same summary of all of it, which is: good on MIRI for shining a light on all these difficulties. But in terms of actual productive alignment progress, we’re like so far away from solving even a fraction of the problem.Tsvi 00:40:31Yep, totally.Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe
    --------  
    40:43
  • Eben Pagan (aka David DeAngelo) Interviews Liron — Why 50% Chance AI Kills Everyone by 2050
    I’m excited to share my recent AI doom interview with Eben Pagan, better known to many by his pen name David DeAngelo!For an entire generation of men, ‘David DeAngelo’ was the authority on dating—and his work transformed my approach to courtship back in the day. Now the roles reverse, as I teach Eben about a very different game, one where the survival of our entire species is at stake.In this interview, we cover the expert consensus on AI extinction, my dead-simple two-question framework for understanding the threat, why there’s no “off switch” for superintelligence, and why we desperately need international coordination before it’s too late.Timestamps0:00 - Episode Preview1:05 - How Liron Got Doom-Pilled2:55 - Why There’s a 50% Chance of Doom by 20504:52 - What AI CEOs Actually Believe8:14 - What “Doom” Actually Means10:02 - The Next Species is Coming12:41 - The Baby Dragon Fallacy14:41 - The 2-Question Framework for AI Extinction18:38 - AI Doesn’t Need to Hate You to Kill You21:05 - “Computronium”: The End Game29:51 - 3 Reasons There’s No Superintelligence “Off Switch”36:22 - Answering ‘What Is Intelligence?”43:24 - We Need Global CoordinationShow NotesEben has become a world-class business trainer and someone who follows the AI discourse closely. I highly recommend subscribing to his podcast for excellent interviews & actionable AI tips: @METAMIND_AI---Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe
    --------  
    47:27
  • Former MIRI Researcher Solving AI Alignment by Engineering Smarter Human Babies
    Former Machine Intelligence Research Institute (MIRI) researcher Tsvi Benson-Tilsen is championing an audacious path to prevent AI doom: engineering smarter humans to tackle AI alignment.I consider this one of the few genuinely viable alignment solutions, and Tsvi is at the forefront of the effort. After seven years at MIRI, he co-founded the Berkeley Genomics Project to advance the human germline engineering approach.In this episode, Tsvi lays out how to lower P(doom), arguing we must stop AGI development and stigmatize it like gain-of-function virus research. We cover his AGI timelines, the mechanics of genomic intelligence enhancement, and whether super-babies can arrive fast enough to save us.I’ll be releasing my full interview with Tsvi in 3 parts. Stay tuned for part 2 next week!Timestamps0:00 Episode Preview & Introducing Tsvi Benson-Tilsen1:56 What’s Your P(Doom)™4:18 Tsvi’s AGI Timeline Prediction6:16 What’s Missing from Current AI Systems10:05 The State of AI Alignment Research: 0% Progress11:29 The Case for PauseAI 15:16 Debate on Shaming AGI Developers25:37 Why Human Germline Engineering31:37 Enhancing Intelligence: Chromosome Vs. Sperm Vs. Egg Selection37:58 Pushing the Limits: Head Size, Height, Etc.40:05 What About Human Cloning?43:24 The End-to-End Plan for Germline Engineering45:45 Will Germline Engineering Be Fast Enough?48:28 Outro: How to Support Tsvi’s WorkShow NotesTsvi’s organization, the Berkeley Genomics Project — https://berkeleygenomics.orgIf you’re interested to connect with Tsvi about germline engineering, you can reach out to him at [email protected] Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe
    --------  
    49:27

Más podcasts de Economía y empresa

Acerca de Doom Debates

It's time to talk about the end of the world! lironshapira.substack.com
Sitio web del podcast

Escucha Doom Debates, Libros para Emprendedores y muchos más podcasts de todo el mundo con la aplicación de radio.es

Descarga la app gratuita: radio.es

  • Añadir radios y podcasts a favoritos
  • Transmisión por Wi-Fi y Bluetooth
  • Carplay & Android Auto compatible
  • Muchas otras funciones de la app
Aplicaciones
Redes sociales
v7.23.11 | © 2007-2025 radio.de GmbH
Generated: 11/7/2025 - 4:40:39 PM