DEBATE: Is AGI Really Decades Away? | Ex-MIRI Researcher Tsvi Benson-Tilsen vs. Liron Shapira
Sparks fly in the finale of my series with ex-MIRI researcher Tsvi Benson-Tilsen as we debate his AGI timelines.Tsvi is a champion of using germline engineering to create smarter humans who can solve AI alignment. I support the approach, even though I’m skeptical it’ll gain much traction before AGI arrives.Timestamps0:00 Debate Preview0:57 Tsvi’s AGI Timeline Prediction 3:03 The Least Impressive Task AI Cannot Do In 2 years6:13 Proposed Task: Solve Cantor’s Theorem From Scratch 8:20 AI Has Limitations Related to Sample Complexity 11:41 We Need Clear Goalposts for Better AGI Predictions 13:19 Counterargument: LLMs May Not Be a Path to AGI16:01 Is Tsvi Setting a High Bar for Progress Towards AGI? 19:17 AI Models Are Missing A Spark of Creativity28:17 Liron’s “Black Box” AGI Test 32:09 Are We Going to Enter an AI Winter? 35:09 Who Is Being Overconfident? 42:11 If AI Makes Progress on Benchmarks, Would Tsvi Shorten His Timeline? 50:34 Recap & Tsvi’s ResearchShow NotesLearn more about Tsvi’s organization, the Berkeley Genomics Project — https://berkeleygenomics.orgWatch my previous 2 episodes with Tsvi:TranscriptDebate PreviewLiron Shapira 00:00:00Do you think we’re going to enter an AI winter soon or no?Tsvi Benson-Tilsen 00:00:03Let’s say 1-3% chance of real AGI in the next five years.Liron 00:00:08That is definitely lower than the consensus among most people.Tsvi 00:00:11There’s several things that they can’t do, each of which is strong evidence that they’re not intelligent.Liron 00:00:16And I’m encouraging you to set down some goalposts.Tsvi 00:00:19I don’t have to say this particular line is not gonna be crossed. That doesn’t make sense.Liron 00:00:22Maybe you can put more work into choosing examples thatTsvi 00:00:25Okay. Wait, wait. Okay. Listen. From my perspective, you’re somewhat overconfident.Liron 00:00:29I just feel like you’re the one who’s coming in over confident.Liron 00:00:32I mean, usually we can just point to “oh this might be a boundary. This might be a boundary. Tsvi 00:00:37But I named, I named…Liron 00:00:39I mean, that’s basically, that’s my beef with you, is I feel like. I think you should dig harder.Tsvi 00:00:42I mean, at some point, I’m just gonna say I’m busy.Tsvi’s AGI Timeline Prediction Liron 00:00:57Now let’s talk about timelines. You mentioned that you think in decades.Liron 00:01:00So for example, you were saying that you’d be surprised if it came about in the next five years.Tsvi 00:01:05I’m not well calibrated about this. It’s a very complicated thing, blah, blah, blah. But if you poke me hard enough, I will spit out numbers. So, let’s say 1-3% chance of AGI, of real AGI in the next five years.LironSo that is definitely lower than the way we usually aggregate people’s predictions. If you look at Metaculus, or even just people I’ve had on the show, usually the consensus among most people who seem informed is that there’s a bell curve where it seems to average around like 2031 of when we’ll get superhuman intelligence.Liron 00:01:40But why are you so confident that it’s probably not by 2031?Tsvi 00:01:46One part of my response boils down to I don’t really buy that the benchmarks are measuring the relevant things. Experts in a field, if they try to discuss their ideas with an LLM, what they’ll usually find is the LLM might have useful facts or references that are relevant.If there is any significant novelty, the LM sort of falls apart and doesn’t follow the relevance, doesn’t really understand what’s happening, makes up sort of nonsense.Tsvi 00:02:15And it’s not just occasional hallucinations or 10% hallucinations. It’s not helpful for thinking at the edge of things.Liron 00:02:22I do actually think there’s something to that. I do agree with you that LLMs, they haven’t gotten to what Steven Byrnes is calling brain-like AGI. You know Steven Byrnes?Tsvi 00:02:32Yeah. Yeah. He does good research on neuroscience.Liron 00:02:35Yeah, I recently started reading this guy, Steven Byrnes. I agree with Steven that there does seem to be a boundary. And also Jim Babcock came on my show.Liron 00:02:41There’s an episode of Jim Babcock and we discussed the same kind of boundary, which is LLMs today, it seems like we lucked out, where they’re not quite bootstrapping to super intelligence. They’re not superhuman in some very important ways, and yet they’re still so useful. And you’re putting your finger on that distinction.Liron 00:02:55And you’re saying that they’re not a creative mind having new ideas. I wanna poke at your version of this distinction more. So let me ask you the question this way. What do you think is the least impressive thing that you think AI probably still can’t do in two years?The Least Impressive Task AI Cannot Do In 2 yearsTsvi 00:03:10So I’m gonna answer your question in a second, but I just have my retort ready, which is, why don’t you say the most impressive thing that you do expect AI to do in two years?Liron 00:03:22Yeah, I mean, I’m happy to take a stab at that. I do have pretty wide confidence intervals. So I’m basically going to take things that I’m kind of impressed by today and just make ‘em somewhat more impressive.I think a really good example of incremental improvement is how right now AIs can talk to you. When you use GPT voice mode, it’s pretty good.Liron 00:03:39It talks to my kids, so I’m expecting two years from now, they’ll really polish up the lag. So it’ll really feel a hundred percent in terms of micro lags and any sort of distinguishing characters whatsoever. I think those will all be smoothed out within two years. And I also think on the video side, maybe it won’t be fully polished, but it’ll be almost as good as the voice is today.Liron 00:03:59That’s my prediction of things. I’m pretty confident will happen in two years.Tsvi 00:04:02Just to check, would you agree that those things are not that impressive? What you listed?Liron 00:04:07Well, I’m 80% confident. If you want to get to 50% confident, oh boy. I mean, you know what I mean? Once you got me down to 15% confident, then I’ll tell you the intelligence explosion. I’ll predict super intelligence. So that’s the difference between me being 15% confident and me being 80% confident.Tsvi 00:04:21Yeah. No, I mean, to be clear, I think your predictions are totally fair. I probably agree with them. That sounds in the ballpark of 80%. Partly I’m noting because now I’m gonna answer your question and I’m gonna give things that are not necessarily the least impressive things that I expect AI will not be able to do in two years, but—Liron 00:04:39Why not?Tsvi 00:04:41Well, ‘cause that’s a harder question. And the boundary is more…Liron 00:04:45Well, but the reason I’m asking you the question this way is because I feel like you think you know something about a firewall. I always bring up this concept of a firewall. Currently we’re on this track, but the track isn’t getting us to a creative mind having new ideas. That’s your language. And David Deutsch uses similar language about creating new knowledge.Liron 00:05:02You think you’re not seeing a creative mind having new ideas, and I think you’re extrapolating out as much as 10 plus years where you’re potentially still not seeing a creative mind having new ideas.Liron 00:05:12So that’s why I think my question is very fair, where you’re basically saying, hey, here’s some things that look kind of easy relative to what I do today. And I’m certain that you won’t get them within a few years because that would require a creative mind having new ideas.Tsvi 00:05:24Well, certain is not something I would say. I mean, as I said—Liron 00:05:2890% certain, right?Tsvi 00:05:30Yeah, probably. Okay. Well, so I’ll just answer your question. So, I think you are probably not going to see AI-led research, where most of the work is being done by an AI that produces concepts or ideas that are novel and interesting to humans, to human scientists or mathematicians in the same way that human-produced concepts are. Yeah.Liron 00:05:54Can you get any less impressive than that?Tsvi 00:05:56Well, this is a bit bold, but let’s say an AI won’t be able to prove Cantor’s theorem from scratch, working without human math theorem or definition or proof content or what have you.Proposed Task: Solve Cantor’s Theorem From Scratch Liron 00:06:13I like this proposal. I’m not sure I agree with it. For the audience, let’s try to dumb it down a little bit. So I think you’re talking about the theorem that there are more real numbers than there are integers.Tsvi 00:06:23Mhm.Liron 00:06:24Yeah, so I like the example because it’s like, where do you begin? It’s like, how did Cantor attack this problem? And I think before Cantor people were barely asking the question. They barely knew what question to ask, and then they barely knew where to begin proving it one way or the other. Right?Tsvi 00:06:38Yeah. Yeah, I think that’s a big part of it.Liron 00:06:40Yeah. So I mean, I certainly consider that an impressive leap. I agree. This kind of goes into the book of really impressive leaps that humans have done together with, let’s say, Einstein’s theory of relativity. ‘Cause you have to come at things from a new angle, combine fresh concepts.Liron 00:06:55Really make what you might call a creative leap. That’s a fair phrase. My observation though is that it’s like we’re drawing a line here where it’s like, oh yeah, 98% of humans working in the field of math are just below the line. They’re just waiting to receive ideas from on high, from the likes of a Cantor or Einstein.Liron 00:07:14And the AI is kind of catching up to all those humans, but then the top 2% who make these brilliant creative leaps, where it’s like one out of every million humans does it once or twice in a lifetime. Yeah. AI can’t get to there. It seems like the line’s getting pretty high.Tsvi 00:07:26Yeah. I think that’s a somewhat skewed way of viewing things. And so I would actually take a much more expansive position on who’s doing this sort of creative thinking. I would say more or less every human, at least when they’re a child, they are doing this sort of creative thinking all the time.Tsvi 00:07:44I mean, you have kids and I don’t. Maybe you have some experience watching your kids sort of play around, goof around, pick some activity that they are having fun with and just sort of repeat it over and over again maybe with variations.Liron 00:07:57Well, maybe my kids are delayed because I just haven’t seen ‘em do anything Cantor level.Tsvi 00:08:02Yeah. I’m not saying that they’re inventing new physics ideas. What I’m saying is that they’re sort of constructing their world, almost from scratch.AI Has Limitations Related to Sample Complexity Liron 00:08:11If you’re right about what human children are doing, shouldn’t we be able to answer the question about the least impressive thing AI can do by referencing a specific thing that a human child is doing?Tsvi 00:08:20Yeah. So this would be an inspiration for another answer I would give, which would be about sample complexity. I would also be pretty surprised, I don’t know exactly which numbers to give here, but let’s say something on the order of a thousand x less data.If you produce an LLM that is comparably impressive to present day systems, but it was trained using a thousand x less data than current systems. There’s asterisks on here, but basically if you did that, then I would be quite surprised, more scared. Also probably more confused.Liron 00:08:53I mean, I’m feeling you that the magic ingredient that today’s AI doesn’t quite have. I agree that it seems to have to do with the amount of data that you need in order to get a result. So even if you look at a self-driving car, yeah, they seem, some of ‘em seem to be better drivers than humans, probably better drivers than me.Liron 00:09:09But why do they need so much data when the best human drivers just don’t need that much data? They need hundreds of hours of data. Probably, you know, some people can drive with less than that. So I agree, that’s where I would also try to put my finger on what’s missing.Liron 00:09:25But it just seems hard to propose a specific functional test. So it’s like we think we have this intuition about what’s missing, but when we try to translate into okay, what objective test won’t they be able to do? So far, I feel like you’ve been pretty hand wavy. You’re basically saying the cream of the crop of scientists or some abstract thing that kids do, but you’re not really giving a specific input-output. Or, sorry, you did, you said amount of data.Liron 00:09:49But even that, it’s talking about the internals of AI.Tsvi 00:09:51Well that’s not exactly internals, but I somewhat see your point.Tsvi 00:09:56I mean you’re not gonna like this, but you know, if something is really, really clear and operationalizable, it kind of already is a benchmark.Tsvi 00:10:05Just to frame the sort of things I’m saying. Partly I’m not necessarily trying to give a nice, adversarially robust test, but rather I’m sort of trying to just give a hint to, if someone’s trying to think about this from first principles, give some hints about what I think I’m seeing, to direct attention. So it’s, yeah.Tsvi 00:10:24I think you’re putting your finger on something a little bit vaguely, but I agree.Liron 00:10:27That’s also how I think about it too. Of course. I mean, I think a lot of people are having similar, kind of vague but related thoughts. My only question for you is why does this translate into a feeling of confidence that whatever gap you think is left isn’t going to be jumped?Liron 00:10:43By all the ways that so many different people are trying to jump it. You know, reinforcement learning comes to mind as a way to jump the gaps that LLMs have.Tsvi 00:10:49So again, the thing you’re calling confidence is like, I would say three to 10% in the next 10 to 15 years.Liron 00:10:54So my mainline scenario is AI progress continues rapidly, and sure, there’s kind of a stall that you would describe as it’s having trouble inventing new ideas and it can’t do Cantor-Einstein’s mental motion, but a few years pass and other stuff happens and then it can.Liron 00:11:12Reinforcement learning is the buzzword that I use. There’s other reinforcement learning based approaches that then combine with the LLMs, and now suddenly even that’s not a firewall. And then by 2040, by 2050 let’s say, then the chance that we’re all dead is better than even.Tsvi 00:11:24Subjectively you’ve just felt the last few years like you keep reading about breakthroughs and it’s an interesting breakthrough but it still doesn’t have that spark of creativity. It’s like in your mind they’ve all fallen on the unimpressive side of the ledger with respect to creativity.Tsvi 00:11:37First, yeah, in some sense, yeah. In the relevant sense.We Need Clear Goalposts for Better AGI Predictions Liron 00:11:41Yeah, I mean, I would encourage you to try to be a little bit more concrete about specific data that would shock you. A specific test. Goalposts that are, you know, prediction market level. Did this happen or not? Because I suspect it’s one of those things where it’s tempting to move the goalposts and I’m not saying you’ve done it in the past, but I’m saying it might be tempting to do so in the future.Tsvi 00:12:02Um, yeah, I don’t really like this moving the goalposts critique that much. If someone says, if someone keeps saying, in order to be smart, or if you can do analogies, then you’re smart and are like a human, but if you can’t do analogies, then you’re not, and then LLMs can do analogies, and then they’re like, oh, nevermind.Tsvi 00:12:21It’s this other thing that’s moving the goalposts. But if you’re like, LLMs can’t do X, Y, Z, so clearly they’re not intelligent, then LLMs can do X, Y, Z, and you’re like, okay, well they can’t do A, B, C, that’s not necessarily moving the goalposts. It’s just you’re saying—Tsvi 00:12:38There’s several things that they can’t do, which each of which is strong evidence that they’re not intelligent.Liron 00:12:43Well, it sounds like you’re kind of saying that you never set down your goalposts, and I’m encouraging you to set down some goalposts.Tsvi 00:12:48Are you not satisfied with the ones I listed, so the Cantor from scratch or a thousand x less training data? Liron 00:12:56I mean a thousand x less training data I think is kind of reasonable. I think you might just be able to notice that you’re shocked even before that. I feel like that is still, you’re making yourself open your eyes late in the game the way your goalpost is structured right now.Counterargument: LLMs May Not Be a Path to AGITsvi 00:13:12Okay. I will, maybe I will spend more time thinking about that. Let me give you a counter suggestion, which is, I perceive often, and I don’t necessarily have counter evidence for you, although maybe you have done this, I perceive often that people who have what I would call confident short timelines, sort of can’t, haven’t done much to imagine, sort of an alternative hypothesis to LLMs or LLM training architecture basically has the ingredients to AGI to general intelligence.Or when they do that, they’re like, LLMs are almost there, but you just need online learning or self play or reinforcement learning or long time horizons or something. And that’s not really addressing the thing I’m trying to get at. The thing I’m trying to get at is you’ve made a big update on, you’ve made several big updates on the observations of LLM performance.Tsvi 00:14:12Try to think in the most, try to as much as you can get an alternative intuition where you’re like, oh, yeah, I see how all this is explained away by my other theory of what’s going on with AI currently, as opposed to just, we basically figured out general intelligence.Liron 00:14:21Yeah I mean I’m not sure there’s really anything to convince me of ‘cause I would’ve told you from the get go that yeah there’s a 25% chance that there is an AI winter and we won’t have super intelligence 20 years from now. That is, if you ask me why are we going to survive, that is probably where most of my survival probability goes of like, oh yeah, hey, we had a few more decades.Liron 00:14:41Your scenario. It just seems like I would still be pretty surprised. I wouldn’t be like, oh my God, this nothing makes sense. I’d be like, no, I’m pretty surprised that’d be my—Tsvi 00:14:52And can you put any words to the, to what’s going on with the, what I mean is at the level maybe you just, maybe you wouldn’t make any sort of mechanistic claims at all, and you would just say, look how impressive LLM performance is. Is that sort of your position or would you say, look how impressive LLM’s performance is?Tsvi 00:15:09So they probably have some algorithms or something, or they’re thinking or, or they’re creative or whatever.Liron 00:15:14So my position is that I am no longer able to confidently draw a boundary between what LLMs can and can’t do in terms of these tests. And you’ve shown that you don’t see the world in terms of these input output tests to the degree that I do, but I consider it a very important test where you set down all of these different goalposts in terms of specific challenges of like, okay, you give the AI this, it has to output this.Liron 00:15:37And had I done so this year or last year or any point in time, it just seems like most of the milestones I would’ve set down are being crossed. And it’s now very hard for me to say milestones. And when I say a milestone like, okay, wash my dishes, sure. But it doesn’t seem like that is a fundamental limitation.Liron 00:15:54Like it doesn’t seem like they’re going need some major new spark to be able to wash my dishes. It feels like incremental progress is all it’s going to take.Is Tsvi Setting a High Bar for Progress Towards AGI? Tsvi 00:16:01Okay. And so your critique, if I say to me the biggest update would be the AI is doing impressive scientific advances, like coming up with new concepts or scientific insights or mathematical concepts, theorems or proofs that are interesting to humans, but the humans didn’t already write about.Tsvi 00:16:20Your critique of that is, “Sure, that’s fine. That’s an input output thing, but it’s just a really, really high bar and you should be able to update before that point.” Liron 00:16:30Yes. And the reason I say it is because I think that you’re already going to notice if you’re looking at what is the state of the art of them proving stuff and helping mathematicians I think it’s very steadily creeping up in the same way, you know, with software engineering, I have a little bit more direct experience, they keep getting more useful.Liron 00:16:45You know, I’m not the only one who does this, but I just had a chat with GPT-o3 the other day about like, “hey, I’m using Firebase. Why is Firebase slow in this case? What could be wrong? And it’s like, okay, here’s the inspector tool you should use in your specific situation. You did this query, you should check on this.” Liron 00:17:00Right? So it’s already, you know, the waterline of how helpful it is keeps increasing. And I think that’s true in many different domains. And I’m not seeing a dam. You know, I feel like the waterline just chugs along.Tsvi 00:17:11Well, the dam is that it’s not gonna create new concepts and new insights. I mean, are you saying that that’s not a thing or you just—Liron 00:17:20So I’m just saying that you can break that down. If you try to cash that out as a series of tests, whatever series of tests you end up writing. It seems like the pattern is pretty strong that it’s just going to keep passing your easiest tests and then harder tests.Tsvi 00:17:35Yeah, this is a really weird line of argument to me. I mean, sorry, not weird. I mean, obviously lots of people make this line of argument. I’m just, you could say the same thing about software in general. You could be like, whenever we have a pretty clear idea of some task and it has the right properties, like it can be put into a computer at all, like it’s an information processing task, then what our software can do keeps creeping up over time and I’m like, yeah, that kind of is a valid argument and it’s related to AI, but that doesn’t really tell you now we have the insights and now we’re a few years away from—Liron 00:18:08I mean what our software can do creeping up over time. If you go back you know 30 years or whatever at the dawn of usable text to speech for example I could have laid out all these different milestones and a lot of ‘em would be AI related or whatever we wanna call AI. But yeah, you know, text, images, voice, motion, self-driving.Liron 00:18:25These things that seem like important milestones just from an outside view or just what seems like salient and you know, virtual reality, that’s a technological milestone that seems very salient even before you’ve crossed it. So I would’ve laid out all these milestones and I’m just noticing like, hey, all the milestones that I set out decades ago, they’re getting knocked out and I don’t really have many milestones left.Liron 00:19:01Yeah. And the falling of all of these milestones is connected to the same central engine. This LLM algorithm and the idea of scale is all you need in the transformer architecture. And it’s a remarkably simple architecture.AI Models Are Missing A Spark of CreativityTsvi 00:19:13Okay. I mean basically to me, I’m like, yeah, that seems true. But a very, very key component is that the core of these capabilities is coming from the giant amount of text data that we have, sort of demonstrating the capabilities. And then when you go outside of that, in some contexts, LLMs can go outside of that significantly and they’re definitely not behaving like just a human. They can, they have a huge amount of knowledge, so to speak. They can bring in lots of facts.There’s certain operations that they can do much, much better than humans. At least they can program much faster than humans for easy contexts, for easy problems. And then if you look at AI more broadly, it’s superhuman chess, superhuman this, superhuman that, superhuman image generation. But I feel like a really, really big part of the explanation for how LLMs have been hitting these milestones that I think you’re referring to is that that’s the stuff that’s generated, that’s demonstrated in the data from human text production.Liron 00:20:12Right. Okay. So I think we may not disagree as much as it seems because I actually agree with you. I think most people would agree that LLM scaling is hitting a wall. I think GPT-4.5 showed that. Remember they just threw more scale at the LLM, I think it might’ve even been 10 times more scale. And they’re like, look, it’s slightly better.Liron 00:20:28And everybody’s like, oh, okay. So we need more pieces. We need to throw more puzzle pieces into this. So I agree with you there. The prediction I’m making when I point to all these goalposts is I think we have enough tools. It’s not gonna be just LLMs, but I agree with you. If you told me, fix the architecture to that of GPT-4 or 4.5 or whatever it is, fix the architecture and all you get to do is throw in more data and more GPUs, then I’d be like, okay, yeah, that probably will in fact not show a spark of creativity.Liron 00:20:54By your definition. Maybe even ever. I’d be 60% confident. Okay. It’s never gonna show the spark of creativity. But in reality we do have a few other puzzle pieces that we can stir into the cauldron here and I think the stirring is happening constantly.Tsvi 00:21:06I agree with those two things. But why do you think that that does get us the spark of creativity?Liron 00:21:10Because all the different milestones that if I’m just saying look black box. Okay don’t even worry what’s going on under the hood. Just what are different challenges that show a spark. I would’ve said personally that self-driving better than humans robustly with fewer accident rate, overall and any environment better than human.Liron 00:21:27I would’ve been like, yep. That is evidence for, I wouldn’t call that a spark of creativity, but just a spark of general intelligence. At least.Tsvi 00:21:35Would you have said the same about—Liron 00:21:36That’s an example of one of the black box milestones and I’m like, well, that’s evidence. I’m checking that off my list of milestones that tell us that the real deal is here.Tsvi 00:21:45The black box thing is confusing to me and I don’t wanna be too accusatory and I’m not very confident at all. But let me just set this up as a conjecture, not as an accusation. There might be, I’m working this out right now, but it might be a motte and bailey between the black box and the not black box.Tsvi 00:22:02Where on the one hand you want to say, well, LLMs are maybe not the thing because of recent evidence, but we’ll just have more stuff. So you’re trying to screen off the mechanistic reasoning there, but you’re not screening off the mechanistic reason in the sense that you want to make the induction.Tsvi 00:22:19You wanna say, nah, maybe this is not a fair accusation.Liron 00:22:23I’m just saying the AI industry as a whole is now in this part where they are showing a lot of momentum on any measurable dimension that you wanna give them.Tsvi 00:22:33I mean you say people say that, but I don’t understand when you’re saying measurable. You mean some subset of measurable, like it is measurable whether an expert who tries to tinker with a new idea by talking to an LLM, it’s measurable whether they’re like this LLM had this insight that is nowhere, that I’m confident there’s nowhere in the data.Tsvi 00:22:51‘Cause I’m an expert in this particular field and I didn’t have it either. And the LLM had it. That was amazing. Does the expert say—Liron 00:22:57Right. So what I’m trying to tell you is when you take what you think is a boundary like that and you split it up into more continuous ticks like okay did it have a small insight? Did it have a bigger insight? You’re going to see that it’s having the small insights.Tsvi 00:23:09I am, yeah. I’m not sure. I really—Liron 00:23:12Well, and that’s why I’m encouraging you to make the test, because I think it’s easy for you to dismiss it by just being like well I just drew this line. It’s black and white, you know, Einstein versus not and I’m like well try drawing more lines. I think you’re going to see a trend.Tsvi 00:23:25Well okay. If I draw more lines, then more lines will be crossed. Yes.Liron 00:23:31Right and I think you’re already going to see momentum of them being crossed. So then you’d have to be like okay, but this particular line here is a special line.Tsvi 00:23:37I don’t have to say, I don’t have to say this particular line is not gonna be crossed. Why? That doesn’t make sense.Liron 00:23:46Well so if you agree with my premise here. So it’s like let’s say you’d be impressed if they do the Cantor diagonal proof without having that in their data. But imagine you write a bunch of less impressive proofs working up to it, and let’s say they’re getting halfway there or whatever. So you would just keep asserting like okay yeah, but out of all these different lines and milestones that I drew those are all easy up to this certain point that I’m pointing at.Tsvi 00:24:10I mean, I’m not sure I’m following. Whenever we do this, we’re going to be gaining new information. Like we didn’t, before you make GPT-4 or 3, you, even if you were anticipating, you’re gonna get this weird distribution of capabilities where you’re extremely superhuman in terms of how much knowledge and the ability to answer a huge array of questions and even the ability to solve certain kinds of problems.Tsvi 00:24:35But you’re not gonna be able to be creative in this way, blah, blah, blah. Even if you called that in advance, drawing the lines is gonna be extremely difficult. And if you try to draw the lines beforehand, you’re not gonna be successful.Liron 00:24:48I’m just arguing for what I think is good black box methodology. So the methodology is let’s say you wanna investigate the brilliance of making leaps during proving, okay? So make 10 steps. A score one to 10 difficulty leaps. And I think it’s going to advance on your scale.Tsvi 00:25:03Okay. Well, I don’t necessarily disagree with your proposed methodology. That sounds reasonably good to me. I’m not in the business of constructing these benchmarks. I take your point though that that would be a way to update more. I don’t really agree with part of the argument though, which is you do this. You make 10 steps, and then the AI does the first 3, and you’re making this strong update. You should make a weak update. You should update in favor of capabilities are increasing.Tsvi 00:25:32But you should also update in terms of like, well, this is how far you can go with this limited AI method.Liron 00:25:38In the example of Cantor-Einstein I feel like you have this example of a class that’s like an 8, 9, or 10 difficulty and you’re like well it’s not doing that. So I feel pretty good that it doesn’t have a spark and I’m like, okay, but did you see it’s doing 1, 2, 3? And you’re like, well that doesn’t mean much.Liron 00:25:51Maybe it’ll stop at 3. And I’m like, well, why don’t you think about this more and maybe tell me that 4 or 5 is your real firewall.Tsvi 00:25:58Wait, sorry, can you rephrase that? I didn’t understand that.Liron 00:26:01Yeah. So this hypothetical example where a proof as good as Cantor’s proof is like an 8, 9, or 10 difficulty of having a leap of insight. And I’m saying but look at all these smaller proofs it’s doing. These are still considered smaller leaps of insight. No. So are you, basically, what do you think is going to happen?Liron 00:26:15Do you think it’s just going to stop at 3 because it just got to 1 and then 2, and then 3? So are you claiming it’s going to stop at 3 or are you claiming it’s just going to stop sometime before 8? Can you maybe nail down the goalpost of when you’d first be like, “oh crap it’s not stopping where I thought it was gonna stop”?Tsvi 00:26:29Like the claim is that there’s a smear of like, yeah, I think it’s gonna stop before 10, probably with fairly high probability. Where exactly that happens will be smeared across things. And so I will update somewhat, but just not that strongly on any given step.Liron 00:26:43So from my perspective it feels like you’re choosing an example that you know is above the waterline today. And the problem is the example you chose, it’s one that’s only going to be knocked down very late in the game. So I guess I’m just asking you if maybe you can put more work into choosing examples that aren’t quite as high above the waterline on the spectrum for like, well—Tsvi 00:27:03Okay. Wait, wait. Okay. Listen, as a framing point, there’s no obligation of reality to give you a nice set of informations that you’ll receive at different times that will give you information about how close the intelligence explosion is. You could just not be able to tell. Liron 00:27:26Of course there’s not but I’m just observing that anytime that I try to do the exercise. If I were to lay out a series of math, to be fair I haven’t done it but the impression I get reading other reports and headlines is that, people keep getting more and more impressed that on any dimension or on any scale that you give it, just seems like it’s just climbing the scale.Tsvi 00:27:47Well, on some scales and then not on others. .Tsvi 00:27:52Okay, you’re saying that’s an artifact of me not having made the scale nice and continuous.Liron 00:27:58Exactly. I think it would be productive for you to make the scale and be like it’s easy to claim like, oh yeah this super hard thing that only the top 1% of humans can do. The AI can’t do that yet today. But I think it would be productive to try to be like, okay, is there anything that the median human can totally do today that you would be shocked if the AI could do in a couple years?Liron’s “Black Box” AGI TestLiron 00:28:16My methodology is just talk about something, a useful application, and then have a scale that relates to the useful application.Tsvi 00:28:23But okay, when you, I feel like this sort of intentionally mixing in a bunch of different stuff. Like if you’re, I don’t know. So let’s say language learning as just a random example. LLMs totally have the core competency or something to be a good language teacher. Like they can both speak in many languages as long as there’s a reasonable amount of text data.Tsvi 00:28:44And also they can using scaffolding and multiple agents, you can have guys checking your spelling, checking your pronunciation, checking your grammar, giving you advice, blah, blah blah. So they definitely have the core competencies. And then there’s a separate question of like, can you implement it?Tsvi 00:28:58Do you have good tastes? Did you roll out your product well, blah, blah, blah. And if you’re asking me to predict, will there be a nice product where I can just drop in and learn a new language in the way that I actually want to, that’s useful to me. That’s a really complicated question. But it also bears less on AI capabilities. Maybe I’m misunderstanding what you’re—Liron 00:29:18You’re basically saying that I’m trying to package up too many variables when I talk about an application. Yeah. But I still think that it’s a pretty natural layer to ask the question.Tsvi 00:29:27I don’t, because I mean, if you package up a bunch of variables, you’re going to get a range of how impressive that variable should be, if the LM could do it, and also how useful that variable is. So you’re gonna get some variables that are kind of useful for the task, but shouldn’t actually be that impressive or wouldn’t update me that much.Liron 00:29:43I mean the specific examples of if we look at the kind of things that I think are meaningful tests, so I mentioned self-driving and then there’s writing essays that get an A. I mean, you don’t think that’s a meaningful benchmark.Tsvi 00:29:55Um, not very much. No.Liron 00:29:59I mean, don’t you think that would’ve been an incredible benchmark to be talking about 10 years ago?Tsvi 00:30:03I agree. It’s very surprising. But if you think about it longer, you’re like, Hmm. I guess it kind of is in the training data. Yeah.Liron 00:30:14Right. I mean when you conclude I guess it’s in the training data. I see that as not that useful to my methodology because I think my methodology should just allow, okay, yeah, the AI can exploit different paradigms and instead of retroactively judging, like, ah, yes, all of the things it can do are just because of this paradigm and therefore it’s going to stop.Liron 00:30:30I think it’s productive to just have a black box measurement and not open the black box until you first look at the results of a black box benchmark.Tsvi 00:30:39And why is that useful?Liron 00:30:41Black box benchmarks. I mean, it just prevents you from having the confirmation bias of being like, oh, yes, well, I understand the paradigm which of course you get to learn the paradigm, after seeing the results of the paradigm. I think it’s a protection against confirmation bias.Tsvi 00:30:54I’m not sure I’m following—Liron 00:30:55Sorry. And it’s not just that. It’s also the thing of, I don’t want you to, it also prevents you from getting too attached by zooming into one paradigm. Like yes LLMs are a really important paradigm but I wanna make sure you’re thinking about the field of AI as a whole, which is mixing paradigms.Liron 00:31:10And the black box tests protect you from diving into the details of one paradigm that you think you understand well. And just looking at the bigger picture of the whole industry.Tsvi 00:31:19And then you infer from this 50% probability of AGI in the next 10 years.Liron 00:31:25Yes. Anytime. Yeah. The same way that most people do. By the way there’s an outside view argument. Most people are saying the same thing I see. Which is just any of these natural black box dimensions, like these tests where I’m like, look, I’m just standing back. I don’t even claim to understand AI. I’m just looking at all these tests and I don’t know where somebody is getting a scale that’s making em be like ah yes it’s stuck on the scale.Tsvi 00:31:46You don’t know where someone’s getting a scale or it’s stuck on the scale. I think what I’m — maybe the structure here is that I’m sort of to a significant extent, or probabilistically, I mean, trying to explain away the observations of LLM capabilities in particular and saying basically, well, it’s ‘cause it’s in the data.Are We Going to Enter an AI Winter?Liron 00:32:09Right. I mean, so my question is, do you think we’re subjectively going to enter an AI winter soon or no?Tsvi 00:32:15Okay, well again, you’re, that’s integrating a whole lot of variables. Like, um, so I mean—Liron 00:32:22I mean, I think you think we are right? ‘Cause you’re saying the probability of AGI soon is low and a lot of the companies are now promising AGI soon.Tsvi 00:32:28Right, so my guess is that we will not get AGI soon and not, not super strongly, 80, 90% depending on where you, try the year. But that doesn’t necessarily mean there’ll be a winter. Like they are already having pretty substantial revenue. They’re probably going to significantly expand the revenue.Tsvi 00:32:46I don’t know whether, I don’t know the economics of—Liron 00:32:48So I feel like you’re, I think you have to strongly predict that there’s totally going to be, very likely going to be a subjective, like disappointing AI winter, because you’re telling me—Tsvi 00:32:56Oh, oh, sorry. Maybe I misunderstood. You’re trying to, you’re saying will research keep progressing in will, will it keep, we keep getting similarly impressive things or not? Is that—Liron 00:33:09Yeah. And I mean so from my perspective. Any of these natural black box metrics are just going to keep scaling. I think you really have to go out on a limb. So you’re saying, ah, yes, no, you’re gonna slam into a wall and these companies are gonna miss their revenue targets. It sounds like that has to be—Tsvi 00:33:24Well slam into a wall is slightly strawman-y, but very, at a very coarse level. Probabilistically. Yes. I think that these things are really smeared out though, because; so you might say, and I would also say, well, we’re gonna do o3-style, o1-style, reinforcement learning, or we’ll do new things.Tsvi 00:33:44People will come up with new circuit breakers. People will come up with new training algorithms they’ll come up with.Liron 00:33:50And we both think that they will do that, right?Tsvi 00:33:52Mm-hmm. For, yeah, certainly. And so that’s going to at least kind of, what’s the word? There’s unhobbling, and then there’s also like, perform, unlocking performance. Liron 00:34:03So in your mind, they’re gonna be trying all this other stuff, but it’s just still going to take decades for them to, for something to really click and restart the singularity.Tsvi 00:34:13You’re going to get increase in capabilities. I don’t know how to call in advance subjectively how impressive they’ll be. You might get an intelligence explosion in three years or 10 years. But yeah, my main line is that it’s just more smeared out over 10, 20, 30, 40, 50 years where you, yeah, you have multiple paradigms if you like, or just multiple insights, multiple algorithms and working out how to combine them and yeah.Liron 00:34:41So to summarize the crux of disagreement I think it’s like I just see the amount of puzzle pieces that people are working with to already look like there’s just enough going on to probably finish the job soon.Tsvi 00:34:52So now you’re doing mechanistic reasoning.Liron 00:34:56I mean, I admit that there’s an interesting mechanistic statement to be made about how it looks like LLMs don’t directly scale to super intelligence. And so you just need a little bit more. I’m willing to go, yeah, I’m happy to admit that is likely.Who Is Being Overconfident? Tsvi 00:35:09Well, I’m trying to, I’m just trying to track where, from my perspective, you’re somewhat overconfident. Or let’s just say confident of AGI in 10 or 20 years. So I’m trying to understand where that confidence is coming from. So you’ve talked about the black box thing where you’re explicitly saying, don’t be reasoning about the mechanism, but now you’re saying, well, I see lots of people are producing mechanisms and we have lots of little pieces of mechanisms and that seems like it should add up to a general intelligent mechanism.Tsvi 00:35:41I don’t know.Liron 00:35:42Yeah maybe the connection between ideas is I think the black box tests are really important to just kind of objectively tell you the momentum of things. Of course, momentum doesn’t have to hold. Momentum can peter out and there’s probably been, you know, AI summers where, burst of progress happened and then petered out.Liron 00:35:58So, but I think that’s a good starting point. And then, now just based on that starting point, things seem to have high momentum now across the board. And then when I open the black box and I’m like, okay, well what’s the driver of the momentum? One is the LLM scaling, which is like, okay, GPT-4.5 was kind of bending the curve, but I also see a lot of other puzzle pieces and a lot of other type of results going down, you know, like AlphaFold.Liron 00:36:22I’m like, Hmm, okay. That’s a different puzzle piece that people are also mixing in together with the LLM puzzle piece. And I’m like, well, it seems like these are powerful puzzle pieces getting mixed and zooming out. It seems like the black box results keep getting more impressive. And then, you know, deferring to other people’s opinions.Liron 00:36:38Like the consensus of people in the field. It’s not like a bunch of experts in the field are being, don’t worry guys. You know, Yann LeCun aside who says it’s maybe a little over 10 years, that’s his prediction. Even he’s not contradicting the consensus that much. So I’m just mashing it all together.Liron 00:36:51I mean, don’t get me wrong, if tomorrow I check the AI news and everybody’s like, Hey we’ve all changed our minds based on evaluating the data. I’m totally willing to reconsider, but actually I just feel like you’re the one who’s coming in over confident.Tsvi 00:37:02Okay, so some of it is coming from other people’s opinions, which is fair.Liron 00:37:10Yeah, it’s signals about potency of the puzzle pieces. You can describe it that way. I’m only getting positive signals about there being a lot of potent puzzleTsvi 00:37:17Can you tell me a bit about the structure? Like are there a few people you could name who have 40% of the Liron belief to the extent that your belief is coming from other people saying things? Is there some small set of people you can name or is it like a category of people or?Liron 00:37:31It’s basically whoever’s voting you know in Metaculus. It is just commentators. There’s not that many commentators that I’ve seen that I respect who are like yeah, I totally understand AI and LLMs, but I would be shocked if it comes in less than 20 years. I mean, you know, Gary Marcus I guess, but even he’s admitted to having a 30% chance and he’s on the extreme of people who are skeptical that it’s coming soon. He said 30% chance coming less than 10 years on my show.Tsvi 00:37:57Mm-hmm. Okay. So it’s just the fact that most, that very few people say less than—Liron 00:38:03Well that combined with the objective tests. I mean the only other thing I can do is try to get a deeper gears level understanding of AI itself which isn’t my expertise. I mean, I’m interested in it, you know, I study it when I get a chance. But I mean, I just think I’ve got a calibrated, I mean, I’ve already given probability to it taking a long time. I gave you a 25% chance that it’ll take more than—Tsvi 00:38:21Mm-hmm. So yeah, it’s the people, it’s the black box observed capabilities, and it’s some sense of there’s a bunch of mechanisms maybe they add up, or seems like there’s a lot of mechanisms and there’s a lot of people working on combining them.Liron 00:38:36Right. And it and it’s this idea of like okay describe the scenario where it doesn’t happen. I asked you to do it. And you’re like oh well it’s a scenario where it peters out at being able to prove this thing but even your boundary seems like, you’re probably going to end up moving your boundary once you start seeing the progress. That’s the sense I get. I don’t get the sense that you’ve actually drawn a meaningful boundary.Tsvi 00:38:59Well, I’ve generally been pretty careful actually about saying I’ll be very surprised if X, Y, Z.Liron 00:39:07I mean I agree that you’ve drawn a somewhat meaningful high boundary. And I’m willing to believe that you will in fact be surprised right before the world ends.Tsvi 00:39:19Okay, so I mean, I guess my, if you’re accusing me of, you’re accusing me of sort of sloppy or poor epistemics in this particular case, basically, which is fine.Liron 00:39:31No. I mean, I don’t think, I think you’re, everything you said is reasonable. I guess I just wish you would be like, you know, what I should do is try to define a lower boundary for myself that would wake me up, that the puzzle—Tsvi 00:39:47Well, I, okay. Well, I guess I’m kind of saying it doesn’t feel that compelling because I don’t really feel like nature has to give me an indication like that.Liron 00:39:54Yeah. It doesn’t have to. I mean that is true. That is true but I think nature is going to be sufficiently generous to give you earlier warning signs than Cantor level. Yeah.Tsvi 00:40:04Thank you, nature.Liron 00:40:07I do think nature’s giving us some signs. I mean, I don’t think the herd of people including you know highly smart qualified AI experts who are converging on next decade. You know, the Turing Award winners of the world, the David Duvenauds, Andrew Critch. You know, just to throw out a couple. Geoff Hinton, I mean, Yoshua Bengio.Liron 00:40:26I mean, these are all highly qualified people who are like yeah it doesn’t seem like it’ll take more than a decade. And so it just, it does seem like you’re being very confident on your low level technological prediction here.Tsvi 00:40:40Okay. I mean, I guess what are the reasons for being confident that it’s coming soon?Liron 00:40:45What are they? I mean so when I talked to Critch and David Duvenaud I actually think that the vague stuff that I’ve said is probably similar to what they, I mean, they’d probably say additional stuff, but I don’t think I’m way off base. Like when I talked to Andrew Critch for example, I specifically remember him kind of agreeing with me of like, yeah, I just don’t see a firewall.Liron 00:41:05It just seems like things are just on track to creep up.Tsvi 00:41:08I mean, yeah. So another thing, another piece of intuition that I can bring in that won’t change anyone’s mind, but is, bridges don’t stand randomly. You don’t just pile steel up and then you have a bridge because you don’t see a good convincing, clear reason that it shouldn’t be a bridge or shouldn’t stand up.Liron 00:41:30Right. But intelligence—Tsvi 00:41:32Intelligence. Intelligence is a specific thing or a specific class of thing.Liron 00:41:37It’s a specific class of thing, but I think it’s productive to test—Tsvi 00:41:40It involves—Liron 00:41:41A series of tests for it.Tsvi 00:41:43Fair. Fair enough. Okay. But what we were trying—Liron 00:41:46I mean that, that’s basically, that’s my beef with you: Is I feel like you had this one test and you’re not willing to have other tests because you don’t think that reality should necessarily let you make other tests. But I think you should dig harder.Tsvi 00:41:58I mean, at some point, I’m just gonna say I’m busy. But I, well, I guess I’m sort of taking you as talking to people like me or something. So I’m responding that way.If AI Makes Progress on Benchmarks, Would Tsvi Shorten His Timeline? Liron 00:42:11I mean, when LLMs came out, did that at least shorten your timelines at all?Tsvi 00:42:15Uh, so I didn’t really have timelines before that. I agree. I agree that people should shorten their timelines when they see LLMs. Um, yeah. But I’m, yeah, I guess when I ask people why, you know, how did you update? So far I haven’t gotten very clear answers.Liron 00:42:35So the same way that reality showed you some evidence with LLMs, that timelines are shorter. I think it’s going to be generous enough to show you again. You just would, you know, work on your goalposts.Tsvi 00:42:45Um, I don’t, yeah. Yeah. I don’t really get, so does it make sense to you that there could be a series of ideas? There’s like five of them or something? We have two of them. Each time you get one, you get a significant burst of more capabilities. And then, you know, there’s a fast period of growth, then it tapers off and then you get another one, and then when you get the fifth one, you get an intelligence explosion.Tsvi 00:43:09Does that make sense? As a kind of world?Liron 00:43:12I mean those five things are, you could call those what I mean by puzzle pieces but I also think that at a high level we already know the puzzle pieces. Okay, high dimensional spaces, transformer architecture, reinforcement learning. I think at a high level, we probably have all the high—Tsvi 00:43:27Why do you think that?Liron 00:43:30I think that because of my inability to draw these clear boundaries of what it can and can’t do.Tsvi 00:43:34Sorry, sorry, keep going.Liron 00:43:37Yeah. No, that, that’s pretty much it. It is just, there may be a boundary, but the fact that I can’t even name what the boundary is, that’s not a state I’m used to in any field. I mean, usually we can just point to oh this might be a boundary, this might be a boundary. And you’re, you know, it might be a boundary is—Tsvi 00:43:52But I’ve named, I’ve namedLiron 00:43:53A spark of—Tsvi 00:43:54No, but you keep, you keep replacing, but I. No, no. You keep, you keep replacing. No, I’ve given several specific things though. And you’re just like, well, it’s a high bar. It is a—Liron 00:44:04Yeah.Tsvi 00:44:04I mean, I didn’t give you the boundary, but I—Liron 00:44:06Right. Okay. So just to clarify then, so you just think that the AI is probably going to march along and be really close, you know, just be way better than most human mathematicians at math, but just not as good as the top mathematicians.Tsvi 00:44:20No, it’s way better than all humans at some pieces of math or some aspects of doing math. And it’s gonna be somewhat better at other aspects and somewhat worse at more aspects, and then quite bad at other aspects?Liron 00:44:37Okay. So you think that we’re for the next two decades let’s say the 90th percentile mathematician is just going to be more qualified for a job, a better hire for a job than the best AI. Tsvi 00:44:50Well, it depends what job it is. It is plausible that you will, I don’t know about replacing jobs. I’m not sure that there’s gonna be some jobs you can completely replace. There’s some jobs that you can mostly replace or replace where you have one human and 5 AIs doing the job of 10 people or whatever.Tsvi 00:45:07There’s gonna be a whole spectrum like that. If you’re talking about mathemat, like research—Liron 00:45:12Sure. Let’s talk about the research, the publishing work. So for publishing, you’re telling me that if my only choice, let’s say I have a job as a research mathematician and my only choice is to either use my own brain or use the best math AI brain from 20 years from now, you’re saying that my own brain is gonna be better.Tsvi 00:45:32It depends what you’re trying to do. Like if you’re trying to, yes, it is better to use your brain if what you’re trying to do is do math research that is beautiful and interesting to other mathematicians and in a way that then more people talk about the concepts you were discussing. You’re still gonna, there’ll be more and more use of computers and AIs in math research for sure. But if you’re trying to do that kind of math, you’re still gonna want the human for a long while probably.Tsvi 00:46:07Yes. There’s gonna be some things where you mostly want to just use the AI, like, I dunno what, checking—Liron 00:46:16I mean I think this is what I’d wanna dig into. Is you’re not just even claiming the Cantor genius level stuff. You’re saying hey even stuff that mathematicians that are merely average professor level, even they do things that the AI is going to struggle with for decades.Tsvi 00:46:31Some of the stuff they do. Yeah.Liron 00:46:32Yeah, I mean I guess. So I mean, I think that’s really the way forward. I think what you wanna do is look at this somewhat average mathematics professor and isolate the aha moments. Like the one hour periods where you’re like, aha, this one hour period, the AI couldn’t have done that. And that could be your next goalpost.Tsvi 00:46:50Yeah, and then you’re, I mean, all these things are gonna be hard to measure. So you need to be checking that it’s not something that actually basically was already in the training set or the otherwise accessible data. Liron 00:47:05Mm-hmm.Tsvi 00:47:06Also, you might want to parameterize over compute power. Like how much compute did the AI take to replicate the same thing. I’m not that big on caring about compute power, but that’s another parameter you might wanna, yeah, you could do that.Liron 00:47:24Yeah. And obviously I know this isn’t your main focus. You’re not gonna take the time to do it. I would just claim that when you make those kind of benchmarks and then we can even turn the dial of like, okay, 90th percentile math professor. 10th percentile math professor. At some point you got the world’s worst math professor.Liron 00:47:41Like surely he’s going to be replaced by an AI pretty soon. And that’s the spectrum I encourage you to look at.Tsvi 00:47:48Okay. I mean, again, I just don’t feel like nature has to be that clean for you. So an example is, I suppose I agreed to something like this where I’m like, okay, yeah, if we start climbing this scale, then I’ll, then I’m in trouble. Then I have to update. And then what we actually get is a particular scenario, and then I’ll retrospectively be like, wait, wait, wait, wait.Tsvi 00:48:12I know what I said. I know I said that, but actually I wanna retrospectively revise my prediction. Where, what happens is, there’s a subset of math where there’s a subset of what we would currently call creative research math, where actually the AI does start having insights that equal or surpass better and higher and higher caliber mathematicians.Tsvi 00:48:37The subset could be like things that involve large amounts of computation or large amounts of sort of—Liron 00:48:44Right. Yeah. But you just took specific math professors and they’re like, yeah, their subset of math seems like a representative subset of math. I mean these aren’t hard goalposts to set up in my opinion.Tsvi 00:48:53They sound kind of hard to me, but yeah, someone could do that.Liron 00:48:56Like if you just, if you just learn, right? If you pick 10 mathematicians and you just rank em in terms of how impressed you are by their overall amount of brain power that it seems like they use in their publishing. Okay, so you rank the 10 and then one day you learn that the guy in the middle of the pack that you thought was pretty damn impressive, turns out he’s been letting an AI do his entire job. Like he’s literally not showing up for work.Tsvi 00:49:17Mm-hmm.Liron 00:49:17Don’t you think that would be an earlier alarm than going all the way to the math geniuses? The top math geniuses?Tsvi 00:49:23No, that’s, that’s basically what I said, isn’t it? Liron 00:49:27Well you said that you expected reality to serve up a situation where yes he’s been sleeping on the job but then later you learned that his job actually was easier than you realized in some sense. But I’m saying, just pick him to not be like that.Tsvi 00:49:38Yeah. At some point that is, that’s sort of the scenario I’m saying where I would be surprised and would update, which is if you have an AI that is producing many concepts across across some significant range of fields of endeavor that are interesting to humans in the way that human math concepts are interesting.Tsvi 00:49:55Yeah, that would. Liron 00:49:56I mean, if you see, if you see math professor unemployment skyrocket wouldn’t that be an earlier alarm for you than what you’ve previously beenTsvi 00:50:03Well, if it’s because of, if it’s because of this.Liron 00:50:07Right. I mean that’s all I’m asking for: Is like okay notice when math professor unemployment skyrockets.Tsvi 00:50:12Sure. But then I’m gonna look in, if I look into it and it’s for some other, like the NIH cut a bunch of funding or whatever then, or the, whatever national funding, the NSF cut funding, then I’m like, well, nevermind.Liron 00:50:24Okay sure I agree that that could be a good excuse. But don’t you think we’re making progress toward letting you notice that the singularity is happening and shorten your timelines?Tsvi 00:50:33Not that much, but—Liron 00:50:34Okay. I feel like we’re making a lot of progress, but whatever.Closing Thoughts & Tsvi’s ResearchLiron 00:50:35All right man. Well I think, I think this is a good point to just quickly recap. So we talked about MIRI’s research and we both agree that intellidynamics are important and MIRI has legit foundations and they’re a good organization and still underrated.Liron 00:50:50And we talked about, you know, credibility is one of those things, and decision theory. People can go back and listen to the details. And then we talked about your different hopes for how we’re going to lower P(Doom) and it’s probably not going to be alignment research. And so maybe it should be germline engineering or how would you tweak that summary?Tsvi 00:51:07Yeah, we should ban AI. We should talk to people, try to really understand why they’re doing AGI capabilities research and try to give them a way to do something else. And we should try to make smarter humans, maybe BCIs, but the way that will work is germline genetic engineering.Liron 00:51:29Eliezer Yudkowsky talks about this idea in his famous post “Die with Dignity” where it’s like it seems like we’re probably gonna die but at least what we can do is try to do actions that significantly lower the probability, or maybe not significantly, but measurably lower in micro-dooms or whatever.Liron 00:51:45Like at least you lowered it, you know, a millionth of a percent. That’s something. If we all do that, maybe we can get to less than 50%. I want to commend you because you are somebody who is genuinely helping us increase our dignity. I won’t say the die part. You’re helping us increase our, I think it’s a dignified action by this standard to boost our capability for germline engineering to get us on the path to making smarter humans.Liron 00:52:07I think that you’re actually helping and not just talking about it and joking about it and going do your regular job and trying to be a rock star and make money. I think you’re actually part of the solution, so thank you for that.Tsvi 00:52:17Thanks. Thanks for your work. Talking about this stuff seems pretty important. We should be, we should all be thinking about this a bit more. Liron 00:52:26Hell yeah. We should. All right I appreciate it Tsvi Benson-Tilsen. Thanks for coming on Doom Debates.Tsvi 00:52:30Thanks Liron. Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe