Ep 3. Our approach to AI is broken | George Morgan

72m 38s

The transcription recounts George Morgan's quest to locate a rare Sony robot in Japan, showcasing his dedication and eventual success in accessing and turning it on after years of research. It also introduces Symbolica, George's company, aiming to revolutionize AI by focusing on symbolic representation instead of statistical methods. The conversation delves into the differences between current machine learning models and Symbolica's approach, highlighting the goal of achieving general intelligence through symbolic manipulation. George explains the symbolic system as building relationships between entities to encode information, contrasting it with statistical-based algorithms. Symbolica's unique approach challenges the conventional supervised learning method by redefining how AI models are trained and providing a new perspective on achieving true cognition in artificial intelligence.

Transcription

12048 Words, 65802 Characters

And then I had to write an email to Elon explaining why I deserved to join the autopilot team with that degree, yeah, exactly. And he responded to my email and asked me like three questions, he's like, "Why do you think that you're the right fit for this team?" I answered those and he responded, "Cool, just just cool." And I didn't know what that meant, but like three days later I got the offer letter. Welcome to the Edge of Infinity, a podcast of builders talking to other builders in the AI space. Together we discuss where the field is at today and make predictions about how things are going to look like in five, ten, a hundred years time. I'm your host, Lena Evansini-Colucci, thanks for joining me as we imagine the future of AI together. Today we talk with George Morgan who's the founder and CEO of a moonshot AI company here in Silicon Valley. George believes that the way we train AI models today is fundamentally broken and will not get us to AGI. So the approach that he's taking at his company, Simbalca, is completely different. If it works, Simbalca's approach will change a lot about AI and the economics of AI. It'll enable us to have AI models that are orders of magnitude faster and cheaper than what exists today, plus with a lot more capabilities. George is also just a fascinating human being and there's so many fun stories in this episode. For example, George had to email Elon Musk in order to get his job at Tesla. So we'll talk about that email, what Elon said, what George learned from years of working directly with Elon, his differences with André Caparthy while they were both at Tesla. George worked with Mr. Beast, who's the most famous YouTuber in the world, to recreate a Squid Games game on YouTube. And George felt the hardware for that in his garage. George also had this long obsession with robots and one robot in particular that took him on this 20 year long saga around the world. We'll talk about all of that, plus our predictions for AGI, the future of AI in this episode. And without further ado, I hope you enjoy my conversation with George. Welcome to the show. Thanks for having me, it's great to be here. So I was chatting with one of our mutual friends. He said that you're really into robots and that your search for a robot took you all the way to Japan one time. So I was wondering what is this robot story that took you to Japan? Yeah, that's a great question. This is probably one of the weirdest, strangest and most ambitious things I've done. You know, one of the projects that I've worked on over the longest period of time that eventually ended up coming to fruition in such a strange way. It starts all the way back when I was seven years old. So my mom took me to Barnes & Noble, and you know, this is something that we would do together all the time, like just as a pastime. Like, my mom would always take you to Barnes & Noble and say you can buy one book, like whatever you want, as long as you promise to read it, you can buy the book. And so we went to Barnes & Noble and that day I found a book on robots and I was like, this looks really cool. This is something that I really want to learn more about. And of course, it's basically just a picture book when you're seven, but you know, I knew I wanted it. There was a picture of a robotic giraffe on the front of it, which I thought was really cool. And so I picked that book out, I came home and I read it, and unlike the third page, there's a robot called Sony Curio. And it's an entertainment robot built by Sony that never ended up being sold or produced. It was kind of like the companion to Sony's robot dog, the Ibo, like the humanoid version of the Ibo robot. Interesting. And I instantaneously fell in love with this robot and like, I wanted one just from the picture in the book. Absolutely. Oh, yeah. It was like I immediately became super obsessed with it and like I googled everything I possibly could about it. And I was pretty sad to realize you couldn't actually buy them because you know, they weren't for sale. Now, at the time I was learning about this robot, Sony was still actively doing research on it. Sony has like an AI laboratory called the Sony Dynamics Research Laboratory. Sorry. Sony Intelligence Dynamics Research Laboratory. It's since been rebranded to Sony AI, but that's kind of like a separate little story to tell. But yeah, Sony was still very much Ricky on this robot and I kind of like my goal in life, you know, at the time, like my only goal in life was going to Japan and getting a chance to see this robot in person at age 7. At age 7, yeah. And so I begged my mom, was like, please, please take me to Japan, like, you know, it's all I want to do. I even started learning Japanese. I mean learning. I mean learning. Like, yeah, like, you know, like it was just like one of my little obsessions, like, you know, that I had on it was seven. But yeah, the years passed when I ended up, you know, doing other things and, you know, at some point along that process, Sony ended up canceling the project. So like news, publications, and information about the robot just completely ceased from like 2004 on. So I want to say like in, in probably like 2019, I was doing some more research on the robot. I saw, you know, really nothing had been published about since 2004, and I was kind of curious. I was like, you know, I wonder if any of these robots actually still exist. And so I kind of like started pouring a lot of my time into trying to see if I could find anyone that knew anything about the robot, like I sent out a bunch of emails to a bunch of different people, most of which I never got a response from. But I eventually got a response from one of the people that actually worked on motion planning for the robot back at Sony at the lab. And he agreed to get lunch with me because he works in the Bay Area. And so he went and got lunch at a Japanese restaurant. He told me that all of the prototypes were crushed, just completely destroyed. On purpose? On purpose. Yeah. Because in order to save Sony from bankruptcy, they basically had to eliminate any sort of like, you know, R&D output done by the team. And so they had to like crush all these robots. And then they could do a big tax write off. There's all sorts of corporate politics. But it in part saved the company, but he said, I do know one person who might know if one of them still exists. And so I got an introduction to that person. And that person said, I can't think of any that still exists, but I can keep my eyes peeled. A couple years later, I get an email from him saying I found one. And I was like, I was elated, like I could not believe it. And this was like during the height of COVID, so it was impossible to get into Japan, like literally tourists were not allowed. You could not get into the country. The only way to get into the country was to get a business visa for business purposes. And so I pulled some strings and I ended up having some friends of mine that were going to Japan for business purposes as well. And I kind of like hopped on the training of their business visa. And I ended up by like some crazy miracle getting into the country. And then we went to the Sony building and you know pulled this thing out of storage and turned it on for the first time in 20 plus years. Oh my gosh. And it turned on. Yeah, it was amazing. It was one of the coolest things that we've done. That's incredible. You know, most people don't have their moby dick, like like saga starts so early, but you had one, right, it's like 20 plus year saga. And what happened? Do you, did you get to keep it? No, I did get to keep it, but you got to turn it on. Yeah, got to turn it on, you know, we got to go through all of like its diagnostics menus and like, you know, there's so much stuff that I learned about the robot that's just not documented anywhere at all because like all of the videos taken of it before the project were shut down or all of like digital cameras from like 2004. So it's all like potato quality video and like super compressed audio and there's really not that much high fidelity video of the robot on the internet. And so like there's so many details of how it moves and how like you interact with it that just aren't present on the internet. So like I got to see all of these details and we got to go through all of its like diagnostics menus and like everything was in Japanese too. So the the person that I was with broke Japanese and English and he was like translating to me what the menus were saying. So so cool. Did you get any photos with it or was it a top secret behind, you know, can't tell the world this exists? Yeah, so I got, I got very high resolution photos in video of it. Sony was like just like incredibly supportive of me coming in and taking pictures of it like we got permission from Sony to do that. And not only that, but the day that I was there, the person who was with me kind of like pointing this thing out of storage and turning it on sent a message on the internal like Sony team like teams chat to the creator of the robot, the person that started the project and said, hey, there's this kid here that like really wanted to see this, you know, Curio robot, you know, are you in the office today? And by some miracle, this was like the one day that month that he was in the office and he came by and introduced himself to me and I got a photo with him. The creator of the robot. Oh my gosh. That's amazing. Well, if you have that photo, we can add it to the show notes of the podcast after and also, you know, I need to go look up this robot now and we'll definitely include it in the show notes as well. Sounds great. Yeah, I can type into it too. That's the coolest story ever. So George, you're the founder of Symbolicum and, you know, the way I describe your company to other folks is the most important company being built in Silicon Valley right now. It's a moonshot, right? You guys are going after, but if things work, the way you hope that they'll work, right? It's going to be a game changer for the way AI is done in the world. And so your mission is make machines like humans think symbolically, not statistically. So what does that mean and what are you guys doing at Symbolica? Yeah, you're absolutely right. Symbolica is kind of a total moonshot idea in AI. And I think you did a really concise job at kind of explaining what our mission is. For some context, all current machine learning algorithms, you know, can effectively be thought of as a form of computational statistics. So basically, we all kind of took statistics classes in high school and learn about probability distributions and like all sorts of these kinds of phenomenon that happen and observed data. And machine learning is like an extremely beefy scaled up version of what are pretty, you know, simple statistical methods. And they're scaled up into high dimensions and they're trained using lots and lots of data. But ultimately, name of the game in machine learning is figure out how to most accurately continue the pattern, statistically speaking. So for large language models, for instance, they've been trained on a massive corpora of text from the internet. And they've learned the statistics of language. So like what's the likelihood that these words will appear around these words? What's the likelihood that this word will follow that word through a process called co-occurrence? Like there's so many phenomenon that, you know, these language models are picking up on learning, but fundamentally what it all boils down to is just picking the right next word, the next predicted token essentially. And the model basically inside of all of its weights and, you know, all of its complex, you know, informal machinery is just kind of computing like, oh, given all of the previous words tokens, what's the most likely next word to count? So it's like, you know, my name, the most likely next word is going to be. Right? My name is. Exactly. It's going to be a high likelihood token. And then just extending that to, you know, given 1,000 previous tokens, what's the next most likely one? Precisely. And this analogy extends across any form of AI model today. You know, like, you know, we have these image generation models, like diffusion, for instance, like stable diffusion and dolly and all of these kinds of diffusion based models, you know, the way that these work is they start with completely random noise. And then at each step, the model is asked to predict what the most likely way to unscramble the noises to get to something that matches what prompt is provided. So it's learned a whole bunch of correlations between prompts and ways to like unscramble the noise in order to kind of get closer and closer to something that statistically matches and is correlated with the prompts that it was trained on. And, you know, all sorts of image classification algorithms kind of use the same process like every machine learning architecture that's widely deployed today is based on statistics, computational statistics. And, you know, this works extremely well, but in my opinion, it's lacking in a lot of things that are required to get to true cognition and build a true cognitive model. And there have been lots of proposals about how to extend these machine learning architectures with various kinds of cognitive models like JEPA, you know, Voyager, all these other kinds of interesting techniques that are coming out to actually kind of supplement these statistical embeddings with other cognitive tools that could be used to kind of start to achieve logic reasoning, localization, and spaces and stuff like that. But, you know, so far, not to make a sweeping generalization, most of these techniques have not worked, or at least they haven't really worked in a way that's like had as big of an impact as something like GPT has with its statistical prediction. And so, Symbolica, the moonshot company is to figure out a way to build a machine learning architecture that's not based on statistics, but is instead based upon the manipulation of Symbols directly that eventually yields things like online learning, calls of reasoning, you know, the ability to understand, like, ontology of information and so on, that ultimately leads to general intelligence by some definition of that part on the road. We have intermediary business goals, but, you know, like the overarching or star goal of the company is to build general intelligence. Fascinating. I have so many questions about different things you said, but just to try to provide a mental model here, what does it mean to represent things in symbols instead of statistics and floating point values? What does it mean to kind of represent things symbolically? Yeah, so I mean, perhaps the most like simple symbolic system that everyone's familiar with is algebra, right? Like, you know, the super simple algebra like addition, right? Like everyone knows the rules of like one plus one equals two. You know, you can kind of have variable substitutions like x plus y equals something, and then you can start to chain multiple of these expressions together to form systems of equations and like, you can, you know, use the relationships between these variables and these equations to like, you know, figure out what other variables in the equations mean and so on and so forth. And this, I think, is like basically kind of at its core, all symbolic is. It's, you know, building up relationships between things that may not have explicit names, so there's symbols, and via those relationships between those things, you can, you know, express information and encode information in some way. Really all it means to be symbolic is to be able to write down the explicit relationships between entities, you know, data, quantum, some individual, you know, base thing, and other versions of those individual basings to build a symbolic system. Mm-hmm. And I think he meant this correctly. Let's say, you know, I'm asking a model what is two plus two rather than kind of saying, oh, it is approximately four, you know, it's, no, I know that when you have x plus y, this is how I compute the relationship between, this is how I compute the result. And so that's a very, you know, trait example, but is that the right mental model of thinking about it? Exactly. Right. Like, somehow the human brain via whatever process it learns through has learned to build a mental model of symbols, so like individual numbers, one, two, three, so on. And you can kind of generalize up to infinity what those numbers would be, like we have patterns in our brain for like, oh, adding another digit, carrying a digit, how to like do the addition operation between all these things, that's all rule based, right? And our brain is kind of learned those rules for manipulating these symbols when we, you know, are asked like, what's one plus two? Your brain doesn't just execute a neural function that then spits out the answer to as some probability distribution, your brain actually goes through and it steps through the procedure, you know, symbol plus that operation is a rule. I take this operand, I combine them through this rule, and you do that over and over again until you get to the end result, which is your evaluated function or whatever it is. Maybe for super simple functions like one plus one, we've all memorized the answer. Like we just, we've just, we've just memorized it, you know, which is to say that we've evaluated it once and then just store the result of that computation and now we can use it. But if I asked you like, what's some arbitrary five digit number plus some arbitrary five digit number, you'd have to like sit there and like kind of draw, draw it in your head and actually go through the steps of evaluating it. It's not just like a one-shot sort of like, I collapse this probability distribution down to a answer, which is what a neural network would do if you, for example, trained it to add two numbers together. Makes sense. And what's the, how should I think about Symbolic as a approach? So let's say, you know, you guys are going to create a version of chat GPT, right, of large language models that can generate text. What would that process look like? Do you still need a massive data set to begin? Are you learning the relationships there through some kind of supervised approach? We kind of understand machine learning today, you know, in the supervised sense of you have your data and your labels and you're learning the relationships there. What does it look like in the Symbolica model of the world? So I think that that exact question almost exactly defines what makes Symbolica different. So when you're training a machine learning model, like, let's just take like the transformer, right, like the T and GPT, right, like all of these language models are based on this transformer architecture and, you know, like we mentioned before, like the goal of the transformer is to give in some input sequence of tokens, predict the next token in that sequence. So when open AI is training these large GPT models, they're basically like taking their entire data set, masking off all the ending tokens, saying, okay, here's just one token, predict the next token. The model will probably get it wrong. Then they show the model that actual next token in the data set. So this is a form of self-supervised learning and then they correct it. And then they do this over a corpus of potentially trillions of tokens and they may train it the same sequence of tokens in multiple times, right? What's actually happening is inside the model, you have a function, you know, that takes some sequence and then it produces some token. And so there's some explicit notion of input and output and there's some explicit notion of being correct or incorrect. You either predicted the right next token or you did not predict the right next token. And as such, you know, you brought up the socialite data and labels. You can think of like the data set as being both the data and labels at the same time depending on how you mask, mask out the next token and so on. >> Certainly. >> This is actually one of the kind of like fundamental architectural issues that I see with neural networks is that it's really, really hard to specify a network architecture that's not the form of like affabets equals what? A function, input layer to affabets, like that is kind of the definition of a neural network. You're trying to build a universal function to, you know, where you're using the framework of universal function approximator to learn some map from some input space to some output space from some data to some label. And the transformers certainly know exactly this. So in the symbolic architecture kind of falling by analogy, there is no notion of data and labels. You only have a notion of data. And the intuition behind this is that the data set that trains these large transformer models is, you know, I mean, it contains all of the labels already. It contains all of the information that you would use to correct the function. And so, you know, why would you need to explicitly provide the model with like the labels if they're kind of already present in the data? And the reframing that I think has been really powerful for us building symbolica is that the data, at least in the case of language, is the labels. And moreover, the data is actually the model itself. So if you can think of like using your data set as the model, like the data set is the function or it is the definition of the rules. And instead, you, you know, almost like flip the problem on its head and instead of like forcing some predefined neural network architecture like the transformer to like form itself to the problem of the data set, can you think of it the other way and say like force forming some algorithm over the data to form the algorithm to the immutable data. And that's really kind of how we approach the problem. So all the symbolica models, you know, use data as instructions that update the model, the model being basically a way of interpreting the data rather than using the data to update some, some way it's in the network. Fascinating. Is it correct to think about, you know, today we have the traditional model of training in a superized way, there's data in labels and a result of that training process is getting an embedding space, right, that is, you know, less human interpretable, but still very semantically meaningful of the distribution of data. Is what you're saying that today, with, let's say, the transformer architecture, we kind of fit all data into a specific kind of embedding space. And with symbolica, you don't take a predefined approach and you just let the data tell you what it's embedding or what it's kind of, you know, highly dimensional distribution should be. That's exactly right, that's like a really beautiful way to look at it. In fact, there's a hypothesis that precisely describes what you just said, it's called the manifold hypothesis. And basically what the manifold hypothesis states is that all of the data that we find in the natural world, whether it's like images or text or audio or whatever, it's embedded on a manifold. Well, what is a manifold? It's really just a fancy mathematical word for some surface that you could walk over if you were an aunt. So like, imagine you're like an aunt on a donut, a donut would be a manifold, but no matter where you are on the surface of that manifold or donut as an aunt, you could always walk in a straight line around that surface and you could just keep walking and keep walking and keep walking. And so the same thing is true of all of the data that exists in the natural world. It's just on the surface of some incomprehensibly complicated, high dimensional shape of some sort that's got all these kinds of like local properties and so on. And basically what you're saying is like, well, we can't control the shape of that manifold. Like, it's just out there. It's just like a, it's kind of like the laws of physics. It's just like a fact of life, you know, like the data of natural languages just on, you know, this surface somewhere. And what you're trying to do when you're trying to train a neural network is you're trying to take samples on that manifold that, you know, exists in the ether somewhere and you're trying to like force a low fidelity representation of some, you know, synthetic manifold you're building to conform to the shape of that natural manifold. And you're doing that by turning all these little knobs in the neural network, you know, each little knob, turning it a little bit corresponds to changing the shape of that surface just very slightly. And you're trying to like, turn all those knobs such that that surface kind of warps a little bit like Plato, you know, like over the natural manifold of things. And really the thesis of symbolic is like, why spend all that extra computation to like force all these little knobs to be turned into, you know, something that kind of vacuformed over the natural manifold? Why not just use the data, which in itself is like kind of a sample of that natural manifold to define the outline of it, you know, basically use, you know, a simple set of procedures to kind of like, you know, wrap, you know, a little string around these little data points you already have in your data sets. And then you have a wireframe kind of representation of this manifold. Very cool. And so how many manifolds will there be, is there a language, like a generative language one or does all language kind of fit in one? So what I mean by that is chat GPT is, you know, synthesizes language, but it can also be used as a class of all sorts of NLP classification tasks, sentiment classification, and you've done to de-recognition, you know, basically any NLP task is it can do. In the symbolic model of the world, do we have a language manifold and then a separate, you know, natural images manifold? Is it multimodal? Like how do you see it playing out? That's a great question. So I've only been thinking about texts so far. So anything beyond this would just kind of be a speculation about what I think will happen. But I'm happy to speculate about it. Well, we can focus on the language. Is it one manifold for language in the same way, you know, we have a massive embedding space in GPT-4 for language, or is it somehow different in this new paradigm? Right. So for just language, it's only one. So, you know, you kind of probably have heard this term like latent space, right? There exists some latent space that the model's operating within. And in a neural network, typically you have many, many latent spaces. In fact, the process of dimensionality reduction is basically kind of projecting one latent space into another. Each successive latent space gets a little tinier until you just end up with one really tiny super easy to understand latent space like R2 or R3 or something like that. In the symbolic compiler, you only have one latent space and you're basically just performing operations, you know, over the fabric of that one latent space, over and over and over and over. You know, never sort of like reducing dimensionality like you might expect in a neural network. Got it. And so, one of the benefits that you mentioned of this approach is online learning. What is online learning and what do you mean by that? I think it's probably easiest to kind of contrast against like neural networks. So neural networks are basically like I mentioned just a function, you know, that maps some input to some output. The way that machine learning currently works is that you have to train that function. Like, it starts out being completely incorrect and inaccurate. And over time, by turning these old knobs to kind of approximate this like manifold that we were talking about, this high-dimensional surface, high-dimensional shape, you know, that function slowly gets a little bit more and more and more accurate. However, due to the algorithm that's used to train these models gradient descent, once you've stopped descending through the gradient, like once you've stopped like optimizing to find the global or local minima, you're done. Like, you know, that function is fixed, it doesn't change anymore. The input you provide to the function doesn't change the function, it's still just like the same function. And there are these things, you know, like transformers, which are called auto-regressive models, which is that you like take kind of part of the output of the function and you feed that output back into the input of the function. But, you know, of course, like, you can always just collapse that down and represent that as a static function as well. You know, the process of online learning is basically almost as if you continue running gradient descent or whatever the analog that is and whatever model we're talking about here and you're not only always updating the function, but you're updating the process that updates the function. So this is like what the brain is doing, like the brain is receiving information and that information is producing an output, you know, like it's producing an action reaction in you, but it's also updating the function that you use to then interpret data in the future. So it's causing you to change the way you perceive future data. Your inputs actually impact the way that you perceive future inputs and that's kind of how we define online learning. So it's the ability to kind of update your internal correlations and predictions in real time, in response to information instead of having to go through an entire new long training process. Exactly. Like, it enables you to interpret the next set of inputs differently based on your previous set of inputs that you got without having to take a D2 or through some kind of like offline optimization process. You know, in a neural network, it's theoretically possible to do this. Like you could always like feed some input into the neural network, run gradient descent, then feed the next thing into a neural network run gradient descent. So due to sort of compute limitations, this isn't feasible. And also due to the kind of limitations of the way the problem is structured, like some input to out prepare, if you don't have the label, if you don't have the output, if you don't know how to, you know, tell the model how it was wrong, then you can't do that. So for instance, like classification models, you might not always have the labels available to you online. So this isn't really something that can be done with neural networks very effectively. Yeah. And it's interesting because we've talked about Voyager in the past and the thing that I think is so cool there. And my mental shift over the past few months, probably marking you're on that Voyager paper coming out is really thinking of LLM's, malicious call, you know, these new class of models as reasoning engines rather than models that need to have specific knowledge encoded in them. And once I shifted my mental model of that, it seems so archaic to require a long training process, you know, back propagation to just update a model with information. The Voyager approach to things is you have this reasoning unit and you can point it to stores of information to knowledge stores and have it reason about it and then write down its conclusions and build up from there, which is, you know, what I do as a human. I don't have everything in my brain. I go look at a textbook or I look at Wikipedia, reason about that information and then draw conclusions. And what you're saying is the symbolic approach is similar to that. It's archaic to think of models as needing to, you know, at the extreme, like, be trained every hour because there's more information generated in the world every hour of the day. That doesn't make sense, right? A, that's not feasible. B, it's, you know, there are alternatives and what you guys are building is an alternative way to do this online learning and to update models precisely. Yeah. That is a great definition of exactly what we're trying to do. And so if we project out, you know, however many years you guys have achieved your vision, what will we be able to do? What does the world look like? Because symbolic has achieved this new paradigm of training models. Yeah. So we're really, we're really aiming to achieve two things. If we achieve either one of them, I would consider it to be successful in an ideal world we achieved both at them. You know, the first thing is the thing we just discussed. It's like this notion of like online learning, the ability to do causal reasoning. So, you know, like, you know, eventually be able to understand how to compose the rules of addition, for instance, you know, at a high level without being explicitly instructed or trained on exactly what these rules are, up your. That's kind of like the first goal. The second goal is making this whole process much more compute efficient to achieve. So one thing that I think doesn't get talked about enough is, is just how computationally expensive it is to train these large machine learning models. It cost OpenAI a reported $400 million just in compute to train GPT-4, which is, you know, completely out of the question for any, you know, start-up or even small-scale company to be able to attempt to do. You know, the only reason OpenAI can, you know, afford to do this is because they're backed by an insane amount of EC funding and Microsoft. And, you know, there's a lot of computer, like computer architectural reasons why it's super expensive. But, you know, it kind of ultimately boils down to the fact that doing floating point operations, especially like high precision floating point operations, is basically the most expensive operation you can do on a computer. And the only hardware we have right now that's really good at doing general purpose floating point operations are GPUs. And GPUs are extremely expensive. And then also, like in general, the operations you're performing are, like, costly in terms of how much time it takes to perform a single operation. I don't know the numbers right off the top of my head. But, like, it takes somewhere in the order of, like, a hundred or, like, 200 times longer to do a floating point operation than an integer operation, somewhere kind of in that ballpark. And all of these machine learning models are based on these operations. And so, you know, part of symbolic as goal is to figure out how to do everything using just into drop operations. If we can accomplish that, you know, we'd be able to train machine learning models on GPUs still quickly, but we'd also be able to train them on CPUs. And CPUs are drastically cheaper and more energy efficient than GPUs. Those are kind of the two goals of the business is make, make training these models and doing inference on them drastically cheaper. Even if it's like no better than, you know, existing machine learning models today, that would still be a big deal. And then on the other hand, like, you know, using pure symbolics, like, we believe we can unlock these expanded capabilities. And even if we were able to accomplish that, but it wasn't necessarily cheaper, it'd still be a big deal. But we really want to accomplish these two things together. Fascinating. What will be, in your mind, the first milestone of showing the world what you guys can do, like, what would be your chat GPT moment? Yeah. So, we were currently working on this thing called SIM chat. And SIM chat is basically just the symbolic version of GPT. And it's, you know, basically like, you know, designed to be interacted with the same as GPT with, you know, one limitation removed, which is that there's no context window anymore. So, like, there's not a fixed number of tokens you can feed into the model. And the limitation removed that you would have to fine tune the model in order to get the model to attend more, for instance, to specific kinds of information. So the goal is that you would be able to communicate with your SIM agent. And basically just instruct it using natural language and teach it basically like using as many tokens as you need to actually teach the model how to do specific tasks, you know, through the interaction of these two components, online learning and no context window limitation, basically fine tune your model to whatever tasks you want to fine tune it to, just using natural language. And I think if we can get to that, that would be, that would be like an absolutely amazing chat GPT moment. I mean, I think you had most people at, you know, GPT without a context window limitation. As much as I want to keep going on this, I want to switch gears a little bit and talk, so prior to starting Symbolca, you were at Tesla for many years. And so I'm curious how was that and what did you learn about running a company from Elon Musk? Yeah, I mean, Tesla is an incredibly formative part of my career. I'm extremely, extremely lucky that I had a chance to join Tesla when I did and join the team when I did. So for context, I ended up dropping out of college after two years and I bought a one-way plane ticket from Rochester, New York where I went to college to San Francisco. I had no idea what I was going to do and, you know, I didn't have very much money in my savings account and it was like a fairly, fairly risky decision that I can't say made my friends too happy, but I was basically determined to make it work by whatever means that I could. And, you know, skipping a couple details, I eventually ended up getting an internship at Tesla. You know, when I was like kind of arguing with the recruiter about like what internship position I'd be holding, I kept saying I want to build robots, I want to do robotics, you know, I want to work on the auto-powered team. And the recruiter was basically like, we don't have any positions on the auto-powered team. And I was like, please, like, please, like, is there any way we could make an internship position or like, I just like, like, I really, really want to work on that team because I mean, I don't, like, it's not that I don't care about cars, but I just didn't have any interest in cars. I had interest in robots and, you know, autopilot just turns the car into a robot. So it was very interested in that. So I was begging the recruiter to please let me do an internship on the autopilot team and, you know, you know, Gigi and absolutely, like, the best recruiter of all time, ended up pulling some strings and got me an internship on the autopilot team. And so I got to kind of get in at what was arguably the ground floor of autopilot. Like Andre Carpathi had just joined, like, the system was barely based on neural nets doing kind of this computer vision stuff. It was like fresh, you know, off hardware one, you know, hardware two was kind of, and I had just been widely deployed. And, man, it was like getting thrown into the deep end as an intern. Like, it was like, hey, here you are, cool. You're, you know, you have all the responsibility and more of a full time employee. Here's these 20 projects. Please get them all done. You know, it was like absolutely this crazy experience that I had no expectation of what it would be like. Like, I'd had a couple of people tell me that working for Tesla was hard, but I was like, ah, it's probably fine. But no, I got there. And it was like, instantaneously, I got thrown into the deep end. And after about three months of working, I was like, this is what I want to do. Like, this is so much fun. I'm really good at this. And so basically, I went back to my summer career, and I was like, can I please just join full time, even though I don't have a degree. And she was like, I really don't think that's possible. Like, you know, we've never, we don't really do that. Like, you know, especially for the autopilot team. Like, that's like a really, you know, like, the team was like really, really exclusive. Like, I don't know if we're going to be able to swing this. And I don't really know exactly how it ended up happening. Like I said, GG for watching this best recruit ever. She ended up somehow getting an exception from the director. And then I had to write an email to Elon explaining why I deserved to join the autopilot team without a degree. Yeah, exactly. And he responded to my email and asked me like three questions. He's like, why do you think that, you know, you're the right foot for this team, you know, and some other questions, I kind of forget off the top of my head, but I answer those. And he responded, cool, just cool. And I didn't know that meant, but like three days later, I got the author letter. I guess I guess it went well. That's right. And so yeah, after doing a three-month internship, I got the full-time job at Tesla. And then after that, it was just a wild adventure. I mean, way too much to go into in the podcast, but, you know, weekly meetings with Elon, like staying til 3 a.m. You know, going out on drives to fix bugs on, like, four hours of sleep, like just absolutely crazy, you know, launching FSD at the AI day, the first autonomy day. Oh, man, that was just a life experience. Like, it was amazing. Like, the team is absolutely incredible. Working with Elon is very stressful, but like, you know, an incredible learning opportunity. Is it stressful because he just demands a lot of you, or he's just a, you know, extreme personality, like, what's difficult about working with Elon? Failure is not an option. In all my interactions with him, he was always like very, like, understanding and calm and like, there was never any, like, you know, aspect of like being upset or short or anything like that. But I did, you know, there was certainly cases where, like, when people weren't getting their job done, or saying that's not possible, or like, you know, taking too long to, you know, solve the problem, you know, that's when like, the wrath came out, like, you know, like, he was very much relentless in terms of like being able to solve the problem. And yeah, like, like, saying it's not possible is just not a solution. So like, you were always under pressure to figure out some way to get it done. And that was a very formative sort of like educational process, by which like, you know, if I didn't have a solution to the problem, I did completely rewire how I was thinking about the problem to come up with a solution. And that's been dramatically valuable to me in starting the company. Fascinating. Yeah, some of my friends, I have different friends that work across as companies. I think one of my friends at Nurelink has said something very similar about, you know, they're doing everything from the ground up there from, you know, the material science, clean room, microfab, all of that. And just the amount that Elon pushes them to just, there was one point where they had a pretty, you know, formed chip and that was just stark from scratch. Right? It's like, this is not working. We've hit a wall like, let's just start from scratch. And that actually enabled them to find a path tour that, you know, would have not been possible with the with the previous approach. And I've just heard again and again, people saying, you know, he pushes you to do the best work of your, you know, get the best out of you. Yeah, but some cost fallacies not a thing in his mind. Not a thing, right? Which is, it's such a human, it's such a normal human way of thinking. And he somehow doesn't have that at all. Were you ever in the room where he was, you know, unleashing his wrath on people? Oh, yeah. Oh, yeah. Yeah, he called me once like just out of the blue on my cell phone and I didn't recognize the number and, you know, was very, very passionate about a problem we were having and demanding solutions for it. Pass me. And yeah, there were lots of meetings where we're all working together trying to solve a problem and things that get very intense. Like, it was part of the culture. But, you know, that's how things got done, you know, without that, there really wouldn't have been the impetus to invent drastically new technology. I mean, that is really inspiring to hear. And, you know, this podcast is inspired by one of my favorite books. It's called the Beginning of Infinity. I actually have it behind me there. And one of the phrases from the book is problems are soluble. And then the other thing that that book changed the way I think about things is everything that doesn't break the laws of physics is possible. If you project out to infinity, everything that doesn't break the laws of physics like will exist because infinity is a very long time. And so what that means is like project out to infinity. Let's figure out how things are going to be done that there, right? And then backtrack to today and figure out how can I start building towards that future. So it's inspiring to hear, you know, the failure is not an option. And it's really this thinking that taken to the extreme of problems are soluble. And as long as we're not trying to break the laws of physics, like we can figure this out. And there will be a path forward here. I couldn't agree more. And Elon takes that thought process to the extreme. And then you mentioned Andre had just joined the team. What was that like working with him and what did you learn from Andre? It's a good question. Well, I'll process by saying Andre and I don't really get along. And I think that, you know, most of that is ideological. You know, of course, in starting a symbolic I'm basically making the statement that I don't believe in the future of deep learning. And, you know, that is basically like all his life, Andre's philosophy is that deep learning is all you need. You know, deep learning will scale to infinity and solve all problems. He calls this software to point out. And this is exactly what he brought to Tesla. You know, idea was that we would train bigger and bigger and bigger neural networks to solve the self-driving problem. And, you know, that ultimately ended up not really panning out. You know, Tesla still hasn't solved the self-driving problem. Not due to lack of talent or, you know, lack of ambition. But in my opinion, due to the failings of deep learning. And, you know, of course, Andre ended up leaving Tesla to go back to OpenAI. I'm not sure about like this specific reasons of that. But I think that like the process of, you know, like making the promise so many times of like, oh, scaling the neural networks will solve the problem. And then having that not be the case, you know, is really draining. And, you know, when that's your core philosophy. And that's the promise you keep making. And it keeps not happening. You know, I can imagine that kind of taking a little bit of psychological toll. You know, Andre's like clearly a very smart guy, like, you know, did an insane amount for Tesla, like built the vision team, built the labeling team, like, you know, really turned autopilot into what it was today. But, you know, like he and I disagree to a point of great personal contention ideologically about the future of AI. So I'll leave it at that. Yeah, yeah, it makes sense. And thank you for sharing that too. One of one of the things I always think is interesting to ask founders is, you know, and some of us, the second company you've started, what, like, how would you describe the difference between being a founder and an employee? You know, I think a lot of this kind of comes from what I learned from Elon, working at Tesla. You know, one of the things that Elon always said to us, kind of like, usually at the end of all of our really intense meetings, we'd say, like, look guys, like, I know that this is an extremely hard problem. And I want you to know that you can call me anytime. Like, here's my cell, like, call me, email me, you know, set up a meeting, like, I don't care if it's at 4am, like, just call me, like, we're going to solve the problem. And I always thought that was so interesting. I was like, like, the world's richest person, probably the world's busiest person is like saying, call him, like, at 4am. You know, when I first heard this, I like, it didn't compute in my brain. Like, I was like, this is so strange. Like, I didn't really understand what he was saying. But then, you know, over the years, eventually, it kind of clicked. And, you know, Elon kind of himself in interviews has described this as being an inverted pyramid. Like, his role is to be at the bottom of that pyramid and support the weight of everyone on top of him. And, you know, as an employee of Elon, you're really not actually working for Elon. He views it as he's working for you. So, whatever you need to succeed as an employee to solve the problems, you ask Elon for at the bottom of the structure to, you know, gain access to resources or compute or the person or whatever it is that you need to solve the problem. And he's always there to act as like the intermediary between you and whatever you need to get to solve the problem. And, as soon as I kind of really realized that that's his philosophy and like, that's, you know, how he operates and like, how he runs his company. So, they immediately became clear to me that that is why all of these companies are so successful. Like, you're not sitting in a castle in the clouds is a CEO. You're lording over all the peasants or something like that. It's actually quite the opposite. You're at the bottom. You're seeping on the factory floor. Like, you're doing everything you possibly can to make sure your employees are successful. And, I very much carry that over to the new company. It's like, the only person driving this to success is you. If you're not there every moment of the day, thinking about the project and figuring out how to support your employees and like figuring out how to get them access to the resources they need to solve the problem, you're not going to succeed. And so, that's really kind of what I view the difference between being a founder or a leader and an employee is. As a founder, the job is much harder. It requires much more involvement in thinking. And I hope to emulate that being a founder at Symbolica. Yeah. And it sounds like it's also a shift on the employee mindset that Elon kind of encourages to have a shift to extreme ownership of it's on Elon to support the employees and make sure they're empowered to do their jobs and achieve the company's mission. And it's on the employees to be actively thinking through what is standing in the way of me achieving my role in this company's mission. And it's interesting that the framing you put it is actually goes in both ways. It's both both sides take extreme ownership. And it actually is very empowering for team member to be like, this is my role, this is my mission. And I should be proactive in figuring out what I need to be on block to achieve this mission. That's crucial. Like Tesla hired the best teams because every employee had that attitude. It's like, what do I do to solve this problem? And how do I get access to the resources I need to solve it? So, thinking a little bit about AGI, you mentioned your mission at Symbolica is to create AGI, pave the way for AGI. My question is, what is your definition of AGI? And how will we know when we've achieved it? You know, AGI is one of those terms where like it's a very hand-wavy, it's not defined, you know, it's kind of an abstraction. But if I really, really had to kind of point at what I would say, you know, or like if I saw a system and I point at it and said, that's the AGI. I think that it would be a system that's capable of learning how to solve any task. A human is capable of solving given the same input that a human would receive to learn how to solve that task. So, basically just by being instructed, the same way you might sit down with someone and show them how to do algebra or sit down with somebody and teach them, you know, how to play the piano or like any task, right, general, if you could show a machine how to do it the same way you should a human how to do it. And with no information beyond that, it was able to do that same task. I think that would be AGI. Interesting. And where do you think we're missing today, because I would argue in many realms, I would say GPT-4 is has achieved that I can give it less instruction than I would to a human and have it perform as desired. But there's certainly, you know, plenty of places where it makes stupid mistakes and, you know, has comical failures. So, where do you think the biggest gaps are in that definition? Yeah, so before I answer that question, I just kind of want to point out one thing. GPT is absolutely phenomenal at tricking you into thinking it's human. It's basically the best example of anthropomorphization we've ever seen in a machine or really maybe ever. It's like, like the machine is a perfect facsimile of what it's like to behave like a human. But really all it is is a facsimile of the outer presentation of what it's like to behave like a human. It doesn't have a model of what is actually occurring inside the human mind. And it's actually extremely easy to prompt GPT with various kinds of questions that make it obvious that it doesn't have a consistent internal world model. And, you know, the fact that it doesn't think and reason like humans, for instance, there's a very interesting paper that came out a couple weeks ago that described the fact that GPT can tell you clearly that there's some relation for me to be. But then if you open a new chat window and you say, what's the relation from B to A? It won't have any idea. And this is like not something that humans have a problem with. And, you know, and I think the example I saw just to just to make it clear was, you know, who who is so-and-so's mom, for example, let's say who is Elon's mom, right? And you could answer that. But then if you're like, hey, who is, you know, I forget his mom's name, but who is so-and-so, it would have no idea. That's right. That's a good example. And so this is at least indicative that the behavior of GPT is kind of purely just a facade, an act. It's an approximation of what the outward facing behavior of a human is. So to kind of get back to answering your question, in order to build a system that, you know, will kind of approach AGI and, you know, kind of get rid of some of these limitations of GPT inside of the model itself, it has to be capable of doing these kinds of reasoning and logic based operations that currently these models are not capable of doing. For instance, GPT can, you know, generate pretty like reasonably effective code if that code is in the space of things it's seen in its training set. Or you could give it code it hasn't seen in its training set and ask it to refactor it. And as long as that code is just a couple lines long, it can, you know, infer the statistics of how we organize things to make it work. But if you ask GPT, you know, code up a new algorithm that does X, Y, and Z, like for the most part, it completely fails at this. Like, and the reason why is that fundamentally statistics will not get you there. You cannot extrapolate, like statistics and computation are not the same thing. You can't extrapolate just purely from the statistics of language how to do problem solving and reasoning. And given that GPT is purely an engine that's extrapolating from statistics, it's fundamentally incapable of performing the actual logic, reasoning, and computation required to say, you know, code a new algorithm or solve a novel problem in quantum mechanics or even something, you know, much simpler, like, you know, build a CAD model that does this thing. You know, and I think that, you know, GPT is certainly like an incredibly impressive feat of technology. But, you know, it's not going to get us to AGI. It's interesting. So I agree. I don't think GPT-4 is AGI. But I think I have a different point of view of, I don't fully know what is AGI. I mean, I think AGI is a lot of things thrown in there with intelligence plus consciousness, which I think is a very different vector. I don't know what consciousness is. Maybe that is just going to be an emergent property. There's, you know, the ability for like self-directed goals. There's the ability for kind of cognitive flexibility. So there's a lot involved in AGI. But I actually think GPT-4 is very good at reasoning. And I, it sounds like one of your definitions is, you know, you gave the example of coding up a new algorithm. So you're thinking about intelligence as the ability to generate new knowledge, which I actually think that is the definition of intelligence. Again, I don't know what AGI is, but pure intelligence, I think it's the ability to generate new knowledge. And, you know, even when we look at, you know, the TDC, that opening I paper where GPT-4 was explaining the neurons of GPT-2. I saw that. Yeah. So, you know, it's, it's an interesting example of having GPT-4 generate new knowledge that like humans could have painstakingly done something similar, but would have taken like forever. And so we had GPT-4 do it. And it's, it does a fine job. It's not perfect, but it does a fine job. And it generated this new knowledge that the world did not know of before. And so, I actually think sometimes people, you know, use, oh, these transformers are, you know, pure autoregressive models. And there's no way they can get to AGI. And again, I don't know about AGI, but I think four intelligence and reasoning capabilities. I think these are pretty dim and impressive. And, you know, if we scale up to GPT-5 and six, I think it'll be able to do most, most things that humans can do on average today. It's not going to be perfect. And I think you bring up some great points on what, what types of things are missing from it. But yeah, I think I have a different take on kind of the current reasoning abilities of it. So I think, yeah, like, I understand what you're arguing. And like, this is certainly like a very, they're very strong arguments in the opposite direction. And like, I don't know if I'm necessarily convinced, but like, that's okay. Like, I mean, you know, only time will tell to see what actually ends up happening, whether these systems will scale into fully reasoning agents. Like, I think nobody would have protected that next token prediction would have gotten us to where we are today, for instance. In fact, if you'd asked me back in like 2018, like, do I think that just predicting the next token will yield what we got, I would have said now. So I would have been wrong. However, one kind of like subtle, I guess, piece of information I could supply that I'd just be interested in hearing what you think about, how I think about this is having the source code, you know, like, having like the files and folders of a repo is nowhere near the same as having to get history of that repo. So there's a bunch of meta information that goes into synthesizing some artifact. So like, for instance, the get history could kind of be thought of a little bit as like the reasoning chain that got you from a blank piece of, you know, like a blank editor with a blinking cursor to a novel algorithm. And there's kinds of these like intermediary steps, reflectors, like moving things around and like, you know, processes of modifying this information that, you know, are, you know, at least in some way an encoding of the human thought process, the human reasoning process by which we go from a blank canvas to a piece of artwork in the logical domain. Turns out it's a little bit easier actually, and you know, I think, you know, diffusion models have already kind of nailed it to go from a blank canvas to a piece of artwork because that's that's inherently a statistical operation. But for code, where it's not a statistical operation, you know, something like the get history is actually very necessary component of like displaying how that reasoning chain happened. You know, get history, how happens to be very unique thing we can point on and say, oh, this actually is data that encodes the reasoning chain, like these deltas, these dips between each commit kind of encodes the reasoning chain coming out of the human brain. But for all other information that trains a GPT model, that doesn't exist. Like, for the most part, all of that is just in our heads. We don't even have a way to communicate this with other people. Like, the reasoning chain and sort of the collapse of all of these like things in our brains is completely silent and in our head only and probably encoded in in a way that's only compatible with with our own internal mental models. And so given that the human brain is clearly the only example we have of a system that's capable of forming these reasoning chains and collecting them and so on, I think that if we wanted to have a machine learning model learn to statistically learn statistically to approximate it or figure out the pattern of how that's happening, we would need that part of the training data. And none of the training data we have now has that information at all. So I think that maybe transformers and deep learning models could potentially do that. However, we don't have access to data to get them to do that. So the only way to get them to do that with the data we have and the way that we've sampled and pulled the data out of the natural world is to construct a model that's capable of learning via this bootstrapping process. This process of building relationships on tallies hierarchies at different zoom levels and statistics just simply can't do that. So that's my for sure rebuttal. Yeah, no, this is so interesting. Yeah, I mean, I I agree. I don't think transformers think in the same way that humans think right there are clearly some things missing like you mentioned being able to think in these explanatory knowledge terms versus just statistical relationships between input output. There is the whole thinking fast and slow models of the human brain. There's the kind of recurrent thought patterns and you know, transformers spend the same amount of compute on every token they're predicting. That's not how we as humans think, right? I might say some things really fast and then I spend more time thinking about the next, you know, a more difficult topic and really trying to wrap my head around what what I think and what I want to say. And so yeah, there's death. I agree. There's certainly things missing in having in silico intelligence mimic what we're seeing in natural, you know, neuronal biological intelligence. So I definitely agree with that. This is so so I had picked out this quote from the beginning of infinity, which I think is very, very relevant to this topic here. So I would love to to read it and then get your reactions to it. And so this is again, the book is called the beginning of infinity by David Deutsch. And the quote is for context, the book was published in 2011. So over 10 years ago, even if chat bots did at some point start becoming much better at imitating humans or at fooling humans, that would still not be a path to AI. Becoming better at pretending to think is not the same as coming closer to being able to think. A non-AI program cannot fake AI. Yep. I feel like that's exactly what we were just saying. What a great take from 2011. Very prescient take. Yeah, it's certainly my stance that these systems are phenomenal at faking what it looks like to think. So much so that it's like convinced a lot of people that they are capable of thinking and reasoning. But I agree with David Deutsch. I don't think that that gets you any closer to a truly reasoning and thinking system, which is I think what we are aliasing to AI. Yeah, I mean, I think the world needs to catch up here, right? There used to be the touring test that has now been passed and exceeded, right? We can, you know, probably a couple years ago can fake a human conversation and fake most people into thinking they were conversing with a human. Yep. And with, you know, GPT 3.5 and certainly four, kind of totally maxed out on all the standard evaluation metrics and just maxed out. And yet we're still not there in terms of intelligence and we're still not there in terms of matching human intelligence. So we need to come up with new metrics and new ways to evaluate like how close are we to intelligence? I can totally agree. I feel like we could definitely keep going for a while here. I wanted to ask you about the Mr. Beast Squid Games video that you, you know, built hardware for and helped, you know, get up and running. I feel like we could talk about AGI for hours longer. But I think we might just save it for a part two. And so George, to kind of wrap up here, anything you want to ask of the audience and where can people stay up to date with your work? Totally. Yeah, I would, I would say if anyone in the audience is interested in kind of what comes after the era of deep learning, check out our website symbolica.ai and follow us on x symbolica_ai. Amazing. Well, thank you so much and we'll add those to the show notes as well. Thank you, George. This has been a blast. And until next time. Thanks so much, Shlina. Appreciate it. Thanks for listening to the Edge of Infinity, an AI podcast. You can find all show notes, transcripts and resource links to all of our episodes at blog.infinity.ai. I'm your host, Lena Evansini-Kaluci. You can find me on Twitter @lina_Kaluci. That's L-I-N-A, underscores C-O-L-U-C-C-I. Thanks for listening and until next time.

Key Points:

George Morgan's journey to find a rare Sony robot in Japan.
Symbolica's mission to develop AI models based on symbolic manipulation, not statistics.
Contrasting traditional machine learning models with Symbolica's approach.

Summary:

FAQs

How did George get his job at Tesla?›

George had to email Elon Musk to get his job at Tesla.

What was the robot story that took George to Japan?›

George's robot story began when he found a book on robots at age seven and became obsessed with a Sony Curio robot.

What is the mission of Symbolica?›

Symbolica's mission is to build a machine learning architecture based on symbols, not statistics, to achieve general intelligence.

How does Symbolica differ from traditional machine learning approaches?›

Symbolica approaches learning by manipulating symbols directly, building relationships between entities, unlike traditional statistical-based machine learning.

How does George describe the process of training large language models like GPT in the Symbolica model?›

In the Symbolica model, the process of training large language models involves understanding relationships through symbolic manipulation instead of self-supervised learning based on input-output correction.

Chat with AI

Ask up to 3 questions based on this transcript.

No messages yet. Ask your first question about the episode.