Go back

Season 5, Episode 23: Meta's AI advertising playbook (with Matt Steiner)

44m 34s

Season 5, Episode 23: Meta's AI advertising playbook (with Matt Steiner)

The transcription includes an introduction to Claricites as the sponsor of the podcast episode, emphasizing the need for quick access to marketing insights without relying on data teams. Matt Steiner from Metta discusses the ad ranking process, detailing how ads are selected based on probabilities and auction principles. He also highlights the significance of technologies like Lattice and Dramina in enhancing ad performance through machine learning models and transfer learning. The conversation delves into the complexities of predicting user behavior and optimizing ad placements based on data analysis. Steiner's insights offer a glimpse into the evolving landscape of ad technology and the continual efforts to improve ad targeting and performance for advertisers.

Transcription

7998 Words, 46061 Characters

(upbeat music) The sponsor of this week's episode is Claricites. Tired of waiting for data team support? Fed up with inflexible dashboards where you can't explore your questions? What if you could go from question to insight in minutes with easy access to internal data and attribution to prove your impact without dependencies on data teams or manual spreadsheets? That's why advanced performance marketing teams such as those of Uber, Tilton Point and Delivery Hero trust Claricites for everything marketing insights. From creative optimization to channel deep dive. Go to Claricites.com and schedule a demo to learn how leading performance marketing teams uncover new insights in minutes. - The problem is that the distinction needs to be drawn between the competence of the economists and the correctness of their analysis. - Hello and welcome to the Mobile Dev Memo podcast. I'm your host Eric Sufert and I'm joined today by Matt Steiner of Metta. Matt, welcome to the podcast. - Thanks, great to be here. - Well, it's nice to have you here. I think one of the questions that I get asked most frequently by, you know, investors or whoever is how does ad ranking work? And I have some conceptual ideas, right? You know, I've built ad systems but I don't know how it works at Metta. I've never worked at Metta. And so I thought it'd be really interesting to have someone on the podcast that could kind of elucidate that for the audience. And there I think is no better person to have than you. So welcome. - That's kind of you to say. Thank you very much. - Before we get into all that, can you introduce yourself to the audience in your own words and kind of talk about your role at Metta? - Yeah, happy to. So my name is Matt Steiner and I support the monetization infrastructure and ranking organization. This is the set of teams that build the infrastructure that underlies our ad system, including the ad serving system, the conversion tracking system, the storage systems that house our ads, the batch processing systems that compute data about ads or metrics about how ads have been interacted with and the machine learning model training system for ads machine learning models. It also includes our teams that do AI research on ranking and recommendation research, as well as our teams that do development of machine learning models as well as accelerating machine learning model training and really kind of end to end optimizing the training and serving of machine learning models that make predictions about which ad to show a person when they log into one of Metta's apps. - And you've been with the company for quite some time, right? - Yes, I believe I've been here about 10 years now. - Well, continuing the theme with all the Metta employees that I've interviewed. And have you always been within like the kind of, what you might call like the ad science portion of the company or how was your journey at the company change over the years? - Yeah, I initially started working on rich media in Messenger. I spent about two years doing that. And then I moved into the ads world as the ads business was scaling pretty substantially and needed people who had supported larger organizations that had a deep technical background. 'Cause as you know, ads is a very technical, deeply mathematical kind of field. And so we want to select for people that have that kind of inclination. - Yeah, it's funny. I found this YouTube clip from, I want to say it's like 10 years old or even older. And I wish I should have pulled it up before we started, but it's from, I want to say like, it was Metta's like chief auction economist or something. And he was actually filling in for the person that was supposed to give the talk. But it was, he was giving a talk at UC Berkeley, like to some MBA students. And he was just walking through the process of serving an ad, right? And it was just so fascinating. I think that was my first exposure to really like auction theory. And it was really fascinating. And I went out and after watching it, I bought a book called Auction Theory. It's like the kind of canonical text about designing auctions. And it was just, it was mind blowing. It was just like how mathematical it was. Like every page is full of mathematical notation. And I went to, I guess I never thought about it before that. Like I never spent too much time thinking about auctions and how they work. But, and then just, I mean, I'm working on something right now just about like app pricing. And I've been reading through a lot of like how variants work. And he's written like the economical, like economics, like intro textbook. And he has a whole chapter about auctions. It's just, I don't know, it's just funny. Like you just kind of wake up to like just how, first of all, like how complex it can be. Cause you wouldn't think like us. You see like an auctioneer like at a charity gala or whatever, like that's an auction. That seems pretty easy. But when you're designing this stuff for scale, for ads ranking and all that kind of stuff, it gets very complex very quickly. - Yeah. There's a lot of deep auction theory and a lot of really interesting incentive issues. And of course pricing is a challenge everywhere in business, but auctions give you a mechanism to drive kind of maximal fairness with your incentives and your auction. - Okay. So I recorded a podcast a few years ago called how a bid becomes a DAU, which is like kind of meant to be, you know, sort of a song to the tune of how a bill becomes a law. And it walked through the programmatic ad serving flow from an impression becoming available through to an ad being filled. I'd love to get like Meta's version of that. So just kind of like to kick off the conversation with just a sort of very high concept question. Can you walk me through the process of how a bid becomes an ad on Meta's platform? - I think the way the ads ranking system at Meta works is very similar to how other kind of ad systems would work in that when an ad slot becomes available, one of the apps for Meta's businesses will call out to the ad system and ask for a set of ads to show. And it basically passes along, hello, this is a particular person that we have an ad slot available to show for and give me the top five ads that would be best suited to this person in this context, whether that's Reels or feed on Instagram or feels on feed on Facebook, et cetera. And the first thing that happens is that the ad request goes to our indexing system and the indexing system collects all of the ads that have been targeted to this particular person. So that could be a very large number of ads. And as a result, you can't rank all of them with a heavy-rate ranking model. So we do what we'd call a multi-pass ranking operation where we first apply a very lightweight ranking model to predict how likely a person is to be interested in each of the large number of candidate ads. And then once those number of ads are filtered down to a smaller number of ads with a higher probability of conversion as assessed by the lightweight ranking model, then the smaller number of ads are given to a much more heavyweight ranking model, which applies a much more sophisticated machine learning model to do better precision predictions about how interested a person would be in each of the candidate ads. And ultimately that produces a probability that somebody's gonna be interested in that ad for each of those ads. And that ad with its probability is then entered into an auction where that auction price is the expected probability of conversion, multiplied by the bid that the advertiser has bid for that ad slot, for that particular person. And that produces a expected value of showing that ad, probability of conversion times expected value, or times bid value, which then gives you the expected value of showing the ad. And those are sorted based on the expected value of showing the ad. And the top five ad candidates are then returned to the application that asked for ad candidates to show a person. Then the application has final decision making authority over which of those five ads to show in which of the slots that are available. 'Cause you can imagine when someone loads a feed, there's not just one slot available, but there's a slot at the top and then there's a slot several places lower in the feed, et cetera. And so then the app can say, well, this ad more closely fits with the content around the top, or this ad more closely fits with the content around the second slot, et cetera. - Okay, that's very informative. So maybe can we just kind of hover here? And can you walk me through like, what is ad ranking at a conceptual level? Like what is the purpose of that? What is that designed to do? - Yeah, ad ranking is designed to select the ad that has the best probability of driving the outcomes that our advertisers have sought to purchase when interacting with meta, as well as the probability that people who see the ad are gonna be satisfied with that ad, interested in that ad. And those kind of dual objectives are the purpose of the ad's ranking system. And so it is taking in information about who the advertiser is trying to find and advertise to and potentially convert to purchase their product or service. And what kind of interests a person has, what they may be interested in seeing or doing and the types of goods or services they're interested in purchasing. And that kind of matching process is assessed on these probabilities, the ranking models effectively predict. So ranking models are fine-tuned to predict as accurately as possible, what's gonna be interesting to you among the set of available ad candidates and then produce a ranking based on that interest to you. - And how like, I mean, I imagine these are very distinct systems, but like how does that differ from the kind of ranking process that you apply just to content, right? So like obviously you're filling the feed with ads and content, right? Do they just have like a different objective functions or like how do those differ? - Yeah, it's an interesting question. And I think one way to think about it is in feed your friends and content creators on the internet are not effectively bidding for how much they want to have an interaction with you. And so that has driven a lot more solely on interest, whereas in the advertising auction, of course, advertiser pricing is an influence on which content gets shown to which people. Advertisers are basically saying, I'm willing to pay this much for an impression, a click, a purchase conversion. And that of course influences the value of the ad slot that we have to show and the relative ranking among advertisers. Given an equal probability of interest from a particular person, the advertiser that bids higher will wind up in the ad slot. - Right, I think that's like a really, you just kind of critical concept to understand. Maybe we could kind of just unpack that with a little bit more detail. So you talked about the expected value piece and the idea there is like, well, we've got this set of candidate ads that are targeted at this person. And like what we want to do is we want to make sure that the one that gets shown to that person creates the most value, right? And the way we think about the value is, well, the advertiser bid against something. It could, some outcome, right? Like, and oftentimes you think about that as like a purchase, but it could be an app install, right? It could be like an ad to cart, could be email registration, whatever the case, baby. And they price that. That's not you pricing ads. They're telling you what that's worth to them. And so then you say, well, okay, that's great, but that's not the only consideration here because if we just filled every ad slot with like the highest bidding ad, they might not convert, they might convert terribly, right? Like even if the advertiser is willing to pay a lot for it, if no one wants it, that we'd be wasting those impressions. So they kind of adjust that with this probability. And that's the real work here, isn't it? Like coming up with the probability, I mean, beyond like running an auction in like milliseconds, like coming up with the probability, calculating the probability that that person will respond to that stimulus, which is the ad, in the way that the advertiser cares about. - Yeah, that's exactly right. That is the hard work of the ranking model. It's creating an accurate prediction of the probability that this person would achieve the goal, the advertiser, the objective, the advertiser is set out if we showed them that. And a lot of the complexity is kind of assessing what are you interested in? How interested are you in this thing? Kind of based on what we've previously seen you click on or read through as an ad, et cetera. There are a lot of important signals that influence the kind of ranking model. And it's largely your past history of interactions with content on the services. - How easy is it to generalize across products, right? So like you could say, and 'cause you could imagine like you could get as granular with like the definition of this thing as you want, right? Like, you know, just looking around my desk, I've got a coffee mug with a letter P on it 'cause my wife's name starts with a letter P. And so you could say, well, our prediction is that Eric, he'll be very receptive to a coffee mug with a letter P on it. Or you could say a coffee mug, or you could say some sort of container that holds drinks, or you could say a household kitchenware. Like, is that just a function of how much data you have about me? Or is it how generalizable is that knowledge? - Yeah, I mean, this is an interesting kind of research problem in the AI space broadly. Like what level of generalization and granularity produces the kind of optimal results. And it varies a lot based on the kind of use case for example, if I'm into cycling, you can imagine there's a wide variety of cycling goods that I might be interested in purchasing. And specifically a signal about cycling is sufficient to increase the probability on each one of those ads. Cycling helmet, new bicycle, new bike shoes, new water bottles, aerodynamic handlebars, aerodynamic tires, et cetera. Whereas if we know something a little bit more like we've seen you purchase a bicycle previously and we've seen you purchase a bicycle helmet, maybe we should decrease the probability of showing those again, unless a certain amount of time has elapsed. And maybe instead we wanna be able to say, well, in your set of purchase journeys around cycling, you're more interested in new bike socks, new bike clip-in shoes, new bike aerodynamic handlebars, 'cause your probability of purchasing another bike is low for the next time window, right? And those are the kinds of purchase journeys that you can learn about if you see repeated kind of conversions. And that's what we are attempting to do with the gem modeling technology that we just started talking about. We wanna learn from those event sequences that we see from people in terms of their purchase journey. - Right, yeah, that's fascinating. I wanna dig in there. Let's maybe go back in time, I wanna say like 18 months. So I would love it if you could just kind of talk to me about the sort of, and let me know if I'm missing anything important in any big milestones. But to my mind, like the three most sort of like fundamental milestones here were Lattice, which I wanna say was like 18 months ago, and Dramina, which was in December, if I remember correctly. Can you just talk to me about what each of those is? - Yeah, yeah, happy to. That's great. And maybe I'd add two other kind of interesting pieces that kind of dovetail with those three. The first, of course, Advantage Plus Shopping, which I'm sure you've discussed previously and read about, but it has a very interesting dovetail with these pieces of technology. And the second is Generative AI for Ads Creative, which also fits into the story. So I would say starting from the earliest inception here, Advantage Plus kind of shopping suite is really built on the key insight that a lot of marketing today is data-driven and advertisers want to generate the best return on advertising spend that they can, 'cause that's great for their business. It allows their business to grow faster, produce better outcomes. And one of the key insights that our advertising team had is a lot of human time in the marketing space was spent doing kind of meticulous measurement and spreadsheets over which advertising audiences, which creatives, what were the performance, copy, paste, huge numeric strings over and over again, trying to do analysis, right? The big lever in Advantage Plus Shopping is the machine learning models they're always on, they're always measuring performance. They can move levers really rapidly in response to changing consumer sentiment, changing performance of a particular creative, new discoveries in which audience would be interested in this particular ad or this particular creative variant. And so that really powerful lever is the machine moving, the bid, the budgeting, the audience, the creative levers for the humans. The machine learning model is always awake. It doesn't get tired. It makes consistent analysis of the data, which leads to improved performance, right? So that is like one way that we're applying machine learning models to drive much better performance for advertisers and improve their return on ad spend. The kind of next technology breakthrough that we started to apply to the advertising space was a concept that we call or is called transfer learning. The idea here is that a model being trained on examples in a totally different domain can actually improve the performance of that same model in the first domain. And we like to use a kind of musical analogy when we talk about this. If I've been learning the piano from a young age and I get to junior high and piano is not offered, but I wanna continue to take music classes, maybe I decide to pick up the violin. And if I already have a piano background, I understand music theory and scores and tempo and how to read music and harmonies and melodies, et cetera, it's gonna be much easier for me to pick up the violin than a student that has never touched a musical instrument before. 'Cause I have this foundational structure of music that I've learned. And the same is true for transfer learning across models with different objectives. They learn some foundational structure that helps them improve performance when they're cross-trained on different objectives. So this kind of key technology breakthrough led us to develop Meta Lattice, which is multi-objective models trained on different objectives with different data sets that improve the performance of each objective over the original model that was solely trained on a single objective. So in the ads case here, we would train a model on clicks before, it would only see clicks and it would only be trained to predict the probability of a click. We would have another model that was trained on say landing page views. What is the probability that you'll actually get a landing page view by showing this particular ad? Well, it turns out you can improve the accuracy of the probability prediction for both models by training one model on each data set from both the click and the landing page view data sets. So there's no new data here, but just merging the models and then training the model on both data sets produces better outcomes for both objectives. So that was the Meta Lattice technology that allowed us to start this long series of model merging that produces better results on each objective as another data set is merged into the training set of the Lattice model. So that produces better outcomes for advertisers, more conversions per dollar spent, each time another model is merged into the Meta Lattice model. - Is that just kind of a case of like a mixture of experts type approach? Or are you actually taking the model, cutting the head off and then training a new head with different data? - I guess the way to think about this is it's a combination of mixture of experts and just scaling the examples that a model sees because each of those examples that a model sees has some signal about what you're interested in or not interested in in terms of a product or a feature that an advertiser is offering. It also has some signal about the type of creative that you're interested in, et cetera. And so part of the game there is just developing an architecture that allows you to predict different objectives independently while considering the expanded data set. But the technology breakthrough really is the transfer of learning. How much knowledge can you transfer from different data sets into a different objective? - I see, thank you. Okay, so that's Lattice. So that's the kind of like transfer learning breakthrough. And then talk to me about Andromeda. What is Andromeda? - So maybe one intervening technology development in the advertising space was of course the use of generative AI for the development of creatives. So advertisers were previously spending a lot of money and a lot of human time to develop small variations in creatives that they could test in different audiences or to determine what was interesting to people, improve the probability of conversion, et cetera. So one of the ways that Meta's ads team has attempted to help reduce the cost of testing and learning is to apply AI, specifically generative AI, to help generate those creative assets that you can test on. Whether that's variations in the text of a creative, the more variations you test, the more likely you are to find a variation that improves performance, as well as generative AI to develop backgrounds for your creatives. If you wanna highlight your product in a bunch of different backgrounds and discover which type of background improves your probability of conversion. And then of course being able to generate the images from text string directly. These reduce the development time and development cost of producing variants. And the more you can test, the more you can learn about which message, which background, which kind of overall creative is more appealing to different audiences, which message resonates with different audiences. And so really since Meta ads has existed, the secret to performance is to test and iterate, right? The more tests that you can run, the better the ultimate results that you can drive. And so this allowed advertisers, marketers, to drive down the cost of testing. The consequence of introducing dramatically more ad variants into the advertising system, of course is that original problem that we talked about where first you select, which ads are targeted to a particular user. Well, that problem has now become dramatically harder because there are way more ads and ad variants in the system than ever before. So it makes that retrieval step, the selection step really, really hard. As a result, the ads team started working on how do we do better at retrieval in the face of this Cambrian explosion of ads? The solution there really is that we threw dramatically more compute at the retrieval phase problem. So we worked with our hardware partner NVIDIA and our ads ranking teams and specifically our ads AI research teams to design end-to-end retrieval system that would be dramatically more performant than the old less intelligent retrieval system. In this case, we started to use GPUs at retrieval time and run a machine learning model that had personalization data in the machine learning model at retrieval time. You may have some preferences. You may like ads that look like this and not like that. You may like this particular phrasing and not this other phrasing. And this is all kind of data that has been learned in machine learning models from just watching you interact with previous ads that we've shown you. Well, before that information wouldn't be evaluated until the last ranking stage. So initially we'd select some number of ads. We'd pass them through to the ranking stage and then we'd realize 60% of these ads you're not gonna like anyway. And so this Andromeda step we can now apply personalization at retrieval or selection time, making sure that the ads that we select are ads that you're more likely to be interested in at selection time. So we do a lot more computation at retrieval that is driven by having access to a lot of compute by replacing the retrieval hardware with this Andromeda hardware. And we custom design a machine learning model that would not just select ads that were targeted at you and had high probability, but they would be ads that were targeted at you and would be more likely to convert for you. The key insight there is personalization of selection. Before we would select ads that are likely to convert. Now we select ads that are likely to convert for you. And that is the kind of Andromeda breakthrough. It's a large end-to-end design for hardware, software infrastructure, machine learning infrastructure and a custom machine learning model with personalization at selection time that improves recall and conversions per dollar for advertisers. - That's so fascinating. When you rank them, so I guess this is kind of a classification task, right? In a sense, 'cause I imagine like the probabilities are probably pretty close at the high end. Do you do like a softmax, like blow up the highest, like the number one rank just to draw that sharp distinction? - I guess I would say there are a variety of algorithms that we use to try to produce the separation that we're looking for as well as kind of cap the prediction rates at what is reasonable there. Personalized retrieval is just incorporating the signals that we know that you're interested in in a way that they weren't able to be incorporated at retrieval time before. When those better ads are then passed to the ranking stage, then a ranking model is gonna apply even more sophisticated signals because there are fewer ads and so you can apply a lot more compute per ad to do better prediction. The first stage is lower accuracy prediction, but now it's personalized in a way that it wasn't before. The second stage is the higher accuracy prediction, which was already being personalized 'cause we had a much larger compute budget for a smaller number of ads to rank. - The sponsor of this week's episode is incremental. Connected TV is here with the premium audiences everyone wants, but how are you measuring performance? Hoping your users will scan QR codes or smudge the screen by clicking the TV? CTV joins the new era of advertising. Your measurement should too. Incremental is the future of measurement, helping you measure CTV, linear, influencers, podcasts and more without user tracking. Google incremental, the future of measurement or click the link in this episode's description. - Okay, so we talked about Lattice. We talked about the Gini tools. I wanna come back to that, obviously. We talked about Andromeda. So now talk to me about GEM. So that was, I think the big reveal was that earnings. I mean, I might be wrong about that. There may have been like a blog post or something, but I haven't seen it. Talk to me about what is GEM? - Yeah, so there's a blog post before earnings and then we talked about it in the earnings call. GEM is our machine learning model that has been adapted to use sequence learning. So I think I was mentioning this a little bit before, but sequence learning is of course the big breakthrough that is driving innovation in models like llama and chat GPT large language models where they effectively learn to predict, say the next word in a sequence. So you can imagine previously, we were considering every ad that was shown to a person in a training example kind of independently. We showed you shoes and you were not interested in shoes. We showed you bicycles and you were not interested in bicycles. We showed you gardening shears. You were interested in gardening shears. And those are effectively like independent events that had independent probability assessment. When most of kind of user journeys are not single shot user journeys, right? Maybe it's the case that we show you an ad for a new cycle. We show you another ad for a new cycle. We show you another ad for a new cycle. And that one you linger on for a while and maybe you click on it, but then you decide you go off and do something else. A little while later, we saw that you're pretty interested in ads for new bicycles. And so we show you again another ad for a new bicycle and you again click and then maybe you browse around the advertiser's website looking for that new kind of bicycle that you may want to replace your current bicycle with. Maybe you decide to not purchase and then a little bit later, we show you another ad for a bicycle, you click, you browse around to the advertiser's website, you decide to add to cart and you finally do a checkout. You press that enter your credit card information, you press the buy button. So there is a kind of user journey that is encapsulated in those sequence of events. They weren't all independent random events, right? There was actually a sequence of view, view, view, click, view, click, browse, view, click, browse, convert. And so that sequence in the user journey could be something that we could represent and learn from in terms of our machine learning modeling. So the machine learning modeling is moving beyond just independent probability estimates in terms of conversions, but moving towards understanding conversion, journeys, sequences of user events, and then sequences of events that actually lead to that type of conversion. And we can actually improve the probability, accuracy of our predictions for conversion by looking at the kind of user journeys that you've already seen. So if we've already seen that like, maybe you've browsed around a cycling website, maybe you've clicked on some gardening shears, maybe you've looked at some toys for your toddlers, those three things that have happened in the past may substantially increase the probability that you're likely to convert in one of those areas. And it's not independent like the previous kind of generation of machine learning models assumed, there's a sequence of user journeys that can be learned and those prior interactions in the sequence can improve the accuracy of predicting how likely you are to convert on an ad that we show you based on that sequence in the user journey that the machine learning model has learned. So there's no new data anywhere in the process, but we've taken the sequence learning technology, applied it to the ads use case and produced innovation in machine learning for ads that is likely to increase the conversions and return on ad spend for advertisers because our prediction accuracy has improved by learning from those user journey sequences, user event sequences. - So it didn't meta then Facebook, didn't they kind of contributed to this sort of sequence to sequence architecture with like the differential equation solving model, right? Like it was pretty early application of this. - Yes, and I think that's like generic application of sequencing learning technology and this is the specific application of sequence learning to the ads use case. Yeah, I think that's right. - No, right, just to point out that you've been a pioneer of the research there. - Mark has been pretty prescient in understanding how valuable AI would be to the company and made a huge bet on AI research almost a decade ago, starting fair, maybe it was a little bit more than a decade ago, starting the Facebook AI research organization, trying to attract brilliant AI researchers around the world, having them work on a lot of problems that are relevant to the kind of meta business problems, as well as some problems that weren't immediately applicable but wound up paying off down the road. As you can see, a lot of our business today is critically reliant on AI technology, including ranking posts in feeds or reels, as well as predicting which ad to show to you based on your probability of conversion, as well as our new bets in the kind of llama large language model space. - Right, and I'm taking notes of all this stuff I wanna revisit and I'm starting to fear that we probably won't get to many of these, but so that talent issue is something I wanna come back to, but I think just kind of the natural next question when we've been talking about the progression of these with these technologies, that I imagine are still being kind of improved upon, right? Like it's not the end, right? But what's next? So we talked about the sort of progression of the transfer learning with Lattice and then you've got generative models, you've got Andromeda for ranking and you've got Gem for kind of thinking through sequences. What's next? What's on the roadmap? - Yeah, I mean, you can imagine that we have a large team of AI researchers in the ad space that is really focused on the two elements that drive better performance for advertisers. The first is, of course, driving increased conversions per dollar spent. And there, our teams are working on innovations in the retrieval space. They're working on innovations in the ranking space. They're working on those innovations in the advantage plus shopping automation type of space, as well as the other side of the ROI calculation. How much do you have to spend in order to achieve those results? Where we have investments along the lines of generative AI for creatives, which reduces your cost to generate creatives to test and learn. And of course, we've also talked about some of the investments that we're making in business messaging, which is also dependent on AI models, predicting which messages are going to be relevant to which people. And some of our newer engagements with AI agents, where AI agents are helping businesses reduce costs in their efforts to support customers, both pre-sale and post-sale. - Got it. I want to kind of revisit the generative AI for creative component here, because I mean, like, Met has been very public about the advancements here. And I still feel like it's, I don't want to say underappreciated, but I think the real innovation here is not sort of valued properly by advertisers, right? So like there's one, you know, very obvious consequence of having access to generative AI. Creative, you spend less money on producing creative, which can be a substantial cost, right? But I don't think that's what provides the real value. The real value is the ability to reach a person with just a more relevant ad. And then you increase receptivity to the ad. And then we go back to your kind of original answer, which means that probability of conversion goes up, right? And you said something at the outset that the retrieval process, it sort of sorts through the set of ads that are available to be shown to you, right? Well, that set of ads now is this, you know, sort of very discreet bank of images or videos or whatever that have been created and you can't grow beyond that. You can't select from outside of that, right? I think you see where I'm going. What happens when you can, right? And I think if you think about the theoretical endpoint or even like the kind of years out endpoint, is the goal to just be able to on the fly generate the ad that will produce the highest probability of X being done where X is that outcome? Is that kind of the end goal here? Yeah, I mean, I think the way that I would describe it is there are a variety of advertisers that want to operate their businesses in different ways. But there's certainly a class of advertisers who they care most about performance. And at the end of the day, they want as much automated generation as possible that drives as much performance as possible. That won't be all of our advertisers, but some of our advertisers, the holy grail for them would be that they would give us a URL to their website. We would look through their product catalog, generate the appropriate creatives, automatically show those creatives to the right audiences, test and learn and drive the most number of sales. And that would certainly drive down costs and be a dramatically easier experience for the businesses that that's the type of experience they're looking for. Again, that's not going to be all of our advertising partners. That's not going to be everyone that wants that. For example, with generative AI for ads creative, you talked about that allows you to drive better results by just finding a better creative. But I think one important second order consequence here is what you can learn from which creatives actually drove better results. Right, right. That humans are going to become really valuable. Some human can sit down and look at the results and say, well, why would this messaging perform so much better with this audience for our product? Right, that's not a question a machine learning model can answer for you, but a human can do some user research. They can talk to people. They can meet with customers and really try to understand why that message is resonating better. And that may influence the direction you want to take your particular brand voice with that audience. So there's a really important second order component of the learning that you can achieve by testing more and then looking at the results, which perform better, which perform worse and having humans try to understand why. I mean, that's an area that humans are kind of uniquely valuable that they can understand other humans behavior, like why would a human do this? And they can ask people, why would this message resonate more with you? I appreciate that answer. I find it's a very diplomatic answer. Let me speak briefly for the class of advertisers that are so mercenary, they don't care. I don't care why it works better. This is like a fun anecdote. So I ran user acquisition teams for gaming companies for a long time. And so I played a lot of games, just don't want to see the ads, right? And I also want to see the economy design. And I was playing a game one time and this ad pops up and it's a mobile gaming ads with like full screen interstitial. And I'm watching a game of Pong be played at like slow motion. Like that's what it was, like super low fidelity pixelated Pong being played at like excruciatingly slow pace. And I'm like, what is this? And I click on it and it's, I won't say the name of the company, but it takes me to like one of the biggest gaming companies, App Store Pages, right? And I know, I knew the CMO there. So I messaged him, I'm like, I took a screenshot of the ad when I was watching it. And I was like, what is this? And he's like, so the ad that performs the best. That is as simple as, we just tried a bunch of random stuff and that works the best. I'm not going to question it, right? And I think that's such a powerful idea in advertising. And obviously that applies to a specific subset of digital advertisers. There's a lot of advertisers for whom that idea is like abhorrent, right? But there are a lot of advertisers and they spend a lot of money where they would say, just do your thing. Do your thing, get me the most relevant ads that work the best and that drive the best ROAS and then take my money in doing that. - Yeah, there are a bunch of advertisers that will test anything. And sometimes that drives really great results for their business. I think the one other interesting signal there that is of course relevant in the ads ranking process is the negative feedback rate on an ad. So if your ad is full screen blocking, really slow, annoying people, the more they click X on your ad, the lower your ad will be ranked kind of going forward. So there is some kind of gating boundary condition that prevents really negative experiences. People have the ability to signal like what they don't like in an ad. We of course incorporate that deeply into our ranking systems. - So how do, I mean, advertisers that come to meta, they want performance, they want to utilize this full suite of tools. How do they stay on top of this? What advice would you give to advertisers who are interested in learning more about exactly what we're talking about? Aside from listening to the podcast, like how do advertisers stay on top of this so that they can utilize it? - Yeah, I mean, obviously we try to communicate the innovation that we're driving and the advertising space on our blog, at conferences, et cetera, by meeting with popular podcasters like you. But really the best way to stay on top of that, the technology is to use the product, right? Testing is the secret to success with meta advertising. You should know like what's changed, how has that affected your return on advertiser spend, which features have you tested out? Are they working for you or not? Why are they not working for you? And that is a real opportunity for marketers that use Facebook's tooling to discover what works for their business in particular, as well as try to understand what's not working and why. Testing is a secret to success. The number of iterations you have allows you to learn faster, more iterations you learn faster, and you can drive better results that way. - So I want to return to the talent question 'cause you mentioned that meta had built this like AI research lab like a decade ago or established it, you know, in the region of a decade ago. And just to kind of get on top of this, to get ahead of this, right? And my sense is that like the things that you're talking about, like when you're operating at the frontier of building these models, the number of people that could build these technologies is you could fit in, you know, in a football stadium or something smaller than that basketball stadium. At what point does that become a mode, just being able to assemble that talent? Because like it's not like universities aren't pumping out thousands of these people a year, right? I mean, there's a fairly small pool. There's a lot of competition for their talents. Like at what point is that a mode, just having this pool of talent of the sort of very small people that know how to build these things, like we have some significant share? At what point does that in itself become a mode? - Yeah, I mean, it's a really good question. I think you're right that the market for talent here is obviously very hot as there are so many applications of state-of-the-art AI today, whether that's generative AI or your prior generation ranking and recommendations AI, there are so many critical use cases that are driving results in every kind of application of technology space. I think maybe one observation is universities are pumping out thousands of people who understand AI every year. There are universities all over the world with lots of deep education in AI, and there's more kind of AI content available online in an open source way than ever before. The number of people that have an understanding of AI when graduating from undergrad is dramatically higher than it was five years ago or 10 years ago, as it's the really kind of hot part of the computer science field, as well as a number of PhDs in machine learning is larger than ever before as well, both because there's an extreme demand for talent here, and because there's so many interesting things that you can build with this kind of skill set. So I think there's obviously going to be a deep value for more experienced ML talent, and that's where you'll continue to see wars for experienced competent ML engineers, and the companies that are able to attract and retain that talent have a distinct advantage in driving results for their business. At the same time, the number of people entering the ML portion of the field is just growing every year, and we expect that trend to continue. When will the supply exceed the demand? That's a really interesting and open question. I think the same was true with the dot-com era in the early 2000s. There was more and more computer science graduates every year, and it seemed like the supply would never meet the demand, but it eventually caught up and may have been after a crash and then a recovery, but now there's a really healthy job market for SWEES, it was for the last two decades, and we expect the same thing to be true for ML engineers. There'll be quite a bit of demand for ML engineers for a long time to come. - Matt, this was fascinating. I hope I didn't probe too much, but I really appreciate your time here today. I know, given public company, it's hard to get these things to materialize, but I appreciate you being generous with your time today. - Yeah, thanks very much, Eric. It was a pleasure to be here and great talking to you.

Podcast Summary

Key Points:

  1. Introduction to Claricites as the sponsor of the podcast episode.
  2. Discussion on ad ranking and its importance in marketing insights.
  3. Interview with Matt Steiner from Metta discussing ad ranking process and technology advancements like Lattice and Dramina.

Summary:

The transcription includes an introduction to Claricites as the sponsor of the podcast episode, emphasizing the need for quick access to marketing insights without relying on data teams. Matt Steiner from Metta discusses the ad ranking process, detailing how ads are selected based on probabilities and auction principles. He also highlights the significance of technologies like Lattice and Dramina in enhancing ad performance through machine learning models and transfer learning.

The conversation delves into the complexities of predicting user behavior and optimizing ad placements based on data analysis. Steiner's insights offer a glimpse into the evolving landscape of ad technology and the continual efforts to improve ad targeting and performance for advertisers.

FAQs

Claricites is a sponsor offering a solution for quick access to marketing insights, internal data, and attribution without reliance on data teams. It caters to advanced performance marketing teams.

Matt Steiner supports the monetization infrastructure and ranking organization at Metta, focusing on ad serving systems, machine learning models, and AI research.

Ad ranking at Metta involves a multi-pass ranking operation using lightweight and heavyweight ranking models to predict the probability of a person's interest in ads. An auction system then determines which ad to show based on expected value.

Ad ranking aims to select ads that drive desired outcomes for advertisers and engage users effectively. It matches ads based on user interests and advertiser objectives.

Transfer learning involves training models in different domains to improve performance in the original domain. Metta's Lattice uses multi-objective models trained on different data sets to enhance ad prediction accuracy.

Meta's Advantage Plus Shopping suite leverages machine learning models for continuous performance measurement and optimization of ad campaigns, leading to improved return on ad spend for advertisers.

Chat with AI

Loading...

Pro features

Go deeper with this episode

Unlock creator-grade tools that turn any transcript into show notes and subtitle files.