Season 5, Episode 23: Meta's AI advertising playbook (with Matt Steiner)
44m 34s
The transcription includes an introduction to Claricites as the sponsor of the podcast episode, emphasizing the need for quick access to marketing insights without relying on data teams. Matt Steiner from Metta discusses the ad ranking process, detailing how ads are selected based on probabilities and auction principles. He also highlights the significance of technologies like Lattice and Dramina in enhancing ad performance through machine learning models and transfer learning. The conversation delves into the complexities of predicting user behavior and optimizing ad placements based on data analysis. Steiner's insights offer a glimpse into the evolving landscape of ad technology and the continual efforts to improve ad targeting and performance for advertisers.
Transcription
7998 Words, 46061 Characters
(upbeat music)
The sponsor of this week's episode is Claricites.
Tired of waiting for data team support?
Fed up with inflexible dashboards
where you can't explore your questions?
What if you could go from question to insight in minutes
with easy access to internal data
and attribution to prove your impact
without dependencies on data teams or manual spreadsheets?
That's why advanced performance marketing teams
such as those of Uber, Tilton Point and Delivery Hero
trust Claricites for everything marketing insights.
From creative optimization to channel deep dive.
Go to Claricites.com and schedule a demo
to learn how leading performance marketing teams
uncover new insights in minutes.
- The problem is that the distinction needs to be drawn
between the competence of the economists
and the correctness of their analysis.
- Hello and welcome to the Mobile Dev Memo podcast.
I'm your host Eric Sufert
and I'm joined today by Matt Steiner of Metta.
Matt, welcome to the podcast.
- Thanks, great to be here.
- Well, it's nice to have you here.
I think one of the questions that I get asked most frequently
by, you know, investors or whoever
is how does ad ranking work?
And I have some conceptual ideas, right?
You know, I've built ad systems
but I don't know how it works at Metta.
I've never worked at Metta.
And so I thought it'd be really interesting
to have someone on the podcast
that could kind of elucidate that for the audience.
And there I think is no better person to have than you.
So welcome.
- That's kind of you to say.
Thank you very much.
- Before we get into all that,
can you introduce yourself to the audience
in your own words and kind of talk about your role at Metta?
- Yeah, happy to.
So my name is Matt Steiner
and I support the monetization infrastructure
and ranking organization.
This is the set of teams that build the infrastructure
that underlies our ad system,
including the ad serving system,
the conversion tracking system,
the storage systems that house our ads,
the batch processing systems that compute data
about ads or metrics
about how ads have been interacted with
and the machine learning model training system
for ads machine learning models.
It also includes our teams that do AI research
on ranking and recommendation research,
as well as our teams that do development
of machine learning models
as well as accelerating machine learning model training
and really kind of end to end optimizing the training
and serving of machine learning models
that make predictions about which ad to show a person
when they log into one of Metta's apps.
- And you've been with the company
for quite some time, right?
- Yes, I believe I've been here about 10 years now.
- Well, continuing the theme
with all the Metta employees that I've interviewed.
And have you always been within like the kind of,
what you might call like the ad science portion
of the company or how was your journey
at the company change over the years?
- Yeah, I initially started working on rich media
in Messenger.
I spent about two years doing that.
And then I moved into the ads world
as the ads business was scaling pretty substantially
and needed people who had supported larger organizations
that had a deep technical background.
'Cause as you know, ads is a very technical,
deeply mathematical kind of field.
And so we want to select for people
that have that kind of inclination.
- Yeah, it's funny.
I found this YouTube clip from,
I want to say it's like 10 years old or even older.
And I wish I should have pulled it up before we started,
but it's from, I want to say like,
it was Metta's like chief auction economist or something.
And he was actually filling in
for the person that was supposed to give the talk.
But it was, he was giving a talk at UC Berkeley,
like to some MBA students.
And he was just walking through the process
of serving an ad, right?
And it was just so fascinating.
I think that was my first exposure
to really like auction theory.
And it was really fascinating.
And I went out and after watching it,
I bought a book called Auction Theory.
It's like the kind of canonical text
about designing auctions.
And it was just, it was mind blowing.
It was just like how mathematical it was.
Like every page is full of mathematical notation.
And I went to, I guess I never thought about it before that.
Like I never spent too much time thinking about auctions
and how they work.
But, and then just, I mean,
I'm working on something right now just about like app pricing.
And I've been reading through a lot of like how variants work.
And he's written like the economical, like economics,
like intro textbook.
And he has a whole chapter about auctions.
It's just, I don't know, it's just funny.
Like you just kind of wake up to like just how,
first of all, like how complex it can be.
Cause you wouldn't think like us.
You see like an auctioneer like at a charity gala
or whatever, like that's an auction.
That seems pretty easy.
But when you're designing this stuff for scale,
for ads ranking and all that kind of stuff,
it gets very complex very quickly.
- Yeah.
There's a lot of deep auction theory
and a lot of really interesting incentive issues.
And of course pricing is a challenge everywhere in business,
but auctions give you a mechanism to drive kind of maximal
fairness with your incentives and your auction.
- Okay. So I recorded a podcast a few years ago
called how a bid becomes a DAU,
which is like kind of meant to be, you know,
sort of a song to the tune of how a bill becomes a law.
And it walked through the programmatic ad serving flow
from an impression becoming available
through to an ad being filled.
I'd love to get like Meta's version of that.
So just kind of like to kick off the conversation
with just a sort of very high concept question.
Can you walk me through the process of how a bid
becomes an ad on Meta's platform?
- I think the way the ads ranking system at Meta works
is very similar to how other kind of ad systems would work
in that when an ad slot becomes available,
one of the apps for Meta's businesses
will call out to the ad system
and ask for a set of ads to show.
And it basically passes along,
hello, this is a particular person
that we have an ad slot available to show for
and give me the top five ads that would be best suited
to this person in this context,
whether that's Reels or feed on Instagram
or feels on feed on Facebook, et cetera.
And the first thing that happens
is that the ad request goes to our indexing system
and the indexing system collects all of the ads
that have been targeted to this particular person.
So that could be a very large number of ads.
And as a result, you can't rank all of them
with a heavy-rate ranking model.
So we do what we'd call a multi-pass ranking operation
where we first apply a very lightweight ranking model
to predict how likely a person is to be interested
in each of the large number of candidate ads.
And then once those number of ads are filtered down
to a smaller number of ads
with a higher probability of conversion
as assessed by the lightweight ranking model,
then the smaller number of ads
are given to a much more heavyweight ranking model,
which applies a much more sophisticated machine learning
model to do better precision predictions
about how interested a person would be
in each of the candidate ads.
And ultimately that produces a probability
that somebody's gonna be interested in that ad
for each of those ads.
And that ad with its probability
is then entered into an auction
where that auction price
is the expected probability of conversion,
multiplied by the bid that the advertiser has bid
for that ad slot, for that particular person.
And that produces a expected value of showing that ad,
probability of conversion times expected value,
or times bid value,
which then gives you the expected value of showing the ad.
And those are sorted based on the expected value
of showing the ad.
And the top five ad candidates are then returned
to the application that asked for ad candidates
to show a person.
Then the application has final decision making authority
over which of those five ads to show
in which of the slots that are available.
'Cause you can imagine when someone loads a feed,
there's not just one slot available,
but there's a slot at the top
and then there's a slot several places lower
in the feed, et cetera.
And so then the app can say,
well, this ad more closely fits
with the content around the top,
or this ad more closely fits with the content
around the second slot, et cetera.
- Okay, that's very informative.
So maybe can we just kind of hover here?
And can you walk me through like,
what is ad ranking at a conceptual level?
Like what is the purpose of that?
What is that designed to do?
- Yeah, ad ranking is designed to select the ad
that has the best probability of driving the outcomes
that our advertisers have sought to purchase
when interacting with meta,
as well as the probability that people who see the ad
are gonna be satisfied with that ad, interested in that ad.
And those kind of dual objectives
are the purpose of the ad's ranking system.
And so it is taking in information
about who the advertiser is trying to find
and advertise to and potentially convert
to purchase their product or service.
And what kind of interests a person has,
what they may be interested in seeing or doing
and the types of goods or services
they're interested in purchasing.
And that kind of matching process
is assessed on these probabilities,
the ranking models effectively predict.
So ranking models are fine-tuned
to predict as accurately as possible,
what's gonna be interesting to you
among the set of available ad candidates
and then produce a ranking based on that interest to you.
- And how like, I mean,
I imagine these are very distinct systems,
but like how does that differ from the kind of ranking process
that you apply just to content, right?
So like obviously you're filling the feed
with ads and content, right?
Do they just have like a different objective functions
or like how do those differ?
- Yeah, it's an interesting question.
And I think one way to think about it is
in feed your friends and content creators on the internet
are not effectively bidding
for how much they want to have an interaction with you.
And so that has driven a lot more solely on interest,
whereas in the advertising auction,
of course, advertiser pricing is an influence
on which content gets shown to which people.
Advertisers are basically saying,
I'm willing to pay this much for an impression,
a click, a purchase conversion.
And that of course influences the value of the ad slot
that we have to show
and the relative ranking among advertisers.
Given an equal probability of interest
from a particular person,
the advertiser that bids higher
will wind up in the ad slot.
- Right, I think that's like a really,
you just kind of critical concept to understand.
Maybe we could kind of just unpack that
with a little bit more detail.
So you talked about the expected value piece
and the idea there is like,
well, we've got this set of candidate ads
that are targeted at this person.
And like what we want to do is we want to make sure
that the one that gets shown to that person
creates the most value, right?
And the way we think about the value is,
well, the advertiser bid against something.
It could, some outcome, right?
Like, and oftentimes you think about that as like a purchase,
but it could be an app install, right?
It could be like an ad to cart,
could be email registration, whatever the case, baby.
And they price that.
That's not you pricing ads.
They're telling you what that's worth to them.
And so then you say, well, okay, that's great,
but that's not the only consideration here
because if we just filled every ad slot
with like the highest bidding ad,
they might not convert, they might convert terribly, right?
Like even if the advertiser is willing to pay a lot for it,
if no one wants it,
that we'd be wasting those impressions.
So they kind of adjust that with this probability.
And that's the real work here, isn't it?
Like coming up with the probability,
I mean, beyond like running an auction in like milliseconds,
like coming up with the probability,
calculating the probability that that person
will respond to that stimulus, which is the ad,
in the way that the advertiser cares about.
- Yeah, that's exactly right.
That is the hard work of the ranking model.
It's creating an accurate prediction of the probability
that this person would achieve the goal, the advertiser,
the objective, the advertiser is set out
if we showed them that.
And a lot of the complexity is kind of assessing
what are you interested in?
How interested are you in this thing?
Kind of based on what we've previously seen you click on
or read through as an ad, et cetera.
There are a lot of important signals
that influence the kind of ranking model.
And it's largely your past history of interactions
with content on the services.
- How easy is it to generalize across products, right?
So like you could say, and 'cause you could imagine
like you could get as granular with like the definition
of this thing as you want, right?
Like, you know, just looking around my desk,
I've got a coffee mug with a letter P on it
'cause my wife's name starts with a letter P.
And so you could say, well, our prediction is that Eric,
he'll be very receptive to a coffee mug with a letter P on it.
Or you could say a coffee mug,
or you could say some sort of container that holds drinks,
or you could say a household kitchenware.
Like, is that just a function of how much data
you have about me?
Or is it how generalizable is that knowledge?
- Yeah, I mean, this is an interesting kind of research
problem in the AI space broadly.
Like what level of generalization and granularity produces
the kind of optimal results.
And it varies a lot based on the kind of use case
for example, if I'm into cycling,
you can imagine there's a wide variety of cycling goods
that I might be interested in purchasing.
And specifically a signal about cycling is sufficient
to increase the probability on each one of those ads.
Cycling helmet, new bicycle, new bike shoes,
new water bottles, aerodynamic handlebars,
aerodynamic tires, et cetera.
Whereas if we know something a little bit more
like we've seen you purchase a bicycle previously
and we've seen you purchase a bicycle helmet,
maybe we should decrease the probability
of showing those again, unless a certain amount of time
has elapsed.
And maybe instead we wanna be able to say,
well, in your set of purchase journeys around cycling,
you're more interested in new bike socks,
new bike clip-in shoes, new bike aerodynamic handlebars,
'cause your probability of purchasing another bike
is low for the next time window, right?
And those are the kinds of purchase journeys
that you can learn about if you see repeated kind of conversions.
And that's what we are attempting to do
with the gem modeling technology
that we just started talking about.
We wanna learn from those event sequences
that we see from people in terms of their purchase journey.
- Right, yeah, that's fascinating.
I wanna dig in there.
Let's maybe go back in time, I wanna say like 18 months.
So I would love it if you could just kind of talk to me
about the sort of, and let me know
if I'm missing anything important in any big milestones.
But to my mind, like the three most sort of like
fundamental milestones here were Lattice,
which I wanna say was like 18 months ago,
and Dramina, which was in December,
if I remember correctly.
Can you just talk to me about what each of those is?
- Yeah, yeah, happy to.
That's great.
And maybe I'd add two other kind of interesting pieces
that kind of dovetail with those three.
The first, of course, Advantage Plus Shopping,
which I'm sure you've discussed previously and read about,
but it has a very interesting dovetail
with these pieces of technology.
And the second is Generative AI for Ads Creative,
which also fits into the story.
So I would say starting from the earliest inception here,
Advantage Plus kind of shopping suite
is really built on the key insight
that a lot of marketing today is data-driven
and advertisers want to generate the best return
on advertising spend that they can,
'cause that's great for their business.
It allows their business to grow faster,
produce better outcomes.
And one of the key insights that our advertising team had
is a lot of human time in the marketing space
was spent doing kind of meticulous measurement
and spreadsheets over which advertising audiences,
which creatives, what were the performance,
copy, paste, huge numeric strings over and over again,
trying to do analysis, right?
The big lever in Advantage Plus Shopping
is the machine learning models they're always on,
they're always measuring performance.
They can move levers really rapidly
in response to changing consumer sentiment,
changing performance of a particular creative,
new discoveries in which audience would be interested
in this particular ad or this particular creative variant.
And so that really powerful lever is the machine moving,
the bid, the budgeting, the audience,
the creative levers for the humans.
The machine learning model is always awake.
It doesn't get tired.
It makes consistent analysis of the data,
which leads to improved performance, right?
So that is like one way
that we're applying machine learning models
to drive much better performance for advertisers
and improve their return on ad spend.
The kind of next technology breakthrough
that we started to apply to the advertising space
was a concept that we call or is called transfer learning.
The idea here is that a model being trained
on examples in a totally different domain
can actually improve the performance
of that same model in the first domain.
And we like to use a kind of musical analogy
when we talk about this.
If I've been learning the piano from a young age
and I get to junior high and piano is not offered,
but I wanna continue to take music classes,
maybe I decide to pick up the violin.
And if I already have a piano background,
I understand music theory and scores and tempo
and how to read music and harmonies and melodies,
et cetera, it's gonna be much easier for me
to pick up the violin than a student
that has never touched a musical instrument before.
'Cause I have this foundational structure of music
that I've learned.
And the same is true for transfer learning
across models with different objectives.
They learn some foundational structure
that helps them improve performance
when they're cross-trained on different objectives.
So this kind of key technology breakthrough
led us to develop Meta Lattice,
which is multi-objective models
trained on different objectives with different data sets
that improve the performance of each objective
over the original model that was solely trained
on a single objective.
So in the ads case here,
we would train a model on clicks before,
it would only see clicks and it would only be trained
to predict the probability of a click.
We would have another model that was trained
on say landing page views.
What is the probability that you'll actually get
a landing page view by showing this particular ad?
Well, it turns out you can improve the accuracy
of the probability prediction for both models
by training one model on each data set
from both the click and the landing page view data sets.
So there's no new data here, but just merging the models
and then training the model on both data sets
produces better outcomes for both objectives.
So that was the Meta Lattice technology
that allowed us to start this long series of model merging
that produces better results on each objective
as another data set is merged into the training set
of the Lattice model.
So that produces better outcomes for advertisers,
more conversions per dollar spent,
each time another model is merged into the Meta Lattice model.
- Is that just kind of a case of like a mixture
of experts type approach?
Or are you actually taking the model, cutting the head off
and then training a new head with different data?
- I guess the way to think about this is
it's a combination of mixture of experts
and just scaling the examples that a model sees
because each of those examples that a model sees
has some signal about what you're interested in
or not interested in in terms of a product or a feature
that an advertiser is offering.
It also has some signal about the type of creative
that you're interested in, et cetera.
And so part of the game there is just developing
an architecture that allows you to predict different
objectives independently while considering
the expanded data set.
But the technology breakthrough really
is the transfer of learning.
How much knowledge can you transfer
from different data sets into a different objective?
- I see, thank you.
Okay, so that's Lattice.
So that's the kind of like transfer learning breakthrough.
And then talk to me about Andromeda.
What is Andromeda?
- So maybe one intervening technology development
in the advertising space was of course
the use of generative AI for the development of creatives.
So advertisers were previously spending a lot of money
and a lot of human time to develop small variations
in creatives that they could test in different audiences
or to determine what was interesting to people,
improve the probability of conversion, et cetera.
So one of the ways that Meta's ads team has attempted
to help reduce the cost of testing and learning
is to apply AI, specifically generative AI,
to help generate those creative assets
that you can test on.
Whether that's variations in the text of a creative,
the more variations you test,
the more likely you are to find a variation
that improves performance,
as well as generative AI to develop backgrounds
for your creatives.
If you wanna highlight your product
in a bunch of different backgrounds
and discover which type of background
improves your probability of conversion.
And then of course being able to generate the images
from text string directly.
These reduce the development time
and development cost of producing variants.
And the more you can test,
the more you can learn about which message,
which background, which kind of overall creative
is more appealing to different audiences,
which message resonates with different audiences.
And so really since Meta ads has existed,
the secret to performance is to test and iterate, right?
The more tests that you can run,
the better the ultimate results that you can drive.
And so this allowed advertisers, marketers,
to drive down the cost of testing.
The consequence of introducing dramatically more ad variants
into the advertising system,
of course is that original problem that we talked about
where first you select,
which ads are targeted to a particular user.
Well, that problem has now become dramatically harder
because there are way more ads and ad variants
in the system than ever before.
So it makes that retrieval step,
the selection step really, really hard.
As a result, the ads team started working on
how do we do better at retrieval
in the face of this Cambrian explosion of ads?
The solution there really is that we threw
dramatically more compute at the retrieval phase problem.
So we worked with our hardware partner NVIDIA
and our ads ranking teams
and specifically our ads AI research teams
to design end-to-end retrieval system
that would be dramatically more performant
than the old less intelligent retrieval system.
In this case, we started to use GPUs at retrieval time
and run a machine learning model
that had personalization data
in the machine learning model at retrieval time.
You may have some preferences.
You may like ads that look like this and not like that.
You may like this particular phrasing
and not this other phrasing.
And this is all kind of data
that has been learned in machine learning models
from just watching you interact with previous ads
that we've shown you.
Well, before that information wouldn't be evaluated
until the last ranking stage.
So initially we'd select some number of ads.
We'd pass them through to the ranking stage
and then we'd realize 60% of these ads
you're not gonna like anyway.
And so this Andromeda step
we can now apply personalization at retrieval
or selection time,
making sure that the ads that we select
are ads that you're more likely to be interested in
at selection time.
So we do a lot more computation at retrieval
that is driven by having access to a lot of compute
by replacing the retrieval hardware
with this Andromeda hardware.
And we custom design a machine learning model
that would not just select ads that were targeted at you
and had high probability,
but they would be ads that were targeted at you
and would be more likely to convert for you.
The key insight there is personalization of selection.
Before we would select ads that are likely to convert.
Now we select ads that are likely to convert for you.
And that is the kind of Andromeda breakthrough.
It's a large end-to-end design for hardware,
software infrastructure, machine learning infrastructure
and a custom machine learning model with personalization
at selection time that improves recall
and conversions per dollar for advertisers.
- That's so fascinating.
When you rank them,
so I guess this is kind of a classification task, right?
In a sense, 'cause I imagine like the probabilities
are probably pretty close at the high end.
Do you do like a softmax, like blow up the highest,
like the number one rank just to draw that sharp distinction?
- I guess I would say there are a variety of algorithms
that we use to try to produce the separation
that we're looking for as well as kind of cap
the prediction rates at what is reasonable there.
Personalized retrieval is just incorporating the signals
that we know that you're interested in
in a way that they weren't able to be incorporated
at retrieval time before.
When those better ads are then passed to the ranking stage,
then a ranking model is gonna apply
even more sophisticated signals because there are fewer ads
and so you can apply a lot more compute per ad
to do better prediction.
The first stage is lower accuracy prediction,
but now it's personalized in a way that it wasn't before.
The second stage is the higher accuracy prediction,
which was already being personalized
'cause we had a much larger compute budget
for a smaller number of ads to rank.
- The sponsor of this week's episode is incremental.
Connected TV is here with the premium audiences
everyone wants, but how are you measuring performance?
Hoping your users will scan QR codes
or smudge the screen by clicking the TV?
CTV joins the new era of advertising.
Your measurement should too.
Incremental is the future of measurement,
helping you measure CTV, linear, influencers, podcasts
and more without user tracking.
Google incremental, the future of measurement
or click the link in this episode's description.
- Okay, so we talked about Lattice.
We talked about the Gini tools.
I wanna come back to that, obviously.
We talked about Andromeda.
So now talk to me about GEM.
So that was, I think the big reveal was that earnings.
I mean, I might be wrong about that.
There may have been like a blog post or something,
but I haven't seen it.
Talk to me about what is GEM?
- Yeah, so there's a blog post before earnings
and then we talked about it in the earnings call.
GEM is our machine learning model
that has been adapted to use sequence learning.
So I think I was mentioning this a little bit before,
but sequence learning is of course the big breakthrough
that is driving innovation in models like llama
and chat GPT large language models
where they effectively learn to predict,
say the next word in a sequence.
So you can imagine previously,
we were considering every ad that was shown to a person
in a training example kind of independently.
We showed you shoes and you were not interested in shoes.
We showed you bicycles
and you were not interested in bicycles.
We showed you gardening shears.
You were interested in gardening shears.
And those are effectively like independent events
that had independent probability assessment.
When most of kind of user journeys
are not single shot user journeys, right?
Maybe it's the case that we show you an ad for a new cycle.
We show you another ad for a new cycle.
We show you another ad for a new cycle.
And that one you linger on for a while
and maybe you click on it,
but then you decide you go off and do something else.
A little while later,
we saw that you're pretty interested in ads for new bicycles.
And so we show you again another ad for a new bicycle
and you again click
and then maybe you browse around the advertiser's website
looking for that new kind of bicycle
that you may want to replace your current bicycle with.
Maybe you decide to not purchase
and then a little bit later,
we show you another ad for a bicycle, you click,
you browse around to the advertiser's website,
you decide to add to cart
and you finally do a checkout.
You press that enter your credit card information,
you press the buy button.
So there is a kind of user journey
that is encapsulated in those sequence of events.
They weren't all independent random events, right?
There was actually a sequence of view, view, view, click,
view, click, browse, view, click, browse, convert.
And so that sequence in the user journey
could be something that we could represent and learn from
in terms of our machine learning modeling.
So the machine learning modeling is moving beyond
just independent probability estimates
in terms of conversions,
but moving towards understanding conversion,
journeys, sequences of user events,
and then sequences of events
that actually lead to that type of conversion.
And we can actually improve the probability,
accuracy of our predictions for conversion
by looking at the kind of user journeys
that you've already seen.
So if we've already seen that like,
maybe you've browsed around a cycling website,
maybe you've clicked on some gardening shears,
maybe you've looked at some toys for your toddlers,
those three things that have happened in the past
may substantially increase the probability
that you're likely to convert in one of those areas.
And it's not independent like the previous kind of generation
of machine learning models assumed,
there's a sequence of user journeys that can be learned
and those prior interactions in the sequence
can improve the accuracy of predicting
how likely you are to convert on an ad that we show you
based on that sequence in the user journey
that the machine learning model has learned.
So there's no new data anywhere in the process,
but we've taken the sequence learning technology,
applied it to the ads use case
and produced innovation in machine learning for ads
that is likely to increase the conversions
and return on ad spend for advertisers
because our prediction accuracy has improved
by learning from those user journey sequences,
user event sequences.
- So it didn't meta then Facebook,
didn't they kind of contributed
to this sort of sequence to sequence architecture
with like the differential equation solving model, right?
Like it was pretty early application of this.
- Yes, and I think that's like generic application
of sequencing learning technology
and this is the specific application of sequence learning
to the ads use case.
Yeah, I think that's right.
- No, right, just to point out
that you've been a pioneer of the research there.
- Mark has been pretty prescient
in understanding how valuable AI would be to the company
and made a huge bet on AI research almost a decade ago,
starting fair, maybe it was a little bit more
than a decade ago,
starting the Facebook AI research organization,
trying to attract brilliant AI researchers around the world,
having them work on a lot of problems
that are relevant to the kind of meta business problems,
as well as some problems that weren't immediately applicable
but wound up paying off down the road.
As you can see, a lot of our business today
is critically reliant on AI technology,
including ranking posts in feeds or reels,
as well as predicting which ad to show to you
based on your probability of conversion,
as well as our new bets
in the kind of llama large language model space.
- Right, and I'm taking notes of all this stuff
I wanna revisit and I'm starting to fear
that we probably won't get to many of these,
but so that talent issue is something I wanna come back to,
but I think just kind of the natural next question
when we've been talking about the progression
of these with these technologies,
that I imagine are still being kind of improved upon, right?
Like it's not the end, right?
But what's next?
So we talked about the sort of progression
of the transfer learning with Lattice
and then you've got generative models,
you've got Andromeda for ranking
and you've got Gem for kind of thinking through sequences.
What's next?
What's on the roadmap?
- Yeah, I mean, you can imagine
that we have a large team of AI researchers in the ad space
that is really focused on the two elements
that drive better performance for advertisers.
The first is, of course, driving increased conversions
per dollar spent.
And there, our teams are working on innovations
in the retrieval space.
They're working on innovations in the ranking space.
They're working on those innovations
in the advantage plus shopping automation type of space,
as well as the other side of the ROI calculation.
How much do you have to spend
in order to achieve those results?
Where we have investments along the lines
of generative AI for creatives,
which reduces your cost to generate creatives
to test and learn.
And of course, we've also talked about some of the investments
that we're making in business messaging,
which is also dependent on AI models,
predicting which messages are going to be relevant
to which people.
And some of our newer engagements with AI agents,
where AI agents are helping businesses reduce costs
in their efforts to support customers,
both pre-sale and post-sale.
- Got it.
I want to kind of revisit the generative AI
for creative component here,
because I mean, like,
Met has been very public about the advancements here.
And I still feel like it's,
I don't want to say underappreciated,
but I think the real innovation here
is not sort of valued properly by advertisers, right?
So like there's one, you know,
very obvious consequence of having access to generative AI.
Creative, you spend less money on producing creative,
which can be a substantial cost, right?
But I don't think that's what provides the real value.
The real value is the ability to reach a person
with just a more relevant ad.
And then you increase receptivity to the ad.
And then we go back to your kind of original answer,
which means that probability of conversion goes up, right?
And you said something at the outset
that the retrieval process,
it sort of sorts through the set of ads
that are available to be shown to you, right?
Well, that set of ads now is this, you know,
sort of very discreet bank of images or videos or whatever
that have been created and you can't grow beyond that.
You can't select from outside of that, right?
I think you see where I'm going.
What happens when you can, right?
And I think if you think about the theoretical endpoint
or even like the kind of years out endpoint,
is the goal to just be able to on the fly generate the ad
that will produce the highest probability of X being done
where X is that outcome?
Is that kind of the end goal here?
Yeah, I mean, I think the way that I would describe it is
there are a variety of advertisers
that want to operate their businesses in different ways.
But there's certainly a class of advertisers
who they care most about performance.
And at the end of the day,
they want as much automated generation as possible
that drives as much performance as possible.
That won't be all of our advertisers,
but some of our advertisers,
the holy grail for them would be
that they would give us a URL to their website.
We would look through their product catalog,
generate the appropriate creatives,
automatically show those creatives
to the right audiences, test and learn
and drive the most number of sales.
And that would certainly drive down costs
and be a dramatically easier experience
for the businesses
that that's the type of experience they're looking for.
Again, that's not going to be all of our advertising partners.
That's not going to be everyone that wants that.
For example, with generative AI for ads creative,
you talked about that allows you to drive better results
by just finding a better creative.
But I think one important second order consequence here
is what you can learn
from which creatives actually drove better results.
Right, right.
That humans are going to become really valuable.
Some human can sit down and look at the results and say,
well, why would this messaging perform so much better
with this audience for our product?
Right, that's not a question
a machine learning model can answer for you,
but a human can do some user research.
They can talk to people.
They can meet with customers
and really try to understand
why that message is resonating better.
And that may influence the direction
you want to take your particular brand voice
with that audience.
So there's a really important second order component
of the learning that you can achieve by testing more
and then looking at the results,
which perform better, which perform worse
and having humans try to understand why.
I mean, that's an area that humans are kind of uniquely valuable
that they can understand other humans behavior,
like why would a human do this?
And they can ask people,
why would this message resonate more with you?
I appreciate that answer.
I find it's a very diplomatic answer.
Let me speak briefly for the class of advertisers
that are so mercenary, they don't care.
I don't care why it works better.
This is like a fun anecdote.
So I ran user acquisition teams
for gaming companies for a long time.
And so I played a lot of games,
just don't want to see the ads, right?
And I also want to see the economy design.
And I was playing a game one time and this ad pops up
and it's a mobile gaming ads
with like full screen interstitial.
And I'm watching a game of Pong be played at like slow motion.
Like that's what it was,
like super low fidelity pixelated Pong
being played at like excruciatingly slow pace.
And I'm like, what is this?
And I click on it and it's,
I won't say the name of the company,
but it takes me to like
one of the biggest gaming companies, App Store Pages, right?
And I know, I knew the CMO there.
So I messaged him, I'm like,
I took a screenshot of the ad when I was watching it.
And I was like, what is this?
And he's like, so the ad that performs the best.
That is as simple as,
we just tried a bunch of random stuff
and that works the best.
I'm not going to question it, right?
And I think that's such a powerful idea in advertising.
And obviously that applies to a specific subset
of digital advertisers.
There's a lot of advertisers
for whom that idea is like abhorrent, right?
But there are a lot of advertisers
and they spend a lot of money where they would say,
just do your thing.
Do your thing, get me the most relevant ads
that work the best and that drive the best ROAS
and then take my money in doing that.
- Yeah, there are a bunch of advertisers
that will test anything.
And sometimes that drives really great results
for their business.
I think the one other interesting signal there
that is of course relevant in the ads ranking process
is the negative feedback rate on an ad.
So if your ad is full screen blocking,
really slow, annoying people,
the more they click X on your ad,
the lower your ad will be ranked kind of going forward.
So there is some kind of gating boundary condition
that prevents really negative experiences.
People have the ability to signal
like what they don't like in an ad.
We of course incorporate that deeply
into our ranking systems.
- So how do, I mean, advertisers that come to meta,
they want performance,
they want to utilize this full suite of tools.
How do they stay on top of this?
What advice would you give to advertisers
who are interested in learning more
about exactly what we're talking about?
Aside from listening to the podcast,
like how do advertisers stay on top of this
so that they can utilize it?
- Yeah, I mean, obviously we try to communicate
the innovation that we're driving
and the advertising space on our blog,
at conferences, et cetera,
by meeting with popular podcasters like you.
But really the best way to stay on top of that,
the technology is to use the product, right?
Testing is the secret to success with meta advertising.
You should know like what's changed,
how has that affected your return on advertiser spend,
which features have you tested out?
Are they working for you or not?
Why are they not working for you?
And that is a real opportunity for marketers
that use Facebook's tooling to discover
what works for their business in particular,
as well as try to understand what's not working and why.
Testing is a secret to success.
The number of iterations you have allows you to learn faster,
more iterations you learn faster,
and you can drive better results that way.
- So I want to return to the talent question
'cause you mentioned that meta had built this
like AI research lab like a decade ago or established it,
you know, in the region of a decade ago.
And just to kind of get on top of this,
to get ahead of this, right?
And my sense is that like the things
that you're talking about,
like when you're operating at the frontier
of building these models,
the number of people that could build these technologies
is you could fit in, you know, in a football stadium
or something smaller than that basketball stadium.
At what point does that become a mode,
just being able to assemble that talent?
Because like it's not like universities aren't pumping out
thousands of these people a year, right?
I mean, there's a fairly small pool.
There's a lot of competition for their talents.
Like at what point is that a mode,
just having this pool of talent
of the sort of very small people
that know how to build these things,
like we have some significant share?
At what point does that in itself become a mode?
- Yeah, I mean, it's a really good question.
I think you're right that the market for talent here
is obviously very hot
as there are so many applications of state-of-the-art AI today,
whether that's generative AI
or your prior generation ranking and recommendations AI,
there are so many critical use cases
that are driving results
in every kind of application of technology space.
I think maybe one observation is
universities are pumping out thousands of people
who understand AI every year.
There are universities all over the world
with lots of deep education in AI,
and there's more kind of AI content available online
in an open source way than ever before.
The number of people that have an understanding of AI
when graduating from undergrad is dramatically higher
than it was five years ago or 10 years ago,
as it's the really kind of hot part
of the computer science field,
as well as a number of PhDs in machine learning
is larger than ever before as well,
both because there's an extreme demand for talent here,
and because there's so many interesting things
that you can build with this kind of skill set.
So I think there's obviously going to be a deep value
for more experienced ML talent,
and that's where you'll continue to see wars
for experienced competent ML engineers,
and the companies that are able to attract
and retain that talent have a distinct advantage
in driving results for their business.
At the same time, the number of people entering
the ML portion of the field is just growing every year,
and we expect that trend to continue.
When will the supply exceed the demand?
That's a really interesting and open question.
I think the same was true with the dot-com era
in the early 2000s.
There was more and more computer science graduates
every year, and it seemed like the supply
would never meet the demand, but it eventually caught up
and may have been after a crash and then a recovery,
but now there's a really healthy job market
for SWEES, it was for the last two decades,
and we expect the same thing to be true for ML engineers.
There'll be quite a bit of demand for ML engineers
for a long time to come.
- Matt, this was fascinating.
I hope I didn't probe too much,
but I really appreciate your time here today.
I know, given public company,
it's hard to get these things to materialize,
but I appreciate you being generous with your time today.
- Yeah, thanks very much, Eric.
It was a pleasure to be here and great talking to you.
Podcast Summary
Key Points:
Introduction to Claricites as the sponsor of the podcast episode.
Discussion on ad ranking and its importance in marketing insights.
Interview with Matt Steiner from Metta discussing ad ranking process and technology advancements like Lattice and Dramina.
Summary:
The transcription includes an introduction to Claricites as the sponsor of the podcast episode, emphasizing the need for quick access to marketing insights without relying on data teams. Matt Steiner from Metta discusses the ad ranking process, detailing how ads are selected based on probabilities and auction principles. He also highlights the significance of technologies like Lattice and Dramina in enhancing ad performance through machine learning models and transfer learning.
The conversation delves into the complexities of predicting user behavior and optimizing ad placements based on data analysis. Steiner's insights offer a glimpse into the evolving landscape of ad technology and the continual efforts to improve ad targeting and performance for advertisers.
FAQs
Claricites is a sponsor offering a solution for quick access to marketing insights, internal data, and attribution without reliance on data teams. It caters to advanced performance marketing teams.
Matt Steiner supports the monetization infrastructure and ranking organization at Metta, focusing on ad serving systems, machine learning models, and AI research.
Ad ranking at Metta involves a multi-pass ranking operation using lightweight and heavyweight ranking models to predict the probability of a person's interest in ads. An auction system then determines which ad to show based on expected value.
Ad ranking aims to select ads that drive desired outcomes for advertisers and engage users effectively. It matches ads based on user interests and advertiser objectives.
Transfer learning involves training models in different domains to improve performance in the original domain. Metta's Lattice uses multi-objective models trained on different data sets to enhance ad prediction accuracy.
Meta's Advantage Plus Shopping suite leverages machine learning models for continuous performance measurement and optimization of ad campaigns, leading to improved return on ad spend for advertisers.
Chat with AI
Loading...
Pro features
Go deeper with this episode
Unlock creator-grade tools that turn any transcript into show notes and subtitle files.