Prediction vs. Search Models: What Data Scientists Are Missing

As data scientists, we’ve become extremely focused on building algorithms, causal/predictive models, and recommendation systems (and now genAI). We optimize for accuracy, fine-tune hyperparameters, and look for the next big fancy model to deploy in prod. But in our focus on delivering a state of the art implementation, we’ve overlooked a class of models that can reshape how we think about the business problem itself.

Consider the rise of platform companies like Amazon, Spotify, Netflix, Uber, and Upstart. While their industries appear vastly different, they fundamentally operate as intermediaries in search-and-matching markets between demand and supply agents. These companies’ value proposition lies in reducing search costs for customers by providing a platform and a matching algorithm to connect agents together under uncertainty and heterogeneous preferences.

The Core Challenge

In these markets, the fundamental questions aren’t just standard isolated machine learning problems such as “how do we predict demand?” or “how do ads impact churn rate?” Instead, the critical challenges are:

How many suppliers should we onboard given expected demand patterns?
How do we design matching mechanisms that generates the optimal allocation?
What pricing strategies maximize platform revenue while balancing platform growth and customer satisfaction?
How do we handle the downstream impact when changes in one model primitive has a ripple effect?

Traditional data science approaches treat these as independent optimization problems and dedicate separate workstreams to them. However, economists have been working on these problems since the 1980s and developed a unified theoretical framework to capture the interdependent nature of these platform dynamics called search theoretic models. Furthermore, this was something I’ve studied deeply in graduate school but have not seen applied in industry work, so I’d like to bring attention to this set of models.

Why This Matters for Data Scientists

Data science as a field is great at measurement and algorithms, but falls behind in problem formulation (which we have left to PMs and execs). Understanding these theoretical foundations informs how we think about what metrics to measure and what algorithms to build. Instead of building isolated prediction models, we can design systems that work jointly together to account for equilibrium effects, strategic behavior, and feedback loops. This theoretical lens helps us identify the correct experiment to run, understand when our models break down (cohort drift) due to changes in agent preferences, and design interventions that has a first-order impact on the equilibrium outcomes.

In this article, I’ll introduce the theory behind search models and demonstrate their practical application using a lending platform (Upstart/LendingClub/Prosper) that matches borrowers and banks as a concrete example. We’ll explore how this framework can inform partner acquisition strategies, pricing and fee mechanisms, and what levers should be used to drive growth. Interested readers can continue to the next section for a short background summarising how these models came to be, or skip straight to the practical example to understand how to design these models.

The Economic Literature

This modeling framework comes from economics in the 1980s, when Dale Mortensen, Christopher Pissarides, and Peter Diamond were trying to understand why unemployment exists even when there are job openings. This series of question led them to win the Nobel Prize in 2010 for their work. Their Diamond-Mortensen-Pissarides (DMP) model changed how we think about markets. The core insight is that finding a job (or hiring someone) takes time (and costs money), leading to frictions in an otherwise competitive market. Diamond showed in 1982 that when searching is costly, wages aren’t detemrined by aggregate supply and demand. Instead, they’re negotiated between a specific worker and firm after in a bilateral bargaining process. This negotiation uses Nash bargaining, where the wage depends on each party’s bargaining power and outside options. If either side has better outside options, they get a larger share of the value created by the match.

Mortensen expanded on this by showing that search costs create a pool of unemployed workers even in a healthy economy. Workers develop a “reservation wage”—the minimum they’ll accept based on what they expect to find if they keep searching. Firms similarly balance the cost of keeping a position open against the expected value a worker would bring. Pissarides then tied these individual negotiations to economy-wide patterns, showing how unemployment and job creation relate to business cycles.

In 2005, Duffie, Gârleanu, and Pedersen applied this same thinking to financial markets. In over-the-counter markets, buyers and sellers have to find each other, just like workers and firms. This search process creates bid-ask spreads and explains why the same asset can trade at different prices at the same time. A seller who needs cash immediately (high liquidity demand) might accept a lower price, while someone with enough time can wait for a better offer. Lagos and Rocheteau later relaxed restrictions on binary asset holdings and introduced a variable asset portfolio for each agent and showed how monetary policy affects these decentralized markets.

The third piece of the puzzle comes from platform economics. Platforms create a marketplace that require both sellers and buyers. Ride-sharing platforms needs both drivers and riders. Lending platforms need both borrowers and banks. The literature on two-sided markets shows how platforms can maximize their revenue by setting prices and jointly controlling the size of demand and supply agents. These platforms has to set a price to ensure that participants remain in the market (Incentive Compatbility constraint), and that accepting the transaction is beneficial for these agents (Individual Rationality constraint). Platforms could also handle instances of multiple markets (Amazon books/electronics), where demand/supply from one segment might have spillover effects into the other segment.

These three related streams of research can be combined to give us the tools to understand modern digital platform firms. Below I will show a practical example on how these concepts tie together in a theoretical model to understand the optimal behavior of a lending platform.

A Practical Example: Lending Platforms

Let’s apply this framework to lending platforms like Upstart, LendingClub, and Prosper. These companies use AI to underwrite loans, connecting banks that have available capital with consumers who need loans. They act as marketplaces where partner banks offer various loan types (personal, auto, mortgage) and consumers apply for credit. The platforms make money through origination fees, service fees, and late fees while reducing search costs for both sides since banks don’t need to find and evaluate borrowers themselves, and consumers don’t need to shop around multiple banks. From a platform perspective, these firms face key economic challenges:

Demand forecasting: How much loan demand will we see next quarter?
Supply management: How many partner banks do we need to handle that demand?
Competition design: How do we keep banks competing for borrowers without driving them away?
Matching mechanism: Should we use auctions, posted prices, or algorithmic matching to match borrowers and lenders?
Risk assessment: How do we model both bank risk appetite and borrower default probability?
Market segmentation: Are there any spillover effects between lending in different market segments?

None of these questions is easy to answer and each has many moving parts. You might forecast loan demand using time series models, but that aggregate number needs to be broken down by loan type, amount, and duration since banks have different preferences amongst these dimensions. Smaller banks with limited capital may only want to originate short-term loans to high-credit borrowers, while large banks might provide longer-term loans from riskier borrowers if they have excess capital. The matching algorithm needs to account for these preferences while ensuring both sides get enough value (trade surplus) to accept the offer.

In this framework, each loan represents a three-way negotiation between the borrower, bank, and platform. The borrower has the power to reject any offer, the bank has the ability to place a reservation interest rate, while the platform has the power to decide the allocation of the total trade surplus. The platform controls key parameters like interest rates and fees, since changing these affects participation on both sides. Rates that are too high cause borrowers to leave and lower adoption rate and increase churn. Rates that are too low reduce partner satisfaction and decrease the number of partners. Every decision shifts the equilibrium, and understanding these dynamics is crucial for platform growth.

The Model Environment

Let’s build the simplest model to understand these dynamics. We’ll start with assumptions that make the math tractable, which will make up our environment. This environment will only have one loan type lasting only one period, identical borrowers, and identical banks.

Our environment exists in discrete time $t \in \mathcal{T}$, with no inter-period discounting. There exists a loan of size $S$ with an interest rate of $r$, where $r$ is an endogenous variable (whose outcome is decided within the system and not a model primitive).

Borrowers arrive at the platform following an unconditional Poisson rate $\Lambda$. Borrowers come into the platform demanding a loan of size $S$, which they value at $V(S)$. Their have a linear utility function $U_L = V(S) – (1+r)S$, the valuation they receive from the loan net of the payment that they have to make in the next period. The stock of unmatched borrowers at each time period is denoted $L_t$. Each borrower has a repayment probability $p$. When they have an offer for a loan, they can choose to either accept or reject that offer. If they reject the offer, they leave the market and exit the platform. The borrower always think that they will repay the loan.

On the banking side, there exists a set of banks $i \in \mathcal{J}$, with a maximum capital capacity $K$ and a cost of origination $c$. Each loan of size $S$ has a maturity date of $T=1$ (a loan that is successfully originated reduces that bank’s available capital by $S$ for $1$ period). Their goal is to maximize profit by setting a minimum acceptable interest rate on the platform, and will leave the platform if they cannot generate profit.

In this environment, there exists a platform that has a matching technology $M(B,L)$ to match banks and borrowers. This platform can observe all parameters of each agent and determine the interest rate $r$ charged to the borrower and origination fee $f$ charged to the bank that maximizes the revenue of the platform. The platform also has the ability to onboard any number of banks they desire by setting $B$. When a match occurs, the platform selects one bank at random from the stock of willing banks and provides an offer: $ \{ S, r, f \} $ that must be incentive-compatible for both the bank and the borrower.

For this application we’ll use a standard matching technology called the Cobb-Douglas (which is also used in the literature as a production function) that gives the aggregate matching rate for this market. This matching function takes an input the number of banks and borrowers and maps them into the number of matches per period:

$$ M(B,L) = \alpha B^\beta L^{1-\beta}$$

In each time period, the expected matching rate per bank is defined as the aggregate number of matches over the stock of banks: $\phi \equiv \frac{M(B,L)}{B} = \alpha B^{\beta-1} L^{1-\beta}$. If banks and borrowers are matched at random, the number of matches per bank per unit time is identical and denoted as $\phi$.

This concludes our work in setting up the environment that this model lives in. The environment should contain enough information to find the equilibrium (outcomes) of all parameters of interests of the model.

Finding the Equilibrium

This section’s goals is to find solutions to all model outcomes we are interested in. To solve for the equilibrium, we must solve for all of the endogenous (free) variables that have not been pre-defined by the environment. For this example, this means that we need to solve for the interest rate $r$, the origination fee $f$, and the number of banks $B$. There is no set order in how we should solve these statistics, but it is also important to understand the participation decision of the agents, then solve the matching rate, then finally the bargaining problem.

Under this full information framework, the optimal decision is to accept for all borrowers and banks. For each loan origination, the expected profit of the bank is given by:

$$\pi = p(1+r)S – (1+c)S – f$$

The first term is represents the probability of repayment multiplied by the profit if the borrower repays the loan. The second term is the cost of origination (since a bank must borrow the funds from its own balance sheet/depositors and pay them a cost $c$). The third term is what the bank gives the platform for originating the loan. In reality, the expected profit calculation considers long maturity loans ($T>1$), cost of collection conditional on default, and other factors.

After we solve the expected per-loan profit, we must figure out how many loans get originated per point in time. To have a steady state amount of unmatched borrowers, the arrival rate of borrowers must equal the number of matches in the long run (since all borrowers accept the loan condition on a match). This means that the flow rate of borrowers into the system $\Lambda$ must equal to the flow rate of borrowers leaving the system $M(B,L)$:

$$ \Lambda = M(B,L) = \alpha B^\beta L^{1-\beta}$$

By solving for $L$, we get that $L = \Big[ \frac{\Lambda}{\alpha B^\beta} \Big]^\frac{1}{1-\beta}$. If necessary, we can also find the expected arrival rate of a loan for a borrower by dividing the matching fucntion by the mass of borrowers. Since we define the match rate $M = \Lambda$ by construction, the rate of arrival of loans for a bank is given by $\phi = \frac{\Lambda}{B}$.

Since each loan that a bank funds takes up some part of its reserve capacity $K$, we can also solve for the maximum number of loans $l$ the bank can fund at once. The budget constraint for the bank is given by $S \cdot \phi \leq K$. Since we have already solved for the flow rate of loans, a bank’s number of loans per period is therefore given by $l^* = \min\{ \frac{\Lambda}{B}, \frac{K}{S}\}$. If the binding constraint $\frac{K}{S}$ holds, this means that the platform should increase the number of banks that it partners with since lending supply is constrained. Given that there is no free entry condition on the lender side, the platform can directly control the number of banks $B$ so that we can stay in the unconstrained equilibria, such that $l^* = \frac{\Lambda}{B}$.

Now that we know number of loans, we can determine the bank’s profit per unit time:

$$ \Pi_B = \frac{\pi \Lambda}{B} = \frac{\Lambda(p(1+r)S – (1+c)S – f)}{B}$$.

As we can see, increasing the number of banks partnered with the platform decreases the expected profit per bank by decreasing the number of loans that each bank can originate. Since the platform can set both the fees $f$ and the number of banks $B$, it is up to the platform to decide whether they want a small number of banks and high per-bank profit (at the risk of inducing capacity constraints) or whether they want to maximize the borrower’s surplus by increasing the number of banks or decreasing the fee rate $r$. This also allows us to set a binding constraint on the maximum fees that the platform can charge, since banks would not be willing to take on a loan if the profit is negative. This means that the upper bound on the fees is given by $ \bar{f} = p(1+r)S – (1+c)S$.

If the platform increases the allocation of trade surplus towards the bank by increasing $r$, they can charge a higher fee and generate more revenue. However, this might also decrease the growth rate of borrowers moving onto the platform in reality. In this example, we set the arrival rate of the borrower as exogenous so it would not be affected by the fee and rate, but we can envision an environment where $\Lambda = f(f, r, B)$, which would change this problem to one with a conditional entry rate. Since we allow banks to post a reservation rate $\underline{r}$ that sets their minimum required rate for any loan origination, we can model the lower bound of interest rate $\underline{r}$ as:

$$ \underline{r} = \frac{f + (1+c)S}{p S} – 1$$

If the platform decreases the fees charged, the banks can set a lower reserve rate, which increases borrower surplus. This is also possible if the probability of repayment increases, or if the cost of origination (risk-free rate) decreases.

The Negotiation

Now that we have fully described the aggregate matching and profit statistics, we need to pin down the behavior of each party during the negotiation along with the profit-maximizing parameters for the platform.

When the borrower and bank gets matched, the platform makes a take-it-or-leave-it offer and the borrower can choose to accept or reject. If the borrower rejects, they exit the market (no outside option). Therefore, the platform has to choose a set of parameters $\{ r,f\}$ to satisfy the participation constraint of both the borrower and the banks subject to $\{ \underline{r},\bar{f}\}$. From the lienar utility specification, the borrower only accepts the loan if they have a positive utility from it (since they can just reject and get $U_L = 0$). This allows us to define a maximum rate on the interest rate parameter:

$$\bar{r} = \frac{V(S)}{S} -1 $$

Now that we know the bounds for the free parameters $r$ and $f$, we can construct the maximization problem of the platform. The platform chooses a rate and fee parameter that satisfies the incentives of each participation agent but maximizes their own net proceeds. Under this assumption, the platform maximizes:

$$ \Pi_p = \max_{r, f, B} f M(B,L) \\ s.t. \;\;\; \Pi_B \geq 0 \\ \;\;\;\;\;\;\;\; U_L \geq 0 $$

The bank chooses a set of interest rate $r$, fees $f$, and number of partner banks $B$ to maximize their fee rate and number of matches. This problem has an analytical solution and can be solved in closed form to find the optimal parameters, or it can be solved numerically by grid-search or constrained optimization to find the set of parameters that maximizes $\Pi_p$. I leave the problem of solving the closed-form solution for the readers.

To close out this section, we define our equilibrium objects as the steady-state solution to our $.

What This Means for Business

This model reveals several key insights for platform strategy:

1. The choice of B: Increasing the number of partner lenders increases the surplus for the borrower. One way is through a faster matching speed, which decreases the steady-state number of unmatched borrowers. Since we modeled the borrower as leaving the market after the loan is rejected, this doesn’t put any downward pressure on the loan rate. However, if we assumed that borrowers can re-enter the market after they reject a loan, then now they have a higher outside option. This gives banks less bargaining power and lowers the maximum rate that borrowers are willing to be charged $\bar{r}$. However, increasing the number of partner banks also decreases each banks’ profit per time (since per-bank profit falls with the number of banks). This lowers the maximum amount the platform can charge for each transaction $\bar{f}$, decreasing platform profit.

1. The choice of r: Choosing the correct $r$ involves determining whether the platform wants the banks or the borrowers to profit. In this simple model, the platform would choose $r = \bar{r}$ since it only needs to satisfy the borrower’s participation constraint and do not have to worry about entry conditions. Any increase to $r$ would allow the platform to extract more surplus from the trade through increasing fees. In a more complex model where the entry rate of borrower is positively correlated with their surplus, the optimal decision would be to shift some of the surplus allocation to the borrowers to increase the per-period matching speed, which could increase total revenue for the platform. Finally, in a model with limited information (where the platform does not know the true payoff of the borrower), the optimal interest rate relies on an expectation of the valuation $\mathbb{E}[V(S)]$ over the estimated distribution of borrowers. If there are differences across borrowers represented by $\theta$, the expectation would change to be a conditional expectation over the expected borrower profile $\mathbb{E}[V(S) | \theta ]$. If the borrower profile is unknown (common in cold start cases), we can replace $\theta$ with an ML-estimated version $\hat{\theta}$.

1. The choice of f: In this model, $f$ decides the allocation of trade surplus between the bank and the platform. A higher fee increases the revenue for the platform and proportionally decrease the revenue for the banks. In reality, banks can choose to participate between different competing platforms, and their participation depends on the revenue they expect to receive. This implies that it is likely optimal for the platform to allocate some of the trade surplus towards banks to increase the chances of signing new partners in later periods.

Final Remarks and Extensions

What We Haven’t Considered Yet

This basic model scratches the surface of platform dynamics. Real platforms deal with complexities we’ve intentionally ignored to keep the math tractable. For instance, we assumed borrowers exit after rejection (to make the outside option 0), but in reality they can either stay in the market, or visit a competitor platform. We also assumed that both banks and borrowers are identical, but banks can be diverse in their risk appetite, capital funding, and maturity preferences. Borrower scan also differ in their set of observed and latent features, impacting their probability of repayment, loan valuation, and loan size. This heterogeneity changes the matching problem from random assignment to sorted matching, where the platform needs to decide which types should match with whom, which ties back to the value proposition of the platform itself.

We’ve also ignored information asymmetry. Banks don’t perfectly observe default risk, borrowers don’t know their true creditworthiness, and platforms have limited insight into outside options of both parties. This creates opportunities for signaling (borrowers trying to appear creditworthy), screening (banks designing different reservation interest rates for separate loan types), and mechanism design choices for the platform. Should a lending platform show borrowers all available rates or just the best match? Should they reveal a borrower’s credit score to banks or just their proprietary risk assessment? Can revealing too much information have a negative impact on match quality?

Extensions That Would Deepen Understanding

To make this framework operational, several natural extensions come to mind:

Dynamic Entry and Exit: Model how market conditions affect participation. When interest rates rise, some borrowers drop out while others become desperate. Banks adjust their risk appetite and capital ratio based on regulatory changes and balance sheet constraints. Machine learning plays a large role here since the platform needs to forecast these flows and adjust fees/rates accordingly.
Competition Between Platforms: What happens when borrowers can simultaneously search on Upstart, LendingClub, and Prosper? Multi-platform dynamics changes bargaining power and forces platforms to think deeply about how their decisions can impact the arrival flow rate and growth prospects. This could explain why some platforms focus on speed (instant approval) while others emphasize better rates. Understanding what niche each platform captures and which niche has unmet demand is critical to capturing a larger piece of the pie.
Reputation and Learning: Both sides build reputations over time, but only if they remain on the platform to build history. Banks that consistently offer competitive rates could attract more borrowers and receive a higher matching ratio. Borrowers who repay builds a profile on the platform, improving the accuracy of their profile. As time goes on and more data is captured, the platform’s sorted matching efficiency is improved due to higher availability of signals. Modeling these dynamics would help understand customer lifetime value and decide whether the platforms should focus mainly on acquisition or retention.
Mechanism Design: Instead of take-it-or-leave-it offers and randomizing borrowers to the matched banks, platforms could run auctions where banks bid on borrowers. Alternatively, the platform could require posted prices where banks commit to rate schedules. Each mechanism has different implications for efficiency, revenue, and market thickness. The correct choice depends on both regulatory constraints and the distribution of borrowers and banks.

From building models to modeling problems

This framework provides a strategic advantage because it forces you to think about both first and second-order effects. Most data scientists optimize metrics in isolation, such as reducing default rates, increasing conversion, and lower churn. But in these types of markets, every model optimization affects all equilibrium objects. Lower default rates might mean a lower reservation rate for the bank, allowing the platform to capture more of the trade surplus through fees. If there is borrower heterogentiy, higher matching probabilities might attract worse borrowers, leading to a reduction in average match quality.

The framework also helps identify which metrics actually matter. A lending platform could possibly accept negative margins on certain loans (loss leaders) if it keeps a high-value bank participating or have positive spillovers to different segments. Platforms might restrict borrower entry (or lower matches) even partner banks are already at high capital utilization. This type of thinking should help industry data scientist move away from measurement for measurements’ sake and take a step back to look at the bigger picture for whichever company they work for.

The platforms that win aren’t necessarily those that can predict repayment probability with 98% accuracy over ones with 93% accuracy, but the ones that understand the market dynamics their algorithms operate within. This framework aims to move your mindset away from building better models to modeling the right problems. If you have the opportunity to apply this concept in your own work, I’d love to hear about it. Please do not hesitate to reach out with questions, insights, or stories through my email or LinkedIn. If you have any feedback on this article, please also feel free to reach out. Thank you for reading!

Source link

The post Prediction vs. Search Models: What Data Scientists Are Missing first appeared on TechToday.

This post originally appeared on TechToday.