We look at the problem of allocating the budget of an advertiser across multiple surfaces optimally when both the demand and the value are unknown. Consider an advertiser who uses the Facebook platform to advertise a product. They have a daily budget that they would like to spend on our platform. Advertisers want to reach users where they spend time, so they spread their budget over multiple platforms, like Facebook, Instagram, and others. They want an algorithm to help bid on their behalf on the different platforms and are increasingly relying on automation products to help them achieve it.

In this research, we model the problem of placement optimization as a stochastic bandit problem. In this problem, the algorithm is participating in *k* different auctions, one for each platform, and needs to decide the correct bid for each of the auctions. The algorithm is given a total budget *B* (e.g., the daily budget) and a time horizon *T* over which this budget should be spent. At each time-step, the algorithm should decide the bid it will associate with each of the *k* platform, which will be input into the auctions for the next set of requests on each of the platforms. At the end of a round (i.e., a sequence of requests), the algorithm sees the total reward it obtained (e.g., number of clicks) and the total budget that was consumed in the process, on each of the different platforms. Based on just this history, the algorithm should decide the next set of bid multipliers it needs to place. The goal of the algorithm is to maximize the total advertiser value with the given budget across the *k* platforms.

This problem can be tackled using a model of bandits called bandits with budgets. In this paper, we propose a modified algorithm that works optimally in the regime when the number of platforms *k* is large and the total possible value is small relative to the total number of plays. Online advertising data has this particular behavior where the budget spent and the total value the advertiser receives are much smaller compared with the total number of auctions they participate in, because of the scale in the competition pool. Thus, our algorithm is a significant improvement over prior works, which tend to focus on the regime where the total optimal value possible is comparable to the number of time-steps.

The key idea of the algorithm is to modify a primal-dual based approach in prior work [1] that can handle multiple platforms. In particular, we derive a new optimization program at each time-step whose optimal solution gives us the bid multiplier that needs to be placed at each time-step. Prior work [2] that solves an optimization program usually relies on also performing a rounding step. However, this rounding step works well only when the optimal value possible is at least √*T*, and hence the assumption of the optimal value being comparable to the number of time-steps is unavoidable. However, in this work, we rely on the property of this linear program [3] and show that for the special case of multiplatform bid optimization, the optimal solution is already integral, and thus, we do not need a rounding step. This is the key idea that leads to the optimal regret guarantee.

We use logged data to show that this algorithm works well in practice with desirable properties such as uniform budget consumption and small total regret. We compare it with the prior works and also other commonly used heuristics in the industry. We show that the proposed algorithm is indeed superior to all these algorithms.

On the business side, this research has potential benefits for advertisers, users, and platforms. Automated products that perform many of the targeting, placement, and creative optimization on advertisers’ behalf and their adoption is rapidly rising in the larger induis increasing in number. The key challenges with these automated products are scalability and budget management. The number of possible combinations explodes exponentially while the total budget provided by the advertiser roughly remains the same. This research provides scalable and simple algorithms that can help us in creating such automated solutions by automatically manipulating the bids, in real time, in the auction mechanism. Bid is one primary lever that advertisers use to change the delivery of the ads to their desired behavior. However, they usually do so in a black-box fashion because they do not have the required data to perform optimal bidding choices. However, the advantage of using the proposed algorithm is that the bidding is *near-optimal* thus, getting the most value for their spend. This has benefits for both the individual advertiser and the overall ecosystem.

On the research side, “Bandits with Budgets” has primarily been studied in the theoretical computer science/operations research community purely as a mathematical problem. This research bridges the gap between theory and practice of these algorithms — by applying it to a large-scale important problem. En route to this application, we also create a new, simpler algorithm that is optimal in the parameter ranges desired in the application.

Going forward, we hope that our paper opens the door for newer applications, both within online advertising and outside of it, for this extremely general and versatile model. We believe that this line of work has huge research potential for creating new algorithms as well as for affecting core business problems.

__Stochastic bandits for multiplatform budget optimization in online advertising__

[1] – Badanidiyuru, Ashwinkumar, Robert Kleinberg, and Aleksandrs Slivkins. “Bandits with knapsacks.” *2013 IEEE 54th Annual Symposium on Foundations of Computer Science*. IEEE, 2013.

[2] – Sankararaman, Karthik Abinav, and Aleksandrs Slivkins. “Combinatorial semi-bandits with knapsacks.” *International Conference on Artificial Intelligence and Statistics*. PMLR, 2018.

[3] – Avadhanula, Vashist, et al. “On the tightness of an LP relaxation for rational optimization and its applications.” *Operations Research Letters* 44.5 (2016): 612-617.