Price Optimization Model
By Mayssam Naji; Kukeshajanth Kodeswaran; Amit Borundiya
Provide an easy to use tool for searching optimal price points for desired products along with an increase in the overall margin for the forthcoming campaigns or season’s sales events. This approach will also serve as the back-bone for further optimization of the marketing spend and promotion calendar. It relies on two major components:
- Sensitivity Model: a model that is able to predict the demand of a product for the given conditions.
- Optimizer: A tool that is able to utilize the sensitivity model to efficiently search for an optimal price for each product.
Data used are product specification, purchase history, past promotions , special days (specific to business) and past marketing spends. Along with, few engineered features that account for seasonality.
- Promo Bins: To better understand the variation of promos. The bins are designed to group the promotions according to their realistic impact on the business.
- Calendar Data: Data related to US-calendar and consumer behavior
The sensitivity model aims to predict the demand for each product given a set of features (conditions). As it can be seen from the diagram before, the problem appears to be a time-series problem at the surface. Hence, we initially formulated the approach as a time-series problem, however, after experimenting with different approaches, gradient boosting on decision trees (Catboost) proved to be the most efficient and effective approach.
The advantages of time-series models is their ability to detect trends and seasonality giving it a good understanding of the continuity in the data. However, our predictions have to be tailored to each product, and each product has a distinguishing set of attributes along with a distinct sell-through curve. Therefore, a time-series model would have to be trained on each product’s historical data in parallel and would only be able to make predictions for each product separately. This is a major challenge for e-commerce use cases, as the majority of the products are available for a specific period of time and then are removed from the market. Hence, this approach will have a sparse training set of a size ranging between 200 to 500 data points (days). These factors made the time-series approach an undesirable formulation for a sensitivity model.
Unlike time-series models, Catboost treated each data point independently which preserved our dataset and gave flexibility in terms of dataset formulation. However, the model did not have a sense of continuity when it came to prediction on the product level. Hence, feature engineering played a major role in overcoming this challenge. We designed a diverse set of features that can give the model a good understanding of the performance of each product and the overall website behaviour. We separated our features according to the following structure:
There are there are 4 major families of features:
- Macro-inherent features: a set of features that depend on the day of the prediction and they describe the status of the website from a pricing perspective. Moreover, they give a glimpse of the consumer purchasing status.
- Macro-historical features: describe the performance of the e-commerce business in the previous year.
- Micro-inherent features: a set of features that describe each product, its intended consumer, its pricing and discount, and quality.
- Micro-historical features: these are the features that enable the model to understand the sell-through curve for each product. They are features built on the historical sell-through both long term (yearly) and short term (latest few months). Hence, giving the model a long-short memory of the performance.
Therefore, according to this structure we give the model a good understanding of the products and their sell-through as well as the overall expected performance of the business.
Finally, in order to validate the results of the sensitivity model we measured its performance on predicting the demand for each product in a given time frame, and we aggregate all the predicted units-sold to monitor its performance on an e-commerce level in general.
Overall Demand Predictions
On a given test period, the model is able to make predictions around an error margin of 15%. Furthermore, in order to measure the generalizability of the model, we slice various time segments from our dataset and train-test the model.
Product Level Analysis
We investigate the model accuracy when making predictions for each product for the time segment. Below is an example of the predicted demand for a single product
Moreover, when we investigate the distribution of the percent error for each product over the selected time segment, we can see that the model is able to make stable predictions for most of the product. After examining products with high error margins, we noticed that these products are outliers.
Aside from its applicability in price optimization, the sensitivity model was used to cluster the products into different price elasticity. The sensitivity model was given a set of products along with the rest of the features to simulate demand for each product. Then, the pricing of these products gradually decreased to generate different demand simulations.
Hence, another use case of the sensitivity model is price elasticity clustering. The product pricing does not have a large set of unique pricing points in the historical data. This is due to constraints on business in terms of pricing. Therefore, the sensitivity model offers a solution to this problem by generating reliable simulations of demand on different price points. We monitor the simulated change in demand with respect to price change and were able to identify different elasticity clusters as shown below:
As mentioned in the problem statement there are two major objectives ( though additional can be added) demand and margin, hence the formalulation was made as a multiobjective optimizations problem. Multi-objective optimization helps the business to make optimal decisions in the presence of trade-offs between two or more conflicting objectives.
In the present scenario we needed to maximize the margin along with maximizing the sell-through. Various optimization strategies are used namely Grid Search, Random Search and Bayesian Search. Due to the large search space, Bayesian optimizer is the right choice to get results in finite time and with less computation expenses. The basic idea is not to be completely random in choice in search space but instead use the information from the prior runs to choose better points in the search space.
Present state-of-the-art optimization algorithm is the Tree-Structured Parzen Estimator (TPE). TPE is an iterative process that uses a prioris of evaluated search space to create a probabilistic model, which is used to suggest the next set of points in the search space. Few advantages of TPE are the following:
- TPE supports a wide variety of variables in parameter search space e.g., uniform, log-uniform, quantized log-uniform, normally-distributed real value, categorical.
- Extremely computationally efficient than conventional methods.
We choose Optuna as our choice of framework for optimization mainly because it provides Multi Objective TPE and prune unpromising trials for faster results. Both of these features helped in reaching our own objectives of providing faster and computationally cheaper solutions to the large search space problems. During the development of the algorithms we had several challenges in terms of time complexities and search space complexities.
Search Space Time Complexity Reduction
There are around 2000 products and for markdown we need to optimize for around 200 products. On average, for each product our search space is the price between upper bound and lower bound price, usually it is 20 to 30 dollars. Our initial approach was to take into account the cannibalization of other products due to the change in price of other similar products.
For this we need to create a search space that tries to search for the best combination of prices of products. If we were to do a brute force search for 200 products, even by a 1 dollar decrement, we would have needed to search 200^20 points in space (assuming 20 dollars range).
Using MOTPE, we were able to search the space more efficiently, still, in order to get a better result, we needed 8 to 9 hours of optimization in ml.C5.9xlarge instance. To reduce the time consumed for search, we created an initial warm up run for a lower quantity of products ( 5 products per search), we tried to run this search for fixed iterations (500) . For 200 products, we would run this 40 times and the time it consumed to run this process is 37 minutes. (55 seconds per run)
By doing this, we were able to find the better initialization price points for each product. We take two prices from this run, one that yields a high margin and the other that yields high demand. After this, we initialize the optimizer with these two prices and the upper bound price. When we ran the optimizer using this setup, we were able to get the same performance/ better than the initial approach in 2 hours (overall). In the graph on the left side, search space is very scattered, while with the new approach the search space is very directed.
Therefore by breaking down the problem into a sensitivity model and an optimizor, we were able to develop a tool that sets an optimal price point for each product. The tool is easy to use for the business team and utilizes state of the art algorithms to make its predictions. Moreover, the presented formulation of the problem makes use of the historical data and takes into consideration external factors that have an impact on the business. Finally, the current approach can be utilized for use cases other than only price-optimization, such as marketing expenditure optimization, price elasticity analysis.
Author: Mayssam Naji