By Peter Murray, Content Markerting Manager, Carto
Advances made possible thanks to location data continues to transform business practices and processes when it comes to site planning. Now, with insights from new data streams, it is possible to determine which sites are most likely to increase sales for seasonal, temporary, and mobile businesses.
Food trucks, a lunch-time staple for some, operate on a location-dependent business model. Generally speaking, food trucks offer similar meal options for roughly the same price, which makes it difficult for them to differentiate themselves from nearby competitors. As a result, food truck location can determine whether a business succeeds or fails.
Recently, Carto helped a local food truck business determine prime spots for their trucks with revenue prediction models. The company provided one month’s worth of anonymised transaction data for each of its 10 food carts. With this information, the team of data scientists from Carto were able to determine current performance, build increasingly confident revenue models, and, finally, predict the six best performing food truck locations.
Before predicting what locations should be selected to drive future sales, the team had to figure out a way to measure the current performance of each site in Manhattan and Brooklyn.
To get started, Wenfei and Dongjie, two of Carto’s data scientists, first aggregated the data by truck by hour to find a measure for the average spend per hour.
The graphs show that hourly revenue for each food truck usually peaks around lunch-time, although sometimes there are spikes in sales around breakfast-time as well. Next, Wenfei and Mamata, Carto’s head of cartography (science of drawing maps), mapped food truck sales using proportional circles reflecting revenue amounts for each location across Manhattan and Brooklyn.
As expected, food trucks located in high volume traffic areas–Grand Central Station, SoHo, Times Square, etc.–are the most lucrative locations for this company.
Now the team then wanted to figure where the best locations are for increasing sales, which means they’ll need to identify some variables near and around the current locations that can serve as predictors in the revenue model. Traditionally, these predictors are identified using data from the census and points of interest (POI) data.
The demographic insights available from census data are helpful for segmenting target customers, but this use case illustrates one of the significant limitations of working with census data.
The census provides residential data for area of operations, and in the image above this information is presented at the census tract level. However, many food truck customers are workers who commute into the city or tourists visiting New York landmarks, which is likely why the Grand Central Station and Times Square are among the most profitable locations.
As such, residential data offers few insights relevant to increasing sales among this target customer base.
POI data will be more useful here for finding patterns of nearby attractions around high-performing food trucks that can serve as a predictor for our models.
The first map shows every POI in Manhattan and Brooklyn, but there’s so much noise that it’s hard to determine which attractions appear and reappear near and around each of our food trucks. Since many customers select food trucks based on proximity, 200 meter radius buffers were created around each cart, which is about a 2 ½ to 3 minute walking time, so predictor features could more easily be identified in the second map.
Now we’re ready to start building a gradient boosted regression (GBR) model that will allow us to determine which features from this data are most important when considering where to place our food trucks. In short, the GBR model will help us rank feature importance that will provide us a list of predictors to look for when considering a potential food truck location.
The first revenue model was created using only traditional data sources, specifically census and POI data:
The GBR model returned an R-squared score, a measure of the variability within the data set from 0-1 that can gauge confidence in the model. An R-squared score of .38 means that there is a range of variability in the data. This means that the current data does not provide a high enough confidence on what features are most important to consider when selecting a food truck location. More data is needed to increase the score.
To improve the model, MasterCard spend data was added and the same equation was performed to see whether the R-squared score would increase.
MasterCard spend scores provide aggregated and anonymised merchant-level transaction insights on where, when, and how people spend money. More specifically, the transaction percentile score provides a frequency measure that is important. Because most food carts offer similar types of food for around the same price, the frequency measure provides insights on customer volume for each cart.
Here we see a sizeable score increase and greater alignment among points in the scatter plot. However, the R-squared score could be stronger so a layer of foot traffic data was added to the model.
Here the R-squared score has increased by 18 points since model one, which makes a lot of sense and confirms our earlier assumption with POI buffers that food trucks rely on foot traffic from nearby customers.
It is significant to note that when additional derivative data layers were added to our model there was an improvement in our R-squared score.
Without these new data streams, we would not be in a position to identify with much confidence where the best locations are for each food truck.
The image above presents the 12 features that our model identified as having a statistically significant impact on food truck sales, and the top four features were selected to serve as predictors for identifying new locations: 1. Foot Traffic from previous hour, 2. Foot Traffic from current hour, 3. Day of the week, and 4. Mastercard frequency score.
Now it is time to start mapping the selected predictors across New York City using 100×100 meter grid tiles (roughly the size of a city block). Next, using a histogram, we looked at the sales distribution across the city and calculated the weekly sales average per truck to be approximately $2,786 (approx. RM11,360).
Since the goal is to find new locations that are likely to increase sales revenue, we selected the higher end of the revenue distribution and then clustered them into revenue areas. Because the model’s R-squared score was .63 there’s not quite enough confidence to pinpoint the exact location for each truck. Instead, these revenue areas were clustered to locate regions within a neighbourhood with a higher likelihood of being profitable.
The image above shows the changes to the map that each of these operations yielded. In the end, six locations were identified with revenue predictions for each. Below, the six locations are ranked highest to lowest by weekly sales average for each locations.
While there are the usual suspects on this list (Penn Station, Grand Central, etc.), it is surprising that Corona Park turns out to be the best location for increasing food truck sales revenue. When nearby tourist attractions and the area’s population density are taken into consideration, the results make sense.
New data streams are ushering in a new era of site planning, therefore making previously impossible solutions possible. Indeed, as this food truck example highlights, the future of site planning depends on accessing and working with various types of data, from traditional sources to new derivative data sets, to identify, understand, and quantify the impact that mobility patterns will have on your sales revenue.
Article first appeared on the Carto Blog.
Lava is an authorised Partner of Carto in Malaysia, a location intelligence software that analyses and visualizes complex and large amounts of data. With more than a decade of experience in the industry, we’re proud to say we’re one of the leading cloud solutions providers and consultants in the Asia Pacific region.