Where’s my Budget Pizza -A case study
A comparison of metropolitan and non-metropolitan prices of fast food
Have you ever wondered if you should eat in the Downtown of the city you’re travelling or on the way back to your suburban home. Did you find any price differences in both areas.
That brings us to the question. Do you have more food related expenses in a rural or urban lifestyle? A lot of research indicates that grocery store prices are relatively cheaper in urban areas (inside the city) compared to rural areas. Does the same trend follow for fast food joints as well? Or will they break and find a new pattern. Apart from just having the rural and urban classification, I followed the Census data, which follows 9 levels of classification of a county as follows:
Metropolitan Counties
Nonmetropolitan Counties
A easier/better classification for this article is to have:
Urban: 1, 2 and 3 levels
Suburban: 4 and 5 levels
Rural: 6, 7 and 8 levels ( level 9 rarely has any fast food available, so excluded)
For the test purposes, I’ll be scraping the various pizza and bread prices from Little Caesars (referred as LC hereafter) in various counties in California.
There are some underlying assumptions in this study:
1. Little Caesars prices are representative of all fast-food prices
2. All counties have same number of accessible LC stores near them and some stores, which may be repeated, do not distort the mean
3. There are no external contributions for the price differences except the business model requiring it to change the price. For example, if some counties have unique taxes, they won’t be accounted for in this study. (may or may not be true, as it’s only added to the final price)
Getting the Data:
So, getting the data has been a little tricky for this case study. We wanted to compare the prices in each county and for that, we need the classification of counties into the RUCC levels described above. The available Census data gives us data as described below.
We have 58 counties classified into several RUCC levels. The data has the classification but how do we link it to the counties and get the zip codes associated. We only have FIPS codes here. The zip codes are needed to extract the nearest LC stores for each county and each county has multiple zip codes.
So, after looking for data from simple google search for conversion excel sheet, we can associate zip codes to FIPS codes [ Data available in github]. And it looks something like this:
So, once we combine these two tables and filtering for California counties, we get the following data:
The next step is to extract the pizza prices, and for that, we need store IDs for each store. Upon investigating the LC website, and finding the right URL we get the store IDs using the following code for each zip code. [ Code in Github link]
Note: A single store can be accessible to multiple zip codes, for those along the border of the counties or within 5 to 10 miles, the stores from other county will still show up. But, the cases are negligible to distort the price differences.
Using these store ids, we can now extract the prices suing the following code.
[ Code in Github link]
Now, by grouping using the RUCC codes and calculating the mean, we get the following data. Note that there are no stores in Type 9 RUCC, since the population is less than 2500, which is not really going to bring any kinds of profits.
It’s hard to interpret using the table above. So, here’s a simple plot:
If you assumed that rural prices will be cheaper, that’s partly right, but the key takeaway is that prices are costly in counties of classification type 5 and 6, which are:
5 = Urban population of 20,000 or more, not adjacent to a metro area
6 = Urban population of 2,500 to 19,999, adjacent to a metro area
So, Is this the inflection point for prices to change? Or the bigger question is,
Is the difference significant enough between these county types?
In order to answer that question, we may need to step into Statistics for a bit.
So, to reiterate, we have the Mean prices as follows:
Here, we can choose to do various kinds of test, but the two important ones are:
1. Compare prices between RUCC type 1 to type 5
2. Compare prices between RUCC type 1 to type 8
Following the above Statistical test design, We formulated the hypothesis in the beginning. Designed the test and gathered data by collecting the pizza prices. Now, choosing a statistical test, since the data we have is an item (pizza menu items) that we are comparing across two sample groups (pizza prices for different locations), we can use Paired t-Test and not unpaired because we are measuring the same item prices. Like all hypothesis tests, the Paired t-Test starts with two hypotheses, the null and the alternate. In the case of the paired t-Test, they are based on the difference in each pair.
H0 (Null Hypothesis): Mean difference (Type 1 — Type 5) = 0
H1 (Alternate Hypothesis): Mean difference (Type 1 — Type 5) != 0
Let’s compute the price differences:
After doing the paired t-Test, we get the following results:
P-value and statistical significance:
The two-tailed P value equals 0.0037
By conventional criteria, this difference is considered to be very statistically significant.
Confidence interval:
The mean of RUCC 1 minus RUCC 5 equals -1.7420
95% confidence interval of this difference: From -2.5369 to -0.9471
Intermediate values used in calculations:
t = 6.0843
df = 4
standard error of difference = 0.286
Now, there’s one more way to do the paired t-Test, combine the metro areas (RUCC 1, 2 and 3) and non-metro areas (RUCC 4 to 8). Here are the results for such a test:
P-value and statistical significance:
The two-tailed P value equals 0.0044
By conventional criteria, this difference is considered to be very statistically significant.
Confidence interval:
The mean of Metro minus Non-Metro equals -1.0060
95% confidence interval of this difference: From -1.4865 to -0.5255
Intermediate values used in calculations:
t = 5.8124
df = 4
standard error of difference = 0.173
Conclusion
The answer to the question, whether price difference across counties differ based on population size is Yes, they vary and it’s cheaper in densely populated and costlier in sub urban areas and that difference is significant enough.
It gets costlier in the sub-urban areas to live if it’s any indication of purchase parity for these growing areas. There might be n number of reasons for why this happens, cost of having lesser population on profits, sub urban areas having similar rents to urban areas and many others. The reason needs its own study someday.
Furthermore, in this case study, I did not include the price after taxes and taxes vary based on county. I presume the tax difference is about 10 cents and not big enough to change the results.
Relevant links:
Code is available here: https://github.com/ashabhi101/pizza-price-comparison
Data from: https://www.census.gov/programs-surveys/geography/guidance/geo-areas/urban-rural.html