DATA ANALYSIS: UBER & LYFT
PURPOSE:
To determine if the cost from a major city to a nearby suburb (within an 100 mile radius) is greater than the anticipated cost for the same ride back (suburb to major city) and how this differs between Uber and Lyft.
GOAL:
Access Uber and Lyft API’s
Extract distance, max cost, & min cost for 25 latitude and longitude pairs
Calculate the average max cost for the same ride at different times of day
Calculate the average difference between cost there and cost back for a pair
Visualize data for 5 pairs of latitudes and longitudes (one smaller city for each of our major cities)
Test our hypothesis
SCOPE:
Python, SQL, Matplotlib, Plotly
To view the GitHub repository, click here.
To view the final report and to find information on running each file, click here.
DATA COLLECTION AND STORAGE
DATA COLLECTION
We used the Uber and Lyft APIs to gather data from 5 different major cities, and or each city, 5 nearby suburbs/smaller cities (25 pairs of longitudes and latitudes total)
DATA STORAGE
Using SQL, we stored our data in two tables in the database: RideShare (Lyft) and RideShareOtherCompany (Uber). (Note: the cost for Uber is displayed in dollars while the cost for Lyft is displayed in cents. This is because of the raw data grabbed from the APIs which were in different units).
Using the pair_id filter, you can better see all of the data associated with any one location pair at different times of day:
RESULTS
CALCULATIONS
VISUALIZATIONS
LYFT
UBER
CONCLUSION
Given our small sample size, we cannot confidently conclude that the cost from a major city to a nearby suburb is greater than the anticipated cost for the same ride back (suburb to major city); however, our results do suggest that such price discrepancies do exist. Further research needs to be done to draw conclusions. There are slight differences between the costs for the two companies; however, there is no significant pattern. This too is something that we would like to explore further.