DATA ANALYSIS: UBER & LYFT

PURPOSE:

To determine if the cost from a major city to a nearby suburb (within an 100 mile radius) is greater than the anticipated cost for the same ride back (suburb to major city) and how this differs between Uber and Lyft.

GOAL:

 
Screen Shot 2020-02-03 at 5.31.46 PM.png
  • Access Uber and Lyft API’s

  • Extract distance, max cost, & min cost for 25 latitude and longitude pairs

  • Calculate the average max cost for the same ride at different times of day

  • Calculate the average difference between cost there and cost back for a pair

  • Visualize data for 5 pairs of latitudes and longitudes (one smaller city for each of our major cities)

  • Test our hypothesis

SCOPE:

Python, SQL, Matplotlib, Plotly


To view the GitHub repository, click
here.

To view the final report and to find information on running each file, click here.

DATA COLLECTION AND STORAGE

DATA COLLECTION

We used the Uber and Lyft APIs to gather data from 5 different major cities, and or each city, 5 nearby suburbs/smaller cities (25 pairs of longitudes and latitudes total)

DATA STORAGE

Using SQL, we stored our data in two tables in the database: RideShare (Lyft) and RideShareOtherCompany (Uber). (Note: the cost for Uber is displayed in dollars while the cost for Lyft is displayed in cents. This is because of the raw data grabbed from the APIs which were in different units).

Screen Shot 2020-02-13 at 2.08.08 PM.png
 

Using the pair_id filter, you can better see all of the data associated with any one location pair at different times of day:

Screen Shot 2020-04-01 at 5.01.26 PM.png
 

RESULTS

CALCULATIONS

Screen Shot 2020-04-01 at 5.03.48 PM.png
 

VISUALIZATIONS

LYFT

 
Screen Shot 2020-04-01 at 5.06.45 PM.png
 

UBER

 
Screen Shot 2020-04-01 at 5.06.56 PM.png
 

CONCLUSION

Given our small sample size, we cannot confidently conclude that the cost from a major city to a nearby suburb is greater than the anticipated cost for the same ride back (suburb to major city); however, our results do suggest that such price discrepancies do exist. Further research needs to be done to draw conclusions. There are slight differences between the costs for the two companies; however, there is no significant pattern. This too is something that we would like to explore further.