Tidytuesday Philly Parking Tickets

 
 
 

Tidytuesday Philly Parking Tickets

By: Patrick Ayers

12/3/2019

 

Basic Over Data

This week I will be looking at the Philadelphia parking violations data set. This data has all of the parking violations for Philadelphia for the year 2017. The whole data set was over 1GB in size, but it was cut down to less than 100Mb due to Git Hub restrictions. If you would like to see the full data set or who uploaded it to Git hub see https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-12-03.

Let's Take a look at what the structure of the data is. We can see that there are a total of seven variables and a bit over 1.2 million observations.

As always, we can look at the number of missing values. Looking at the table below, we can see that almost all of the data is coming from zip_code variables. Given that we have both latitude and longitudes values, we can use them to find the zip code they are connected too.

variables missing_values
violation_desc 0
issue_datetime 0
fine 0
issuing_agency 1
lat 0
lon 0
zip_code 173588

Let's take a more in-depth look at the variable of violations_desc and see what violations include, or if we need to do many more data cleaning. There are 96 unique values for these variables.

unique(tickets$violation_desc)
##  [1] "BUS ONLY ZONE"        "STOPPING PROHIBITED"  "OVER TIME LIMIT"     
##  [4] "STOP PROHIBITED CC"   "DOUBLE PARKED"        "OVER TIME LIMIT CC"  
##  [7] "FIRE HYDRANT"         "METER EXPIRED"        "HP RESERVED SPACE"   
## [10] "METER EXPIRED CC"     "LOADING ZONE   CC"    "STREET CLEANING"     
## [13] "PASSENGR LOADNG ZONE" "EXPIRED INSPECTION"   "SIDEWALK"            
## [16] "CORNER CLEARANCE"     "PARKING PROHBITED CC" "PARKING PROHBITED"   
## [19] "HP RAMP BLOCKED"      "CROSSWALK"            "IMPROPER ON 2WAY HWY"
## [22] "TAXI STAND"           "BLOCKING DRIVEWAY"    "SIDEWALK   CC"       
## [25] "PRIVATE PROPERTY"     "SCHOOL ZONE"          "BUS ONLY ZONE   CC"  
## [28] "UNREG/ABANDONED VEH"  "EXPIRED TAG"          "DOUBLE PARKED  CC"   
## [31] "LOADING ZONE"         "CAR SHARE VEHS ONLY"  "VALET ZONE VIOLATION"
## [34] "IMPROPER ON 1WAY HWY" "SNOW EMERGENCY ROUTE" "STOP MEDIAL STRIP"   
## [37] "STOP/BLOCK HIGHWAY"   "SNOW RELOCATION TOW"  "CORNER CLEARANCE  CC"
## [40] "TAXI NOT AT STAND"    "BUS NOT IN BUS STAND" "STOP IN INTERSECTION"
## [43] "DISPLAY VEH FOR SALE" "STOP ON R/R TRACK 44" "WASH/REPAIR VEH"     
## [46] "PARKING PROHIBITED"   "BLOCKING DRIVEWAY CC" "BLOCKNG MASS TRANSIT"
## [49] "CROSSWALK    CC"      "COUNTERFEIT HP PERM"  "COMMRCL VEH RES AREA"
## [52] "ON LIMITD ACCESS HW"  "ILLEGAL PLACED TICKT" "PK NEAR FIRE STAT 51"
## [55] "UNMETERED SPACE"      "SUSPENDED LICENSE"    "VALET VIOLATION"     
## [58] "PKG W/LIC SUSPD 75"   "PARK MOBILE HOME 28"  "EXCESSIVE IDLING"    
## [61] "BUS-IMP PSNG DISCHRG" "DISOBEY SIGN-PRHB 21" "PARKED ON BRIDGE 14" 
## [64] "STOP/BLOCK HIWY  CC"  "PARK METER IMPROPER"  "EXCESSIVE NOISE"     
## [67] "UNATTENDED VEH    29" "PARENTAL LIABILITY"   "PARK PROHIBITED 04"  
## [70] "IMPROPER-2WAY HWY 74" "PARKED ON GRASS"      "STOPPED IN SAFE ZONE"
## [73] "OBSTRUCT TROLLEY 52"  "UNLAWFUL ALARM"       "FRAUD PARK HP SPACE" 
## [76] "PARK PROHIB PLACE"    "INVALID CONT PERMIT"  "INVALID KIOSK RECPT" 
## [79] "DAMAGE TO METER"      "PK NEAR TRAFF SIGNAL" "INVALID RPP PERMIT"  
## [82] "DISOBEY SN-STANDING"  "BUS-IMPROPER LOADING" "ILLEGAL PLACD TKT 73"
## [85] "HIWAY CONSTRCTION 43" "PARK-TROLLEY STOP 23" "STOP PROHIBITED  CC" 
## [88] "OWNER ID ON COM VEH"  "STOP/BLOCK HWY  33"   "IMPROPER MOTORCYCLE" 
## [91] "PK OVER TIME LIM 17"  "+4HR IN LOADING ZONE" "PRK MTR IMPROPER 11" 
## [94] "INTERSECTION   CC"    "STOPPED SAFE ZONE CC"

Adding a new Variable

One thing I am seeing is how some of the values have “CC” after them, such as SIDEWALK CC vs. SIDEWALK or LOADING ZONE CC vs. LOADING ZONE. If you go and look at the http://www.philapark.org/faq/ we can see that there are different ticket values for non Center City and Center City. We can look at some of the violations that have the letters CC and see if there is a difference in the fines.

tickets %>% filter(violation_desc %in% c("METER EXPIRED CC","METER EXPIRED","DOUBLE PARKED  CC","DOUBLE PARKED","SIDEWALK   CC","SIDEWALK","OVER TIME LIMIT CC","OVER TIME LIMIT"  )) %>%
  group_by(violation_desc) %>% summarise(mean_value = dollar(mean(fine))) %>% kable()
violation_desc mean_value
DOUBLE PARKED $51
DOUBLE PARKED CC $76
METER EXPIRED $26
METER EXPIRED CC $36
OVER TIME LIMIT $26
OVER TIME LIMIT CC $36.00
SIDEWALK $51
SIDEWALK CC $76

I am seeing that the difference between the fines is about $25.00, which is matching up with what is said on the Philadelphia website. With this information, we can now and add a new variable called City_center to the data. I will also be removing “CC” form violation_desc because that information is now in the variable.

Tickets <- tickets %>%
  mutate('City_Center' = if_else(str_detect(violation_desc,"CC"),"City_center","Not_city_center")) 
#glimpse(Tickets)
Tickets$violation_desc <- trimws(str_remove(Tickets$violation_desc,"CC"))
kable(head(Tickets[,c(1,2,3,4,8)]))
violation_desc issue_datetime fine issuing_agency City_Center
BUS ONLY ZONE 2017-12-06 12:29:00 51 PPA Not_city_center
STOPPING PROHIBITED 2017-10-16 18:03:00 51 PPA Not_city_center
OVER TIME LIMIT 2017-11-02 22:09:00 26 PPA Not_city_center
OVER TIME LIMIT 2017-11-05 20:19:00 26 PPA Not_city_center
STOP PROHIBITED 2017-10-17 06:58:00 76 PPA City_center
DOUBLE PARKED 2017-10-02 10:40:00 51 POLICE Not_city_center

Diving into the Data

Now that we have cleaned the data let's take a look into what we can learn from this data set. First, let's see just the general information about the data set and see what the top 10 violation is.

Tickets %>% group_by(City_Center) %>% summarise(total_fine = dollar(sum(fine)),
                                                avg_fine = dollar(mean(fine)),
                                                total_fins_given = n()) %>% kable()
City_Center total_fine avg_fine total_fins_given
City_center $24,759,362 $48.96 505697
Not_city_center $32,498,768 $43.03 755194
Tickets %>% group_by(violation_desc) %>% summarise("Violation_Count" = n()) %>%
  arrange(desc(Violation_Count))  %>% top_n(10) %>% kable()
violation_desc Violation_Count
METER EXPIRED 462389
OVER TIME LIMIT 181444
EXPIRED INSPECTION 138575
STOP PROHIBITED 115899
PARKING PROHBITED 92314
STOPPING PROHIBITED 47395
BUS ONLY ZONE 30161
SIDEWALK 24412
PASSENGR LOADNG ZONE 24359
FIRE HYDRANT 20945

Meter expired is the most common ticket that was given out in 2017, let's add to this and see how much money did the fines generate. We can see that Meter Expired makes almost 15 million dollars, and the next highest violation amount is Stop Prohibited at 8.8 million dollars.

Tickets %>% group_by(violation_desc) %>% summarise("Violation_Count" = n(),
                                                   "Total_fines" = dollar(sum(fine))) %>%
  arrange(desc(Violation_Count)) %>% top_n(10,Violation_Count) %>% kable()
violation_desc Violation_Count Total_fines
METER EXPIRED 462389 $14,832,714
OVER TIME LIMIT 181444 $4,963,384
EXPIRED INSPECTION 138575 $5,681,835
STOP PROHIBITED 115899 $8,808,249
PARKING PROHBITED 92314 $4,235,694
STOPPING PROHIBITED 47395 $2,417,135
BUS ONLY ZONE 30161 $1,795,536
SIDEWALK 24412 $1,387,412
PASSENGR LOADNG ZONE 24359 $755,129
FIRE HYDRANT 20945 $1,591,820

This data was at the total level; let's take advantage of the new variable of City center and see how different the parking violations differences between city centers and non-city centers.

Tickets %>% filter(City_Center == "City_center") %>%  group_by(violation_desc) %>% 
  summarise("Violation_Count" = n(),
             "Total_fines" = dollar(sum(fine))) %>%
  arrange(desc(Violation_Count)) %>% top_n(10,Violation_Count) %>% kable()
violation_desc Violation_Count Total_fines
METER EXPIRED 281060 $10,118,160
STOP PROHIBITED 115899 $8,808,249
PARKING PROHBITED 45082 $2,299,182
OVER TIME LIMIT 24585 $885,050
LOADING ZONE 12826 $654,126
BUS ONLY ZONE 10294 $782,319
DOUBLE PARKED 7841 $595,916
SIDEWALK 5696 $432,896
BLOCKING DRIVEWAY 958 $72,808
CORNER CLEARANCE 812 $61,712
Tickets %>% filter(City_Center == "Not_city_center") %>%  group_by(violation_desc) %>% 
  summarise("Violation_Count" = n(),
             "Total_fines" = dollar(sum(fine))) %>%
  arrange(desc(Violation_Count)) %>% top_n(10,Violation_Count) %>% kable()
violation_desc Violation_Count Total_fines
METER EXPIRED 181329 $4,714,554
OVER TIME LIMIT 156859 $4,078,334
EXPIRED INSPECTION 138575 $5,681,835
STOPPING PROHIBITED 47395 $2,417,135
PARKING PROHBITED 47232 $1,936,512
PASSENGR LOADNG ZONE 24359 $755,129
FIRE HYDRANT 20945 $1,591,820
BUS ONLY ZONE 19867 $1,013,217
SIDEWALK 18716 $954,516
HP RESERVED SPACE 14554 $4,380,754

We can see that the top 3 violations are almost entirely different. The top 3 parking violations for city centers are Meter expired, stop prohibited, and parking prohibited. It is comparing that to non-city centers that have Meter expired, overtime limit, and expired inspection. I do find it interesting that that double-parked is in the top 10 for city centers but not for non-city centers.

The biggest take away that I am seeing from comparing the two tables is that the City centers violations generate almost 60% of all Meter expired violations.Lastly, we can create a bar graph of the show the differences between the city center and non-city centers. This plot is a great way to see the different distribution between the two different location types.

p1 <- Tickets %>% 
  select(violation_desc,City_Center) %>% 
  group_by(City_Center) %>%
  count(violation_desc) %>% 
  arrange(desc(n)) %>%
  top_n(5) %>%
  filter(City_Center == 'City_center') %>%
  ggplot(aes(violation_desc,n)) + geom_bar(stat = 'identity') + ggtitle("Top 5 Violations for City Centers") + ylim(0,300000)  +
  scale_y_continuous(breaks = c(50000,100000,150000,200000,250000,300000,350000)) + ylab("Number of violations") + xlab("Violation")


p2 <- Tickets %>% 
  select(violation_desc,City_Center) %>% 
  group_by(City_Center) %>%
  count(violation_desc) %>% 
  arrange(desc(n)) %>%
  top_n(5) %>%
  filter(City_Center != 'City_center') %>%
  ggplot(aes(violation_desc,n)) + geom_bar(stat = 'identity') + ggtitle("Top 5 Violations for non-City Centers")  +
  scale_y_continuous(breaks = c(50000,100000,150000,200000,250000)) + ylab("Number of violations") + xlab("Violation")

ggarrange(p1,p2,nrow = 2)

plot of chunk unnamed-chunk-11

Final Thoughts

This was a great data set to dig into — a lot of great information and many different ways to analyze the data. I did not even go into the variable of issuing_agency or latitude and longitudes. There is still more analysis that can be done on this, such as a heat map to see the area of Philadelphia has the highest density of violations. There can even be an analysis of the date-times that are included with the data set. A lot of fun working on this data, and lastly, if you are in Philadelphia, be shear to read the parking signs to avoid a ticket.

Previous
Previous

Work Out Summary Project

Next
Next

TidyTuesday Ramen analysis