I filtered for fatal cyclist crashes and arranged the rows by crash date. Here are the first 10 rows:
fatal_cyc_crashes = idot_crashes %>%
filter(first_crash_type == "PEDALCYCLIST" & most_severe_injury == "FATAL") %>%
mutate(crash_date = as.POSIXct(crash_date)) %>%
arrange(crash_date) %>%
select(Date = crash_date,`Traffic Control Device` = traffic_control_device,`Device Condition` = device_condition, `Trafficway Type` = trafficway_type)
pander(head(fatal_cyc_crashes,10))
Date | Traffic Control Device | Device Condition | Trafficway Type |
---|---|---|---|
2009-05-22 01:28:00 | Traffic Signal | Functioning Properly | Not Divided |
2009-06-20 12:36:00 | Traffic Signal | Functioning Improperly | One-Way or Ramp |
2009-06-25 10:50:00 | Stop Sign/Flasher | Functioning Properly | Not Divided |
2009-09-06 17:32:00 | No Controls | No Controls | Alley or Driveway |
2009-10-21 12:25:00 | Traffic Signal | Functioning Properly | Divided, No Median Barrier |
2010-03-30 16:39:00 | No controls | No controls | Two Way - Divided, no median barrier |
2010-04-18 20:25:00 | No controls | No controls | Two Way - Divided, no median barrier |
2010-08-29 21:09:00 | Lane use marking | Function properly | Two Way - Divided w/median barrier |
2010-09-02 23:31:00 | No controls | No controls | Two Way - Not divided |
2010-09-05 09:42:00 | Stop sign/flasher | Function properly | Two Way - Not divided |
As a quick check we can also filter for crashes occurring before May 5 of each year:
# Date formats are a little complicated, but here we can extract the month and date to check to check if a crash occurred any time before May (the 5th month) or in May _and_ on or before May 5th. Although May is the 5th month, this date format (POSIXlt) uses 0 index so in January `mon = 0`, thus in May `mon = 4`.
crashes_before_may_5 = idot_crashes %>%
filter(first_crash_type == "PEDALCYCLIST" & most_severe_injury == "FATAL" & (crash_date$mon < 4 | (crash_date$mon == 4 & crash_date$mday <= 5)) ) %>%
mutate(crash_year = crash_date$year + 1900) %>% # the year is represented as years since 1900
group_by(crash_year) %>%
summarize(count = n(), first_crash = min(as.POSIXct(crash_date)), last_crash = max(as.POSIXct(crash_date))) # POSIXct is easier to sort, it's represented in seconds since 1/1/1970 (UNIX timestamp)
pander(crashes_before_may_5)
crash_year | count | first_crash | last_crash |
---|---|---|---|
2010 | 2 | 2010-03-30 16:39:00 | 2010-04-18 20:25:00 |
2011 | 1 | 2011-01-17 18:00:00 | 2011-01-17 18:00:00 |
2015 | 2 | 2015-01-01 02:43:00 | 2015-04-05 13:30:00 |
2017 | 1 | 2017-01-11 22:38:00 | 2017-01-11 22:38:00 |
The data is for years 2009 through 2017, so we know that in 2009, 2012, 2013, and 2014, there were no fatal crashes involving cyclists before May 5th. In 2010 and 2015 there were 2 fatal cyclist crashes before May 5th.
The original question was on which date each year did the 3rd crash occur. Here are the 3rd crashes in each year in the dataset:
third_crashes = idot_crashes %>%
filter(first_crash_type == "PEDALCYCLIST" & most_severe_injury == "FATAL") %>%
mutate(crash_year = crash_date$year + 1900) %>%
group_by(crash_year) %>%
mutate(crash_number = rank(as.POSIXct(crash_date))) %>%
filter(crash_number == 3) %>%
mutate(crash_date = as.POSIXct(crash_date)) %>%
select(Year = crash_year, Date = crash_date,`Traffic Control Device` = traffic_control_device,`Device Condition` = device_condition, `Trafficway Type` = trafficway_type)
pander(third_crashes)
Year | Date | Traffic Control Device | Device Condition | Trafficway Type |
---|---|---|---|---|
2009 | 2009-06-25 10:50:00 | Stop Sign/Flasher | Functioning Properly | Not Divided |
2010 | 2010-08-29 21:09:00 | Lane use marking | Function properly | Two Way - Divided w/median barrier |
2011 | 2011-06-14 16:30:00 | No Controls | No Controls | One-Way or Ramp |
2012 | 2012-07-12 20:40:00 | Traffic Signal | Functioning Properly | Not Divided |
2013 | 2013-12-06 23:45:00 | No Controls | No Controls | Not Divided |
2014 | 2014-07-03 10:36:00 | Traffic Signal | Functioning Properly | Divided, No Median Barrier |
2015 | 2015-07-29 04:35:00 | Lane Use Marking | Functioning Properly | Not Divided |
2016 | 2016-08-16 08:15:00 | Other | Other | Divided, No Median Barrier |
2017 | 2017-07-26 12:44:00 | Traffic Signal | Functioning Properly | Not Divided |
Now we can see that the third fatal cyclist crash has not happened in May in 2009 through 2017. It typically occurs in late June or July, even August. This data shows 2013’s third fatal cyclist crash (of 3 total) happening in December. However, Streetsblog Chicago reported 4 bicyclist fatalities in 2013, with the fourth occurring in December.
We can also look at the more recent (but somewhat flawed) data in the city’s Data Portal from 2018 through the end of 2021.
Again, let’s look at crashes occurring before May 5th:
crashes_before_may_5 = all_crashes %>%
filter(first_crash_type == "PEDALCYCLIST" & most_severe_injury == "FATAL" & (crash_date$mon < 4 | (crash_date$mon == 4 & crash_date$mday <= 5)) ) %>%
mutate(crash_year = crash_date$year + 1900) %>% # the year is represented as years since 1900
group_by(crash_year) %>%
summarize(count = n(), first_crash = min(as.POSIXct(crash_date)), last_crash = max(as.POSIXct(crash_date))) # POSIXct is easier to sort, it's represented in seconds since 1/1/1970 (UNIX timestamp)
pander(crashes_before_may_5)
crash_year | count | first_crash | last_crash |
---|---|---|---|
2019 | 1 | 2019-03-29 20:02:00 | 2019-03-29 20:02:00 |
2020 | 1 | 2020-02-29 00:35:00 | 2020-02-29 00:35:00 |
According to this data, 2018 and 2021 did not have any fatalities this early in the year.
And the 3rd crash in each year:
third_crashes = all_crashes %>%
filter(first_crash_type == "PEDALCYCLIST" & most_severe_injury == "FATAL") %>%
mutate(crash_year = crash_date$year + 1900) %>%
group_by(crash_year) %>%
mutate(crash_number = rank(as.POSIXct(crash_date))) %>%
filter(crash_number == 3) %>%
mutate(crash_date = as.POSIXct(crash_date)) %>%
select(Year = crash_year, Date = crash_date,`Traffic Control Device` = traffic_control_device,`Device Condition` = device_condition,`Address` = street_address)
pander(third_crashes)
Year | Date | Traffic Control Device | Device Condition | Address |
---|---|---|---|---|
2019 | 2019-11-06 06:58:00 | TRAFFIC SIGNAL | FUNCTIONING PROPERLY | 3800 N MILWAUKEE AVE |
2018 | 2018-08-09 07:10:00 | TRAFFIC SIGNAL | FUNCTIONING PROPERLY | 727 W MADISON ST |
2020 | 2020-07-13 02:21:00 | STOP SIGN/FLASHER | FUNCTIONING PROPERLY | 1502 S CENTRAL PARK AVE |
2021 | 2021-08-17 17:19:00 | NO CONTROLS | NO CONTROLS | 1431 S MUSEUM CAMPUS DR |