Number of crime events each year

Overall, the number of crime events showed an increasing trend from year 2006 to year 2021, and year 2013 is the watershed: the number of events before 2013 is generally lower than that after 2013. Sepcifically, the number of crime events is over 10,000 almost each year, for year 2016, 2017, 2018, and 2019, total number of crime events reached at 12,000. However, there is a sudden decrease from year 2019 to year 2020, the number of crime events reduced by a half, and that low quantity keeps in year 2021, updating to September, 2021.

sub_crime_year = 
  raw_sub_crime %>% 
  select(start_date, start_time_hour, crime_event, law_cat) %>% 
  mutate(start_date = substring(start_date, 1, 4))

plot_1 = 
  sub_crime_year %>% 
  group_by(start_date) %>% 
  summarise(event_num = n() / 1000) %>% 
  plot_ly(
    x = ~start_date, y = ~event_num, type = "bar", colors = "#2171B5"
  )

layout(plot_1, title = "Crime events over years", xaxis = list(title = "Year"), yaxis = list(title = "Number of Crime Events (K)"))

Number of crime events each season

Basically, the peak of crime events appeared on the first quarter for almost each year, the bottom point of the quantity of crime events appeared on the third or fourth quarter for nearly every year. The overall trend of quantity of events was increasing, and the overall quantity of events after the fourth quarter of 2013 is higher than that before the fourth quarter of 2013.

sub_crime_season = 
  raw_sub_crime %>% 
  select(start_date, start_time_hour, crime_event, law_cat) %>% 
  separate(start_date, into = c("year", "month", "day"), sep = "-") %>% 
  mutate(season = case_when(
    month %in% c("01","02","03") ~ "Q1",
    month %in% c("04","05","06") ~ "Q2",
    month %in% c("07","08","09") ~ "Q3",
    month %in% c("10","11","12") ~ "Q4"
  )) %>% 
  mutate(event_quarter = paste(year, season, sep = "-"))
  
plot_2 = 
  sub_crime_season %>% 
  group_by(event_quarter) %>% 
  summarise(event_num = n()) %>% 
  plot_ly(
    x = ~event_quarter, y = ~event_num, type = "scatter", mode = "points", colors = "#2171B5"
  )

layout(plot_2, title = "Crime events over quarter", xaxis = list(title = "Quarter"), yaxis = list(title = "Number of Crime Events"), width = 800)

Number of cirime events each month

The overall trend of the number of crime events in a year is decreasing, the peak of quantity appeared on January whereas the bottom of events number appeared on December. Generally, March, May, and October have comparatively high frequency of crime events whereas February, April, and July have comparatively low frequency of crime events.

As for three degrees of crime, violation events showed a flat trend over a year, as for felony and misdemeanor events, the high quantity of them both appeared on January, which is consistent with the total crime events. However, for felony events, the lowest frequency of events number appeared on April.

sub_crime_month = 
  raw_sub_crime %>% 
  select(start_date, start_time_hour, crime_event, law_cat) %>% 
  separate(start_date, into = c("year", "month", "day"), sep = "-") %>% 
  mutate(month = ifelse(month == "01", "January", ifelse(month == "02", "February", ifelse(month == "03", "March", ifelse(month == "04", "April", ifelse(month == "05", "May", ifelse(month == "06", "June", ifelse(month == "07", "July", ifelse(month == "08", "August", ifelse(month == "09", "September", ifelse(month == "10", "October", ifelse(month == "11", "November", "December")))))))))))) %>% 
  mutate(month = ordered(month, level = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")))
  
plot_3 = 
  sub_crime_month %>% 
  group_by(month, law_cat) %>% 
  summarise(event_num = n() / 1000) %>% 
  plot_ly(
    x = ~month, y = ~event_num, type = "scatter", mode = "points", color = ~law_cat, colors = "viridis"
  )


plot_4 = 
  sub_crime_month %>% 
  group_by(month) %>% 
  summarise(event_num = n() / 1000) %>% 
  plot_ly(
    x = ~month, y = ~event_num, type = "scatter", mode = "points", color="TotalNumber", colors = "#2171B5"
  )

subplot = subplot(plot_4, plot_3)

layout(subplot, title = "Event number over month", xaxis = list(title = "Month"), yaxis = list(title = "Number of crime events (K)"), width = 1000)

Degrees of crime event v.s. Occurrence time

Based on the heat map, it is easier to obtain that the quantity of misdemeanor events is the largest, the second largest is felony, and the least is violation. For misdemeanor crime events, it most happened in the afternoon, around 3:00 pm and 4:00 pm, as well as felony events happened most frequently from 1:00 to 6:00 pm.

sub_crime_degree = 
  raw_sub_crime %>% 
  group_by(law_cat, start_time_hour) %>% 
  summarise(event_num = n()) %>% 
  rename("crime_degree" = "law_cat", "hour" = "start_time_hour")

plot_5 = 
  sub_crime_degree %>% 
  plot_ly(
    x = ~ hour, y = ~ crime_degree, z = ~ event_num, type = "heatmap", colors = "YlGn"
  ) %>%
  colorbar(title = "Events Number", x = 1.1, y = 0.8) 

layout(plot_5, title = "Crime frequency: Degree v.s. Hour", xaxis = list(title = "Hour"), yaxis = list(title = "Degree"), width = 850, height = 400)

Day of week v.s. Occurrence time

From the heat map, it is easier to be judged that most of crime events were mainly occurred in the afternoon from Tuesday to Thursday. We can say that in the noon around 12:00 pm on Tuesday and Wednesday may be the most dangerous time on in a week, whereas there are not that many crime events on Saturday and Sunday.

sub_crime_dow = 
  raw_sub_crime %>% 
  mutate(day_of_week = wday(as.Date(start_date), label=TRUE, abbr = FALSE)) %>% 
  mutate(day_of_week = fct_relevel(day_of_week, "Saturday", "Friday", "Thursday", "Wednesday", "Tuesday", "Monday", "Sunday")) %>%
  select(day_of_week, start_time_hour, crime_event) %>% 
  group_by(day_of_week, start_time_hour) %>% 
  summarise(crime_num = n())

plot_6 = 
  sub_crime_dow %>% 
  plot_ly(
    x = ~ start_time_hour, y = ~ day_of_week, z = ~ crime_num, type = "heatmap", colors = "BuPu"
  ) %>%
  colorbar(title = "Events Number", x = 1.1, y = 0.8) 

layout(plot_6, title = "Crime frequency: Day v.s. Hour", xaxis = list(title = "Hour"), yaxis = list(title = "Day of week"), width = 850, height = 430
    )

Proceeding time

When it comes to the proceeding time of three degrees of events, we can see that the median proceeding time between misdemeanor and violation events are not so different, whereas median time for felony events is slightly higher than the other two so misdemeanor and violation events may be easier to handle with, excluding some extreme outliers.

Generally, the range of the proceeding time for misdemeanor events is comparably less than the other two, and the maximum proceeding time for misdemeanor events is approximately 5 minutes less than felony events and violation events.

crime_prcd_time = 
  raw_sub_crime %>% 
  drop_na(start, end) %>%
  mutate(prcd_time = difftime(end, start, units = "mins")) %>% 
  filter(prcd_time < 35) %>% 
  filter(prcd_time != 0) %>% 
  mutate(quarters = quarters(as.Date(start_date)))

plot_7 = 
  crime_prcd_time %>% 
  plot_ly(y = ~ prcd_time, color = ~ law_cat, type = "box")

layout(plot_7, title = "Crime type", xaxis = list(title = "Proceeding time"), yaxis = list(title = "Crime type v.s. Proceeding time (mins)"))