Bellabeat is a manufacturer of health tracking devices that is looking to become a more dominant player in this space. They offer #different devices that monitor data activity such as sleep, stress, walking activity, and calories burned. They have provided a data #set to analyze in order to reveal opportunities to explore for potential business growth.
rm(list = ls())
# Install prereq software if needed.
# install.packages(c("plyr","ggpubr","ggrepel","RColorBrewer","plotly","waffle","scales","viridis","janitor","skimr","lubridate",
#"tidyverse","readr","dplyr","glmnet","plyr","writexl","kableExtra","highcharter"))
# Load Library
#install.packages("extrafont")
library(extrafont)
## Registering fonts with R
font_import(pattern = "lmroman*")
## Importing fonts may take a few minutes, depending on the number of fonts and the speed of the system.
## Continue? [y/n]
## Exiting.
loadfonts(device = "win")
par(family = "LM Roman 10")
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 488014 26.1 1071408 57.3 645834 34.5
## Vcells 857961 6.6 8388608 64.0 2324874 17.8
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
library(ggplot2)
library(tidyr)
library(scales) # For transforming numbers in percentage
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(RColorBrewer) # Pallete colors
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(RColorBrewer)
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
## 1.0 Import databases from csv files
daily_activity <- read_csv("./data/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
hourly_calories <- read_csv("./data/hourlyCalories_merged.csv")
## Rows: 22099 Columns: 3
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, Calories
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
hourly_intensities <- read_csv("./data/hourlyIntensities_merged.csv")
## Rows: 22099 Columns: 4
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ActivityHour
## dbl (3): Id, TotalIntensity, AverageIntensity
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
hourly_steps <- read_csv("./data/hourlySteps_merged.csv")
## Rows: 22099 Columns: 3
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, StepTotal
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
daily_sleep <- read_csv("./data/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(daily_activity,3)
## # A tibble: 3 x 15
## Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitie~
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 4/12/2016 13162 8.5 8.5 0
## 2 1.50e9 4/13/2016 10735 6.97 6.97 0
## 3 1.50e9 4/14/2016 10460 6.74 6.74 0
## # ... with 9 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>
head(daily_sleep,3)
## # A tibble: 3 x 5
## Id SleepDay TotalSleepRecor~ TotalMinutesAsl~ TotalTimeInBed
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:0~ 1 327 346
## 2 1503960366 4/13/2016 12:00:0~ 2 384 407
## 3 1503960366 4/15/2016 12:00:0~ 1 412 442
head(hourly_calories,3)
## # A tibble: 3 x 3
## Id ActivityHour Calories
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 81
## 2 1503960366 4/12/2016 1:00:00 AM 61
## 3 1503960366 4/12/2016 2:00:00 AM 59
head(hourly_intensities,3)
## # A tibble: 3 x 4
## Id ActivityHour TotalIntensity AverageIntensity
## <dbl> <chr> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 20 0.333
## 2 1503960366 4/12/2016 1:00:00 AM 8 0.133
## 3 1503960366 4/12/2016 2:00:00 AM 7 0.117
head(hourly_steps,3)
## # A tibble: 3 x 3
## Id ActivityHour StepTotal
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 373
## 2 1503960366 4/12/2016 1:00:00 AM 160
## 3 1503960366 4/12/2016 2:00:00 AM 151
unique(hourly_calories)
## # A tibble: 22,099 x 3
## Id ActivityHour Calories
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 81
## 2 1503960366 4/12/2016 1:00:00 AM 61
## 3 1503960366 4/12/2016 2:00:00 AM 59
## 4 1503960366 4/12/2016 3:00:00 AM 47
## 5 1503960366 4/12/2016 4:00:00 AM 48
## 6 1503960366 4/12/2016 5:00:00 AM 48
## 7 1503960366 4/12/2016 6:00:00 AM 48
## 8 1503960366 4/12/2016 7:00:00 AM 47
## 9 1503960366 4/12/2016 8:00:00 AM 68
## 10 1503960366 4/12/2016 9:00:00 AM 141
## # ... with 22,089 more rows
unique(hourly_steps)
## # A tibble: 22,099 x 3
## Id ActivityHour StepTotal
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 373
## 2 1503960366 4/12/2016 1:00:00 AM 160
## 3 1503960366 4/12/2016 2:00:00 AM 151
## 4 1503960366 4/12/2016 3:00:00 AM 0
## 5 1503960366 4/12/2016 4:00:00 AM 0
## 6 1503960366 4/12/2016 5:00:00 AM 0
## 7 1503960366 4/12/2016 6:00:00 AM 0
## 8 1503960366 4/12/2016 7:00:00 AM 0
## 9 1503960366 4/12/2016 8:00:00 AM 250
## 10 1503960366 4/12/2016 9:00:00 AM 1864
## # ... with 22,089 more rows
unique(hourly_intensities)
## # A tibble: 22,099 x 4
## Id ActivityHour TotalIntensity AverageIntensity
## <dbl> <chr> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 20 0.333
## 2 1503960366 4/12/2016 1:00:00 AM 8 0.133
## 3 1503960366 4/12/2016 2:00:00 AM 7 0.117
## 4 1503960366 4/12/2016 3:00:00 AM 0 0
## 5 1503960366 4/12/2016 4:00:00 AM 0 0
## 6 1503960366 4/12/2016 5:00:00 AM 0 0
## 7 1503960366 4/12/2016 6:00:00 AM 0 0
## 8 1503960366 4/12/2016 7:00:00 AM 0 0
## 9 1503960366 4/12/2016 8:00:00 AM 13 0.217
## 10 1503960366 4/12/2016 9:00:00 AM 30 0.5
## # ... with 22,089 more rows
unique(daily_sleep)
## # A tibble: 410 x 5
## Id SleepDay TotalSleepRecor~ TotalMinutesAsl~ TotalTimeInBed
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:~ 1 327 346
## 2 1503960366 4/13/2016 12:00:~ 2 384 407
## 3 1503960366 4/15/2016 12:00:~ 1 412 442
## 4 1503960366 4/16/2016 12:00:~ 2 340 367
## 5 1503960366 4/17/2016 12:00:~ 1 700 712
## 6 1503960366 4/19/2016 12:00:~ 1 304 320
## 7 1503960366 4/20/2016 12:00:~ 1 360 377
## 8 1503960366 4/21/2016 12:00:~ 1 325 364
## 9 1503960366 4/23/2016 12:00:~ 1 361 384
## 10 1503960366 4/24/2016 12:00:~ 1 430 449
## # ... with 400 more rows
unique(daily_activity)
## # A tibble: 940 x 15
## Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitie~
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 4/12/2016 13162 8.5 8.5 0
## 2 1.50e9 4/13/2016 10735 6.97 6.97 0
## 3 1.50e9 4/14/2016 10460 6.74 6.74 0
## 4 1.50e9 4/15/2016 9762 6.28 6.28 0
## 5 1.50e9 4/16/2016 12669 8.16 8.16 0
## 6 1.50e9 4/17/2016 9705 6.48 6.48 0
## 7 1.50e9 4/18/2016 13019 8.59 8.59 0
## 8 1.50e9 4/19/2016 15506 9.88 9.88 0
## 9 1.50e9 4/20/2016 10544 6.68 6.68 0
## 10 1.50e9 4/21/2016 9819 6.34 6.34 0
## # ... with 930 more rows, and 9 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>
sum(duplicated(daily_activity))
## [1] 0
sum(duplicated(daily_sleep))
## [1] 3
sum(duplicated(hourly_calories))
## [1] 0
sum(duplicated(hourly_intensities))
## [1] 0
sum(duplicated(hourly_steps))
## [1] 0
### Shows 3 duplicates.
daily_activity <- daily_activity %>%
distinct() %>%
drop_na()
daily_sleep <- daily_sleep %>%
distinct() %>%
drop_na()
hourly_calories <- hourly_calories %>%
distinct() %>%
drop_na()
hourly_intensities <- hourly_intensities %>%
distinct() %>%
drop_na()
hourly_steps <- hourly_steps %>%
distinct() %>%
drop_na()
sum(duplicated(hourly_calories))
## [1] 0
sum(duplicated(hourly_steps))
## [1] 0
sum(duplicated(hourly_intensities))
## [1] 0
sum(duplicated(daily_sleep))
## [1] 0
sum(duplicated(daily_activity))
## [1] 0
#### Check for N/A.
sum(is.na(hourly_calories))
## [1] 0
sum(is.na(hourly_steps))
## [1] 0
sum(is.na(hourly_intensities))
## [1] 0
sum(is.na(daily_sleep))
## [1] 0
sum(is.na(daily_activity))
## [1] 0
## --- formatting date and time for Hourly items. ----
hourly_calories$ActivityHour=as.POSIXct(hourly_calories$ActivityHour, format = "%m/%d/%Y %I:%M:%S %p",tz = Sys.timezone())
##hourly_steps
hourly_steps$ActivityHour=as.POSIXct(hourly_steps$ActivityHour, format = "%m/%d/%Y %I:%M:%S %p",tz = Sys.timezone())
##hourly_intensities
hourly_intensities$ActivityHour=as.POSIXct(hourly_intensities$ActivityHour, format = "%m/%d/%Y %I:%M:%S %p",tz = Sys.timezone())
##sleep_day
daily_sleep$SleepDay=as.POSIXct(daily_sleep$SleepDay, format = "%m/%d/%Y %I:%M:%S %p",tz = Sys.timezone())
##daily_activity
daily_activity$ActivityDate=as.POSIXct(daily_activity$ActivityDate, format = "%m/%d/%Y",tz = Sys.timezone())
###obtaining Day of Week from date
daily_activity <- daily_activity %>%
mutate(Day = format(ymd(ActivityDate), format = '%a'))
class(hourly_calories$ActivityHour)
## [1] "POSIXct" "POSIXt"
class(hourly_intensities$ActivityHour)
## [1] "POSIXct" "POSIXt"
class(hourly_steps$ActivityHour)
## [1] "POSIXct" "POSIXt"
class(daily_sleep$SleepDay)
## [1] "POSIXct" "POSIXt"
class(daily_activity$ActivityDate)
## [1] "POSIXct" "POSIXt"
#merging hourly data frames (steps,calories,intensities)----
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 1237489 66.1 1967202 105.1 1967202 105.1
## Vcells 2358604 18.0 8388608 64.0 5118317 39.1
hourlies_df <- hourly_steps %>%
left_join(hourly_calories, by = c("Id", "ActivityHour")) %>%
left_join(hourly_intensities, by = c("Id", "ActivityHour")) %>%
separate(ActivityHour, sep = " ", into = c("date","time")) %>%
mutate(day = format(ymd(date), format = '%a')) %>%
mutate(time = format(parse_date_time(as.character(time), "HMS"), format = "%H:%M")) %>%
mutate(date = as.POSIXct(date))
head(hourlies_df,3)
## # A tibble: 3 x 8
## Id date time StepTotal Calories TotalIntensity
## <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 2016-04-12 00:00:00 00:00 373 81 20
## 2 1503960366 2016-04-12 00:00:00 01:00 160 61 8
## 3 1503960366 2016-04-12 00:00:00 02:00 151 59 7
## # ... with 2 more variables: AverageIntensity <dbl>, day <chr>
library(writexl)
write_xlsx(hourlies_df, "E:\\Websites\\R-Projects\\new_data\\hourlies_df.xlsx")
write_xlsx(daily_activity, "E:\\Websites\\R-Projects\\new_data\\daily_activity.xlsx")
write_xlsx(daily_sleep, "E:\\Websites\\R-Projects\\new_data\\daily_sleep.xlsx")
write_xlsx(hourly_calories, "E:\\Websites\\R-Projects\\new_data\\hourly_calories.xlsx")
write_xlsx(hourly_intensities, "E:\\Websites\\R-Projects\\new_data\\hourly_intensities.xlsx")
write_xlsx(hourly_steps, "E:\\Websites\\R-Projects\\new_data\\hourly_steps.xlsx")
daily_use2 <- daily_activity %>%
filter(TotalSteps >200 ) %>%
group_by(Id) %>%
dplyr::summarize(ActivityDate=sum(n())) %>%
mutate(Usage = case_when(
ActivityDate >= 1 & ActivityDate <= 15 ~ "Low Use",
ActivityDate >= 16 & ActivityDate <= 22 ~ "Moderate Use",
ActivityDate >= 23 & ActivityDate <= 31 ~ "High Use")) %>%
mutate(Usage = factor(Usage, level = c('Low Use','Moderate Use','High Use'))) %>%
rename(days_used = ActivityDate) %>%
group_by(Usage)
head(daily_use2,8)
## # A tibble: 8 x 3
## # Groups: Usage [3]
## Id days_used Usage
## <dbl> <int> <fct>
## 1 1503960366 30 High Use
## 2 1624580081 31 High Use
## 3 1644430081 30 High Use
## 4 1844505072 17 Moderate Use
## 5 1927972279 15 Low Use
## 6 2022484408 31 High Use
## 7 2026352035 31 High Use
## 8 2320127002 31 High Use
daily_use <- daily_activity %>%
left_join(daily_use2, by = 'Id') %>%
group_by(Usage) %>%
summarise(participants = n_distinct(Id)) %>%
mutate(perc = participants/sum(participants)) %>%
arrange(perc) %>%
mutate(perc = scales::percent(perc))
ggplot(daily_use,aes(fill=Usage ,y = participants, x="")) +
geom_bar(stat="identity", width=2, color="white") +
coord_polar("y", start=0)+
scale_fill_brewer(palette='Set1')+
theme_void()+
theme(axis.title.x= element_blank(),
axis.title.y = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
plot.title = element_text(hjust = 0.5,vjust= -5, size = 20, face = "bold")) +
geom_text(aes(label = perc, x=1.2),position = position_stack(vjust = 0.5))+
labs(title="Tracker use Percentage", tag = "figure 1")+
guides(fill = guide_legend(title = "Usage Amount"))
options(repr.plot.width = 2, repr.plot.height = 1)
#data manipulation to add Usage Types to 'daily_activity' df
daily_activity_usage <- daily_activity %>%
left_join(daily_use2, by = 'Id') %>%
mutate(day = format(ymd(ActivityDate), format = '%a')) %>%
mutate(total_minutes_worn = SedentaryMinutes+LightlyActiveMinutes+
FairlyActiveMinutes+VeryActiveMinutes) %>%
mutate(total_hours = seconds_to_period(total_minutes_worn * 60))
head(daily_activity_usage,6)
## # A tibble: 6 x 21
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## <dbl> <dttm> <dbl> <dbl> <dbl>
## 1 1503960366 2016-04-12 00:00:00 13162 8.5 8.5
## 2 1503960366 2016-04-13 00:00:00 10735 6.97 6.97
## 3 1503960366 2016-04-14 00:00:00 10460 6.74 6.74
## 4 1503960366 2016-04-15 00:00:00 9762 6.28 6.28
## 5 1503960366 2016-04-16 00:00:00 12669 8.16 8.16
## 6 1503960366 2016-04-17 00:00:00 9705 6.48 6.48
## # ... with 16 more variables: LoggedActivitiesDistance <dbl>,
## # VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## # LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## # VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## # LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>,
## # Day <chr>, days_used <int>, Usage <fct>, day <chr>,
## # total_minutes_worn <dbl>, total_hours <Period>
#data for steps
steps_hour <- daily_activity_usage %>%
group_by(day) %>%
summarise(mean_steps = round(mean(TotalSteps))) %>%
mutate(day = factor(day, level = c('Mon', 'Tue', 'Wed','Thu', 'Fri', 'Sat', 'Sun')))
head(steps_hour)
## # A tibble: 6 x 2
## day mean_steps
## <fct> <dbl>
## 1 Fri 7448
## 2 Mon 7781
## 3 Sat 8153
## 4 Sun 6933
## 5 Thu 7406
## 6 Tue 8125
### * plot for avg steps by day
ggplot(steps_hour, aes(x = day, y= mean_steps, fill = mean_steps)) +
geom_col(color="darkblue", size = 0.1) +
scale_fill_gradientn(limits=c(0,9000), breaks=seq(0,9000, by = 1500), colors = brewer.pal(11,"Spectral")) +
scale_y_continuous(limits=c(0,9000), breaks=seq(0, 9000, by = 1500))+
labs(title= ("Average Steps"), tag = "figure 2", subtitle = ('By Day'), x="" , y="Steps")+
theme(plot.title=element_text(size = 16,hjust = 0))+
theme(plot.subtitle=element_text(size = 14,hjust = 0))+
theme(axis.text.y=element_text(size=14)) +
theme(axis.text.x=element_text(size=14,hjust= 0.5))+
theme(axis.title.x = element_text(margin = margin(t = 14, r = 0, b = 0, l = 0)))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)))+
theme(legend.position = "top")+
theme(legend.title=element_text(size=12))+
theme(legend.text=element_text(size=8))+
guides(fill = guide_colourbar(barwidth = 12))
options(repr.plot.width = 10, repr.plot.height = 8)
### Average steps by Group
stepsbygroup <- daily_activity_usage %>%
group_by(day,Usage) %>%
dplyr::select(Usage, TotalSteps, day) %>%
mutate(day = factor(day, level = c('Mon', 'Tue', 'Wed','Thu', 'Fri', 'Sat', 'Sun')))
head(stepsbygroup,8)
## # A tibble: 8 x 3
## # Groups: day, Usage [7]
## Usage TotalSteps day
## <fct> <dbl> <fct>
## 1 High Use 13162 Tue
## 2 High Use 10735 Wed
## 3 High Use 10460 Thu
## 4 High Use 9762 Fri
## 5 High Use 12669 Sat
## 6 High Use 9705 Sun
## 7 High Use 13019 Mon
## 8 High Use 15506 Tue
stepsbygroup %>%
ggplot(aes(x= Usage, y= TotalSteps, fill= Usage)) +
geom_boxplot() +
scale_y_continuous(limits=c(0,38000), breaks=seq(0,38000, by = 4000))+
theme(legend.position="none",
plot.title = element_text(size=11) ) +
ggtitle("A boxplot with jitter") +
xlab("") +
labs(title= ("Average Steps by Group"), tag = "figure 3",x="average per week" , y="Steps")+
theme(plot.title=element_text(size = 16,hjust = 0))+
theme(plot.subtitle=element_text(size = 14,hjust = 0))+
theme(axis.text.y=element_text(size=14)) +
theme(axis.text.x=element_text(size=14,hjust= 0.5))+
theme(axis.title.x = element_text(margin = margin(t = 14, r = 0, b = 0, l = 0)))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)))+
theme(axis.text.x=element_blank(),axis.ticks.x=element_blank())+
theme(legend.position = "top")+
theme(legend.title=element_text(size=12))+
theme(legend.text=element_text(size=8))+
facet_grid(~Usage)
options(repr.plot.width = 5, repr.plot.height = 6)
caloriesbp <- daily_activity_usage %>%
group_by(day,Usage) %>%
dplyr::select(Usage, Calories, day) %>%
mutate(day = factor(day, level = c('Mon', 'Tue', 'Wed','Thu', 'Fri', 'Sat', 'Sun')))
head(caloriesbp)
## # A tibble: 6 x 3
## # Groups: day, Usage [6]
## Usage Calories day
## <fct> <dbl> <fct>
## 1 High Use 1985 Tue
## 2 High Use 1797 Wed
## 3 High Use 1776 Thu
## 4 High Use 1745 Fri
## 5 High Use 1863 Sat
## 6 High Use 1728 Sun
caloriesbp %>%
ggplot(aes(x=day , y= Calories, fill= Usage)) +
geom_boxplot() +
scale_y_continuous(limits=c(0,3000), breaks=seq(0,3000, by = 400))+
theme(legend.position ="element_text(angle = 90))",
plot.title = element_text(size=11)) +
ggtitle("A boxplot with jitter") +
xlab("") +
labs(title= ("Average Calories"), tag = "figure 4", subtitle = ('By Usage Group'), x="Day" , y="Calories")+
theme(plot.title=element_text(size = 16,hjust = 0))+
theme(plot.subtitle=element_text(size = 14,hjust = 0))+
theme(axis.text.y=element_text(size=14)) +
theme(axis.text.x=element_text(angle = 90, size=12, hjust= 0, vjust = 0.3))+
theme(axis.title.x = element_text(margin = margin(t = 18, r = 0, b = 0, l = 3)))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)))+
theme(legend.title=element_text(size=8))+
theme(legend.text=element_text(size=8))+
facet_grid(~Usage)
## Warning: Removed 153 rows containing non-finite values (stat_boxplot).
# Observation 02 - Average Calories Per Group ### Observations 02:
some individuals being unmotivated on some days. Maybe some type of motivations alert may be helpful for them.
in the high use group.
### Merging hourly data frames (steps,calories,intensities)
hourlies_df <- hourly_steps %>%
left_join(hourly_calories, by = c("Id", "ActivityHour")) %>%
left_join(hourly_intensities, by = c("Id", "ActivityHour")) %>%
separate(ActivityHour, sep = " ", into = c("date","time")) %>%
mutate(day = format(ymd(date), format = '%a')) %>%
mutate(time = format(parse_date_time(as.character(time), "HMS"), format = "%H:%M")) %>%
mutate(date = as.POSIXct(date))
head(hourlies_df,3)
## # A tibble: 3 x 8
## Id date time StepTotal Calories TotalIntensity
## <dbl> <dttm> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 2016-04-12 00:00:00 00:00 373 81 20
## 2 1503960366 2016-04-12 00:00:00 01:00 160 61 8
## 3 1503960366 2016-04-12 00:00:00 02:00 151 59 7
## # ... with 2 more variables: AverageIntensity <dbl>, day <chr>
stephr <- left_join(hourlies_df,daily_use2, by = 'Id','Usage')
stephr%>%
mutate(day = factor(day,level = c('Mon', 'Tue', 'Wed','Thu', 'Fri', 'Sat', 'Sun'))) %>%
group_by(Usage, time, day) %>%
summarize(steps = round(mean(StepTotal),2))
## `summarise()` has grouped output by 'Usage', 'time'. You can override using the `.groups` argument.
## # A tibble: 504 x 4
## # Groups: Usage, time [72]
## Usage time day steps
## <fct> <chr> <fct> <dbl>
## 1 Low Use 00:00 Mon 0
## 2 Low Use 00:00 Tue 1.64
## 3 Low Use 00:00 Wed 0.36
## 4 Low Use 00:00 Thu 5.64
## 5 Low Use 00:00 Fri 6
## 6 Low Use 00:00 Sat 0
## 7 Low Use 00:00 Sun 9.88
## 8 Low Use 01:00 Mon 0
## 9 Low Use 01:00 Tue 0.64
## 10 Low Use 01:00 Wed 33
## # ... with 494 more rows
## -- Make time as hour with only 2 characters in new hour column --
stephr$hour =substr(stephr$time,1,2)
stephr$hour <- as.numeric(stephr$hour)
sapply(stephr, class)
## $Id
## [1] "numeric"
##
## $date
## [1] "POSIXct" "POSIXt"
##
## $time
## [1] "character"
##
## $StepTotal
## [1] "numeric"
##
## $Calories
## [1] "numeric"
##
## $TotalIntensity
## [1] "numeric"
##
## $AverageIntensity
## [1] "numeric"
##
## $day
## [1] "character"
##
## $days_used
## [1] "integer"
##
## $Usage
## [1] "factor"
##
## $hour
## [1] "numeric"
### Grouped
stephr %>%
ggplot(aes(x = day, y = StepTotal, fill = Usage))+
geom_bar(stat = "identity", position = "dodge")+
labs(title ="",tag = "figure 5")+
ggtitle("The Relationship between Usage, Steps and Day of the week.")
### Lollipop Workout Chart
hrbrthemes::import_roboto_condensed()
## You will likely need to install these fonts on your system as well.
##
## You can find them in [C:/R_library/hrbrthemes/fonts/roboto-condensed]
intensity1 <- hourlies_df %>%
filter(TotalIntensity > 0) %>%
group_by(day) %>%
summarise(mean_intensity = round(mean(TotalIntensity)),
std_mean_intensity = round(sd(TotalIntensity))) %>%
mutate(day = factor(day, level = c('Mon', 'Tue', 'Wed','Thu', 'Fri', 'Sat', 'Sun')))
head(intensity1)
## # A tibble: 6 x 3
## day mean_intensity std_mean_intensity
## <fct> <dbl> <dbl>
## 1 Fri 20 23
## 2 Mon 21 26
## 3 Sat 22 25
## 4 Sun 21 25
## 5 Thu 20 23
## 6 Tue 20 24
ggplot(intensity1,aes(x=day, y=mean_intensity, fill = mean_intensity, group = 1)) +
geom_segment( aes(x=day, xend=day,y=0, yend=mean_intensity)) +
scale_y_continuous(limits=c(0,25), breaks=seq(0,26, by = 2))+
geom_point( size=5, color="red", fill=alpha("orange", 0.3), alpha=0.7, shape=21, stroke=2)+
labs(title ="",tag = "figure 6")+
xlab("Days of the week")+
ylab("Workout Intensity")+
ggtitle("Work out intensity")
# Workout Intensity ### Grouped workout chart
### Join database hourlies_df and Daily_use2 to make group_intensity
group_intensity <- hourlies_df %>%
left_join(daily_use2, by = 'Id') %>%
mutate(day = factor(day,level = c('Mon', 'Tue', 'Wed','Thu', 'Fri', 'Sat', 'Sun'))) %>%
filter(TotalIntensity > 0) %>%
group_by(Usage, day) %>%
summarise(intensity = round(mean(TotalIntensity, .groups = 'keep'),1))
## `summarise()` has grouped output by 'Usage'. You can override using the `.groups` argument.
ggplot(group_intensity, aes(x = day, y= intensity, fill = intensity)) +
geom_col(color="pink", size = 0.1)+
scale_fill_gradientn(limits=c(0,25), breaks=seq(0,25, by = 5), colours = brewer.pal(5, "YlOrRd")) +
scale_y_continuous(limits=c(0,25), breaks=seq(0,25, by = 5))+
labs(title= ("Average Intensity"),tag = "Figure 7", subtitle = ('By Days, Groups'), x="Days" , y="Intensity")+
theme(plot.title=element_text(size = 16,hjust = 0))+
theme(plot.subtitle=element_text(size = 14,hjust = 0))+
theme(axis.text.y=element_text(size=14)) +
theme(axis.text.x=element_text(size=14,hjust= 0.5))+
theme(axis.title.x = element_text(margin = margin(t = 14, r = 0, b = 0, l = 0)))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)))+
theme(legend.position = "top")+
theme(legend.title=element_text(size=12))+
theme(legend.text=element_text(size=8))+
guides(fill = guide_colourbar(barwidth = 12))+
coord_flip() +
facet_grid(~Usage)
### Saturdays seem to be intense for all groups, with Monday being the next insense workout in all groups. ### The high intensity group were pretty consistent over all. Nothing that unusualy with these plots, as they ### showed what would be expected for each group segment.
# Time worn
## Warning: Removed 1 rows containing missing values (position_stack).
## In this graph, we can see the users’ average minutes of sleep vs average distance traveled. A majority of these users sleep between 320 to 530 minutes (5.33 to 8.83 hours)
This study is limited to a small set of user (30), as such, it would not be an ideal situation for a case study. In any case, there should be enough data to show small patterns that we may be able to show a co-relationship that we can make educated decisions to help the stakeholders of BellaBeat.
From this data, we can see that there may be an opportunity for the marketing team to increase sales by making some small changes to the devies and marketing. The last chart (figure 9) shows a health benefit from getting between 7-9 hours of sleep. It also shows that any increase of over 8 miles in steps has an inverse relationship to sleep.
With that data combined, with the other data, and most notably the amount of time the users wear this type of device (Highest with the moderate users)(Figure 8), it seems there would be an opportunity to market towards a moderate user, instead of a hard core athlete.
Moderate users seem to wear the devices longer, do not tend to be the “cross trainer” type of athlete, but someone who would want such a device to improve a moderate or sedentary life style.
There would appear to be a marketing opportunity to market towards improving overall health, improve sleep, and to monitor daily activities for a gradual overall improvement in health.
Maybe add software to alert the user when they would become sedentary, or even make a scoring system where a user can compete daily with themselves.
The data seems to suggest the person interested in wearing such a device for a long period of time (figure 8) would be a more “middle of the road” type of person looking to make small changes.
It sounds like an opportunity that could be rewarding to this type of user that seems to have been overlooked in the marketplace of activity trackers.
```
More about this website.