Load Packages
# Run this code to load the required packages
suppressMessages(suppressWarnings(suppressPackageStartupMessages({
library(mosaic)
library(supernova)
library(Lock5withR)
})))
# To make slightly smaller plots
options(repr.plot.width = 5, repr.plot.height = 3)
Load Data
# Run this code to load the data
CensusSchool <- read.csv("https://docs.google.com/spreadsheets/d/e/
↪2PACX-1vSVaWnM4odSxy0mlnhWvvGbeLtiKoZmsbqC6KLzXtBOjQfrF9EVKuX4RVh3XbP3iw/pub?
↪gid=2100178416&single=true&output=csv", header = TRUE)
str(CensusSchool)
output:
...
...
1.1 Part I: Intro/Overview of the Problem or Question
One very popular extracurricular activity for high schoolers today is playing video games. You’ve probably heard of the stereotype of a “gamer” who spends all day on their game console. You’ve also probably heard that boys play more video games than girls. But do these stereotypes hold true for high schoolers of today’s generation? Are there gender differences in time spent playing video games? This is an important question to answer in order to understand the ways in which America’s high school students choose to spend their time. If they are spending a lot of time on video games, and boys spend more time than girls, we should know this in order to examine the
impact video games might have on their development.
To investigate the relationship between playing video games and gender, I will use data from the Census at School classroom project (CensusSchool) collected between 2010 - 2021. In this data set, 10,113 American high school students completed a survey assignment asking a variety of questions about their preferences, habits, and characteristics. In all, 60 variables are captured in the data set. Using these data, I will look at the number of hours a week students report playing video games (Video_Games_Hours) and students’ self reports of gender (Gender).
My hypothesis is that gender will explain some of the variation in the hours of video games played per week: Video_Games_Hours = Gender + other stuff. Specifically, I predict that males will report spending more time playing video games than females.
1.2 Part II: Explore Variation
First, I explored variation in my outcome variable using a histogram and summary statistics. I noticed some odd patterns in the distribution of hours spent playing video games and determined
I needed to clean my data before moving on. Specifically, the histogram of Video_Games_Hours showed that some students reported impossibly high hours - approaching 100,000 a week! To account for this issue, I decided to remove students who reported unrealistically high observations.
Assuming that students spend minimum of 42 weekly hours sleeping and 40 hours of school, I determined 86 to be the maximum possible gaming hours a week. Sixty students reported spending more than 86 hours a week playing games; these cases were removed from the data (see R code for filtering). I decided to only keep complete observations for the variables of interest (Gender and Vido_Games_Hours), so I also removed students with missing observations (NAs).
# Select only those variables of interest for my analysis and store them in a␣
↪new data frame called gamedata
gamedata <- select(CensusSchool, Gender, Video_Games_Hours)
# Create a histogram to visualize the distribution of hours spent playing video␣
↪games for all students
gf_histogram(~Video_Games_Hours, data = gamedata)
# Calculate summary statistics for hours spent playing video games per week
favstats(~Video_Games_Hours, data = gamedata)
output:
Warning message:
“Removed 39 rows containing non-finite values (stat_bin).”
A data.frame: 1 × 9
min Q1 median Q3 max mean sd n missing
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
0 0 1 6 99999 27.94044 1415.313 10074 39
Output:
# Tally up the number of students who reported spending more than 86 hours per␣
↪week playing video games
tally(~Video_Games_Hours > 86, data = gamedata)
OUTPUT:
Video_Games_Hours > 86
TRUE FALSE <NA>
60 10014 39
# Clean the data
# Filter to remove students who reported > 86 hours playing video games
gamedata_clean <- filter(gamedata, Video_Games_Hours <= 86)
# Omit NA values for the variables I am interested in
gamedata_clean <- na.omit(gamedata_clean)
# Tally up the number of students who reported 86 hours to make sure my data␣
↪cleaning worked.
# If it works, I should see 0 for TRUE and 0 for NA.
tally(~Video_Games_Hours > 86, data = gamedata_clean)
# Inspect the cleaned data to make sure everything worked as expected
head(gamedata_clean)
OUTPUT:
Video_Games_Hours > 86
TRUE FALSE
0 9999
Next, I visualized Video_Games_Hours again using my new, cleaned dataset (gamedata_clean). The histogram (shown below in an R cell) shows a high peak around zero, with a long right skew. This suggests that most students play very few hours, but some students play many hours a week. Running favstats() showed that the mean hours of weekly gaming in my sample is 5.14 (SD=9.61).
The range is from 0 - 84 hours.
# Visualize the distribution of hours spent playing video games again after␣
↪cleaning my data
gf_histogram(~Video_Games_Hours, data = gamedata_clean)
# Calculate the favstats for Video_Games_Hours in my cleaned sample data
favstats( ~Video_Games_Hours, data = gamedata_clean)
OUTPUT:
To answer my research question: “Are there gender differences in time spent playing video games?”, I want to use Gender to explore variation in Video_Games_Hours. Because Gender is a categorical variable, I first made a bar graph to look at the distribution. I also ran tally() to explore the breakdown of males and females. In my sample, 52% of students are female.
# Create a bar graph to visualize the distribution of males and females in my␣
↪sample
gf_bar(~Gender, data = gamedata_clean)
# Calculate the proportion of males and females in the sample
tally(~Gender, data = gamedata_clean, format = "proportion")
OUTPUT:
Gender
Female Male
0.5213521 0.4786479
Next, I created a faceted histogram to visualize the relationship between gender and hours spent playing video games. Looking at the histogram, I see that females appear to have more observations around zero than males. The within-group variation for males looks greater than for females. I suspect my hypothesis to be correct: males likely spend more hours a week playing video games than females.
# Create a faceted histogram to visualize the distribution of hours spent␣
↪playing video games for male and female students
gf_dhistogram(~Video_Games_Hours , data = gamedata_clean) %>%
gf_facet_grid(Gender ~.)
OUTPUT:
Realcode4you provide all data analysis related help which is related to your research paper or any academic semester.
Hire expert to get instant help by sending your project requirement details at:
realcode4you@gmail.com
If you have any query then comment in below comment section.
Comments