Visualization Homework Help | Visualization Using R Programming

Aug 25, 20216 min read

In this blog we will cover different types of visualizations which is more useful to create attractive visual effects as per your need with R programming.

How to use Annotations, Text, Rectangle and An Arrow

By using annotate function we can add specific details to the plot , This is handy for adding minor annotations (like text labels) In the given plot we can understand the features of a flower like petal length , petal width with respect to the species. by this graph we can segment the species by there feature values(petal length , petal width) These annotations makes it easy o understand what the plot is actually telling.

library(tidyverse)
data(iris)

# Load data here
head(iris)

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa

Do the following:

1. Make a plot. Any kind of plot will do (though it might be easiest to work with geom_point()).

2. Label (some or all of) the points using one of geom_text(), geom_label(), geom_text_repel(), or geom_label_repel(). You might need to make a new indicator variable so that you only highlight a few of the points instead of all of them.

3. Add *at least two each** the following annotations somewhere on the plot using annotate():

– Text

– An arrow (make a curved arrow for bonus fun)

– A rectangle

You can add more if you want, but those three are the minimum. Try to incorporate the annotations into the design of the plot rather than just placing them wherever.

Implementation

p = ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) + geom_point() + labs(title="petal length vs petal width", x="Petal Length", y="Petal Width") 
p + annotate("rect", xmin = 5, xmax = 7, ymin = 1.75, ymax = 2.75,
  alpha = .2,fill='blue')+annotate("text", x = 6, y = 2.6, label = "virginica")+
  annotate("rect", xmin = 0.5, xmax = 2, ymin = 0, ymax = 0.75,
  alpha = .2,fill='red')+annotate("text", x = 1, y = 0.7, label = "Seosta")+
    annotate(geom = "curve", x = 2, y = 1.75, xend = 0.75, yend = 0.75, 
    curvature = .3, arrow = arrow(length = unit(2, "mm")))+annotate("text", x = 2.5, y = 1.75, label = "Extreme Ends")+
    annotate(geom = "curve", x = 3, y = 1.75, xend =5, yend = 2.5, 
    curvature = .4, arrow = arrow(length = unit(2, "mm")))

Output:

How to use tidyverse and ggplot

The tidyverse and ggplot are two of my favorite tools. The tidyverse is a fantastic tool for exploratory data analysis, and ggplot is fantastic. The capacity to ingest data, change it, and visualise it that these tools enable is tremendously powerful. There are other supplementary packages in addition to the core tidyverse. There is one function in all of these packages that provides a tremendous amount of value for a single function call. ggplotly() is used to build interactive visualizations using your ggplot() methods. The primary benefits it provides, in my opinion, are:

You can move around the plot by zooming in and out.

You can see the value of points by hovering over them.

library(tidyverse)
library(plotly)
state = data.frame(state.x77, state.region, state.abb) 
head(state)

Output:

Do the following:

1. Make a plot. Any kind of plot will do (though it might be easiest to work with geom_point()).

2. Make the plot interactive with ggplotly().

3. Make sure the hovering tooltip is more informative than the default.

Implementation

g <- ggplot(state, aes(x = reorder(state.region, -Income, median), y = Income))+
  
   labs(title="Percapita Income with respect to the region ", y="Per Capita Income") +
  geom_boxplot()
  
subplot(ggplotly(g, tooltip = c("text", "Income"))) %>% 
  layout(showlegend = FALSE,
         annotations = list(
  list(x = 0.1 , y = 1.05, text = "US Income Per Capita", showarrow = F, xref='paper', yref='paper')))

Output:

How to Create Time series graphs

This exercise is very important with respect to time-related data, Time series graphs can be used to see how counts or numerical values have changed over time. Points are drawn along the x-axis and joined by a continuous line since date and time information is continuous understanding the time-related data is very important and widely used in many domains like stock price prediction, weather prediction, so it would be difficult for us to look at the numerical data and understand the pattern but when we plot it’s easy for even a layman to understand that, these time-related plots play a major role in the corporate world this help to convince the stack holder about the results of the analysis In the below graph it shows the US Monthly Natural Gas Consumption,with the help of this graph its easy to visualize and understand the pattern.

library(tidyverse)
library(plotly)
library(TSstudio)
data("Coffee_Prices")
df <- ts_to_prophet(USgas)
colnames(df)=c('date','Billion_Cubic_Feet')
head(df)

Output:

## date Billion_Cubic_Feet ## 1 2000-01-01 2510.5 ## 2 2000-02-01 2330.7 ## 3 2000-03-01 2050.6 ## 4 2000-04-01 1783.3 ## 5 2000-05-01 1632.9 ## 6 2000-06-01 1513.1

Do the following:

1. Load some time-related data

2. Make a plot to show how that data changes over time.

3. Explain why you chose to visualize the data the way you did.

p <- plot_ly() %>%
  add_lines(data = df, x = ~date, y = ~Billion_Cubic_Feet)  %>%
  layout(title = "US Monthly Natural Gas Consumption" )
p

Output:

How to Create World Map

Geographical data visualization is one of the most interesting part in r This gives a wide range of advantages to analyze data with respect to continents ,countries This gives me a great advantage of using Map ,and visualizing the data over that which helps to express the numerical values on the map and make the graph so intutive and interesting.

The south asian and african countries had the lowest level of user access to internet in 2015.

library(tidyverse)
library(sf)

# Load and clean internet user data
internet_users <- read_csv("share-of-individuals-using-the-internet-1990-2015.csv") %>%
  # Rename country code column to ISO_A3 so it matches what's in the Natural Earth shapefile
  rename(users = `Individuals using the Internet (% of population) (% of population)`,
         ISO_A3 = Code)

# Load world shapefile from Natural Earth
# https://www.naturalearthdata.com/downloads/110m-cultural-vectors/
world_shapes <- read_sf("ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp")
# Only look at 2015
users_2015 <- internet_users %>%
  filter(Year == 2015)

users_map <- world_shapes %>%
  left_join(users_2015, by = "ISO_A3") %>%
  filter(ISO_A3 != "ATA")  # No internet in Antarctica. Sorry penguins.

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly() %>%ggplot() +
geom_sf(data = users_map, aes(fill = users)) +

scale_fill_viridis_c() +
coord_sf(crs = st_crs("EPSG:3395")) +
theme_void() +
labs(title = "Access to the Internet in 2015", fill = "User Access") +
theme(plot.title = element_text(face = "bold", hjust = 0.6)) +
theme(plot.subtitle = element_text(hjust = 0.5)) +
theme(legend.title = element_text(face = "bold")) +
theme(legend.position = "right")

Output:

Text Analytics To Count Work Frequencies

The text analytics is very useful we can extract the insights from the whole corpus of data given in the book and find intresting things from them,

Download 4+ books by some author on Project Gutenberg. Jane Austen, Victor Hugo, Emily Brontë, Lucy Maud Montgomery, Arthur Conan Doyle, Mark Twain, Henry David Thoreau, Fyodor Dostoyevsky, Leo Tolstoy. Anyone. Just make sure it’s all from the same author.

Make these two plots and describe what each tell about your author’s books:

1. Top 10 most frequent words in each book

2. Top 10 most unique words in each book (i.e. tf-idf)


library(tidyverse)
library(tidytext)
library(gutenbergr)
little_women_raw <- gutenberg_download(514, meta_fields = "title")
## Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest
## Using mirror http://aleph.gutenberg.org
# 1524 - Hamlet
# 1532 - King Lear
# 1533 - Macbeth
# 1513 - Romeo and Juliet
tragedies_raw <- gutenberg_download(c(1524, 1532, 1533, 1513),
                                    meta_fields = "title")
tragedies_words <- tragedies_raw %>% 
  drop_na(text) %>% 
  unnest_tokens(word, text)

head(tragedies_words)

Output:

## # A tibble: 6 x 3 ## gutenberg_id title word ## <int> <chr> <chr> ## 1 1513 Romeo and Juliet the ## 2 1513 Romeo and Juliet tragedy ## 3 1513 Romeo and Juliet of ## 4 1513 Romeo and Juliet romeo ## 5 1513 Romeo and Juliet and ## 6 1513 Romeo and Juliet juliet

top_words_tragedies <- tragedies_words %>% 
  # Remove stop words
  anti_join(stop_words) %>% 
  # Get rid of old timey words and stage directions
  filter(!(word %in% c("thou", "thy", "haue", "thee", 
                     "thine", "enter", "exeunt", "exit"))) %>% 
  # Count all the words in each play
  count(title, word, sort = TRUE) %>% 
  # Keep top 15 in each play
  group_by(title) %>% 
  top_n(10) %>% 
  ungroup() %>% 
  # Make the words an ordered factor so they plot in order
  mutate(word = fct_inorder(word))
## Joining, by = "word"
## Selecting by n
head(top_words_tragedies)

## # A tibble: 6 x 3 ## title word n ## <chr> <fct> <int> ## 1 Hamlet, Prince of Denmark hamlet 461 ## 2 Romeo and Juliet romeo 300 ## 3 Macbeth macbeth 282 ## 4 The Tragedy of King Lear lear 230 ## 5 Hamlet, Prince of Denmark lord 223 ## 6 Hamlet, Prince of Denmark king 196

ggplot(top_words_tragedies, aes(y = fct_rev(word), x = n, fill = title)) + 
  geom_col() + 
  guides(fill = "none") +
  labs(y = "Count", x = NULL, 
       title = "10 most frequent words in four Shakespearean tragedies") +
  facet_wrap(vars(title), scales = "free_y") +
  theme_bw()

Output:

RealCode4You

Visualization Homework Help | Visualization Using R Programming

Recent Posts

Comments