Monday, April 20, 2015

Plotting Histograms in R

Histogram is probably one of the first things that we plot to look at a continuous variable. In R you can draw a histogram using its built-in ‘hist’ command. Other packages, such as ggplot2 has much more developed functions to plot histograms although one does need to learn how to use functions within those packages.


First, lets simulate data (generate fake data)
DAT = rnorm(1000, 100, 10)
Above line will generate 1000 draws from a normal distribution with a mean of 100 and standard deviation of 10

Now lets take a look at first few rows we generated
head(DAT)

Lets look at the summary of the data
summary(DAT)

Note: You may get different data every time as we have not set a seed but that is not important at this time.

Now draw first histogram
hist(DAT)

Add color to the histogram
hist(DAT, col="blue")

Now lets take control on the number of histogram bars
hist(DAT, col="blue", breaks=25)

Change Y-axis from frequency (which is default) to density
hist(DAT, col="blue", breaks=25, probability=TRUE)

Add labels to the histogram
hist(DAT, col="blue", breaks=25, probability=TRUE,
     main="My Pretty Histogram",    ### For title of the figure
     xlab="My Fake Data")           ### For x-axis label

Add a dark green-color  kernel density curve to the plot
lines(density(DAT), col="darkgreen", lwd=2) 

Add a red-color normal density curve to the plot
curve(dnorm(x, mean(DAT), sd(DAT)), add=TRUE, col="red", lwd=2) 

Below is similar to what you should expect to get:

image