According to (http://www.revolcutionanalytics.com/what-r) R is hot!!
This assignment, aims to back up this statement, by examining the reasons why the programming tool is so appealing. By providing some information on R & it’s functions.
During the last decade, the momentum coming from both academia and industry has lifted the R programming language to become the single most important tool for computational statistics, visualization and data science.
To begin, we were asked to complete the R Programming course on tryr.codeschool.com; this course provided an excellent introduction to the world of R & some of its commands & functionality.
In order to get an understanding of R studio, I decided to analyze a data file, containing data on nutritional food content & used R to run some basic functions.
- I created a folder & saved it to my desktop, I renamed the folder R
- Within the folder I included a .csv file, which included a breakdown of food components ( file sourced from Moodle)
- I saved the .csv file to the R folder & loaded this as my workspace into R Studio
- I opened R, & set my working directory to the R folder I created on my desktop
- The first command was asking R to read the file from the Working Directory
Reading the data
A: Get the current directory
Command = getwd()
B: Read the csv file
Command = read.csv(“USDA.csv”)
C: Structure of data
Command = str(USDA)
D: Summarise dataset
Command = summary(USDA)
Screenshot of some of the commands run
E: If I wanted to find which product contained the max amount of sodium, I would use the next command
Command = which.max(USDA$Sodium)
The article I read on (http://www.revolutionanalytics.com/what-r) emphases that R is known for creating “beautiful and unique data visualization”, to put that to the test, I went on to run some plotting commands
F: I wanted to look at the protein & total fat content
Command = plot(USDA$Protein, USDA$TotalFat)
G: I used the next command to compare the protein to fat content & added in some colour to make the graph more appealing to the eye
Command = plot(USDA$Protein, USDA$TotalFat, xlab=”Protein”, ylab = “Fat”, main = “Protein vs Fat”, col = “red”)
H: The boxplot command was used next to show the sugar content
Command: boxplot(USDA$Sugar, ylab = “Sugar (g)”, main = “Boxplot of Sugar”)
I: The final command used was to find out how many products have a higher than average fat & sodium content?
Command = table(USDA$HighSodium, USDA$HighFat)
On completion of the above tasks using R Studio, I surmise that R is a great tool, for data visualisation, accuracy & speed in terms of getting the results you want faster & more efficiently.