Try R CA 2 Data Management & Analytics

 Introduction

According to (http://www.revolcutionanalytics.com/what-r) R is hot!!

This assignment, aims to back up this statement, by examining the reasons why the programming tool is so appealing. By providing some information on R & it’s functions.

During the last decade, the momentum coming from both academia and industry has lifted the R programming language to become the single most important tool for computational statistics, visualization and data science.

Code School

To begin, we were asked to complete the R Programming course on tryr.codeschool.com; this course provided an excellent introduction to the world of R & some of its commands & functionality.

Code school - R completion

Case Study

In order to get an understanding of R studio, I decided to analyze a data file, containing data on nutritional food content & used R to run some basic functions.

Process steps

  1. I created a folder & saved it to my desktop, I renamed the folder R
  2. Within the folder I included a .csv file, which included a breakdown of food components ( file sourced from Moodle)
  3. I saved the .csv file to the R folder & loaded this as my workspace into R Studio
  4. I opened R, & set my working directory to the R folder I created on my desktop
  5. The first command was asking R to read the file from the Working Directory

Reading the data

A: Get the current directory

Command = getwd()

B: Read the csv file

Command = read.csv(“USDA.csv”)

C: Structure of data

Command = str(USDA)

 Results

Image 1

D: Summarise dataset

Command = summary(USDA)

Screenshot of some of the commands run

Image 2

E: If I wanted to find which product contained the max amount of sodium, I would use the next command

Command = which.max(USDA$Sodium)

R Graphics

Plots

The article I read on (http://www.revolutionanalytics.com/what-r) emphases that R is known for creating “beautiful and unique data visualization”, to put that to the test, I went on to run some plotting commands

F: I wanted to look at the protein & total fat content

Command = plot(USDA$Protein, USDA$TotalFat)

image 3

G: I used the next command to compare the protein to fat content & added in some colour to make the graph more appealing to the eye

Command = plot(USDA$Protein, USDA$TotalFat, xlab=”Protein”, ylab = “Fat”, main = “Protein vs Fat”, col = “red”)

Image 4

 

Boxplots

H: The boxplot command was used next to show the sugar content

Command: boxplot(USDA$Sugar, ylab = “Sugar (g)”, main = “Boxplot of Sugar”)

I: The final command used was to find out how many products have a higher than average fat & sodium content?

Command = table(USDA$HighSodium, USDA$HighFat)

Image 5

Conclusion

On completion of the above tasks using R Studio, I surmise that R is a great tool, for data visualisation, accuracy & speed in terms of getting the results you want faster & more efficiently.

 

Leave a Reply

Your email address will not be published. Required fields are marked *