DATA 531: Lab 3 - Programming with R
Contents
DATA 531: Lab 3 - Programming with R#
Name: FirstName LastName
Date: September 15, 2020
Obejctives:#
Familiarity writing functions in R
Raw text formatting and Data Cleaning in R
Reading and analysing data from csv files in R
Question #1 - Creating and Calling Functions (15 marks)#
You must have solved this question by now in Python. Today you will implement the same problem in R. In this question you will practice how to create and call functions in R as well as use docstring to add documentation to the function so that it’s readable by everyone.
Problem statement:#
Create a Python program that creates and calls a function for calculating the average of a list of values. Details:
Create function with name
avglist
that accepts three parameters: (1 mark)lst - list of numbers
low - miminum value (not inclusive). Default = 0
high - maximum value (not inclusive). Default = 100
Function will calculate and return the average of all values in the range (low, high). (3 marks)
Function should have a docstring as shown in output. Hint: use
docstring
package in R (3 mark)Call
docstring(avglist)
to display function information. (1 marks)Generate a sequence of numbers from 1 to 10 and call avglist() with default range and print result. (2 mark)
Generate a sequence of numbers from 1 to 100 and call avglist() with range (20, 80) and print result. (2 mark)
Generate a sequence of 100 random numbers between 1 and 100 and call avglist() with range (30, 100) and print the result. Hint: Use sample function to generate the random integers. (3 marks)
## your code here
Question #2 - Data Cleaning in R (15 marks)#
Create an R program that cleans the data in string format. Data set in given below:
data = """5:Joe:35000:1970-08-09
4:Steve:49999:1955-01-02
1:Leah:154000:1999-06-12
3:Sheyanne:255555:1987-05-14
2:Matt:24000:1972-11-03
7:Kyla:1000000:1950-02-01
8:Dave:15000:2000-09-05
"""
Use
strsplit()
to separate data into rows (one per line). (1 mark)Use a for loop to process each line: (1 mark)
Use
strsplit()
once again to divide data into four fields (id, name, salary, birthdate). Output the fields. (3 marks)Calculate the age using the birthdate and the system date using the function difftime. Print \(age\) along with \(id, name, salary, birthdate\). (3 marks)
Calculate and print the total number of people, average salary, highest salary, and youngest employee. (4 marks)
Use a for loop to process the data set again: (1 mark)
Increase the salary by 20% for any employee whose salary < 40000 or has a name that is less than 5 characters long. Print out new and previous salary. (2 marks)
## your code here
Question #3 - Data Analysis in R (15 marks)#
Perform data analysis using R with data in a CSV file. Details:
Use sensor.csv file. Read the data set into a data frame. Note: You will want to add the parameter stringsAsFactors=FALSE for read.csv. (1 mark)
Display the first 5 rows using head. (1 marks)
Using substr extract the day information into a numeric vector and add to data frame as column called day. (1 mark) Note: Will need as.numeric function.
Using substr extract the time information into a vector and add to data frame as column called time. (1 mark)
Display the last 10 rows of the data frame. (1 marks)
Create a dataset called sensors_clean that only contains those observations where the value is between 0 and 100 inclusive. Use as.numeric to convert the value column to numbers. (1 mark)
Create a list called data_summary that contains the following: (3 marks)
count of valid readings (value within bounds described earlier)
minimum reading
mean reading
range of readings
maximum reading of any sensor at site 2
total number of observations of site 1 sensor 2 (HINT: length() might come in handy)
Create a histogram of sensor_clean data values using
ggplot
. (1 mark)
## your code here