Intro to R

Apr 14

R is confusing. I had to learn R for a class in undergrad and apparently I remember zero of it. Thought this one would be a bit easier but it’s definitely not. In the intro to R course, you learn about variables in R, data types, vectors, matrices, factors, data frames, and lists. This course was different than the rest I’ve gone through. There were no videos to explain something and then hop into hands on work. You just start with the hands on work with directions guiding you.

As a quick overview of this course, it went through these chapters: Intro to basics, vectors, matrices, factors, and data frames. If I had to sum up the intro to R course in one sentence, I would say that it’s one big organizational tool.

You store values that make it easier to pull data. This would be storing values in a vector which is a one dimensional array.

To create a vector, it looks something like this: poker_winnings <- c(140, -50, 20, -120, 240). This would pull the winnings (or losses) of your poker winnings throughout the week.
This can allow you to do simple analysis on this like, total winnings, days of the week you’ve won money, etc.

Then you can store values in a two dimensional arrays which are matrices. This is essentially constructing a table of data.

To create a matrix, you assign more than 1 vector to a singular vector. Then you are able to construct the matrix.
Example: you had 3 vectors named: poker_winnings, roulette_winnings, slots_winnings. You can assign all of those to 1 vector named gambling_winnings. This looks like this: gambling_winnnings <- c(poker_winnings, roulette_winnings, slots_winnings).
You will then construct the matrix with the matrix ( ) function. It would look something like this:
1. gambling_winnings_matrix <- matrix(gambling_winnings, nrow = 3, byrow = TRUE). This will now return a 3 row table with however many inputs you have (winnings/losses in this case)
  1. nrow indicates how many rows the matrix should have
  2. byrow indicates that the matrix is filled by the rows. If byrow = FALSE, the matrix would be filled by the columns.

We then took a detour to factors. A factor refers to a statistical data type used to store categorical variables. A categorical variable has a limit to the number of categories data can belong to. Example: sex = “Male” or “Female”. There’s a limit to the type of data. This will allow you to better understand your data. You can also tell R that certain types of data is better than others, for a lack of a better term. Example: You can label data analysts as slow, medium, fast. This will allow you to compare analysts and how they are at their work. Again, better understanding of your data.

Then we make our way to data frames which are two dimensional objects. This is essentially a table that contains multiple types of data. Example: a survey brings back yes/no values, numeric values, character (text) values, etc.

The example in this chapter was data on planets. This included vectors like name (name of planets), type (terrestrial planet/gas planet), diameter, rotation, rings (TRUE/FALES values)
This allowed a build of a planets data frame using the data.frame( ) function: planets_df <- data.frame(name, type, diameter, rotations, rings). This returns a 5 column data frame with all the data from the specific vectors.

Last but not least…lists. A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way. Printing out a list will output everything you stored in said list.

All in all, not too bad. A bit confusing. Some overlap between python and R which also got a bit confusing but overall pretty good. Onto intermediate R next…

-TJ

TJ Maloney

Intro to R

Data Preparation in Excel

Intro to Python Cont…