Basic Tutorials For R To Start Analyzing Data

Basic Tutorials For R To Start Analyzing Data

Learning R can open the door to a promising career in different industries. Follow these tutorials for R to learn the basics of this language and what you can do with it.

Tutorials on R

Follow these tutorials on R to help you improve your knowledge of R (programming language)

Introduction To R

What Is R?

R is a programming language specially designed for data science and statistical computing.

Its original creators, Ross Ihaka and Robert Gentleman, were two statisticians at the University of Auckland, New Zealand. They drew inspiration for R from S, a popular statistical language at the time. The result was R, an S implementation with many added features and improvements.

R is a highly extensible language. It supports multiple statistical techniques, linear modeling, non-linear modeling, classification, and time-series analysis, among others.

The official software implementation of R is open-source and free to download and use for everyone. On top of the language itself, this integrated software environment comes with many tools for data calculation, manipulation, 

Where You Can Use R

While R was originally designed by statisticians, it can be, and has been, used by people in other fields.

It is one of the most popular programming languages. Many researchers, scientists, and academics have relied on it to analyze their data and obtain insights from it.

Fintech

Finance, where statistics and money go hand-in-hand, is where you can witness the extensive use of R. It allows banks and other institutions to conduct risk analysis, detect fraud, carry out client assessments, as well as simulate and model loan stress tests, volatility, mortgage, and credit.

Retail

Retailers can use R to tailor their marketing strategies and assess risk. This software suite comes with machine learning tools, which are useful when you need to optimize your sales. For example, R can help you improve cross-selling at checkout, increasing sales and profits.

Manufacturing

Manufacturers always want to improve their products from user feedback – an area R excels. It can help brands monitor consumer sentiment and make proper adjustments in their design and production. R’s tools can also come in handy when they want to make models predicting products and spare parts they will need in the near future.

Government

Weather forecasting agencies often write R programs to forecast the weather and predict disasters. It can also be used to predict drug reactions, conduct clinical trials, and evaluate drugs.

Academia

R is widely used in universities and other academic environments. Many institutions offer R programming courses, often as a companion to their data science curriculums.

R Advantages And Weaknesses

R possesses several strong points that make it the most popular choice for data analytics:

  • Free and open-source: R isn’t owned by any company, and you don’t need to pay any license fees or subscriptions to use and update your R environment. The development of R is led by a non-profit organization that commits to making it available for everyone.
  • Cross-platform: R users don’t have to limit themselves to any particular system to enjoy all of its benefits. You can find software for R on all major operating systems and architectures.
  • Thriving community and ecosystem: many R users also participate in community forums. Some of them even contribute back to the language by publishing packages and libraries they write.

With that said, there are many challenges when learning and using R that you must be aware of:

  • Complicated syntax: R isn’t the easiest language to learn. Beginners may need quite a time to get used to its syntax.
  • Memory-intensive: to work with large data sets, R will need to use a huge amount of memory because that is where it stores all of its objects. You may have to either upgrade your RAM or optimize memory usage to prevent your system from freezing.
  • Security: this isn’t a top priority for R developers. Malicious actors can try to exploit it, especially through the package ecosystem. You will have to trust the authors of the scripts and packages you plan to run.

Install R

You can get the official pre-compiled binaries for your system from R’s website.

Windows

Go to the R homepage and click the link of the official installer. Run it after the download completes and follow the instructions on the screen.

macOS

Go to the download page and select the right .pkg file for your system. Notice the CPU architectures. If you have an Apple silicon computer, select the arm64 installer. Otherwise, pick the Intel 64-bit file.

Linux

Most popular Linux distributions have R and many of its packages in their official software repositories. You can get R directly from them through your system’s package manager besides downloading from the R homepage.

Debian

sudo apt-get install r-base r-base-devel

Ubuntu

sudo apt-get install r-base

Fedora

sudo dnf install R

openSUSE

sudo zypper install R-base R-base-devel

Install RStudio

While the official binaries come with the necessary tools for running R scripts, they only have a command-line interface. This is very basic and not ideal for newcomers. Even experienced users and professionals can benefit from a more user-friendly interface.

There are many third-party Integrated Development Environments (IDEs) with support for R, such as Visual Studio and Eclipse. However, RStudio is still the most popular choice. It has an open-source desktop version that, like R itself, is free to use.

Go to RStudio Desktop download page and get the installer for your system. The installation process is simple and should be done within a few seconds. After that, you can find a launcher for RStudio on your system like any other application.

Variables

Variables are names used as identifiers for values and objects in R. You can use these identifiers to refer to and manipulate those values and objects later.

Keep in mind that R is an interpreting language that uses dynamic typing. It means you don’t have to explicitly declare data types for your variables when you write them in your programs. The interpreter only performs type checking in run time. If there are problems with typing, it won’t be until then you will be notified of them.

Variables in R can have letters, digits, periods, and underscores. However, the first character must be a letter.

These are valid variable names in R:

var_1

site1.domain

a2022

And these names can’t be used as variables:

_designer

2year

.name

You can’t use special characters, such as & or &, in your R variables. There are also keywords that have been reserved and are now allowed to be used for other purposes: if, else, while, for, repeat, function, break, next, etc.

It is recommended to use nouns in variables. Make sure they are meaningful and concise so your code will be easier to read and maintain. For example, site_1 and site_one are better choices when first_site or siteone.

Numbers

There are 3 number types in R: integer, complex, and numeric.

Numeric

When you enter a number into R, it is represented by the numeric type. You can verify this data type with the is.numeric() function. It takes in a value and returns TRUE or FALSE depending on whether that value is of the numeric type.

> is.numeric(2)
[1] TRUE
> is.numeric(2.5)
[1] TRUE

You can also use the class() function to find the data type of those values:

> class(2)
[1] "numeric"
> class(2.5)
[1] "numeric"

Integer

Numeric is the default computational type for integers and decimals in R. You can check this with the is.integer() function. It returns FALSE even when applied to numbers other languages will recognize as integers:

> is.integer(2)
[1] FALSE

However, there is also a dedicated type for integers. To explicitly use it to present the value 2 above, you can use the as.integer() function:

> a = as.integer(2)
> is.integer(a)
[1] TRUE
> class(a)
[1] "integer"

The is.integer() and class() functions confirm that the value stored in a is of the integer type.

Another option is to use the L suffix. Append it to your number when making an assignment, and R will understand you want it to use the integer type to store the value:

> b = 2L
> is.integer(b)
[1] TRUE
> class(b)
[1] "integer"

If you are using RStudio, the L suffix is also how this IDE indicates the integer type of values currently stored in memory.

The as.integer() function still has many advantages over the simpler L suffix. You can use it to coerce non-integer values, such as decimal values, into integers. Doing so with the L suffix would result in errors:

> c = as.integer(3.14)
> c
[1] 3
> class(c)
[1] "integer"
> d = 3.14L
Warning message:
integer literal 3.14L contains decimal; using numeric value 

This conversion is even possible when the argument is a string containing a decimal value:

> d = as.integer("5.6")
> d
[1] 5

Complex

If your analytics call for complex numbers, R supports them as well with the complex data type.

To construct them in R, you will need to add an imaginary part with the letter i:

> z = 3 + 4i
> is.complex(z)
[1] TRUE
> class(z)
[1] "complex"

You can also use the as.complex() function to explicitly declare this type for any number. It will automatically add the imaginary part when you provide only the real part:

> z = as.complex(4)
> z
[1] 4+0i

You will need this coercion when, for example, find the complex square root of negative numbers:

> sqrt(-1)
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced
> sqrt(as.complex(-1))
[1] 0+1i

In the above example, the sqrt() function ends up with an error because it can’t find the real square root of -1, which defaults to the numeric type in R. You can convert it to a complex number with as.complex() and get the result you are looking for.

Strings

R uses the character type to store textual values. You can wrap around any string of characters with single or double quotes to create string literals before assigning them to variables:

> str = "learnshareit"
> class(str)
[1] "character"

You can also confirm this data type with the is.character() function:

> is.character(str)
[1] TRUE

Unlike other languages, the character type in R doesn’t imply just a single character. You can use it to store any text.

The as.character() can be used to convert values of other types to the character type as well. For example, this command converts 10 to a string and stores it to the variable grade. If you don’t use the as.character() function, R will choose the numeric type instead.

> grade = as.character(10)
> class(grade)
[1] "character"

You can use the paste() function to concatenate two character values:

> str1 = "learnshareit"
> str2 = ".com"
> paste(str1, str2)
[1] "learnshareit .com"

Loops

R comes with two different looping capacities with for and while statements.

for

Use this when you want to go through something (a vector of numbers, for example) and perform a task while doing so.

This example illustrates the basics of for loops in R:

aa> for (i in 1:3) {
+ print(i)
+ }
[1] 1
[1] 2
[1] 3

In this case, the index i iterates through the vector 1:3 and prints the current value of i at each step. You can use any variable you want, not just i.

while

You will need a while loop when your program needs to repeat a specific block of code as long as certain conditions are still satisfied.

For example, this piece of code starts i at 0 and increases its value by one at a time until it reaches 5:

> i <- 0
> while (i <= 5) {
+ print(i)
+ i <- i + 1
+ }
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Making Maps With R

R can’t just compute and produce textual data but can also make beautiful and informative maps. There are plenty of libraries you can use for creating maps in R.

First, you will need to install the necessary packages:

install.packages(c("rnaturalearth", "rnaturalearthdata"))

The rnaturalearth and rnaturalearthdata packages provide maps for every country in the world.

You will need to load these modules:

library("rnaturalearth")
library("rnaturalearthdata")

Then you can get data for the whole world or certain places with the ne_countries() function. This is how you can pull data for Asia:

spdf_asia = ne_countries(continent = 'asia')

Then use the plot() function to create a simple map out of that data:

plot(spdf_asia)

Summary

Our tutorials for R should help you get the hang of the most fundamental concepts of this programming language. They can help you install R and get familiar with how basic data types work. You should follow them up with more tutorials on this site to write your first-ever R program.

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *