Side-to-Side Comparison between R, Python, and Julia

Author

Affiliation

Dr Wei Miao

UCL School of Management

Published

September 16, 2024

Modified

September 16, 2024

Tip

This tutorial is designed for those who are familiar with either R, Python or Julia, and would like to learn another language.

In this tutorial, I will compare the basics of R, Python, and Julia side by side. We will cover the basic syntax, data types, and functionalities.

If you discover any mistakes or outdated content in this tutorial, please let me know. I will be very grateful for your feedback.

library(reticulate)
use_condaenv("base")
library(JuliaCall)

1 Language Basics

1.1 Assignment of variables

Caution

In R and Python, assignment operations do not print the assigned object by default.

But Julia does print the assigned object by default. Unless you put a semicolon ; at the end of the line, Julia will not print the assigned object.

# create an object x with value 3
x <- 3
x

[1] 3

# create an object x with value 3
x = 3
x

# create an object x with value 3
x = 3; # the ; suppresses the output

1.2 Comment codes

You can put a # before any code, to indicate that any codes after the # on the same line are your comments, and will not be run by R.

It’s a good practice to often comment your codes, so that you can help the future you to remember what you were trying to achieve.

# Is x 1 or 2 below?
x <- 1 # +1

Same as R. You can put a # before any code, to indicate that any codes after the # on the same line are your comments, and will not be run by Python.

# Is x 1 or 2 below?
x = 1 # +1

Same as R and Python. You can put a # before any code, to indicate that any codes after the # on the same line are your comments, and will not be run by Julia.

# Is x 1 or 2 below?

x = 1 # +1

1.3 Rules for naming object

For a variable to be valid, it should follow these rules

It should contain letters, numbers, and only dot or underscore characters.
It cannot start with a number (eg: 2iota), or a dot, or an underscore.

# 2iota <- 2
# .iota <- 2
# _iota <- 2

It should not be a reserved word in R (eg: mean, sum, etc.).

# mean <- 2

For a variable to be valid, it should follow these rules

It should contain letters, numbers, and only underscore characters.
It cannot start with a number (eg: 2iota), or a dot, or an underscore.


# 2iota = 2

# .iota = 2

# _iota = 2

It should not be a reserved word in Python (eg: mean, sum, etc.).


# mean = 2

Same as R.

2 Packages and Functions

The base R already comes with many useful built-in functions to perform basic tasks, but as data scientists, we need more.

To perform certain tasks (such as a machine learning model), we can definitely write our own code from scratch, but it takes lots of (unnecessary) effort. Fortunately, many packages have been written by others for us to directly use.

To download a package, hit Tools -> Install Packages in RStudio, and type the package name in the pop-up window. Now, download the package dplyr.
To load the packages, we need to type library().

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Now that the package is loaded, you can use the functions in it. filter() is a function in the dplyr package that can be used to filter data.

data(iris)  # load built in iris
iris %>%
  filter(Species == "setosa")

Python has a similar concept of packages, but they are called modules.

To install a module, you can use pip install in the terminal, or !pip install in Jupyter Notebook. You can also install a module in the Anaconda Navigator.

# !pip install pandas

To load a module, you can use import. Now that the module is loaded, you can use the functions in it.

import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') # load iris

iris[iris['species'] == 'setosa']

    sepal_length  sepal_width  petal_length  petal_width species
0            5.1          3.5           1.4          0.2  setosa
1            4.9          3.0           1.4          0.2  setosa
2            4.7          3.2           1.3          0.2  setosa
3            4.6          3.1           1.5          0.2  setosa
4            5.0          3.6           1.4          0.2  setosa
5            5.4          3.9           1.7          0.4  setosa
6            4.6          3.4           1.4          0.3  setosa
7            5.0          3.4           1.5          0.2  setosa
8            4.4          2.9           1.4          0.2  setosa
9            4.9          3.1           1.5          0.1  setosa
10           5.4          3.7           1.5          0.2  setosa
11           4.8          3.4           1.6          0.2  setosa
12           4.8          3.0           1.4          0.1  setosa
13           4.3          3.0           1.1          0.1  setosa
14           5.8          4.0           1.2          0.2  setosa
15           5.7          4.4           1.5          0.4  setosa
16           5.4          3.9           1.3          0.4  setosa
17           5.1          3.5           1.4          0.3  setosa
18           5.7          3.8           1.7          0.3  setosa
19           5.1          3.8           1.5          0.3  setosa
20           5.4          3.4           1.7          0.2  setosa
21           5.1          3.7           1.5          0.4  setosa
22           4.6          3.6           1.0          0.2  setosa
23           5.1          3.3           1.7          0.5  setosa
24           4.8          3.4           1.9          0.2  setosa
25           5.0          3.0           1.6          0.2  setosa
26           5.0          3.4           1.6          0.4  setosa
27           5.2          3.5           1.5          0.2  setosa
28           5.2          3.4           1.4          0.2  setosa
29           4.7          3.2           1.6          0.2  setosa
30           4.8          3.1           1.6          0.2  setosa
31           5.4          3.4           1.5          0.4  setosa
32           5.2          4.1           1.5          0.1  setosa
33           5.5          4.2           1.4          0.2  setosa
34           4.9          3.1           1.5          0.2  setosa
35           5.0          3.2           1.2          0.2  setosa
36           5.5          3.5           1.3          0.2  setosa
37           4.9          3.6           1.4          0.1  setosa
38           4.4          3.0           1.3          0.2  setosa
39           5.1          3.4           1.5          0.2  setosa
40           5.0          3.5           1.3          0.3  setosa
41           4.5          2.3           1.3          0.3  setosa
42           4.4          3.2           1.3          0.2  setosa
43           5.0          3.5           1.6          0.6  setosa
44           5.1          3.8           1.9          0.4  setosa
45           4.8          3.0           1.4          0.3  setosa
46           5.1          3.8           1.6          0.2  setosa
47           4.6          3.2           1.4          0.2  setosa
48           5.3          3.7           1.5          0.2  setosa
49           5.0          3.3           1.4          0.2  setosa

Julia has a similar concept of packages.

To install a package, you can use Pkg.add() in the Julia terminal.


using Pkg

Pkg.add("DataFrames")
Pkg.add("CSV")

To load a package, you can use using. Now that the package is loaded, you can use the functions in it.


using DataFrames, CSV

iris = CSV.File(download("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")) |> DataFrame;

# Filter the DataFrame where species is "setosa"
setosa_data = iris[iris.species .== "setosa", :];

# Display the first few rows of the filtered data
first(setosa_data, 5)

5×5 DataFrame
 Row │ sepal_length  sepal_width  petal_length  petal_width  species
     │ Float64       Float64      Float64       Float64      String15
─────┼────────────────────────────────────────────────────────────────
   1 │          5.1          3.5           1.4          0.2  setosa
   2 │          4.9          3.0           1.4          0.2  setosa
   3 │          4.7          3.2           1.3          0.2  setosa
   4 │          4.6          3.1           1.5          0.2  setosa
   5 │          5.0          3.6           1.4          0.2  setosa

3 Arithmetic, Logical, and Relational Operations

3.1 Arithmetic operations

# arithmatic operations
x <- 3 
x + 1 # addition

[1] 4

x - 1 # subtraction

[1] 2

x * 2 # multiplication

[1] 6

x / 2 # division

[1] 1.5

x^2 # square

[1] 9

x %% 2 # remainder

[1] 1

x %/% 2 # integer division

[1] 1

# math operations
log(x)  # natural logarithm

[1] 1.098612

exp(x)  # exponential

[1] 20.08554

sqrt(x) # square root

[1] 1.732051

log10(x) # log base 10

[1] 0.4771213

round(x/2) # round

[1] 2

floor(x/2) # floor

[1] 1

ceiling(x/2) # ceiling

[1] 2

# arithmatic operations
x = 3
x + 1 # addition

x - 1 # subtraction

x * 2 # multiplication

x / 2 # division

1.5

x ** 2 # square

x % 2 # remainder

x // 2 # integer division

# math operations
import math
math.log(x)  # natural logarithm

1.0986122886681098

math.exp(x)  # exponential

20.085536923187668

math.sqrt(x) # square root

1.7320508075688772

math.log10(x) # log base 10

0.47712125471966244

round(x/2) # round

math.floor(x/2) # floor

math.ceil(x/2) # ceiling


# arithmatic operations

x = 3


x + 1 # addition


x - 1 # subtraction


x * 2 # multiplication


x / 2 # division

1.5


x ^ 2 # square


x % 2 # remainder


div(x, 2) # integer division


# math operations

log(x)  # natural logarithm

1.0986122886681098


exp(x)  # exponential

20.085536923187668


sqrt(x) # square root

1.7320508075688772


log10(x) # log base 10

0.47712125471966244


round(x/2) # round

2.0


floor(x/2) # floor

1.0


ceil(x/2) # ceiling

2.0

3.2 Logical operations

# logical operations
x <- 3
x > 2 # larger than

[1] TRUE

x < 2 # smaller than

[1] FALSE

x == 2 # equal to

[1] FALSE

x != 2 # not equal to

[1] TRUE

# logical operations
x = 3
x > 2 # larger than

True

x < 2 # smaller than

False

x == 2 # equal to

False

x != 2 # not equal to

True


# logical operations

x = 3


x > 2 # larger than

true


x < 2 # smaller than

false


x == 2 # equal to

false


x != 2 # not equal to

true

3.3 Relational operations

Caution

R: Boolean values are TRUE and FALSE.
Python: Boolean values are True and False (case-sensitive).

T & F # and

[1] FALSE

T | F # or

[1] TRUE

!T # not

[1] FALSE

True & False # and

False

True | False # or

True

not True # not

False


true & false # and

false


true | false # or

true


!true # not

false

4 Vectors

4.1 Creating vectors

In R, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.
Vector can be created using the function c() by listing all the values in the parenthesis, separated by comma ‘,’.
c() stands for “combine”.

Income <- c(1, 3, 5, 10)
Income

[1]  1  3  5 10

Vectors must contain elements of the same data type. If not, it will automatically convert elements into the same type (usually character type).

Income <- c(1, 3, 5, "10")
Income

[1] "1"  "3"  "5"  "10"

In Python, a list is a collection of elements of different data types, which is often used to store a variable of a dataset. For instance, a list can store the income of a group of people, the final grades of students, etc.
List can be created using the square brackets [] by listing all the values in the brackets, separated by comma ‘,’.

Income = [1, 3, 5, 10]
Income

[1, 3, 5, 10]

List can contain elements of different data types.

Income = [1, 3, 5, "10"]
Income

[1, 3, 5, '10']

If you want to create a list with elements of the same numeric data type, you can use the numpy package.

import numpy as np
Income = np.array([1, 3, 5, 10])
Income

array([ 1,  3,  5, 10])

In Julia, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.
Vector can be created using the square brackets [] by listing all the values in the brackets, separated by comma ‘,’.


Income = [1, 3, 5, 10]

4-element Vector{Int64}:
  1
  3
  5
 10

Vector can contain elements of different data types. However, you will note that the data type is now changed to any rather than Int64.


Income = [1, 3, 5, "10"]

4-element Vector{Any}:
 1
 3
 5
  "10"

4.2 Indexing and subsetting

Caution

R, Python, and Julia have different indexing rules.

In R and Julia, the index starts from 1.
In Python, the index starts from 0.

To extract an element from a vector, we put the index of the element in a square bracket [ ].

Income <- c(1, 3, 5, 10)
Income[1] # extract the first element

[1] 1

If we want to extract multiple elements, we can use a vector of indices.

Income[c(1,3)] # extract the first and third elements

[1] 1 5

To extract an element from a list, we put the index of the element in a square bracket [ ].

Income = [1, 3, 5, 10]
Income[0] # extract the first element

If we want to extract multiple elements, we can use a slice.

Income[0:3] # extract the first and third elements

[1, 3, 5]

With numpy array, we can use the same syntax as R.

Income = np.array([1, 3, 5, 10])
Income[0] # extract the first element

Income[[0,2]] # extract the first and third elements

array([1, 5])

To extract an element from a vector, we put the index of the element in a square bracket [ ].


Income = [1, 3, 5, 10];

Income[1] # extract the first element

If we want to extract multiple elements, we can use a slice.


Income[1:3] # extract the first and third elements

3-element Vector{Int64}:
 1
 3
 5

4.3 Creating numeric sequences with fixed steps

It is also possible to easily create sequences with patterns

use seq() to create sequence with fixed steps

# use seq()
seq(from = 1, to = 2, by = 0.1)

 [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

If the step is 1, there’s a convenient way using :

1:5

[1] 1 2 3 4 5

In base Python, we can use range() to create sequence with fixed steps

# from 1 to 6, with step 1
list(range(1, 6)) # range() returns a range object, we need to convert it to a list

[1, 2, 3, 4, 5]

use np.arange() to create sequence with fixed steps

np.arange(1, 2, 0.1)

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])

In Julia, we can use 1:5 to create sequence with fixed steps

1:5

1:5

However, the julia object is not a integer vector, but a UnitRange{Int64} object.


typeof(1:5)

UnitRange{Int64}

4.4 Combine multiple vectors into one: c()

Sometimes, we may want to combine multiple vectors into one. For instance, we may have collected income data from two different sources, and we want to combine them into one vector.
We can use c() to combine different vectors; this is very commonly used to concatenate vectors.

Income1 <- 1:3 
Income2 <- c(10, 15)

c(Income1,Income2)

[1]  1  2  3 10 15

In Python, we can use the + operator to concatenate lists.

Income1 = [1, 2, 3]
Income2 = [10, 15]

Income1 + Income2

[1, 2, 3, 10, 15]

For numpy arrays, we can use np.concatenate() to concatenate arrays.

Income1 = np.array([1, 2, 3])
Income2 = np.array([10, 15])

np.concatenate((Income1, Income2))

array([ 1,  2,  3, 10, 15])

In Julia, we can use the vcat() function to concatenate vectors.


Income1 = [1, 2, 3];

Income2 = [10, 15]; 

vcat(Income1, Income2)

5-element Vector{Int64}:
  1
  2
  3
 10
 15

4.5 Replicating elements

We can use the rep() function to replicate elements in a vector.

rep(1:3, times = 2) # replicate 1:3 twice

[1] 1 2 3 1 2 3

rep(1:3, each = 2) # replicate each element in 1:3 twice

[1] 1 1 2 2 3 3

We can use the * operator to replicate elements in a list.

[1, 2, 3] * 2 # replicate 1:3 twice

[1, 2, 3, 1, 2, 3]

For numpy arrays, we can use np.tile() to replicate elements.

np.tile([1, 2, 3], 2) # replicate 1:3 twice

array([1, 2, 3, 1, 2, 3])

np.repeat([1, 2, 3], 2) # replicate each element in 1:3 twice

array([1, 1, 2, 2, 3, 3])

We can use the repeat() function to replicate elements in a vector.


repeat([1, 2, 3], 2) # replicate 1:3 twice

6-element Vector{Int64}:
 1
 2
 3
 1
 2
 3


repeat([1, 2, 3], inner = 2) # replicate each element in 1:3 twice

6-element Vector{Int64}:
 1
 1
 2
 2
 3
 3

4.6 Maximum and minimum

We can use the max() and min() functions to find the maximum and minimum values in a vector.

Income <- c(1, 3, 5, 10)

max(Income) # maximum

[1] 10

min(Income) # minimum

[1] 1

We can use the max() and min() functions to find the maximum and minimum values in a list.

Income = [1, 3, 5, 10]

max(Income) # maximum

min(Income) # minimum

For numpy arrays, we can use np.max() and np.min() to find the maximum and minimum values.

Income = np.array([1, 3, 5, 10])

np.max(Income) # maximum

np.min(Income) # minimum

We can use the maximum() and minimum() functions to find the maximum and minimum values in a vector.


Income = [1, 3, 5, 10];

maximum(Income) # maximum


minimum(Income) # minimum

4.7 Sum and mean

We can use the sum() and mean() functions to find the sum and mean values in a vector.

Income <- c(1, 3, 5, 10)

sum(Income, na.rm = T) # sum and remove missing values

[1] 19

mean(Income, na.rm = T) # mean and remove missing values

[1] 4.75

We can use the sum() and mean() functions to find the sum and mean values in a list.

Income = [1, 3, 5, 10]

sum(Income) # sum

np.mean(Income) # mean

4.75

For numpy arrays, we can use np.sum() and np.mean() to find the sum and mean values.

Income = np.array([1, 3, 5, 10])

np.sum(Income) # sum

np.mean(Income) # mean

4.75

We can use the sum() and mean() functions to find the sum and mean values in a vector.


Income = [1, 3, 5, 10];

sum(Income) # sum


mean(Income) # mean

4.75

4.8 Missing values

Caution

In R, missing values are represented by NA.
In Python, missing values are represented by np.nan.
In Julia, missing values are represented by missing.

In R, missing values are represented by NA.

Income <- c(1, 3, 5, NA)

sum(Income, na.rm = T) # sum and remove missing values

[1] 9

mean(Income, na.rm = T) # mean and remove missing values

[1] 3

In Python, missing values are represented by np.nan.

Income = [1, 3, 5, np.nan]

np.nansum(Income) # sum and remove missing values

9.0

np.nanmean(Income) # mean and remove missing values

3.0

In Julia, missing values are represented by missing. In order to take the sum or mean by removing missing values,


Income = [1, 3, 5, missing];

sum(skipmissing(Income)) # sum and remove missing values

4.9 Element-wise arithmetic operations

Caution

R by default supports element-wise operations on vectors.
Python by default does not support element-wise operations on lists. You need to use numpy arrays to do element-wise operations.
Julia by default does not support element-wise operations on arrays. You need to use the . operator to do element-wise operations.

If you operate on a vector with a single number, the operation will be applied to all elements in the vector

Income <- c(1, 3, 5, 10)

Income + 2 # element-wise addition

[1]  3  5  7 12

Income * 2 # element-wise multiplication

[1]  2  6 10 20

However, the base Python does not support element-wise operations on lists.

Income = [1, 3, 5, 10]

Income + 2 # element-wise addition

TypeError: can only concatenate list (not "int") to list

Income * 2 # element-wise multiplication

[1, 3, 5, 10, 1, 3, 5, 10]

For numpy arrays, the behavior is the same as R.

Income = np.array([1, 3, 5, 10])

Income + 2 # element-wise addition

array([ 3,  5,  7, 12])

Income * 2 # element-wise multiplication

array([ 2,  6, 10, 20])

If you operate on a vector with a single number, the operation will be applied to all elements in the vector. However, the base Julia does not support element-wise operations on arrays. In order to do element-wise operations, you need to use the . operator.


Income = [1, 3, 5, 10];

Income .+ 2 # element-wise addition

4-element Vector{Int64}:
  3
  5
  7
 12


Income .* 2 # element-wise multiplication

4-element Vector{Int64}:
  2
  6
 10
 20

4.10 Vector multiplication

If the two vectors are of the same length, they can do element-wise operations, including element-wise addition and element-wise multiplication

Income1 <- c(1, 3, 5, 10)

Income2 <- c(2, 4, 6, 8)

Income1 + Income2 # element-wise addition

[1]  3  7 11 18

Income1 * Income2 # element-wise multiplication

[1]  2 12 30 80

For numpy arrays, we can use np.multiply() to do element-wise multiplication.

Income1 = np.array([1, 3, 5, 10])

Income2 = np.array([2, 4, 6, 8])

np.add(Income1, Income2) # element-wise addition

array([ 3,  7, 11, 18])

np.multiply(Income1, Income2) # element-wise multiplication

array([ 2, 12, 30, 80])

If the two vectors are of the same length, they can do element-wise operations, including element-wise addition and element-wise multiplication


Income1 = [1, 3, 5, 10];

Income2 = [2, 4, 6, 8];

Income1 .+ Income2 # element-wise addition

4-element Vector{Int64}:
  3
  7
 11
 18


Income1 .* Income2 # element-wise multiplication

4-element Vector{Int64}:
  2
 12
 30
 80

4.11 Max and min of 2 vectors

We can use the pmax() and pmin() functions to find the element-wise maximum and minimum values of two vectors.

Income1 <- c(1, 3, 5, 10)

Income2 <- c(2, 4, 6, 8)

pmax(Income1, Income2) # element-wise maximum

[1]  2  4  6 10

pmin(Income1, Income2) # element-wise minimum

[1] 1 3 5 8

We can use the np.maximum() and np.minimum() functions to find the element-wise maximum and minimum values of two numpy arrays.

Income1 = np.array([1, 3, 5, 10])

Income2 = np.array([2, 4, 6, 8])

np.maximum(Income1, Income2) # element-wise maximum

array([ 2,  4,  6, 10])

np.minimum(Income1, Income2) # element-wise minimum

array([1, 3, 5, 8])

We can use the max() and min() functions to find the element-wise maximum and minimum values of two vectors.


Income1 = [1, 3, 5, 10];

Income2 = [2, 4, 6, 8];

max.(Income1, Income2) # element-wise maximum

4-element Vector{Int64}:
  2
  4
  6
 10


min.(Income1, Income2) # element-wise minimum

4-element Vector{Int64}:
 1
 3
 5
 8

5 Character and String

5.1 Creating strings

Characters are enclosed within a pair of quotation marks.
Single or double quotation marks can both work.
If even a character may contain numbers, it will be treated as a character, and R will not perform any mathematical operations on it.

str1 <- "1 + 1 = 2"

Strings are enclosed within a pair of quotation marks.
Single or double quotation marks can both work.

str1 = "1 + 1 = 2"

In Julia, single quotation marks (') are used for defining individual characters. Double quotation marks (") are used for defining strings.


character1 = '1'

'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)

str1 = "1 + 1 = 2"

"1 + 1 = 2"

5.2 Concatenating strings

We can use the paste() function to concatenate strings.

str1 <- "1 + 1 = "
str2 <- "2"

paste(str1, str2)

[1] "1 + 1 =  2"

We can use the + operator to concatenate strings.

str1 = "1 + 1 = "
str2 = "2"

str1 + str2

'1 + 1 = 2'

We can use the * operator to concatenate strings.


str1 = "1 + 1 = "

"1 + 1 = "


str2 = "2"

"2"


str1 * str2

"1 + 1 = 2"

5.3 Checking the number of elements in a vector: length()

You can measure the length of a vector using the command length()

x <- c('R',' is', ' the', ' best', ' language')
length(x)

[1] 5

You can measure the length of a list using the command len()

x = ['R',' is', ' the', ' best', ' language']

len(x)

For numpy arrays, you can use the shape attribute to get the shape of the array.

x = np.array(['Python',' is', ' the', ' best', ' language'])

x.shape

(5,)

You can measure the length of a vector using the command length()


x = ["Julia", " is", " the", " best", " language"]

5-element Vector{String}:
 "Julia"
 " is"
 " the"
 " best"
 " language"


length(x)

5.4 Special relational operation: `%in%`

A special relational operation is %in% in R, which tests whether an element exists in the object.

x <- c(1,3,8,7) 

3 %in% x

[1] TRUE

2 %in% x

[1] FALSE

In Python, we can use the in operator to test whether an element exists in the object.

x = [1, 3, 8, 7]

3 in x

True

2 in x

False

In Julia, we can use the in operator to test whether an element exists in the object.


x = [1, 3, 8, 7];

3 in x

true

6 Matrices

6.1 Matrices: creating matrices

Caution

When creating R matrix using matrix(), the sequence of elements is filled by column. This by-column is named as column-major order.

When creating Python matrix using np.array(), the sequence of elements is filled by row. This by-row is named as row-major order.

A matrix can be created using the command matrix()
- the first argument is the vector to be converted into matrix
- the second argument is the number of rows
- the last argument is the number of cols

matrix(1:9, nrow = 3, ncol = 3)

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

A matrix can be created using the numpy package, np.array() function, where the argument is a list of lists, where each list is a row of the matrix

import numpy as np

np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

A matrix can be created using the base Julia using square brackets [] and semicolon ; to separate rows.


[1 2 3; 4 5 6; 7 8 9]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9

6.2 Creating matrices: combine matrices

We can use cbind() and rbind() to concatenate vectors and matrices into new matrices.

cbind() does the column binding

a <- matrix(1:6, nrow = 2, ncol = 3)

a

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

cbind(a, a) # column bind

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    3    5    1    3    5
[2,]    2    4    6    2    4    6

rbind() does the row binding

rbind(a, a) # row bind

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
[3,]    1    3    5
[4,]    2    4    6

We can use np.concatenate() to concatenate arrays.

a = np.array([[1, 2, 3], [4, 5, 6]])

a

array([[1, 2, 3],
       [4, 5, 6]])

np.concatenate((a, a), axis = 1) # column bind

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

np.concatenate((a, a), axis = 0) # row bind

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

We can use the hcat() and vcat() functions to concatenate matrices.


a = [1 2 3; 4 5 6]

2×3 Matrix{Int64}:
 1  2  3
 4  5  6


hcat(a, a) # column bind

2×6 Matrix{Int64}:
 1  2  3  1  2  3
 4  5  6  4  5  6


vcat(a, a) # row bind

4×3 Matrix{Int64}:
 1  2  3
 4  5  6
 1  2  3
 4  5  6

6.3 Matrices: indexing and subsetting

Matrices have two dimensions: rows and columns. Therefore, to extract elements from a matrix, we just need to specify which row(s) and which column(s) we want.

x <- matrix(1:9, nrow = 3, ncol = 3)
x

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Extract the element in the 2nd row, 3rd column.
- use square bracket with a coma inside [ , ] to indicate subsetting; the argument before coma is the row index, and the argument after the coma is the column index.
  - 2 is specified for row index, so we will extract elements from the first row
  - 3 is specified for column index, so we will extract elements from the the second column
  - Altogether, we extract a single element in row 2, column 3.

x[2,3] # the element in the 2nd row, 3rd column

[1] 8

If we leave blank for a dimension, we extract all elements along that dimension.
- if we want to take out the entire first row
  - 1 is specified for the row index
  - column index is blank

x[1,] # all elements in the first row

[1] 1 4 7

x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Extract the element in the 2nd row, 3rd column.

x[1,2] # the element in the 2nd row, 3rd column

If we leave blank for a dimension, we extract all elements along that dimension.

x[0,:] # all elements in the first row

array([1, 2, 3])


x = [1 2 3; 4 5 6; 7 8 9];

Extract the element in the 2nd row, 3rd column.


x[2,3] # the element in the 2nd row, 3rd column

Different from R, we need to use : to extract all elements along that dimension.


x[1,:] # all elements in the first row

3-element Vector{Int64}:
 1
 2
 3

6.4 Matrices: check dimensions and variable types

You can verify the size of the matrix using the command dim(); or nrow() and ncol()

x <- matrix(1:9, nrow = 3, ncol = 3)

dim(x)

[1] 3 3

nrow(x)

[1] 3

ncol(x)

[1] 3

You can get the data type info using the command str()

str(x)

 int [1:3, 1:3] 1 2 3 4 5 6 7 8 9

You can verify the size of the matrix using the shape attribute

x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x.shape

(3, 3)

You can get the data type info using the dtype attribute

x.dtype

dtype('int64')

You can verify the size of the matrix using the size() function


x = [1 2 3; 4 5 6; 7 8 9]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9


size(x)

(3, 3)

6.5 Matrices: special operations

6.5.1 Creating a diagonal matrix

We can use the diag() function to create a diagonal matrix.

diag(1:3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    2    0
[3,]    0    0    3

We can use the np.diag() function to create a diagonal matrix.

np.diag([1, 2, 3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

We can use the diagm() function to create a diagonal matrix.

using LinearAlgebra
diagm(0 => [1, 2, 3])

3×3 Matrix{Int64}:
 1  0  0
 0  2  0
 0  0  3

6.5.2 Creating an identity matrix

We can use the diag() function to create an identity matrix.

diag(3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

We can use the np.eye() function to create an identity matrix.

np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

We can use the I() function to create an identity matrix.


I(3)

3×3 Diagonal{Bool, Vector{Bool}}:
 1  ⋅  ⋅
 ⋅  1  ⋅
 ⋅  ⋅  1

6.6 Matrices’ operations: matrix addition and multiplication

If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication

set.seed(123)

x = matrix(rnorm(9), nrow = 3, ncol = 3)

z = matrix(rnorm(9), nrow = 3, ncol = 3)

x + z   # elementwise addition

           [,1]      [,2]       [,3]
[1,] -1.0061376 0.4712798  2.2478293
[2,]  0.9939043 0.2399705 -0.7672108
[3,]  1.9185221 1.1592239 -2.6534700

x * x

           [,1]        [,2]      [,3]
[1,] 0.31413295 0.004971433 0.2124437
[2,] 0.05298168 0.016715318 1.6003799
[3,] 2.42957161 2.941447909 0.4717668

If we want to perform the matrix multiplication as in linear algebra, we need to use %*%
- x and y must have conforming dimensions

           [,1]       [,2]       [,3]
[1,] -0.5604756 0.07050839  0.4609162
[2,] -0.2301775 0.12928774 -1.2650612
[3,]  1.5587083 1.71506499 -0.6868529

y = matrix(rnorm(9), nrow = 3, ncol = 3)
x %*% y # matrix multiplication

           [,1]       [,2]       [,3]
[1,] -0.9186059 -0.2861301  0.6175429
[2,]  1.1282999  0.8396152 -1.1340507
[3,]  1.0157790 -1.5987826 -4.4424790

If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication

x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

y = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x + y # elementwise addition

array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14, 16, 18]])

x * y # elementwise multiplication

array([[ 1,  4,  9],
       [16, 25, 36],
       [49, 64, 81]])

If we want to perform the matrix multiplication as in linear algebra, we need to use @
- x and y must have conforming dimensions

x @ y # matrix multiplication

array([[ 30,  36,  42],
       [ 66,  81,  96],
       [102, 126, 150]])

If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication. It’s recommended to use . to indicate element-wise operations


x = [1 2 3; 4 5 6; 7 8 9]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9


y = [1 2 3; 4 5 6; 7 8 9]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9


x .+ y # elementwise addition

3×3 Matrix{Int64}:
  2   4   6
  8  10  12
 14  16  18

6.7 Matrices’ operations: inverse and transpose

We use t() to do matrix transpose

x = matrix(rnorm(9), nrow = 3, ncol = 3)
x

           [,1]       [,2]      [,3]
[1,]  0.1533731  0.4264642 0.8781335
[2,] -1.1381369 -0.2950715 0.8215811
[3,]  1.2538149  0.8951257 0.6886403

t(x) # transpose

          [,1]       [,2]      [,3]
[1,] 0.1533731 -1.1381369 1.2538149
[2,] 0.4264642 -0.2950715 0.8951257
[3,] 0.8781335  0.8215811 0.6886403

We use solve() to get the inverse of an matrix

           [,1]       [,2]      [,3]
[1,]  0.1533731  0.4264642 0.8781335
[2,] -1.1381369 -0.2950715 0.8215811
[3,]  1.2538149  0.8951257 0.6886403

solve(t(x)%*%x) # inverse; must be on a square matrix

          [,1]      [,2]      [,3]
[1,]  417.2893 -803.5341  299.4938
[2,] -803.5341 1548.5735 -577.2074
[3,]  299.4938 -577.2074  215.6665

We use T to do matrix transpose

x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

x

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

x.T # transpose

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

We use np.linalg.inv() to get the inverse of an matrix

np.linalg.inv(x.T @ x) # inverse; must be on a square matrix

array([[ 5.62949953e+14, -1.12589991e+15,  5.62949953e+14],
       [-1.12589991e+15,  2.25179981e+15, -1.12589991e+15],
       [ 5.62949953e+14, -1.12589991e+15,  5.62949953e+14]])

We use transpose() to do matrix transpose


x = [1 2 3; 4 5 6; 7 8 9]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9


transpose(x) # transpose

3×3 transpose(::Matrix{Int64}) with eltype Int64:
 1  4  7
 2  5  8
 3  6  9

We use inv() to get the inverse of an matrix


inv(transpose(x) * x) # inverse; must be on a square matrix

3×3 Matrix{Float64}:
  5.6295e14  -1.1259e15   5.6295e14
 -1.1259e15   2.2518e15  -1.1259e15
  5.6295e14  -1.1259e15   5.6295e14

7 Programming Basics: Flow Control

Indentation Difference

In R, the code block is enclosed by curly braces {}. Indentation is not necessary and does not affect the code execution.
In Python, the code block is defined by indentation. Indentation is necessary and affects the code execution.
In Julia, the code block is defined by the beginning of if or for and end. Indentation does not affect the code execution.

7.1 if/else

Sometimes, you want to run your code based on different conditions. For instance, if the observation is a missing value, then use the population average to impute the missing value. This is where if/else kicks in.

if (condition == TRUE) {
  action 1
} else if (condition == TRUE ){
  action 2
} else {
  action 3
}

Example 1:

a <- 15

if (a > 10) {
larger_than_10 <- TRUE  
} else {
  larger_than_10 <- FALSE
}

larger_than_10

[1] TRUE

Example 2:

x <- -5
if(x > 0){
  print("x is a non-negative number")
} else {
  print("x is a negative number")
}

[1] "x is a negative number"

a = 15

if a > 10:
    larger_than_10 = True
else:
    larger_than_10 = False

larger_than_10

True

Example 2:

x = -5

if x > 0:
    print("x is a non-negative number")
else:
    print("x is a negative number")

x is a negative number


a = 15


if a > 10
    larger_than_10 = true
else
    larger_than_10 = false
end

true


larger_than_10

true

Example 2:


x = -5

-5


if x > 0
    println("x is a non-negative number")
else
    println("x is a negative number")
end

x is a negative number

7.2 Loops

Caution

Both R and Python are very inefficient in terms of loops. Therefore, codes should be written in matrix form to utlize the vectorization as much as possible.

In constrast, Julia is very efficient at loops. Thus code readability should be prioritized instead of vectorization.

As the name suggests, in a loop the program repeats a set of instructions many times, until the stopping criteria is met.

Loop is very useful for repetitive jobs.

for (i in 1:10){ # i is the iterator
  # loop body: gets executed each time
  # the value of i changes with each iteration
}

Example:

for (i in 1:5){
  print(i)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

for i in range(1, 6):
    print(i)


for i in 1:5
    println(i)
end

7.3 User-Defined Functions

A function takes the argument as input, run some specified actions, and then return the result to us.

Functions are very useful. When we would like to test different ideas, we can combine functions with loops: We can write a function which takes different parameters as input, and we can use a loop to go through all the possible combinations of parameters.

Here is how to define a function in general:

function_name <- function(arg1 ,arg2 = default_value){
  # write the actions to be done with arg1 and arg2
  # you can have any number of arguments, with or without defaults
  return() # the last line is to return some value 
}

Example:

magic <- function( x, y){
  return(x^2 + y)
}

magic(1,3)

[1] 4

Here is how to define a function in general:

def function_name(arg1, arg2 = default_value):
    # write the actions to be done with arg1 and arg2
    # you can have any number of arguments, with or without defaults
    return # the last line is to return some value

NameError: name 'default_value' is not defined

Example:

def magic(x, y):
    return x**2 + y

magic(1, 3)

Here is how to define a function in general:


function function_name(arg1, arg2 = default_value)
    # write the actions to be done with arg1 and arg2
    # you can have any number of arguments, with or without defaults
    return # the last line is to return some value
end

function_name (generic function with 2 methods)

Example:


function magic(x, y)
    return x^2 + y
end

magic (generic function with 1 method)


magic(1, 3)

8 A comprehensive exercise

Task: write a function, which takes a vector as input, and returns the max value of the vector

get_max <- function(input){
  max_value <- input[1]
  for (i in 2:length(input) ) {
    if (input[i] > max_value) {
      max <- input[i]
    }
  }
  
  return(max)
}

get_max(c(-1,3,2))

[1] 2

def get_max(input):
    max_value = input[0]
    for i in range(1, len(input)):
        if input[i] > max_value:
            max_value = input[i]
    return max_value

get_max([-1, 3, 2])


function get_max(input)
    max_value = input[1]
    for i in 2:length(input)
        if input[i] > max_value
            max_value = input[i]
        end
    end
    return max_value
end

get_max (generic function with 1 method)


get_max([-1, 3, 2])

9 Conclusion about R and Python

Below are the most easy mistakes to make when you are switching between R and Python:

In R, the index starts from 1; in Python, the index starts from 0.
In R, missing values are represented by NA; in Python, missing values are represented by np.nan.
In R, the code block is enclosed by curly braces {}; in Python, the code block is defined by indentation.
In R, the : operator is used to create a sequence with a step of 1; in Python, the range() function is used to create a sequence with a step of 1.
In R, the c() function is used to combine vectors; in Python, the + operator is used to combine lists.
In R, the rep() function is used to replicate elements in a vector; in Python, the * operator is used to replicate elements in a list.
In R, the %in% operator is used to test whether an element exists in the object; in Python, the in operator is used to test whether an element exists in the object.
In R, the %*% operator is used to perform matrix multiplication; in Python, the @ operator is used to perform matrix multiplication.

1 Language Basics

1.1 Assignment of variables

1.2 Comment codes

1.3 Rules for naming object

2 Packages and Functions

3 Arithmetic, Logical, and Relational Operations

3.1 Arithmetic operations

3.2 Logical operations

3.3 Relational operations

4 Vectors

4.1 Creating vectors

4.2 Indexing and subsetting

4.3 Creating numeric sequences with fixed steps

4.4 Combine multiple vectors into one: c()

4.5 Replicating elements

4.6 Maximum and minimum

4.7 Sum and mean

4.8 Missing values

4.9 Element-wise arithmetic operations

4.10 Vector multiplication

4.11 Max and min of 2 vectors

5 Character and String

5.1 Creating strings

5.2 Concatenating strings

5.3 Checking the number of elements in a vector: length()

5.4 Special relational operation: %in%

6 Matrices

6.1 Matrices: creating matrices

6.2 Creating matrices: combine matrices

6.3 Matrices: indexing and subsetting

6.4 Matrices: check dimensions and variable types

6.5 Matrices: special operations

6.5.1 Creating a diagonal matrix

6.5.2 Creating an identity matrix

6.6 Matrices’ operations: matrix addition and multiplication

6.7 Matrices’ operations: inverse and transpose

7 Programming Basics: Flow Control

7.1 if/else

7.2 Loops

7.3 User-Defined Functions

8 A comprehensive exercise

9 Conclusion about R and Python

5.4 Special relational operation: `%in%`