library(reticulate)
use_condaenv("base")
library(JuliaCall)
Side-to-Side Comparison between R, Python, and Julia
This tutorial is designed for those who are familiar with either R, Python or Julia, and would like to learn another language.
In this tutorial, I will compare the basics of R, Python, and Julia side by side. We will cover the basic syntax, data types, and functionalities.
If you discover any mistakes or outdated content in this tutorial, please let me know. I will be very grateful for your feedback.
1 Language Basics
1.1 Assignment of variables
In R and Python, assignment operations do not print the assigned object by default.
But Julia does print the assigned object by default. Unless you put a semicolon ;
at the end of the line, Julia will not print the assigned object.
# create an object x with value 3
x <- 3
x
[1] 3
# create an object x with value 3
= 3
x x
3
# create an object x with value 3
= 3; # the ; suppresses the output x
3
1.3 Rules for naming object
For a variable to be valid, it should follow these rules
It should contain letters, numbers, and only dot or underscore characters.
It cannot start with a number (eg: 2iota), or a dot, or an underscore.
# 2iota <- 2
# .iota <- 2
# _iota <- 2
- It should not be a reserved word in R (eg: mean, sum, etc.).
# mean <- 2
For a variable to be valid, it should follow these rules
It should contain letters, numbers, and only underscore characters.
It cannot start with a number (eg: 2iota), or a dot, or an underscore.
# 2iota = 2
# .iota = 2
# _iota = 2
- It should not be a reserved word in Python (eg: mean, sum, etc.).
# mean = 2
Same as R.
2 Packages and Functions
The base R already comes with many useful built-in functions to perform basic tasks, but as data scientists, we need more.
To perform certain tasks (such as a machine learning model), we can definitely write our own code from scratch, but it takes lots of (unnecessary) effort. Fortunately, many packages have been written by others for us to directly use.
To download a package, hit
Tools
->Install Packages
in RStudio, and type the package name in the pop-up window. Now, download the packagedplyr
.To load the packages, we need to type
library()
.
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
- Now that the package is loaded, you can use the functions in it.
filter()
is a function in thedplyr
package that can be used to filter data.
Python has a similar concept of packages, but they are called modules.
- To install a module, you can use
pip install
in the terminal, or!pip install
in Jupyter Notebook. You can also install a module in the Anaconda Navigator.
# !pip install pandas
- To load a module, you can use
import
. Now that the module is loaded, you can use the functions in it.
import pandas as pd
= pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') # load iris
iris
'species'] == 'setosa'] iris[iris[
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
5 5.4 3.9 1.7 0.4 setosa
6 4.6 3.4 1.4 0.3 setosa
7 5.0 3.4 1.5 0.2 setosa
8 4.4 2.9 1.4 0.2 setosa
9 4.9 3.1 1.5 0.1 setosa
10 5.4 3.7 1.5 0.2 setosa
11 4.8 3.4 1.6 0.2 setosa
12 4.8 3.0 1.4 0.1 setosa
13 4.3 3.0 1.1 0.1 setosa
14 5.8 4.0 1.2 0.2 setosa
15 5.7 4.4 1.5 0.4 setosa
16 5.4 3.9 1.3 0.4 setosa
17 5.1 3.5 1.4 0.3 setosa
18 5.7 3.8 1.7 0.3 setosa
19 5.1 3.8 1.5 0.3 setosa
20 5.4 3.4 1.7 0.2 setosa
21 5.1 3.7 1.5 0.4 setosa
22 4.6 3.6 1.0 0.2 setosa
23 5.1 3.3 1.7 0.5 setosa
24 4.8 3.4 1.9 0.2 setosa
25 5.0 3.0 1.6 0.2 setosa
26 5.0 3.4 1.6 0.4 setosa
27 5.2 3.5 1.5 0.2 setosa
28 5.2 3.4 1.4 0.2 setosa
29 4.7 3.2 1.6 0.2 setosa
30 4.8 3.1 1.6 0.2 setosa
31 5.4 3.4 1.5 0.4 setosa
32 5.2 4.1 1.5 0.1 setosa
33 5.5 4.2 1.4 0.2 setosa
34 4.9 3.1 1.5 0.2 setosa
35 5.0 3.2 1.2 0.2 setosa
36 5.5 3.5 1.3 0.2 setosa
37 4.9 3.6 1.4 0.1 setosa
38 4.4 3.0 1.3 0.2 setosa
39 5.1 3.4 1.5 0.2 setosa
40 5.0 3.5 1.3 0.3 setosa
41 4.5 2.3 1.3 0.3 setosa
42 4.4 3.2 1.3 0.2 setosa
43 5.0 3.5 1.6 0.6 setosa
44 5.1 3.8 1.9 0.4 setosa
45 4.8 3.0 1.4 0.3 setosa
46 5.1 3.8 1.6 0.2 setosa
47 4.6 3.2 1.4 0.2 setosa
48 5.3 3.7 1.5 0.2 setosa
49 5.0 3.3 1.4 0.2 setosa
Julia has a similar concept of packages.
- To install a package, you can use
Pkg.add()
in the Julia terminal.
using Pkg
Pkg.add("DataFrames")
Pkg.add("CSV")
- To load a package, you can use
using
. Now that the package is loaded, you can use the functions in it.
using DataFrames, CSV
= CSV.File(download("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")) |> DataFrame;
iris
# Filter the DataFrame where species is "setosa"
= iris[iris.species .== "setosa", :];
setosa_data
# Display the first few rows of the filtered data
first(setosa_data, 5)
5×5 DataFrame
Row │ sepal_length sepal_width petal_length petal_width species
│ Float64 Float64 Float64 Float64 String15
─────┼────────────────────────────────────────────────────────────────
1 │ 5.1 3.5 1.4 0.2 setosa
2 │ 4.9 3.0 1.4 0.2 setosa
3 │ 4.7 3.2 1.3 0.2 setosa
4 │ 4.6 3.1 1.5 0.2 setosa
5 │ 5.0 3.6 1.4 0.2 setosa
3 Arithmetic, Logical, and Relational Operations
3.1 Arithmetic operations
# arithmatic operations
x <- 3
x + 1 # addition
[1] 4
x - 1 # subtraction
[1] 2
x * 2 # multiplication
[1] 6
x / 2 # division
[1] 1.5
x^2 # square
[1] 9
x %% 2 # remainder
[1] 1
x %/% 2 # integer division
[1] 1
# math operations
log(x) # natural logarithm
[1] 1.098612
exp(x) # exponential
[1] 20.08554
sqrt(x) # square root
[1] 1.732051
log10(x) # log base 10
[1] 0.4771213
round(x/2) # round
[1] 2
floor(x/2) # floor
[1] 1
ceiling(x/2) # ceiling
[1] 2
# arithmatic operations
= 3
x + 1 # addition x
4
- 1 # subtraction x
2
* 2 # multiplication x
6
/ 2 # division x
1.5
** 2 # square x
9
% 2 # remainder x
1
// 2 # integer division x
1
# math operations
import math
# natural logarithm math.log(x)
1.0986122886681098
# exponential math.exp(x)
20.085536923187668
# square root math.sqrt(x)
1.7320508075688772
# log base 10 math.log10(x)
0.47712125471966244
round(x/2) # round
2
/2) # floor math.floor(x
1
/2) # ceiling math.ceil(x
2
# arithmatic operations
= 3 x
3
+ 1 # addition x
4
- 1 # subtraction x
2
* 2 # multiplication x
6
/ 2 # division x
1.5
^ 2 # square x
9
% 2 # remainder x
1
div(x, 2) # integer division
1
# math operations
log(x) # natural logarithm
1.0986122886681098
exp(x) # exponential
20.085536923187668
sqrt(x) # square root
1.7320508075688772
log10(x) # log base 10
0.47712125471966244
round(x/2) # round
2.0
floor(x/2) # floor
1.0
ceil(x/2) # ceiling
2.0
3.2 Logical operations
# logical operations
x <- 3
x > 2 # larger than
[1] TRUE
x < 2 # smaller than
[1] FALSE
x == 2 # equal to
[1] FALSE
x != 2 # not equal to
[1] TRUE
# logical operations
= 3
x > 2 # larger than x
True
< 2 # smaller than x
False
== 2 # equal to x
False
!= 2 # not equal to x
True
# logical operations
= 3 x
3
> 2 # larger than x
true
< 2 # smaller than x
false
== 2 # equal to x
false
!= 2 # not equal to x
true
3.3 Relational operations
- R: Boolean values are TRUE and FALSE.
- Python: Boolean values are True and False (case-sensitive).
T & F # and
[1] FALSE
T | F # or
[1] TRUE
!T # not
[1] FALSE
True & False # and
False
True | False # or
True
not True # not
False
true & false # and
false
true | false # or
true
true # not !
false
4 Vectors
4.1 Creating vectors
In R, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.
Vector can be created using the function
c()
by listing all the values in the parenthesis, separated by comma ‘,’.c() stands for “combine”.
Income <- c(1, 3, 5, 10)
Income
[1] 1 3 5 10
- Vectors must contain elements of the same data type. If not, it will automatically convert elements into the same type (usually character type).
Income <- c(1, 3, 5, "10")
Income
[1] "1" "3" "5" "10"
In Python, a list is a collection of elements of different data types, which is often used to store a variable of a dataset. For instance, a list can store the income of a group of people, the final grades of students, etc.
List can be created using the square brackets
[]
by listing all the values in the brackets, separated by comma ‘,’.
= [1, 3, 5, 10]
Income Income
[1, 3, 5, 10]
- List can contain elements of different data types.
= [1, 3, 5, "10"]
Income Income
[1, 3, 5, '10']
- If you want to create a list with elements of the same numeric data type, you can use the
numpy
package.
import numpy as np
= np.array([1, 3, 5, 10])
Income Income
array([ 1, 3, 5, 10])
In Julia, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.
Vector can be created using the square brackets
[]
by listing all the values in the brackets, separated by comma ‘,’.
= [1, 3, 5, 10] Income
4-element Vector{Int64}:
1
3
5
10
- Vector can contain elements of different data types. However, you will note that the data type is now changed to any rather than Int64.
= [1, 3, 5, "10"] Income
4-element Vector{Any}:
1
3
5
"10"
4.2 Indexing and subsetting
R, Python, and Julia have different indexing rules.
- In R and Julia, the index starts from 1.
- In Python, the index starts from 0.
- To extract an element from a vector, we put the index of the element in a square bracket
[ ]
.
Income <- c(1, 3, 5, 10)
Income[1] # extract the first element
[1] 1
- If we want to extract multiple elements, we can use a vector of indices.
Income[c(1,3)] # extract the first and third elements
[1] 1 5
- To extract an element from a list, we put the index of the element in a square bracket
[ ]
.
= [1, 3, 5, 10]
Income 0] # extract the first element Income[
1
- If we want to extract multiple elements, we can use a slice.
0:3] # extract the first and third elements Income[
[1, 3, 5]
- With numpy array, we can use the same syntax as R.
= np.array([1, 3, 5, 10])
Income 0] # extract the first element Income[
1
0,2]] # extract the first and third elements Income[[
array([1, 5])
- To extract an element from a vector, we put the index of the element in a square bracket
[ ]
.
= [1, 3, 5, 10];
Income
1] # extract the first element Income[
1
- If we want to extract multiple elements, we can use a slice.
1:3] # extract the first and third elements Income[
3-element Vector{Int64}:
1
3
5
4.3 Creating numeric sequences with fixed steps
It is also possible to easily create sequences with patterns
- use
seq()
to create sequence with fixed steps
# use seq()
seq(from = 1, to = 2, by = 0.1)
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
- If the step is 1, there’s a convenient way using
:
1:5
[1] 1 2 3 4 5
- In base Python, we can use
range()
to create sequence with fixed steps
# from 1 to 6, with step 1
list(range(1, 6)) # range() returns a range object, we need to convert it to a list
[1, 2, 3, 4, 5]
- use
np.arange()
to create sequence with fixed steps
1, 2, 0.1) np.arange(
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])
- In Julia, we can use
1:5
to create sequence with fixed steps
1:5
1:5
- However, the julia object is not a integer vector, but a
UnitRange{Int64}
object.
typeof(1:5)
UnitRange{Int64}
4.4 Combine multiple vectors into one: c()
Sometimes, we may want to combine multiple vectors into one. For instance, we may have collected income data from two different sources, and we want to combine them into one vector.
We can use
c()
to combine different vectors; this is very commonly used to concatenate vectors.
Income1 <- 1:3
Income2 <- c(10, 15)
c(Income1,Income2)
[1] 1 2 3 10 15
- In Python, we can use the
+
operator to concatenate lists.
= [1, 2, 3]
Income1 = [10, 15] Income2
+ Income2 Income1
[1, 2, 3, 10, 15]
- For numpy arrays, we can use
np.concatenate()
to concatenate arrays.
= np.array([1, 2, 3])
Income1 = np.array([10, 15]) Income2
np.concatenate((Income1, Income2))
array([ 1, 2, 3, 10, 15])
- In Julia, we can use the
vcat()
function to concatenate vectors.
= [1, 2, 3];
Income1
= [10, 15];
Income2
vcat(Income1, Income2)
5-element Vector{Int64}:
1
2
3
10
15
4.5 Replicating elements
- We can use the
rep()
function to replicate elements in a vector.
rep(1:3, times = 2) # replicate 1:3 twice
[1] 1 2 3 1 2 3
rep(1:3, each = 2) # replicate each element in 1:3 twice
[1] 1 1 2 2 3 3
- We can use the
*
operator to replicate elements in a list.
1, 2, 3] * 2 # replicate 1:3 twice [
[1, 2, 3, 1, 2, 3]
- For numpy arrays, we can use
np.tile()
to replicate elements.
1, 2, 3], 2) # replicate 1:3 twice np.tile([
array([1, 2, 3, 1, 2, 3])
1, 2, 3], 2) # replicate each element in 1:3 twice np.repeat([
array([1, 1, 2, 2, 3, 3])
- We can use the
repeat()
function to replicate elements in a vector.
repeat([1, 2, 3], 2) # replicate 1:3 twice
6-element Vector{Int64}:
1
2
3
1
2
3
repeat([1, 2, 3], inner = 2) # replicate each element in 1:3 twice
6-element Vector{Int64}:
1
1
2
2
3
3
4.6 Maximum and minimum
= [1, 3, 5, 10]
Income
max(Income) # maximum
10
min(Income) # minimum
1
- For numpy arrays, we can use
np.max()
andnp.min()
to find the maximum and minimum values.
= np.array([1, 3, 5, 10])
Income
max(Income) # maximum np.
10
min(Income) # minimum np.
1
- We can use the
maximum()
andminimum()
functions to find the maximum and minimum values in a vector.
= [1, 3, 5, 10];
Income
maximum(Income) # maximum
10
minimum(Income) # minimum
1
4.7 Sum and mean
= [1, 3, 5, 10]
Income
sum(Income) # sum
19
# mean np.mean(Income)
4.75
- For numpy arrays, we can use
np.sum()
andnp.mean()
to find the sum and mean values.
= np.array([1, 3, 5, 10])
Income
sum(Income) # sum np.
19
# mean np.mean(Income)
4.75
4.8 Missing values
In R, missing values are represented by
NA
.In Python, missing values are represented by
np.nan
.In Julia, missing values are represented by
missing
.
- In R, missing values are represented by
NA
.
- In Python, missing values are represented by
np.nan
.
= [1, 3, 5, np.nan]
Income
# sum and remove missing values np.nansum(Income)
9.0
# mean and remove missing values np.nanmean(Income)
3.0
- In Julia, missing values are represented by
missing
. In order to take the sum or mean by removing missing values,
= [1, 3, 5, missing];
Income
sum(skipmissing(Income)) # sum and remove missing values
9
4.9 Element-wise arithmetic operations
- R by default supports element-wise operations on vectors.
- Python by default does not support element-wise operations on lists. You need to use numpy arrays to do element-wise operations.
- Julia by default does not support element-wise operations on arrays. You need to use the
.
operator to do element-wise operations.
- If you operate on a vector with a single number, the operation will be applied to all elements in the vector
Income <- c(1, 3, 5, 10)
Income + 2 # element-wise addition
[1] 3 5 7 12
Income * 2 # element-wise multiplication
[1] 2 6 10 20
- However, the base Python does not support element-wise operations on lists.
= [1, 3, 5, 10]
Income
+ 2 # element-wise addition Income
TypeError: can only concatenate list (not "int") to list
* 2 # element-wise multiplication Income
[1, 3, 5, 10, 1, 3, 5, 10]
- For numpy arrays, the behavior is the same as R.
= np.array([1, 3, 5, 10])
Income
+ 2 # element-wise addition Income
array([ 3, 5, 7, 12])
* 2 # element-wise multiplication Income
array([ 2, 6, 10, 20])
- If you operate on a vector with a single number, the operation will be applied to all elements in the vector. However, the base Julia does not support element-wise operations on arrays. In order to do element-wise operations, you need to use the
.
operator.
= [1, 3, 5, 10];
Income
.+ 2 # element-wise addition Income
4-element Vector{Int64}:
3
5
7
12
.* 2 # element-wise multiplication Income
4-element Vector{Int64}:
2
6
10
20
4.10 Vector multiplication
- If the two vectors are of the same length, they can do element-wise operations, including element-wise addition and element-wise multiplication
- For numpy arrays, we can use
np.multiply()
to do element-wise multiplication.
= np.array([1, 3, 5, 10])
Income1
= np.array([2, 4, 6, 8])
Income2
# element-wise addition np.add(Income1, Income2)
array([ 3, 7, 11, 18])
# element-wise multiplication np.multiply(Income1, Income2)
array([ 2, 12, 30, 80])
- If the two vectors are of the same length, they can do element-wise operations, including element-wise addition and element-wise multiplication
= [1, 3, 5, 10];
Income1
= [2, 4, 6, 8];
Income2
.+ Income2 # element-wise addition Income1
4-element Vector{Int64}:
3
7
11
18
.* Income2 # element-wise multiplication Income1
4-element Vector{Int64}:
2
12
30
80
4.11 Max and min of 2 vectors
- We can use the
np.maximum()
andnp.minimum()
functions to find the element-wise maximum and minimum values of two numpy arrays.
= np.array([1, 3, 5, 10])
Income1
= np.array([2, 4, 6, 8])
Income2
# element-wise maximum np.maximum(Income1, Income2)
array([ 2, 4, 6, 10])
# element-wise minimum np.minimum(Income1, Income2)
array([1, 3, 5, 8])
- We can use the
max()
andmin()
functions to find the element-wise maximum and minimum values of two vectors.
= [1, 3, 5, 10];
Income1
= [2, 4, 6, 8];
Income2
max.(Income1, Income2) # element-wise maximum
4-element Vector{Int64}:
2
4
6
10
min.(Income1, Income2) # element-wise minimum
4-element Vector{Int64}:
1
3
5
8
5 Character and String
5.1 Creating strings
Characters are enclosed within a pair of quotation marks.
Single or double quotation marks can both work.
If even a character may contain numbers, it will be treated as a character, and R will not perform any mathematical operations on it.
str1 <- "1 + 1 = 2"
Strings are enclosed within a pair of quotation marks.
Single or double quotation marks can both work.
= "1 + 1 = 2" str1
- In Julia, single quotation marks (
'
) are used for defining individual characters. Double quotation marks ("
) are used for defining strings.
= '1' character1
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
= "1 + 1 = 2" str1
"1 + 1 = 2"
5.2 Concatenating strings
- We can use the
paste()
function to concatenate strings.
str1 <- "1 + 1 = "
str2 <- "2"
paste(str1, str2)
[1] "1 + 1 = 2"
- We can use the
+
operator to concatenate strings.
= "1 + 1 = "
str1 = "2"
str2
+ str2 str1
'1 + 1 = 2'
- We can use the
*
operator to concatenate strings.
= "1 + 1 = " str1
"1 + 1 = "
= "2" str2
"2"
* str2 str1
"1 + 1 = 2"
5.3 Checking the number of elements in a vector: length()
- You can measure the length of a vector using the command
length()
- You can measure the length of a list using the command
len()
= ['R',' is', ' the', ' best', ' language']
x
len(x)
5
- For numpy arrays, you can use the
shape
attribute to get the shape of the array.
= np.array(['Python',' is', ' the', ' best', ' language'])
x
x.shape
(5,)
- You can measure the length of a vector using the command
length()
= ["Julia", " is", " the", " best", " language"] x
5-element Vector{String}:
"Julia"
" is"
" the"
" best"
" language"
length(x)
5
5.4 Special relational operation: %in%
- A special relational operation is
%in%
in R, which tests whether an element exists in the object.
- In Python, we can use the
in
operator to test whether an element exists in the object.
= [1, 3, 8, 7]
x
3 in x
True
2 in x
False
- In Julia, we can use the
in
operator to test whether an element exists in the object.
= [1, 3, 8, 7];
x
3 in x
true
6 Matrices
6.1 Matrices: creating matrices
When creating R matrix using matrix()
, the sequence of elements is filled by column. This by-column is named as column-major order.
When creating Python matrix using np.array()
, the sequence of elements is filled by row. This by-row is named as row-major order.
- A matrix can be created using the command
matrix()
- the first argument is the vector to be converted into matrix
- the second argument is the number of rows
- the last argument is the number of cols
matrix(1:9, nrow = 3, ncol = 3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
- A matrix can be created using the
numpy
package,np.array()
function, where the argument is a list of lists, where each list is a row of the matrix
import numpy as np
1, 2, 3], [4, 5, 6], [7, 8, 9]]) np.array([[
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
- A matrix can be created using the base Julia using square brackets
[]
and semicolon;
to separate rows.
1 2 3; 4 5 6; 7 8 9] [
3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
6.2 Creating matrices: combine matrices
We can use cbind()
and rbind()
to concatenate vectors and matrices into new matrices.
-
cbind()
does the column binding
a <- matrix(1:6, nrow = 2, ncol = 3)
a
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
cbind(a, a) # column bind
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 3 5 1 3 5
[2,] 2 4 6 2 4 6
-
rbind()
does the row binding
rbind(a, a) # row bind
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[3,] 1 3 5
[4,] 2 4 6
- We can use
np.concatenate()
to concatenate arrays.
= np.array([[1, 2, 3], [4, 5, 6]])
a
a
array([[1, 2, 3],
[4, 5, 6]])
= 1) # column bind np.concatenate((a, a), axis
array([[1, 2, 3, 1, 2, 3],
[4, 5, 6, 4, 5, 6]])
= 0) # row bind np.concatenate((a, a), axis
array([[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[4, 5, 6]])
- We can use the
hcat()
andvcat()
functions to concatenate matrices.
= [1 2 3; 4 5 6] a
2×3 Matrix{Int64}:
1 2 3
4 5 6
hcat(a, a) # column bind
2×6 Matrix{Int64}:
1 2 3 1 2 3
4 5 6 4 5 6
vcat(a, a) # row bind
4×3 Matrix{Int64}:
1 2 3
4 5 6
1 2 3
4 5 6
6.3 Matrices: indexing and subsetting
Matrices have two dimensions: rows and columns. Therefore, to extract elements from a matrix, we just need to specify which row(s) and which column(s) we want.
x <- matrix(1:9, nrow = 3, ncol = 3)
x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
- Extract the element in the 2nd row, 3rd column.
- use square bracket with a coma inside
[ , ]
to indicate subsetting; the argument before coma is the row index, and the argument after the coma is the column index.- 2 is specified for row index, so we will extract elements from the first row
- 3 is specified for column index, so we will extract elements from the the second column
- Altogether, we extract a single element in row 2, column 3.
- use square bracket with a coma inside
x[2,3] # the element in the 2nd row, 3rd column
[1] 8
- If we leave blank for a dimension, we extract all elements along that dimension.
- if we want to take out the entire first row
- 1 is specified for the row index
- column index is blank
- if we want to take out the entire first row
x[1,] # all elements in the first row
[1] 1 4 7
= np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
x
x
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
- Extract the element in the 2nd row, 3rd column.
1,2] # the element in the 2nd row, 3rd column x[
6
- If we leave blank for a dimension, we extract all elements along that dimension.
0,:] # all elements in the first row x[
array([1, 2, 3])
= [1 2 3; 4 5 6; 7 8 9]; x
- Extract the element in the 2nd row, 3rd column.
2,3] # the element in the 2nd row, 3rd column x[
6
- Different from R, we need to use
:
to extract all elements along that dimension.
1,:] # all elements in the first row x[
3-element Vector{Int64}:
1
2
3
6.4 Matrices: check dimensions and variable types
- You can verify the size of the matrix using the
shape
attribute
= np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
x
x.shape
(3, 3)
- You can get the data type info using the
dtype
attribute
x.dtype
dtype('int64')
- You can verify the size of the matrix using the
size()
function
= [1 2 3; 4 5 6; 7 8 9] x
3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
size(x)
(3, 3)
6.5 Matrices: special operations
6.5.1 Creating a diagonal matrix
- We can use the
diag()
function to create a diagonal matrix.
diag(1:3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
- We can use the
np.diag()
function to create a diagonal matrix.
1, 2, 3]) np.diag([
array([[1, 0, 0],
[0, 2, 0],
[0, 0, 3]])
- We can use the
diagm()
function to create a diagonal matrix.
using LinearAlgebra
diagm(0 => [1, 2, 3])
3×3 Matrix{Int64}:
1 0 0
0 2 0
0 0 3
6.5.2 Creating an identity matrix
- We can use the
diag()
function to create an identity matrix.
diag(3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
- We can use the
np.eye()
function to create an identity matrix.
3) np.eye(
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
- We can use the
I()
function to create an identity matrix.
I(3)
3×3 Diagonal{Bool, Vector{Bool}}:
1 ⋅ ⋅
⋅ 1 ⋅
⋅ ⋅ 1
6.6 Matrices’ operations: matrix addition and multiplication
- If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication
set.seed(123)
x = matrix(rnorm(9), nrow = 3, ncol = 3)
z = matrix(rnorm(9), nrow = 3, ncol = 3)
x + z # elementwise addition
[,1] [,2] [,3]
[1,] -1.0061376 0.4712798 2.2478293
[2,] 0.9939043 0.2399705 -0.7672108
[3,] 1.9185221 1.1592239 -2.6534700
x * x
[,1] [,2] [,3]
[1,] 0.31413295 0.004971433 0.2124437
[2,] 0.05298168 0.016715318 1.6003799
[3,] 2.42957161 2.941447909 0.4717668
- If we want to perform the matrix multiplication as in linear algebra, we need to use
%*%
- x and y must have conforming dimensions
x
[,1] [,2] [,3]
[1,] -0.5604756 0.07050839 0.4609162
[2,] -0.2301775 0.12928774 -1.2650612
[3,] 1.5587083 1.71506499 -0.6868529
[,1] [,2] [,3]
[1,] -0.9186059 -0.2861301 0.6175429
[2,] 1.1282999 0.8396152 -1.1340507
[3,] 1.0157790 -1.5987826 -4.4424790
- If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication
= np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
x
= np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y
+ y # elementwise addition x
array([[ 2, 4, 6],
[ 8, 10, 12],
[14, 16, 18]])
* y # elementwise multiplication x
array([[ 1, 4, 9],
[16, 25, 36],
[49, 64, 81]])
- If we want to perform the matrix multiplication as in linear algebra, we need to use
@
- x and y must have conforming dimensions
@ y # matrix multiplication x
array([[ 30, 36, 42],
[ 66, 81, 96],
[102, 126, 150]])
- If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication. It’s recommended to use
.
to indicate element-wise operations
= [1 2 3; 4 5 6; 7 8 9] x
3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
= [1 2 3; 4 5 6; 7 8 9] y
3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
.+ y # elementwise addition x
3×3 Matrix{Int64}:
2 4 6
8 10 12
14 16 18
6.7 Matrices’ operations: inverse and transpose
- We use
t()
to do matrix transpose
[,1] [,2] [,3]
[1,] 0.1533731 0.4264642 0.8781335
[2,] -1.1381369 -0.2950715 0.8215811
[3,] 1.2538149 0.8951257 0.6886403
t(x) # transpose
[,1] [,2] [,3]
[1,] 0.1533731 -1.1381369 1.2538149
[2,] 0.4264642 -0.2950715 0.8951257
[3,] 0.8781335 0.8215811 0.6886403
- We use
solve()
to get the inverse of an matrix
- We use
T
to do matrix transpose
= np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
x
x
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# transpose x.T
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
- We use
np.linalg.inv()
to get the inverse of an matrix
@ x) # inverse; must be on a square matrix np.linalg.inv(x.T
array([[ 5.62949953e+14, -1.12589991e+15, 5.62949953e+14],
[-1.12589991e+15, 2.25179981e+15, -1.12589991e+15],
[ 5.62949953e+14, -1.12589991e+15, 5.62949953e+14]])
- We use
transpose()
to do matrix transpose
= [1 2 3; 4 5 6; 7 8 9] x
3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
transpose(x) # transpose
3×3 transpose(::Matrix{Int64}) with eltype Int64:
1 4 7
2 5 8
3 6 9
- We use
inv()
to get the inverse of an matrix
inv(transpose(x) * x) # inverse; must be on a square matrix
3×3 Matrix{Float64}:
5.6295e14 -1.1259e15 5.6295e14
-1.1259e15 2.2518e15 -1.1259e15
5.6295e14 -1.1259e15 5.6295e14
7 Programming Basics: Flow Control
In R, the code block is enclosed by curly braces
{}
. Indentation is not necessary and does not affect the code execution.In Python, the code block is defined by indentation. Indentation is necessary and affects the code execution.
In Julia, the code block is defined by the beginning of
if
orfor
andend
. Indentation does not affect the code execution.
7.1 if/else
Sometimes, you want to run your code based on different conditions. For instance, if the observation is a missing value, then use the population average to impute the missing value. This is where if/else
kicks in.
if (condition == TRUE) {
action 1
} else if (condition == TRUE ){
action 2
} else {
action 3
}
Example 1:
a <- 15
if (a > 10) {
larger_than_10 <- TRUE
} else {
larger_than_10 <- FALSE
}
larger_than_10
[1] TRUE
Example 2:
= 15
a
if a > 10:
= True
larger_than_10 else:
= False
larger_than_10
larger_than_10
True
Example 2:
= -5
x
if x > 0:
print("x is a non-negative number")
else:
print("x is a negative number")
x is a negative number
= 15 a
15
if a > 10
= true
larger_than_10 else
= false
larger_than_10 end
true
larger_than_10
true
Example 2:
= -5 x
-5
if x > 0
println("x is a non-negative number")
else
println("x is a negative number")
end
x is a negative number
7.2 Loops
Both R and Python are very inefficient in terms of loops. Therefore, codes should be written in matrix form to utlize the vectorization as much as possible.
In constrast, Julia is very efficient at loops. Thus code readability should be prioritized instead of vectorization.
As the name suggests, in a loop the program repeats a set of instructions many times, until the stopping criteria is met.
Loop is very useful for repetitive jobs.
for (i in 1:10){ # i is the iterator
# loop body: gets executed each time
# the value of i changes with each iteration
}
Example:
for (i in 1:5){
print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
for i in range(1, 6):
print(i)
1
2
3
4
5
for i in 1:5
println(i)
end
1
2
3
4
5
7.3 User-Defined Functions
A function takes the argument as input, run some specified actions, and then return the result to us.
Functions are very useful. When we would like to test different ideas, we can combine functions with loops: We can write a function which takes different parameters as input, and we can use a loop to go through all the possible combinations of parameters.
Here is how to define a function in general:
function_name <- function(arg1 ,arg2 = default_value){
# write the actions to be done with arg1 and arg2
# you can have any number of arguments, with or without defaults
return() # the last line is to return some value
}
Example:
magic <- function( x, y){
return(x^2 + y)
}
magic(1,3)
[1] 4
Here is how to define a function in general:
def function_name(arg1, arg2 = default_value):
# write the actions to be done with arg1 and arg2
# you can have any number of arguments, with or without defaults
return # the last line is to return some value
NameError: name 'default_value' is not defined
Example:
def magic(x, y):
return x**2 + y
1, 3) magic(
4
Here is how to define a function in general:
function function_name(arg1, arg2 = default_value)
# write the actions to be done with arg1 and arg2
# you can have any number of arguments, with or without defaults
return # the last line is to return some value
end
function_name (generic function with 2 methods)
Example:
function magic(x, y)
return x^2 + y
end
magic (generic function with 1 method)
magic(1, 3)
4
8 A comprehensive exercise
Task: write a function, which takes a vector as input, and returns the max value of the vector
def get_max(input):
= input[0]
max_value for i in range(1, len(input)):
if input[i] > max_value:
= input[i]
max_value return max_value
-1, 3, 2]) get_max([
3
function get_max(input)
= input[1]
max_value for i in 2:length(input)
if input[i] > max_value
= input[i]
max_value end
end
return max_value
end
get_max (generic function with 1 method)
get_max([-1, 3, 2])
3
9 Conclusion about R and Python
Below are the most easy mistakes to make when you are switching between R and Python:
In R, the index starts from 1; in Python, the index starts from 0.
In R, missing values are represented by
NA
; in Python, missing values are represented bynp.nan
.In R, the code block is enclosed by curly braces
{}
; in Python, the code block is defined by indentation.In R, the
:
operator is used to create a sequence with a step of 1; in Python, therange()
function is used to create a sequence with a step of 1.In R, the
c()
function is used to combine vectors; in Python, the+
operator is used to combine lists.In R, the
rep()
function is used to replicate elements in a vector; in Python, the*
operator is used to replicate elements in a list.In R, the
%in%
operator is used to test whether an element exists in the object; in Python, thein
operator is used to test whether an element exists in the object.In R, the
%*%
operator is used to perform matrix multiplication; in Python, the@
operator is used to perform matrix multiplication.
1.2 Comment codes
You can put a
#
before any code, to indicate that any codes after the#
on the same line are your comments, and will not be run by R.It’s a good practice to often comment your codes, so that you can help the future you to remember what you were trying to achieve.
Same as R. You can put a
#
before any code, to indicate that any codes after the#
on the same line are your comments, and will not be run by Python.Same as R and Python. You can put a
#
before any code, to indicate that any codes after the#
on the same line are your comments, and will not be run by Julia.