library(reticulate)
use_condaenv("base")
library(JuliaCall)Side-to-Side Comparison between R, Python, and Julia
This tutorial is designed for those who are familiar with either R, Python or Julia, and would like to learn another language.
In this tutorial, I will compare the basics of R, Python, and Julia side by side. We will cover the basic syntax, data types, and functionalities.
If you discover any mistakes or outdated content in this tutorial, please let me know. I will be very grateful for your feedback.
1 Language Basics
1.1 Assignment of variables
In R and Python, assignment operations do not print the assigned object by default.
But Julia does print the assigned object by default. Unless you put a semicolon ; at the end of the line, Julia will not print the assigned object.
# create an object x with value 3
x <- 3
x[1] 3
# create an object x with value 3
x = 3
x3
# create an object x with value 3
x = 3; # the ; suppresses the output3
1.3 Rules for naming object
For a variable to be valid, it should follow these rules
It should contain letters, numbers, and only dot or underscore characters.
It cannot start with a number (eg: 2iota), or a dot, or an underscore.
# 2iota <- 2
# .iota <- 2
# _iota <- 2- It should not be a reserved word in R (eg: mean, sum, etc.).
# mean <- 2For a variable to be valid, it should follow these rules
It should contain letters, numbers, and only underscore characters.
It cannot start with a number (eg: 2iota), or a dot, or an underscore.
# 2iota = 2
# .iota = 2
# _iota = 2- It should not be a reserved word in Python (eg: mean, sum, etc.).
# mean = 2Same as R.
2 Packages and Functions
The base R already comes with many useful built-in functions to perform basic tasks, but as data scientists, we need more.
To perform certain tasks (such as a machine learning model), we can definitely write our own code from scratch, but it takes lots of (unnecessary) effort. Fortunately, many packages have been written by others for us to directly use.
To download a package, hit
Tools->Install Packagesin RStudio, and type the package name in the pop-up window. Now, download the packagedplyr.To load the packages, we need to type
library().
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
- Now that the package is loaded, you can use the functions in it.
filter()is a function in thedplyrpackage that can be used to filter data.
Python has a similar concept of packages, but they are called modules.
- To install a module, you can use
pip installin the terminal, or!pip installin Jupyter Notebook. You can also install a module in the Anaconda Navigator.
# !pip install pandas - To load a module, you can use
import. Now that the module is loaded, you can use the functions in it.
import pandas as pd
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') # load iris
iris[iris['species'] == 'setosa'] sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
5 5.4 3.9 1.7 0.4 setosa
6 4.6 3.4 1.4 0.3 setosa
7 5.0 3.4 1.5 0.2 setosa
8 4.4 2.9 1.4 0.2 setosa
9 4.9 3.1 1.5 0.1 setosa
10 5.4 3.7 1.5 0.2 setosa
11 4.8 3.4 1.6 0.2 setosa
12 4.8 3.0 1.4 0.1 setosa
13 4.3 3.0 1.1 0.1 setosa
14 5.8 4.0 1.2 0.2 setosa
15 5.7 4.4 1.5 0.4 setosa
16 5.4 3.9 1.3 0.4 setosa
17 5.1 3.5 1.4 0.3 setosa
18 5.7 3.8 1.7 0.3 setosa
19 5.1 3.8 1.5 0.3 setosa
20 5.4 3.4 1.7 0.2 setosa
21 5.1 3.7 1.5 0.4 setosa
22 4.6 3.6 1.0 0.2 setosa
23 5.1 3.3 1.7 0.5 setosa
24 4.8 3.4 1.9 0.2 setosa
25 5.0 3.0 1.6 0.2 setosa
26 5.0 3.4 1.6 0.4 setosa
27 5.2 3.5 1.5 0.2 setosa
28 5.2 3.4 1.4 0.2 setosa
29 4.7 3.2 1.6 0.2 setosa
30 4.8 3.1 1.6 0.2 setosa
31 5.4 3.4 1.5 0.4 setosa
32 5.2 4.1 1.5 0.1 setosa
33 5.5 4.2 1.4 0.2 setosa
34 4.9 3.1 1.5 0.2 setosa
35 5.0 3.2 1.2 0.2 setosa
36 5.5 3.5 1.3 0.2 setosa
37 4.9 3.6 1.4 0.1 setosa
38 4.4 3.0 1.3 0.2 setosa
39 5.1 3.4 1.5 0.2 setosa
40 5.0 3.5 1.3 0.3 setosa
41 4.5 2.3 1.3 0.3 setosa
42 4.4 3.2 1.3 0.2 setosa
43 5.0 3.5 1.6 0.6 setosa
44 5.1 3.8 1.9 0.4 setosa
45 4.8 3.0 1.4 0.3 setosa
46 5.1 3.8 1.6 0.2 setosa
47 4.6 3.2 1.4 0.2 setosa
48 5.3 3.7 1.5 0.2 setosa
49 5.0 3.3 1.4 0.2 setosa
Julia has a similar concept of packages.
- To install a package, you can use
Pkg.add()in the Julia terminal.
using Pkg
Pkg.add("DataFrames")
Pkg.add("CSV")- To load a package, you can use
using. Now that the package is loaded, you can use the functions in it.
using DataFrames, CSV
iris = CSV.File(download("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")) |> DataFrame;
# Filter the DataFrame where species is "setosa"
setosa_data = iris[iris.species .== "setosa", :];
# Display the first few rows of the filtered data
first(setosa_data, 5)5×5 DataFrame
Row │ sepal_length sepal_width petal_length petal_width species
│ Float64 Float64 Float64 Float64 String15
─────┼────────────────────────────────────────────────────────────────
1 │ 5.1 3.5 1.4 0.2 setosa
2 │ 4.9 3.0 1.4 0.2 setosa
3 │ 4.7 3.2 1.3 0.2 setosa
4 │ 4.6 3.1 1.5 0.2 setosa
5 │ 5.0 3.6 1.4 0.2 setosa
3 Arithmetic, Logical, and Relational Operations
3.1 Arithmetic operations
# arithmatic operations
x <- 3
x + 1 # addition[1] 4
x - 1 # subtraction[1] 2
x * 2 # multiplication[1] 6
x / 2 # division[1] 1.5
x^2 # square[1] 9
x %% 2 # remainder[1] 1
x %/% 2 # integer division[1] 1
# math operations
log(x) # natural logarithm[1] 1.098612
exp(x) # exponential[1] 20.08554
sqrt(x) # square root[1] 1.732051
log10(x) # log base 10[1] 0.4771213
round(x/2) # round[1] 2
floor(x/2) # floor[1] 1
ceiling(x/2) # ceiling[1] 2
# arithmatic operations
x = 3
x + 1 # addition4
x - 1 # subtraction2
x * 2 # multiplication6
x / 2 # division1.5
x ** 2 # square9
x % 2 # remainder1
x // 2 # integer division1
# math operations
import math
math.log(x) # natural logarithm1.0986122886681098
math.exp(x) # exponential20.085536923187668
math.sqrt(x) # square root1.7320508075688772
math.log10(x) # log base 100.47712125471966244
round(x/2) # round2
math.floor(x/2) # floor1
math.ceil(x/2) # ceiling2
# arithmatic operations
x = 33
x + 1 # addition4
x - 1 # subtraction2
x * 2 # multiplication6
x / 2 # division1.5
x ^ 2 # square9
x % 2 # remainder1
div(x, 2) # integer division1
# math operations
log(x) # natural logarithm1.0986122886681098
exp(x) # exponential20.085536923187668
sqrt(x) # square root1.7320508075688772
log10(x) # log base 100.47712125471966244
round(x/2) # round2.0
floor(x/2) # floor1.0
ceil(x/2) # ceiling2.0
3.2 Logical operations
# logical operations
x <- 3
x > 2 # larger than[1] TRUE
x < 2 # smaller than[1] FALSE
x == 2 # equal to[1] FALSE
x != 2 # not equal to[1] TRUE
# logical operations
x = 3
x > 2 # larger thanTrue
x < 2 # smaller thanFalse
x == 2 # equal toFalse
x != 2 # not equal toTrue
# logical operations
x = 33
x > 2 # larger thantrue
x < 2 # smaller thanfalse
x == 2 # equal tofalse
x != 2 # not equal totrue
3.3 Relational operations
- R: Boolean values are TRUE and FALSE.
- Python: Boolean values are True and False (case-sensitive).
T & F # and[1] FALSE
T | F # or[1] TRUE
!T # not[1] FALSE
True & False # andFalse
True | False # orTrue
not True # notFalse
true & false # andfalse
true | false # ortrue
!true # notfalse
4 Vectors
4.1 Creating vectors
In R, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.
Vector can be created using the function
c()by listing all the values in the parenthesis, separated by comma ‘,’.c() stands for “combine”.
Income <- c(1, 3, 5, 10)
Income[1] 1 3 5 10
- Vectors must contain elements of the same data type. If not, it will automatically convert elements into the same type (usually character type).
Income <- c(1, 3, 5, "10")
Income[1] "1" "3" "5" "10"
In Python, a list is a collection of elements of different data types, which is often used to store a variable of a dataset. For instance, a list can store the income of a group of people, the final grades of students, etc.
List can be created using the square brackets
[]by listing all the values in the brackets, separated by comma ‘,’.
Income = [1, 3, 5, 10]
Income[1, 3, 5, 10]
- List can contain elements of different data types.
Income = [1, 3, 5, "10"]
Income[1, 3, 5, '10']
- If you want to create a list with elements of the same numeric data type, you can use the
numpypackage.
import numpy as np
Income = np.array([1, 3, 5, 10])
Incomearray([ 1, 3, 5, 10])
In Julia, a vector is a collection of elements of the same data type, which is often used to store a variable of a dataset. For instance, a vector can store the income of a group of people, the final grades of students, etc.
Vector can be created using the square brackets
[]by listing all the values in the brackets, separated by comma ‘,’.
Income = [1, 3, 5, 10]4-element Vector{Int64}:
1
3
5
10
- Vector can contain elements of different data types. However, you will note that the data type is now changed to any rather than Int64.
Income = [1, 3, 5, "10"]4-element Vector{Any}:
1
3
5
"10"
4.2 Indexing and subsetting
R, Python, and Julia have different indexing rules.
- In R and Julia, the index starts from 1.
- In Python, the index starts from 0.
- To extract an element from a vector, we put the index of the element in a square bracket
[ ].
Income <- c(1, 3, 5, 10)
Income[1] # extract the first element[1] 1
- If we want to extract multiple elements, we can use a vector of indices.
Income[c(1,3)] # extract the first and third elements[1] 1 5
- To extract an element from a list, we put the index of the element in a square bracket
[ ].
Income = [1, 3, 5, 10]
Income[0] # extract the first element1
- If we want to extract multiple elements, we can use a slice.
Income[0:3] # extract the first and third elements[1, 3, 5]
- With numpy array, we can use the same syntax as R.
Income = np.array([1, 3, 5, 10])
Income[0] # extract the first element1
Income[[0,2]] # extract the first and third elementsarray([1, 5])
- To extract an element from a vector, we put the index of the element in a square bracket
[ ].
Income = [1, 3, 5, 10];
Income[1] # extract the first element1
- If we want to extract multiple elements, we can use a slice.
Income[1:3] # extract the first and third elements3-element Vector{Int64}:
1
3
5
4.3 Creating numeric sequences with fixed steps
It is also possible to easily create sequences with patterns
- use
seq()to create sequence with fixed steps
# use seq()
seq(from = 1, to = 2, by = 0.1) [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
- If the step is 1, there’s a convenient way using
:
1:5[1] 1 2 3 4 5
- In base Python, we can use
range()to create sequence with fixed steps
# from 1 to 6, with step 1
list(range(1, 6)) # range() returns a range object, we need to convert it to a list[1, 2, 3, 4, 5]
- use
np.arange()to create sequence with fixed steps
np.arange(1, 2, 0.1)array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])
- In Julia, we can use
1:5to create sequence with fixed steps
1:51:5
- However, the julia object is not a integer vector, but a
UnitRange{Int64}object.
typeof(1:5)UnitRange{Int64}
4.4 Combine multiple vectors into one: c()
Sometimes, we may want to combine multiple vectors into one. For instance, we may have collected income data from two different sources, and we want to combine them into one vector.
We can use
c()to combine different vectors; this is very commonly used to concatenate vectors.
Income1 <- 1:3
Income2 <- c(10, 15) c(Income1,Income2)[1] 1 2 3 10 15
- In Python, we can use the
+operator to concatenate lists.
Income1 = [1, 2, 3]
Income2 = [10, 15]Income1 + Income2[1, 2, 3, 10, 15]
- For numpy arrays, we can use
np.concatenate()to concatenate arrays.
Income1 = np.array([1, 2, 3])
Income2 = np.array([10, 15])np.concatenate((Income1, Income2))array([ 1, 2, 3, 10, 15])
- In Julia, we can use the
vcat()function to concatenate vectors.
Income1 = [1, 2, 3];
Income2 = [10, 15];
vcat(Income1, Income2)5-element Vector{Int64}:
1
2
3
10
15
4.5 Replicating elements
- We can use the
rep()function to replicate elements in a vector.
rep(1:3, times = 2) # replicate 1:3 twice[1] 1 2 3 1 2 3
rep(1:3, each = 2) # replicate each element in 1:3 twice[1] 1 1 2 2 3 3
- We can use the
*operator to replicate elements in a list.
[1, 2, 3] * 2 # replicate 1:3 twice[1, 2, 3, 1, 2, 3]
- For numpy arrays, we can use
np.tile()to replicate elements.
np.tile([1, 2, 3], 2) # replicate 1:3 twicearray([1, 2, 3, 1, 2, 3])
np.repeat([1, 2, 3], 2) # replicate each element in 1:3 twice array([1, 1, 2, 2, 3, 3])
- We can use the
repeat()function to replicate elements in a vector.
repeat([1, 2, 3], 2) # replicate 1:3 twice6-element Vector{Int64}:
1
2
3
1
2
3
repeat([1, 2, 3], inner = 2) # replicate each element in 1:3 twice6-element Vector{Int64}:
1
1
2
2
3
3
4.6 Maximum and minimum
Income = [1, 3, 5, 10]
max(Income) # maximum10
min(Income) # minimum1
- For numpy arrays, we can use
np.max()andnp.min()to find the maximum and minimum values.
Income = np.array([1, 3, 5, 10])
np.max(Income) # maximum10
np.min(Income) # minimum1
- We can use the
maximum()andminimum()functions to find the maximum and minimum values in a vector.
Income = [1, 3, 5, 10];
maximum(Income) # maximum10
minimum(Income) # minimum1
4.7 Sum and mean
Income = [1, 3, 5, 10]
sum(Income) # sum19
np.mean(Income) # mean4.75
- For numpy arrays, we can use
np.sum()andnp.mean()to find the sum and mean values.
Income = np.array([1, 3, 5, 10])
np.sum(Income) # sum19
np.mean(Income) # mean4.75
4.8 Missing values
In R, missing values are represented by
NA.In Python, missing values are represented by
np.nan.In Julia, missing values are represented by
missing.
- In R, missing values are represented by
NA.
- In Python, missing values are represented by
np.nan.
Income = [1, 3, 5, np.nan]
np.nansum(Income) # sum and remove missing values9.0
np.nanmean(Income) # mean and remove missing values3.0
- In Julia, missing values are represented by
missing. In order to take the sum or mean by removing missing values,
Income = [1, 3, 5, missing];
sum(skipmissing(Income)) # sum and remove missing values9
4.9 Element-wise arithmetic operations
- R by default supports element-wise operations on vectors.
- Python by default does not support element-wise operations on lists. You need to use numpy arrays to do element-wise operations.
- Julia by default does not support element-wise operations on arrays. You need to use the
.operator to do element-wise operations.
- If you operate on a vector with a single number, the operation will be applied to all elements in the vector
Income <- c(1, 3, 5, 10)
Income + 2 # element-wise addition[1] 3 5 7 12
Income * 2 # element-wise multiplication[1] 2 6 10 20
- However, the base Python does not support element-wise operations on lists.
Income = [1, 3, 5, 10]
Income + 2 # element-wise additionTypeError: can only concatenate list (not "int") to list
Income * 2 # element-wise multiplication[1, 3, 5, 10, 1, 3, 5, 10]
- For numpy arrays, the behavior is the same as R.
Income = np.array([1, 3, 5, 10])
Income + 2 # element-wise additionarray([ 3, 5, 7, 12])
Income * 2 # element-wise multiplicationarray([ 2, 6, 10, 20])
- If you operate on a vector with a single number, the operation will be applied to all elements in the vector. However, the base Julia does not support element-wise operations on arrays. In order to do element-wise operations, you need to use the
.operator.
Income = [1, 3, 5, 10];
Income .+ 2 # element-wise addition4-element Vector{Int64}:
3
5
7
12
Income .* 2 # element-wise multiplication4-element Vector{Int64}:
2
6
10
20
4.10 Vector multiplication
- If the two vectors are of the same length, they can do element-wise operations, including element-wise addition and element-wise multiplication
- For numpy arrays, we can use
np.multiply()to do element-wise multiplication.
Income1 = np.array([1, 3, 5, 10])
Income2 = np.array([2, 4, 6, 8])
np.add(Income1, Income2) # element-wise additionarray([ 3, 7, 11, 18])
np.multiply(Income1, Income2) # element-wise multiplicationarray([ 2, 12, 30, 80])
- If the two vectors are of the same length, they can do element-wise operations, including element-wise addition and element-wise multiplication
Income1 = [1, 3, 5, 10];
Income2 = [2, 4, 6, 8];
Income1 .+ Income2 # element-wise addition4-element Vector{Int64}:
3
7
11
18
Income1 .* Income2 # element-wise multiplication4-element Vector{Int64}:
2
12
30
80
4.11 Max and min of 2 vectors
- We can use the
np.maximum()andnp.minimum()functions to find the element-wise maximum and minimum values of two numpy arrays.
Income1 = np.array([1, 3, 5, 10])
Income2 = np.array([2, 4, 6, 8])
np.maximum(Income1, Income2) # element-wise maximumarray([ 2, 4, 6, 10])
np.minimum(Income1, Income2) # element-wise minimumarray([1, 3, 5, 8])
- We can use the
max()andmin()functions to find the element-wise maximum and minimum values of two vectors.
Income1 = [1, 3, 5, 10];
Income2 = [2, 4, 6, 8];
max.(Income1, Income2) # element-wise maximum4-element Vector{Int64}:
2
4
6
10
min.(Income1, Income2) # element-wise minimum4-element Vector{Int64}:
1
3
5
8
5 Character and String
5.1 Creating strings
Characters are enclosed within a pair of quotation marks.
Single or double quotation marks can both work.
If even a character may contain numbers, it will be treated as a character, and R will not perform any mathematical operations on it.
str1 <- "1 + 1 = 2"Strings are enclosed within a pair of quotation marks.
Single or double quotation marks can both work.
str1 = "1 + 1 = 2"- In Julia, single quotation marks (
') are used for defining individual characters. Double quotation marks (") are used for defining strings.
character1 = '1''1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
str1 = "1 + 1 = 2""1 + 1 = 2"
5.2 Concatenating strings
- We can use the
paste()function to concatenate strings.
str1 <- "1 + 1 = "
str2 <- "2"
paste(str1, str2)[1] "1 + 1 = 2"
- We can use the
+operator to concatenate strings.
str1 = "1 + 1 = "
str2 = "2"
str1 + str2'1 + 1 = 2'
- We can use the
*operator to concatenate strings.
str1 = "1 + 1 = ""1 + 1 = "
str2 = "2""2"
str1 * str2"1 + 1 = 2"
5.3 Checking the number of elements in a vector: length()
- You can measure the length of a vector using the command
length()
- You can measure the length of a list using the command
len()
x = ['R',' is', ' the', ' best', ' language']
len(x)5
- For numpy arrays, you can use the
shapeattribute to get the shape of the array.
x = np.array(['Python',' is', ' the', ' best', ' language'])
x.shape(5,)
- You can measure the length of a vector using the command
length()
x = ["Julia", " is", " the", " best", " language"]5-element Vector{String}:
"Julia"
" is"
" the"
" best"
" language"
length(x)5
5.4 Special relational operation: %in%
- A special relational operation is
%in%in R, which tests whether an element exists in the object.
- In Python, we can use the
inoperator to test whether an element exists in the object.
x = [1, 3, 8, 7]
3 in xTrue
2 in xFalse
- In Julia, we can use the
inoperator to test whether an element exists in the object.
x = [1, 3, 8, 7];
3 in xtrue
6 Matrices
6.1 Matrices: creating matrices
When creating R matrix using matrix(), the sequence of elements is filled by column. This by-column is named as column-major order.
When creating Python matrix using np.array(), the sequence of elements is filled by row. This by-row is named as row-major order.
- A matrix can be created using the command
matrix()- the first argument is the vector to be converted into matrix
- the second argument is the number of rows
- the last argument is the number of cols
matrix(1:9, nrow = 3, ncol = 3) [,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
- A matrix can be created using the
numpypackage,np.array()function, where the argument is a list of lists, where each list is a row of the matrix
import numpy as np
np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
- A matrix can be created using the base Julia using square brackets
[]and semicolon;to separate rows.
[1 2 3; 4 5 6; 7 8 9]3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
6.2 Creating matrices: combine matrices
We can use cbind() and rbind() to concatenate vectors and matrices into new matrices.
-
cbind()does the column binding
a <- matrix(1:6, nrow = 2, ncol = 3)
a [,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
cbind(a, a) # column bind [,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 3 5 1 3 5
[2,] 2 4 6 2 4 6
-
rbind()does the row binding
rbind(a, a) # row bind [,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[3,] 1 3 5
[4,] 2 4 6
- We can use
np.concatenate()to concatenate arrays.
a = np.array([[1, 2, 3], [4, 5, 6]])
aarray([[1, 2, 3],
[4, 5, 6]])
np.concatenate((a, a), axis = 1) # column bindarray([[1, 2, 3, 1, 2, 3],
[4, 5, 6, 4, 5, 6]])
np.concatenate((a, a), axis = 0) # row bindarray([[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[4, 5, 6]])
- We can use the
hcat()andvcat()functions to concatenate matrices.
a = [1 2 3; 4 5 6]2×3 Matrix{Int64}:
1 2 3
4 5 6
hcat(a, a) # column bind2×6 Matrix{Int64}:
1 2 3 1 2 3
4 5 6 4 5 6
vcat(a, a) # row bind4×3 Matrix{Int64}:
1 2 3
4 5 6
1 2 3
4 5 6
6.3 Matrices: indexing and subsetting
Matrices have two dimensions: rows and columns. Therefore, to extract elements from a matrix, we just need to specify which row(s) and which column(s) we want.
x <- matrix(1:9, nrow = 3, ncol = 3)
x [,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
- Extract the element in the 2nd row, 3rd column.
- use square bracket with a coma inside
[ , ]to indicate subsetting; the argument before coma is the row index, and the argument after the coma is the column index.- 2 is specified for row index, so we will extract elements from the first row
- 3 is specified for column index, so we will extract elements from the the second column
- Altogether, we extract a single element in row 2, column 3.
- use square bracket with a coma inside
x[2,3] # the element in the 2nd row, 3rd column[1] 8
- If we leave blank for a dimension, we extract all elements along that dimension.
- if we want to take out the entire first row
- 1 is specified for the row index
- column index is blank
- if we want to take out the entire first row
x[1,] # all elements in the first row[1] 1 4 7
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
xarray([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
- Extract the element in the 2nd row, 3rd column.
x[1,2] # the element in the 2nd row, 3rd column6
- If we leave blank for a dimension, we extract all elements along that dimension.
x[0,:] # all elements in the first rowarray([1, 2, 3])
x = [1 2 3; 4 5 6; 7 8 9];- Extract the element in the 2nd row, 3rd column.
x[2,3] # the element in the 2nd row, 3rd column6
- Different from R, we need to use
:to extract all elements along that dimension.
x[1,:] # all elements in the first row3-element Vector{Int64}:
1
2
3
6.4 Matrices: check dimensions and variable types
- You can verify the size of the matrix using the
shapeattribute
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
x.shape(3, 3)
- You can get the data type info using the
dtypeattribute
x.dtypedtype('int64')
- You can verify the size of the matrix using the
size()function
x = [1 2 3; 4 5 6; 7 8 9]3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
size(x)(3, 3)
6.5 Matrices: special operations
6.5.1 Creating a diagonal matrix
- We can use the
diag()function to create a diagonal matrix.
diag(1:3) [,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
- We can use the
np.diag()function to create a diagonal matrix.
np.diag([1, 2, 3])array([[1, 0, 0],
[0, 2, 0],
[0, 0, 3]])
- We can use the
diagm()function to create a diagonal matrix.
using LinearAlgebra
diagm(0 => [1, 2, 3])3×3 Matrix{Int64}:
1 0 0
0 2 0
0 0 3
6.5.2 Creating an identity matrix
- We can use the
diag()function to create an identity matrix.
diag(3) [,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
- We can use the
np.eye()function to create an identity matrix.
np.eye(3)array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
- We can use the
I()function to create an identity matrix.
I(3)3×3 Diagonal{Bool, Vector{Bool}}:
1 ⋅ ⋅
⋅ 1 ⋅
⋅ ⋅ 1
6.6 Matrices’ operations: matrix addition and multiplication
- If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication
set.seed(123)
x = matrix(rnorm(9), nrow = 3, ncol = 3)
z = matrix(rnorm(9), nrow = 3, ncol = 3)
x + z # elementwise addition [,1] [,2] [,3]
[1,] -1.0061376 0.4712798 2.2478293
[2,] 0.9939043 0.2399705 -0.7672108
[3,] 1.9185221 1.1592239 -2.6534700
x * x [,1] [,2] [,3]
[1,] 0.31413295 0.004971433 0.2124437
[2,] 0.05298168 0.016715318 1.6003799
[3,] 2.42957161 2.941447909 0.4717668
- If we want to perform the matrix multiplication as in linear algebra, we need to use
%*%- x and y must have conforming dimensions
x [,1] [,2] [,3]
[1,] -0.5604756 0.07050839 0.4609162
[2,] -0.2301775 0.12928774 -1.2650612
[3,] 1.5587083 1.71506499 -0.6868529
[,1] [,2] [,3]
[1,] -0.9186059 -0.2861301 0.6175429
[2,] 1.1282999 0.8396152 -1.1340507
[3,] 1.0157790 -1.5987826 -4.4424790
- If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
x + y # elementwise additionarray([[ 2, 4, 6],
[ 8, 10, 12],
[14, 16, 18]])
x * y # elementwise multiplicationarray([[ 1, 4, 9],
[16, 25, 36],
[49, 64, 81]])
- If we want to perform the matrix multiplication as in linear algebra, we need to use
@- x and y must have conforming dimensions
x @ y # matrix multiplicationarray([[ 30, 36, 42],
[ 66, 81, 96],
[102, 126, 150]])
- If the two matrices are of the same dimensions, they can do element-wise operations, including element-wise addition and element-wise multiplication. It’s recommended to use
.to indicate element-wise operations
x = [1 2 3; 4 5 6; 7 8 9]3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
y = [1 2 3; 4 5 6; 7 8 9]3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
x .+ y # elementwise addition3×3 Matrix{Int64}:
2 4 6
8 10 12
14 16 18
6.7 Matrices’ operations: inverse and transpose
- We use
t()to do matrix transpose
[,1] [,2] [,3]
[1,] 0.1533731 0.4264642 0.8781335
[2,] -1.1381369 -0.2950715 0.8215811
[3,] 1.2538149 0.8951257 0.6886403
t(x) # transpose [,1] [,2] [,3]
[1,] 0.1533731 -1.1381369 1.2538149
[2,] 0.4264642 -0.2950715 0.8951257
[3,] 0.8781335 0.8215811 0.6886403
- We use
solve()to get the inverse of an matrix
- We use
Tto do matrix transpose
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
xarray([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
x.T # transposearray([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
- We use
np.linalg.inv()to get the inverse of an matrix
np.linalg.inv(x.T @ x) # inverse; must be on a square matrixarray([[ 5.62949953e+14, -1.12589991e+15, 5.62949953e+14],
[-1.12589991e+15, 2.25179981e+15, -1.12589991e+15],
[ 5.62949953e+14, -1.12589991e+15, 5.62949953e+14]])
- We use
transpose()to do matrix transpose
x = [1 2 3; 4 5 6; 7 8 9]3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
transpose(x) # transpose3×3 transpose(::Matrix{Int64}) with eltype Int64:
1 4 7
2 5 8
3 6 9
- We use
inv()to get the inverse of an matrix
inv(transpose(x) * x) # inverse; must be on a square matrix3×3 Matrix{Float64}:
5.6295e14 -1.1259e15 5.6295e14
-1.1259e15 2.2518e15 -1.1259e15
5.6295e14 -1.1259e15 5.6295e14
7 Programming Basics: Flow Control
In R, the code block is enclosed by curly braces
{}. Indentation is not necessary and does not affect the code execution.In Python, the code block is defined by indentation. Indentation is necessary and affects the code execution.
In Julia, the code block is defined by the beginning of
iforforandend. Indentation does not affect the code execution.
7.1 if/else
Sometimes, you want to run your code based on different conditions. For instance, if the observation is a missing value, then use the population average to impute the missing value. This is where if/else kicks in.
if (condition == TRUE) {
action 1
} else if (condition == TRUE ){
action 2
} else {
action 3
}
Example 1:
a <- 15
if (a > 10) {
larger_than_10 <- TRUE
} else {
larger_than_10 <- FALSE
}
larger_than_10 [1] TRUE
Example 2:
a = 15
if a > 10:
larger_than_10 = True
else:
larger_than_10 = False
larger_than_10True
Example 2:
x = -5
if x > 0:
print("x is a non-negative number")
else:
print("x is a negative number")x is a negative number
a = 1515
if a > 10
larger_than_10 = true
else
larger_than_10 = false
endtrue
larger_than_10true
Example 2:
x = -5-5
if x > 0
println("x is a non-negative number")
else
println("x is a negative number")
endx is a negative number
7.2 Loops
Both R and Python are very inefficient in terms of loops. Therefore, codes should be written in matrix form to utlize the vectorization as much as possible.
In constrast, Julia is very efficient at loops. Thus code readability should be prioritized instead of vectorization.
As the name suggests, in a loop the program repeats a set of instructions many times, until the stopping criteria is met.
Loop is very useful for repetitive jobs.
for (i in 1:10){ # i is the iterator
# loop body: gets executed each time
# the value of i changes with each iteration
}Example:
for (i in 1:5){
print(i)
}[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
for i in range(1, 6):
print(i)1
2
3
4
5
for i in 1:5
println(i)
end1
2
3
4
5
7.3 User-Defined Functions
A function takes the argument as input, run some specified actions, and then return the result to us.
Functions are very useful. When we would like to test different ideas, we can combine functions with loops: We can write a function which takes different parameters as input, and we can use a loop to go through all the possible combinations of parameters.
Here is how to define a function in general:
function_name <- function(arg1 ,arg2 = default_value){
# write the actions to be done with arg1 and arg2
# you can have any number of arguments, with or without defaults
return() # the last line is to return some value
}Example:
magic <- function( x, y){
return(x^2 + y)
}
magic(1,3)[1] 4
Here is how to define a function in general:
def function_name(arg1, arg2 = default_value):
# write the actions to be done with arg1 and arg2
# you can have any number of arguments, with or without defaults
return # the last line is to return some valueNameError: name 'default_value' is not defined
Example:
def magic(x, y):
return x**2 + y
magic(1, 3)4
Here is how to define a function in general:
function function_name(arg1, arg2 = default_value)
# write the actions to be done with arg1 and arg2
# you can have any number of arguments, with or without defaults
return # the last line is to return some value
endfunction_name (generic function with 2 methods)
Example:
function magic(x, y)
return x^2 + y
endmagic (generic function with 1 method)
magic(1, 3)4
8 A comprehensive exercise
Task: write a function, which takes a vector as input, and returns the max value of the vector
def get_max(input):
max_value = input[0]
for i in range(1, len(input)):
if input[i] > max_value:
max_value = input[i]
return max_value
get_max([-1, 3, 2])3
function get_max(input)
max_value = input[1]
for i in 2:length(input)
if input[i] > max_value
max_value = input[i]
end
end
return max_value
endget_max (generic function with 1 method)
get_max([-1, 3, 2])3
9 Conclusion about R and Python
Below are the most easy mistakes to make when you are switching between R and Python:
In R, the index starts from 1; in Python, the index starts from 0.
In R, missing values are represented by
NA; in Python, missing values are represented bynp.nan.In R, the code block is enclosed by curly braces
{}; in Python, the code block is defined by indentation.In R, the
:operator is used to create a sequence with a step of 1; in Python, therange()function is used to create a sequence with a step of 1.In R, the
c()function is used to combine vectors; in Python, the+operator is used to combine lists.In R, the
rep()function is used to replicate elements in a vector; in Python, the*operator is used to replicate elements in a list.In R, the
%in%operator is used to test whether an element exists in the object; in Python, theinoperator is used to test whether an element exists in the object.In R, the
%*%operator is used to perform matrix multiplication; in Python, the@operator is used to perform matrix multiplication.
1.2 Comment codes
You can put a
#before any code, to indicate that any codes after the#on the same line are your comments, and will not be run by R.It’s a good practice to often comment your codes, so that you can help the future you to remember what you were trying to achieve.
Same as R. You can put a
#before any code, to indicate that any codes after the#on the same line are your comments, and will not be run by Python.Same as R and Python. You can put a
#before any code, to indicate that any codes after the#on the same line are your comments, and will not be run by Julia.