Miao, Wei (缪玮)
  • Home
  • Research
  • Teaching
  • Blog

On this page

  • 1 Preparation
  • 2 data.table
  • 3 qs2

Installations of R Packages that require compilation

MacOS
R
For some R packages (fst, qs2, data.table, etc.), you need to compile from source in order to make the best use of the package. This is a guide to install these packages.
Author

Wei Miao

Published

April 21, 2025

Modified

April 21, 2025

1 Preparation

In order to compile R packages from source, you need to have the following tools installed on your system:

  • Xcode Command Line Tools: This includes the necessary compilers and build tools for compiling R packages from source. You can install it by running the following command in your terminal. Note that this command will prompt you to install the Xcode Command Line Tools if they are not already installed. If you have Xcode installed, you can skip this step, or you will be prompted that the command line tools are already installed.
xcode-select --install
  • Homebrew: This is a package manager for macOS that makes it easy to install and manage software packages. If you don’t have Homebrew installed, you can install it by running the following command in your terminal. You can also check the official Homebrew installation guide for more details.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

2 data.table

Caution

Sometimes, macOS upgrades can update the default compiler (clang) included with Xcode Command Line Tools, which might lead to compilation issues for certain R packages like data.table. For example, a recent issue arose with the macOS Sequoia beta where the updated clang version caused compilation failures (see Rdatatable/data.table#6622).

If you encounter such compilation problems after a macOS update, a potential workaround is to install an older version of the compiler using Homebrew, such as LLVM 16:

brew install llvm@16

You can then set the CC and CXX environment variables to point to the older compiler version before installing the package:

export CC=/usr/local/opt/llvm@16/bin/clang
export CXX=/usr/local/opt/llvm@16/bin/clang++

2.1 Introduction to data.table

data.table is a high-performance extension of R’s data.frame that provides a syntax for data manipulation that is concise, consistent, and efficient. It’s particularly optimized for large datasets and offers significant performance improvements over base R and tidyverse’s dplyr.

2.2 Why data.table is superior to dplyr

  1. Speed: data.table is consistently faster than dplyr, especially for large datasets, due to its C implementation and sophisticated optimization techniques.

  2. Memory efficiency: data.table operations are typically performed in-place, which reduces memory overhead compared to dplyr’s copy-on-modify approach.

  3. Concise syntax: Complex operations can be expressed in a single line of code using data.table’s [i, j, by] syntax, which is more compact than dplyr’s pipe-based approach.

  4. Advanced features: data.table offers powerful features like rolling joins, non-equi joins, and specialized grouped operations that aren’t as easily accessible in dplyr.

2.3 Installation notes

If you directly download the data.table package from CRAN, it will be installed as a binary package. This means that the package is pre-compiled and does not require compilation on your machine. However, this may not always provide the best performance. The biggest disadvantage of using the binary version is that it usually does not use openmp on Mac and Linux, which is a parallel programming model that can significantly speed up computations, so you lose the benefit of using multiple CPU cores.

To install the data.table package from source, you can follow the following steps:

  1. Preparation Follow the preparation steps above Section 1 to install Xcode Command Line Tools and Homebrew.

  2. Install OpenMP: Install the libomp package using Homebrew, which provides support for OpenMP:

brew install libomp
  1. Customize makevars: Create or edit the ~/.R/Makevars file to include the following lines:
# ~/.R/Makevars
CPPFLAGS += -Xclang -fopenmp
LDFLAGS += -lomp
Note
  • CPPFLAGS: This variable is used to specify additional flags for the C++ compiler. The -Xclang -fopenmp flag tells the compiler to enable OpenMP support.
  • LDFLAGS: This variable is used to specify additional flags for the linker. The -lomp flag tells the linker to link against the OpenMP library.
  • If you don’t have the ~/.R/Makevars file, you can create it using the following command:
mkdir -p ~/.R && touch ~/.R/Makevars
  1. Install data.table: After the above steps are done, you can install the data.table package from source using the following command in R:
install.packages("data.table", type = "source")

3 qs2

3.1 Introduction to qs2

qs2 is a package for fast serialization and deserialization of R objects. It is particularly useful for saving and loading large datasets quickly, making it a great choice for data-intensive applications.

I used to use fst package for serialization and deserialization, but I found that qs2 is superior as fst can only save data frames, while qs2 can save any R object.

3.2 Installation notes

To install the qs2 package from source, you can follow the following steps:

  1. Preparation Follow the preparation steps above Section 1 to install Xcode Command Line Tools and Homebrew.
  2. Install TBB: Install the tbb package using Homebrew, which provides support for parallel programming using Intel’s Threading Building Blocks (TBB). TBB is required for the qs2 package to enable parallel serialization and deserialization.
brew install tbb
  1. add TBB environment variable: Add the following line to your ~/.zshrc file to set the TBB environment variable. This step is necessary for the qs2 package to find the TBB library during installation. You can find the path by running brew --prefix tbb. Note that you need to replace /usr/local/opt/tbb with the actual path to the TBB installation on your system.
export TBB="/usr/local/opt/tbb"
export TBB_INC="$TBB/include"
export TBB_LIB="$TBB/lib"
  1. Install qs2: After the above steps are done, you can install the qs2 package from source using the following command in R. The official guide is here.
install.packages("qs2", type = "source", configure.args = "--with-TBB --with-simd=AVX2")
Note
  • --with-TBB: This flag tells the qs2 package to use the TBB library for parallel serialization and deserialization.
  • --with-simd=AVX2: This flag tells the qs2 package to use AVX2 SIMD (Single Instruction, Multiple Data) instructions for further performance optimization. AVX2 is a set of CPU instructions that can perform multiple operations in parallel, which can significantly speed up computations.
Back to top
 

Copyright 2025, Wei Miao