Making a heatmap • ggcorrheatmap

The gghm() function can be used to make a heatmap from a matrix of data frame. Rows and columns are ordered the same way as in the input.

library(ggcorrheatmap)
library(ggplot2)
library(dplyr)
library(patchwork) # For pasting together plots

gghm(mtcars)

A heatmap of the mtcars data with a colour scale going from dark blue to blue.

The ggcorrhm() function uses gghm() to build a correlation heatmap. The correlations between the columns in the input are computed and plotted.

If the matrix being plotted is square and has the same row- and column names, the axis labels are drawn on the diagonal. This behaviour can be changed using the show_names_diag, show_names_rows and show_names_cols arguments. Such a heatmap can also be plotted using triangular and mixed triangular layouts, explained more in the correlation matrix and mixed layouts articles.

ggcorrhm(mtcars)

A symmetric heatmap showing the Pearson correlation between columns in the mtcars data. Colours range from blue (low) to red (high). The names of the variables are displayed on the diagonal.

Specifying colours

The colours of the heatmap can be adjusted by passing a string specifying a Brewer or Viridis scale or a ggplot2::scale_fill_* function call to the col_scale argument. The limits can be changed by giving a vector to the limits argument.

# Use the Viridis D (can also be "viridis") option
gghm(mtcars, col_scale = "D")

A heatmap of the whole mtcars data using the viridis colour scale.

The bins argument creates a binned scale (only for continuous data). If bins is a float number (class ‘double’), ggplot2 will try to put the breaks at nice numbers at the cost of changing the number of bins. If bins is an integer the exact number of bins will instead be prioritised, but the breaks may end up at strange numbers.

plt1 <- gghm(mtcars, col_scale = "A", bins = 6)
plt2 <- gghm(mtcars, col_scale = "A", bins = 6L)

plt1 + plt2

Two heatmaps of the mtcars data using the Viridis magma colour scale. The scale of the first plot is divided into five bins with breaks at 100, 200, 300, and 400. The second scale has six bins with breaks at 78.67, 157.33, 236.0, 314.67, and 393.33.

If the values in the input matrix are discrete a discrete scale will be used. If the input is a data frame with mixed types (e.g. some columns are character vectors and some numeric), the resulting heatmap will be discrete.

# Heatmap with discrete values
# Border colour and linewidth can be changed using the `border_col` and `border_lwd` arguments
set.seed(123)
gghm(replicate(15, sample(letters[1:3], 15, TRUE)),
     col_scale = "Pastel2", border_col = "thistle1",
     border_lwd = 1)

A heatmap of a 15 by 15 matrix filled randomly with a, b, and c. The Brewer Pastel2 palette is used for filling cells and the cell border colours have been changed to thistle1.

For gghm() fill, colour and size legends are shown by default. The legend order can be changed using the legend_order argument. ggcorrhm() tries to hide some legends by default. Legends and scales are covered more in the scales and legends article.

Plotting modes

The mode argument takes a string and determines how the values will be plotted. ‘heatmap’ or ‘hm’ gives the default heatmap, ‘text’ results in the values being written out, and any number from 1 to 25 changes the cells to the corresponding shapes (pch). There is also a ‘none’ mode which draws empty tiles.

In ‘text’ mode the text colour scales with the values in the cells. Instead of fill_scale, use the col_scale argument to set the scale.

gghm(mtcars[1:15, ], mode = "text",
     # Viridis "A" ("magma")
     col_scale = "A")

A heatmap of the first 15 rows of the mtcars data, where the cells have the values written inside instead of being filled. The colour of the text scales with the value.

In shape modes, 1-20 use colour scales and 21-25 use fill scales. The sizes scale with the cell values and the scale can be customised by passing ggplot2::scale_size_* functions to the size_scale argument.

For fillable shapes the border colour and linewidth can be changed using the border_col and border_lwd arguments.

gghm(mtcars[1:15, ], mode = 22,
     border_col = "lightblue", border_lwd = 1,
     col_scale = "G",
     # The range argument sets the range of sizes to plot
     size_scale = scale_size_continuous(range = c(1, 7)))

A heatmap of the first 15 rows of mtcars, where the cells have been substituted for stars (R pch 8). Both the size and colour scale with the value.

gghm(mtcars[1:15, ], mode = "none")

A heatmap of the first 15 rows of mtcars. The cells are not filled, leaving only emptiness within the cell borders.

Cell borders

Cell borders are by default grey, but can be removed by setting border_col (border colour) or border_lwd (border linewidth) to NA or border_lty (border linetype) to 0 (if mode is not a number) or border_lwd to 0 (if mode is a number).

# Use the volcano data
# It contains many rows and columns making the axes hard to read.
# Can either turn off labels with `show_names_rows` and `show_names_cols`, or use
# ggplot2::scale_x/y_discrete to change the labels
# (keep in mind that the y label order is reversed in the default layout and that overwriting the scale can change the label positions)
gghm(volcano, col_scale = "H",
     border_col = NA, border_lwd = 0,
     show_names_rows = FALSE, show_names_cols = FALSE)

A heatmap of the volcano R dataset using the viridis colour scale for filling cells. The cell borders have been removed.

border_col, border_lwd, and border_lty can take either values of length 1 or the same length as the input data. Using the return_data argument to get the plotting data makes it easier to implement conditional customisation of borders.

# Single values as input
plt1 <- gghm(cor(mtcars), col_scale = "G",
             border_col = "orange", border_lwd = 1, border_lty = 2)

# Two-step approach:
# - Make a plot with the data of interest, retrieve the plotting data
# - Make col, lwd, lty vectors based on some condition in the plotting data and pass to gghm
plt_dat <- gghm(cor(mtcars), return_data = TRUE)$plot_data
plt_dat <- mutate(plt_dat,
                  # The plotting data contains the columns 'row', 'col', and 'value', so avoid those names
                  colr = case_when(row == "hp" ~ "red", col == "vs" ~ "yellow", TRUE ~ "grey"),
                  lwd = case_when(row == "hp" | col == "vs" ~ 1.5, TRUE ~ 0.5),
                  lty = case_when(value > 0.5 ~ 1, TRUE ~ 2))
plt2 <- gghm(cor(mtcars), col_scale = "G",
             border_col = plt_dat$colr,
             border_lwd = plt_dat$lwd,
             border_lty = plt_dat$lty)

# Make one figure with both plots using the patchwork package
plt1 + plt2

Two correlation heatmaps of the mtcars data. Both use the viridis mako scale for filling cells. In the first plot the cell borders are thick, orange and dashed. In the second one, all cells where the row is hp are red and thick, cells wherer the column is vs are yellow and thick, and the rest are grey and thin. All cell borders are dashed.

Handling NAs

NAs in the data can be removed using the na_remove argument or set to a specific colour using the na_col and col_scale arguments. Setting na_remove to TRUE removes NAs from the plot, but they can still interfere with clustering if a whole row or column is all NAs.

# Introduce NAs
df_na <- as.matrix(mtcars)
set.seed(123)
df_na[sample(length(df_na), 100)] <- NA

# With NAs and the default colour
na_plt1 <- gghm(df_na, legend_order = NA,
                # Darker borders for demonstration
                border_col = "black", border_lwd = .3) +
  labs(title = "NAs default")
# Colour NAs using scale function
na_plt2 <- gghm(df_na, legend_order = NA, show_names_rows = FALSE,
                border_col = "black", border_lwd = .3,
                na_col = "white") +
  labs(title = "Change NA colour")
# Remove NAs, removes the border as well
na_plt3 <- gghm(df_na, legend_order = NA, show_names_rows = FALSE,
                border_col = "black", border_lwd = .3,
                na_remove = TRUE) +
  labs(title = "NAs removed")

na_plt1 + na_plt2 + na_plt3 & theme(plot.title = element_text(hjust = 0.5))

Three heatmaps of the mtcars data. 100 random cells have been changed to NA. In the first plot the NA cells are grey, in the second they are white In the last plot, NA cells are removed, leaving holes with no cell borders where there are NAs.

Clustering and annotation

Clustering can be applied to the rows and columns using the cluster_rows and cluster_cols arguments. See the clustering article for more on clustering.

gghm(mtcars, cluster_rows = TRUE, cluster_cols = TRUE,
     # The `scale_data` argument allows for scaling
     # NULL or 'none' for no scaling, 'rows' or 'columns'
     # (or a substring of the beginning) for scaling either dimension
     scale_data = "col")

A heatmap of mtcars, where rows and columns have been reordered according to hierarchical clustering results. Dendrograms are drawn to the right of and below the heatmap.

Row and column annotation is added by providing data frames to the annot_rows_df and annot_cols_df arguments. More details on annotation can be found in the annotation article.

set.seed(123)
# Make annotation data, use 15 rows for visibility
row_annot <- data.frame(.names = rownames(mtcars)[1:15],
                        a = sample(letters[1:4], 15, TRUE))

gghm(mtcars[1:15, ], annot_rows_df = row_annot, annot_rows_names_side = "top")

Heatmap of mtcars with a thin column added to the right containing row annotation.

Clustering and annotation can of course be added simultaneously.

gghm(mtcars[1:15, ], cluster_rows = TRUE, scale_data = "col",
     annot_rows_df = row_annot, annot_rows_names_side = "top")

Heatmap of mtcars. The rows have been clustered and along the right side is an annotation column with a dendrogram on its right.

Gaps

Gaps can be added (via facets) to highlight specific groups in the data. The split_rows and split_cols arguments take facet specifications of different forms, explained more in the facets article.

gghm(mtcars, split_rows = c(10, 20), split_cols = 5)

A heatmap of the mtcars data. The rows are split into three groups after rows 10 and 20 and the columns are split into two groups after column 5.

Cell labels

Cell labels can be displayed by using the cell_labels argument which takes either a logical or a matrix or data frame. If logical, the cell values are displayed. If a matrix/data frame, those values are used instead. The matrix should contain the same row and column names as the input, any rows or columns with different names or cells with NAs will be silently dropped.

set.seed(123)
# Make 5x5 matrix to plot with values between 1 and 10
plot_data <- as.matrix(replicate(5, sample(1:10, 5, TRUE)))

# Make label data with letters
label_mat <- plot_data
label_mat[] <- letters[1:25]
# Use a lighter scale for visibility
fill_scl <- scale_fill_gradient(high = "pink", low = "white")

plt1 <- gghm(plot_data, col_scale = fill_scl, cell_labels = TRUE, cell_label_size = 5) +
  labs(title = "Using cell values")
plt2 <- gghm(plot_data, col_scale = fill_scl, cell_labels = label_mat, cell_label_size = 5) +
  labs(title = "Using external values")

plt1 + plt2 & theme(plot.title = element_text(size = 18))

Two 5x5 heatmaps with cells coloured by their values, going from 2 (white) to 10 (pink). The first heatmap has the numbers written in black text in each cell, while the second one has cells with one letter each, gonig from a to y starting in the top left and moving down each column.

cell_label_col, cell_label_size and cell_label_digits control the colours, sizes and number of decimals (if numeric) of the labels. Keep in mind that data frames passed to cell_labels are converted to matrices, so if the data frame contains numbers and strings all numbers are converted to strings and the numbers will not be rounded. In that case the numbers can be rounded before being passed.

As mentioned before, the ‘text’ mode makes cells with text that scales with cell values. If cell_labels is TRUE or a matrix/data frame, the labels will be displayed instead, meaning that user-supplied cell labels can have use colour scale (also meaning that ‘text’ mode with cell_labels being FALSE is the same as ‘text’ mode with cell_labels as TRUE).

plt1 <- gghm(plot_data, cell_labels = TRUE, col_scale = fill_scl,
             cell_label_size = 5, cell_label_col = "#63B8FF", cell_label_digits = 4) +
  labs(title = "size, colours, decimals")

# Text requires colour instead of fill scale
col_scl <- scale_colour_gradient(high = "pink", low = "white")
plt2 <- gghm(plot_data, cell_labels = label_mat, mode = "text",
             col_scale = col_scl, cell_label_size = 8) +
  labs(title = "'text' mode")

plt1 + plt2 & theme(plot.title = element_text(size = 18))

Two 5x5 heatmaps. The first one has coloured cells and a colour scale ranging from 2 (white) to 10 (pink). The values are written in the cells as blue numbers. The second heatmap has white cells, each with one letter written in pink (starting from a in the top right and going down each column, ending in y.

External customisation

The plotting data can be returned by setting the return_data argument to TRUE, allowing for further customisation of e.g. cell labels by adding geoms to the finished plot as shown below.

As explained in the ‘Cell borders’ section, arguments where the values are passed on to ggplot2::geom_* functions can be vectors with one value per cell. The drawing order can get a bit confusing; for the full, top right and bottom left layouts the cells are drawn from the top left moving down each column; by contrast, for top left and bottom right layouts cells are drawn from the bottom left moving up each column (due to the y-axis levels being reversed). The rows are ordered in their drawing order in the data returned when return_data is TRUE so that can be used as a reference. However, in general, passing vectors to static aesthetics in ggplot2 can become tricky and adding another geom to the output plot is more robust.

plot_list <- gghm(scale(mtcars), return_data = TRUE, border_col = NA,
                  col_scale = scale_fill_gradient2(low = "deepskyblue", mid = "white", high = "salmon"))

# Add own labels based on value (also possible by supplying a matrix to cell_labels)
plot_list$plot_data$lab <- ntile(plot_list$plot_data$value, 5)

plt1 <- plot_list$plot +
  # Use col for x and row for y
  geom_text(mapping = aes(x = col, y = row, label = lab),
            data = plot_list$plot_data, size = 3)

# Or maybe points (cell_labels uses `geom_text` so the asterisks would not be positioned in the middle of the cells)
plot_list$plot_data$lab <- ifelse(plot_list$plot_data$value > 1 |
                                  plot_list$plot_data$value < -1, "*", NA)
plt2 <- plot_list$plot +
  geom_point(data = subset(plot_list$plot_data, !is.na(lab)),
             mapping = aes(x = col, y = row),
             shape = "*", size = 4)

plt1 + plt2

Two heatmaps of z-scores of mtcars. The colours are changed to a divergent scale going from blue to red. The first plot has had its cells labelled with numbers ranging from 1 to 5 based the cell value (1 is low, 5 is high). The second plot has asterisks in cells where the z-score is above 1 or below -1.

There are many arguments, allowing for quite some flexibility in the heatmap layout. Most of the arguments are showcased in the articles, hopefully they can be of some help. They go more into making correlation heatmaps, clustering and annotation, adding gaps, making mixed layout heatmaps, ways to deal with legends, and how to use the package with tidy data.

Other packages

While ggcorrheatmap supports making general heatmaps, as a heatmap package it is relatively lacking in features. The pheatmap package and the ComplexHeatmap package can be used for making heatmaps with many more options for customisation (although not with ggplot2).