Skip to contents

Make a correlation matrix from long format data.

Usage

cor_long(
  x,
  rows,
  cols,
  values,
  y = NULL,
  rows2 = NULL,
  cols2 = NULL,
  values2 = NULL,
  out_format = c("wide", "long"),
  method = "pearson",
  use = "everything",
  p_values = FALSE,
  p_adjust = "none",
  p_thresholds = c(`***` = 0.001, `**` = 0.01, `*` = 0.05, 1),
  p_sym_add = NULL,
  p_sym_digits = 2
)

Arguments

x

A long format data frame containing the data to correlate.

rows, cols

The columns in x containing the values that should be in the rows and columns of the correlation matrix.

values

Name of the column in x containing the values of the correlation matrix.

y

Optional second data frame for correlating with the data frame from x.

rows2, cols2

Optional names of columns with values for the rows and columns of a second matrix (taken from y).

values2

Optional column for the values of a second matrix.

out_format

Format of output correlation matrix ("long" or "wide").

method

Correlation method given to stats::cor().

use

Missing value strategy of stats::cor().

p_values

Logical indicating if p-values should be calculated.

p_adjust

String specifying the multiple testing adjustment method to use for the p-values (default is "none"). Passed to stats::p.adjust().

p_thresholds

Named numeric vector specifying p-value thresholds (in ascending order) to mark. The last element must be 1 or higher (to set the upper limit). Names must be unique, but one element can be left unnamed (by default 1 is unnamed, meaning values between the threshold closest to 1 and 1 are not marked in the plot). If NULL, no thresholding is done and p-value intervals are not marked with symbols.

p_sym_add

String with the name of the column to add to p-value symbols from p_thresholds (one of 'values', 'p_val', 'p_adj'). NULL (default) results in just the symbols.

p_sym_digits

Number of digits to use for the column in p_sym_add.

Value

A correlation matrix (if wide format) or a long format data frame with the columns 'row', 'col', and 'value' (containing correlations).

Details

If there is only one input data frame (x) and rows2, cols2 and values2 are NULL (the default), a wide matrix is constructed from x and passed to stats::cor(), resulting in a correlation matrix with the column-column correlations.

If y is a data frame and rows2, cols2 and values2 are specified, the wide versions of x and y are correlated (stats::cor(wide_x, wide_y)) resulting in a correlation matrix with the columns of x in the rows and the columns of y in the columns.

Examples

set.seed(123)
cor_in <- data.frame(row = rep(letters[1:10], each = 5),
                     col = rep(LETTERS[1:5], 10),
                     val = rnorm(50))
# Wide format output (default)
corr_wide <- cor_long(cor_in, row, col, val)

# Long format output
corr_long <- cor_long(cor_in, row, col, val,
                      out_format = "long")

# Correlation between two matrices
cor_in2 <- data.frame(rows = rep(letters[1:10], each = 10),
                      cols = rep(letters[1:10], 10),
                      values = rnorm(100))
corr2 <- cor_long(cor_in, row, col, val,
                  cor_in2, rows, cols, values)