Make a correlation matrix from long format data.
Arguments
- x
A long format data frame containing the data to correlate.
- rows, cols
The columns in
x
containing the values that should be in the rows and columns of the correlation matrix.- values
Name of the column in
x
containing the values of the correlation matrix.- y
Optional second data frame for correlating with the data frame from
x
.- rows2, cols2
Optional names of columns with values for the rows and columns of a second matrix (taken from
y
).- values2
Optional column for the values of a second matrix.
- out_format
Format of output correlation matrix ("long" or "wide").
- method
Correlation method given to
stats::cor()
.- use
Missing value strategy of
stats::cor()
.- p_values
Logical indicating if p-values should be calculated.
- p_adjust
String specifying the multiple testing adjustment method to use for the p-values (default is "none"). Passed to
stats::p.adjust()
.- p_thresholds
Named numeric vector specifying p-value thresholds (in ascending order) to mark. The last element must be 1 or higher (to set the upper limit). Names must be unique, but one element can be left unnamed (by default 1 is unnamed, meaning values between the threshold closest to 1 and 1 are not marked in the plot). If NULL, no thresholding is done and p-value intervals are not marked with symbols.
- p_sym_add
String with the name of the column to add to p-value symbols from
p_thresholds
(one of 'values', 'p_val', 'p_adj'). NULL (default) results in just the symbols.- p_sym_digits
Number of digits to use for the column in
p_sym_add
.
Value
A correlation matrix (if wide format) or a long format data frame with the columns 'row', 'col', and 'value' (containing correlations).
Details
If there is only one input data frame (x
) and rows2
, cols2
and values2
are NULL (the default),
a wide matrix is constructed from x
and passed to stats::cor()
, resulting in a correlation matrix
with the column-column correlations.
If y
is a data frame and rows2
, cols2
and values2
are specified, the wide versions of x
and y
are
correlated (stats::cor(wide_x, wide_y)
) resulting in a correlation matrix with the columns of x
in the
rows and the columns of y
in the columns.
Examples
set.seed(123)
cor_in <- data.frame(row = rep(letters[1:10], each = 5),
col = rep(LETTERS[1:5], 10),
val = rnorm(50))
# Wide format output (default)
corr_wide <- cor_long(cor_in, row, col, val)
# Long format output
corr_long <- cor_long(cor_in, row, col, val,
out_format = "long")
# Correlation between two matrices
cor_in2 <- data.frame(rows = rep(letters[1:10], each = 10),
cols = rep(letters[1:10], 10),
values = rnorm(100))
corr2 <- cor_long(cor_in, row, col, val,
cor_in2, rows, cols, values)