CorrP (Compute correlations in parallel)

The CorrP package under development by Meantrix team and based on Srikanth KS (talegari) cor2 function can provide to R users a way to calculate correlation matrix among large data.frames, tibbles or data.tables through a parallel backend.

The data.frame is allowed to have columns of these four classes: integer, numeric, factor and character. The character column is considered as categorical variable.

In this new package the correlation is automatically computed according to the variables types:

Also, the statistical significance of all correlation’s values in the matrix are tested.  If the statistical tests do not obtain a significance level lower than p.value param the null hypothesis can’t be rejected and by default, the correlation between the variable pair will be zero.

Example:

 
 
library(corrP)
# run correlation in parallel backend
air_cor = corrP(airquality,parallel = TRUE, n.cores = 4, p.value = 0.05)
corrplot::corrplot(air_cor)
corrgram::corrgram(air_cor) 
 

Another package function rh_corrP can remove highly correlated variables from data.frames using the CorrP matrix.

 
 
 air_cor = corrP(airquality)
 airqualityH = rh_corrP(df=airquality,corrmat=air_cor,cutoff=0.5)

setdiff(colnames(airquality),(colnames( airqualityH )))
  
[1] "Ozone" "Temp" 

The CoorP package is still very new, but it is already capable of providing some interesting features. In the next versions we will be including some types of plots to be made with corrP correlation matrix .

Deixe uma resposta

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Copy link
Powered by Social Snap