

{"id":118,"date":"2020-02-21T19:33:34","date_gmt":"2020-02-21T22:33:34","guid":{"rendered":"http:\/\/meantrix.com\/blog\/?p=118"},"modified":"2020-02-24T12:07:37","modified_gmt":"2020-02-24T15:07:37","slug":"corrp-compute-correlations-in-parallel","status":"publish","type":"post","link":"https:\/\/meantrix.com\/blog\/2020\/02\/21\/corrp-compute-correlations-in-parallel\/","title":{"rendered":"CorrP (Compute correlations in parallel)"},"content":{"rendered":"\n<p>The <a href=\"https:\/\/github.com\/meantrix\/corrP\">CorrP package<\/a> under development by Meantrix team and based on Srikanth KS (talegari) cor2 function can provide to R users a way to calculate correlation matrix among large data.frames, tibbles or data.tables through a parallel backend.<\/p>\n\n\n\n<p>The data.frame is\nallowed to have columns of these four classes: integer, numeric, factor and\ncharacter. The character column is considered as categorical variable.<\/p>\n\n\n\n<p>In this new package the\ncorrelation is automatically computed according to the variables types: <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>integer\/numeric pair: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pearson_correlation_coefficient\">Pearson correlation test<\/a> ;<\/li><li>integer\/numeric &#8211; factor\/categorical pair: correlation coefficient or squared root of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Coefficient_of_determination\">R^2 coefficient of linear regression<\/a>;<\/li><li>factor\/categorical pair: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Cra\n\n\n\n<p>Also, the statistical significance of all correlation\u2019s values in the matrix are tested. &nbsp;If the statistical tests do not obtain a significance level lower than p.value param the null hypothesis can\u2019t be rejected and by default, the correlation between the variable pair will be zero.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> <pre class=\"brush: r; title: ; notranslate\" title=\"\"> \nlibrary(corrP)\n# run correlation in parallel backend\nair_cor = corrP(airquality,parallel = TRUE, n.cores = 4, p.value = 0.05)\ncorrplot::corrplot(air_cor)\ncorrgram::corrgram(air_cor) \n <\/pre> <\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"784\" height=\"524\" src=\"http:\/\/meantrix.com\/blog\/wp-content\/uploads\/2020\/02\/corrP_meantrix.png\" alt=\"\" class=\"wp-image-136\" srcset=\"https:\/\/meantrix.com\/blog\/wp-content\/uploads\/2020\/02\/corrP_meantrix.png 784w, https:\/\/meantrix.com\/blog\/wp-content\/uploads\/2020\/02\/corrP_meantrix-300x201.png 300w, https:\/\/meantrix.com\/blog\/wp-content\/uploads\/2020\/02\/corrP_meantrix-768x513.png 768w, https:\/\/meantrix.com\/blog\/wp-content\/uploads\/2020\/02\/corrP_meantrix-750x500.png 750w\" sizes=\"(max-width: 784px) 100vw, 784px\" \/><\/figure>\n\n\n\n<p>Another package function  <strong>rh_corrP <\/strong>can remove highly correlated variables from data.frames using the CorrP matrix.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> <pre class=\"brush: r; title: ; notranslate\" title=\"\"> \n air_cor = corrP(airquality)\n airqualityH = rh_corrP(df=airquality,corrmat=air_cor,cutoff=0.5)\n\nsetdiff(colnames(airquality),(colnames( airqualityH )))\n  <\/pre> <\/pre>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n&#x5B;1] \"Ozone\" \"Temp\" \n<\/pre><\/div>\n\n\n<p>The <a href=\"https:\/\/github.com\/meantrix\/corrP\">CoorP package<\/a> is still very new, but it is already capable of providing some interesting features. In the next versions we will be including some types of plots to be made with  corrP  correlation matrix .<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The CorrP package under development by Meantrix team and based on Srikanth KS (talegari) cor2 function can provide to R users a way to calculate correlation matrix among large data.frames, tibbles or data.tables through a parallel backend. The data.frame is allowed to have columns of these four classes: integer, numeric, factor and character. The character [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":136,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[9,10,8],"class_list":["post-118","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ds","tag-correlation","tag-parallel","tag-r"],"_links":{"self":[{"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/posts\/118"}],"collection":[{"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/comments?post=118"}],"version-history":[{"count":49,"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/posts\/118\/revisions"}],"predecessor-version":[{"id":175,"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/posts\/118\/revisions\/175"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/media\/136"}],"wp:attachment":[{"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/media?parent=118"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/categories?post=118"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/meantrix.com\/blog\/wp-json\/wp\/v2\/tags?post=118"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}