The purpose of `distplyr`

is to equip every analyst with a tool to seemlessly draw powerful insights using distributions. Distributions add colour to your analysis. They show the complete picture of uncertainty.

Use `distplyr`

to:

- Create and meld distributions using a wide pallet of base forms and tools.
- Draw properties from those distributions.

Many distributions in practice are built in “layers”, by transforming and combining other distributions. The result is a tailored distribution that does not follow a basic parametric form such as “Normal” or “Exponential”. The motivation behind the name of `distplyr`

is that distributions are built by manipulation, akin to the package `dplyr`

.

**Note**: This package is still in its infancy. There are many other critical features to come. Expect breaking changes as long as this package is marked as “Experimental”.

`distplyr`

:

- Keeps all components of a distribution together in a single object.
- Computes only when needed, by dispatching an appropriate S3 method on call.
- Manages the discrete components of all distributions, often arising from empirical estimates.

library(distplyr)

There are many parametric families of distributions at your disposal. Here is a Uniform distribution:

(d1 <- dst_unif(2, 5)) #> Uniform Distribution #> #> Parameters: #> # A tibble: 2 x 2 #> parameter value #> <chr> <dbl> #> 1 min 2 #> 2 max 5 #> #> Number of Discontinuities: 0

Empirical distributions are accomodated, too.

(d2 <- stepdst(mpg, data = mtcars)) #> Step Distribution #> #> Number of Discontinuities: 25

Manipulate distributions. Here’s an example of a mixture distribution of two Normals:

(d3 <- mix( dst_norm(-5, 1), dst_norm(0, 1), weights = c(1, 4) )) #> Mixture Distribution #> #> Components: #> # A tibble: 2 x 2 #> distribution weight #> <chr> <dbl> #> 1 Gaussian 0.2 #> 2 Gaussian 0.8 #> #> Number of Discontinuities: 0 plot(d3) #> Warning in get_lower(cdf, level = at[1L]): This function doesn't work properly #> yet! #> Warning in get_higher(cdf, level = at[n_x]): This function doesn't work properly #> yet! #> Warning in get_lower(cdf, level = at[1L]): This function doesn't work properly #> yet! #> Warning in get_higher(cdf, level = at[n_x]): This function doesn't work properly #> yet!

Generate a sample from a distribution.

realise(d3, n = 10) #> [1] -0.77434763 -5.44541849 0.31158146 0.47065853 0.68328860 -0.10685444 #> [7] 0.17142651 -0.07658614 2.24661477 0.75396800

Calculate properties of a distribution.

mean(d1) #> [1] 3.5 variance(d2) #> [1] 35.18897 median(d3) #> Warning in get_lower(cdf, level = at[1L]): This function doesn't work properly #> yet! #> Warning in get_higher(cdf, level = at[n_x]): This function doesn't work properly #> yet! #> [1] -0.3186384 evi(d1) #> [1] -1

Evaluate distributional representations:

eval_density(d1, at = c(2, 3.5, 4.5)) #> [1] 0.3333333 0.3333333 0.3333333 enframe_cdf(d2, at = 1:5) #> # A tibble: 5 x 2 #> .arg .cdf #> <int> <dbl> #> 1 1 0 #> 2 2 0 #> 3 3 0 #> 4 4 0 #> 5 5 0 enframe_hazard(d3, at = 1:5) #> # A tibble: 5 x 2 #> .arg .hazard #> <int> <dbl> #> 1 1 1.53 #> 2 2 2.37 #> 3 3 3.28 #> 4 4 4.23 #> 5 5 5.19

`distplyr`

is not on CRAN yet, so the best way to install it is:

devtools::install_github("vincenzocoia/distplyr")

`distplyr`

in Context`distplyr`

is *not* a modelling package, meaning it won’t optimize a distribution’s fit to data.

The `distributions3`

package is a similar package in that it bundles parametric distributions together using S3 objects, but is less flexible.

The `distr`

package allows you to make distributions including empirical ones, and transform them, using S4 classes. distplyr aims to provide a simpler interface using S3 objects.

Please note that the ‘distplyr’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.