`stepdst.Rmd`

A common way of representing data is through a parametric distribution, such as a Normal distribution, Exponential, Poisson, and others. A useful alternative is to use an empirical distribution, or in general what `distplyr`

calls a *step distribution*.

To make a step distribution, use the function `stepdst()`

:

```
## Step Distribution
##
## Number of Discontinuities: 22
```

The “step” in the name comes from the cdf:

`plot(d1, "cdf", n = 1001)`

You can also weigh the outcomes differently. This is useful for explicitly specifying a probability mass function, as well as for other applications such as using kernel smoothing to find a conditional distribution. Here is an estimate of the conditional distribution of `hp`

given `disp = 150`

, with cdf depicted as the dashed line compared o the marginal with the solid line:

```
K <- function(x) dnorm(x, sd = 25)
d2 <- stepdst(hp, data = mtcars, weights = K(disp - 150))
plot(d1, "cdf", n = 1001)
plot(d2, "cdf", n = 1001, lty = 2, add = TRUE)
```

This is much more informative compared with a point prediction of `hp`

when `disp = 150`

. Such a prediction might be:

`get_mean(d2)`

`## [1] 109.961`

With a distribution, you can get much more, such as a prediction interval. Here’s a 90% interval:

`eval_quantile(d2, at = c(0.05, 0.95))`

`## [1] 62 175`

Here’s the proportion of variance that’s reduced compared to the marginal:

`1 - get_variance(d2) / get_variance(d1)`

`## [1] 0.8031741`

You can extract the step discontinuities in any distribution, using the `discontinuities()`

function. It will give you the location of the discontinuities, and the size of the jump in the cdf:

`discontinuities(d2)`

```
## location size
## 1 52 1.471993e-03
## 2 62 1.208194e-01
## 3 65 8.376466e-04
## 4 66 4.247905e-03
## 5 91 6.017982e-02
## 6 93 2.971967e-02
## 7 95 1.138973e-01
## 8 97 5.960867e-02
## 9 105 1.353927e-03
## 10 109 6.219092e-02
## 11 110 2.250234e-01
## 12 113 1.093317e-02
## 13 123 1.902518e-01
## 14 150 7.207284e-10
## 15 175 1.194633e-01
## 16 180 1.160516e-06
## 17 205 1.154540e-37
## 18 215 4.981505e-35
## 19 230 7.355083e-31
## 20 245 1.601543e-15
## 21 264 1.119890e-15
## 22 335 1.458954e-09
```

For continuous distributions, there are no discontinuities:

`discontinuities(dst_norm(0, 1))`

```
## [1] location size
## <0 rows> (or 0-length row.names)
```