stats._distn_infrastructure

Module Contents

Classes

rv_frozen(self,dist,*args,**kwds)
rv_generic(self,seed=None) Class which encapsulates common functionality between rv_discrete
rv_continuous(self,momtype=1,a=None,b=None,xtol=1e-14,badvalue=None,name=None,longname=None,shapes=None,extradoc=None,seed=None) A generic continuous random variable class meant for subclassing.
rv_discrete(self,a=0,b=inf,name=None,badvalue=None,moment_tol=1e-08,values=None,inc=1,longname=None,shapes=None,extradoc=None,seed=None) A generic discrete random variable class meant for subclassing.
rv_sample(self,a=0,b=inf,name=None,badvalue=None,moment_tol=1e-08,values=None,inc=1,longname=None,shapes=None,extradoc=None,seed=None) A ‘sample’ discrete distribution defined by the support and values.

Functions

instancemethod(func,obj,cls)
_moment(data,n,mu=None)
_moment_from_stats(n,mu,mu2,g1,g2,moment_func,args)
_skew(data) skew is third central moment / variance**(1.5)
_kurtosis(data) kurtosis is fourth central moment / variance**2 - 3
argsreduce(cond,*args) Return the sequence of ravel(args[i]) where ravel(condition) is
_ncx2_log_pdf(x,df,nc)
_ncx2_pdf(x,df,nc)
_ncx2_cdf(x,df,nc)
_drv2_moment(self,n,*args) Non-central moment of discrete distribution.
_drv2_ppfsingle(self,q,*args)
entropy(pk,qk=None,base=None) Calculate the entropy of a distribution for given probability values.
_expect(fun,lb,ub,x0,inc,maxcount=1000,tolerance=1e-10,chunksize=32) Helper for computing the expectation value of fun.
_iter_chunked(x0,x1,chunksize=4,inc=1) Iterate from x0 to x1 in chunks of chunksize and steps inc.
get_distribution_names(namespace_pairs,rv_base_class) Collect names of statistical distributions and their generators.
instancemethod(func, obj, cls)
_moment(data, n, mu=None)
_moment_from_stats(n, mu, mu2, g1, g2, moment_func, args)
_skew(data)

skew is third central moment / variance**(1.5)

_kurtosis(data)

kurtosis is fourth central moment / variance**2 - 3

class rv_frozen(dist, *args, **kwds)
__init__(dist, *args, **kwds)
random_state()
random_state(seed)
pdf(x)
logpdf(x)
cdf(x)
logcdf(x)
ppf(q)
isf(q)
rvs(size=None, random_state=None)
sf(x)
logsf(x)
stats(moments="mv")
median()
mean()
var()
std()
moment(n)
entropy()
pmf(k)
logpmf(k)
interval(alpha)
expect(func=None, lb=None, ub=None, conditional=False, **kwds)
argsreduce(cond, *args)

Return the sequence of ravel(args[i]) where ravel(condition) is True in 1D.

>>> import numpy as np
>>> rand = np.random.random_sample
>>> A = rand((4, 5))
>>> B = 2
>>> C = rand((1, 5))
>>> cond = np.ones(A.shape)
>>> [A1, B1, C1] = argsreduce(cond, A, B, C)
>>> B1.shape
(20,)
>>> cond[2,:] = 0
>>> [A2, B2, C2] = argsreduce(cond, A, B, C)
>>> B2.shape
(15,)
_ncx2_log_pdf(x, df, nc)
_ncx2_pdf(x, df, nc)
_ncx2_cdf(x, df, nc)
class rv_generic(seed=None)

Class which encapsulates common functionality between rv_discrete and rv_continuous.

__init__(seed=None)
random_state()

Get or set the RandomState object for generating random variates.

This can be either None or an existing RandomState object.

If None (or np.random), use the RandomState singleton used by np.random. If already a RandomState instance, use it. If an int, use a new RandomState instance seeded with seed.

random_state(seed)
__getstate__()
__setstate__(state)
_construct_argparser(meths_to_inspect, locscale_in, locscale_out)

Construct the parser for the shape arguments.

Generates the argument-parsing functions dynamically and attaches them to the instance. Is supposed to be called in __init__ of a class for each distribution.

If self.shapes is a non-empty string, interprets it as a comma-separated list of shape parameters.

Otherwise inspects the call signatures of meths_to_inspect and constructs the argument-parsing functions from these. In this case also sets shapes and numargs.

_construct_doc(docdict, shapes_vals=None)

Construct the instance docstring with string substitutions.

_construct_default_doc(longname=None, extradoc=None, docdict=None, discrete="continuous")

Construct instance docstring from the default template.

freeze(*args, **kwds)

Freeze the distribution for the given arguments.

arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include loc and scale.
rv_frozen : rv_frozen instance
The frozen distribution.
__call__(*args, **kwds)
_stats(*args, **kwds)
_munp(n, *args)
_argcheck_rvs(*args, **kwargs)
_argcheck(*args)

Default check for correct values on args and keywords.

Returns condition array of 1’s where arguments are correct and
0’s where they are not.
_support_mask(x)
_open_support_mask(x)
_rvs(*args)
_logcdf(x, *args)
_sf(x, *args)
_logsf(x, *args)
_ppf(q, *args)
_isf(q, *args)
rvs(*args, **kwds)

Random variates of given type.

arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
Location parameter (default=0).
scale : array_like, optional
Scale parameter (default=1).
size : int or tuple of ints, optional
Defining number of random variates (default is 1).
random_state : None or int or np.random.RandomState instance, optional
If int or RandomState, use it for drawing the random variates. If None, rely on self.random_state. Default is None.
rvs : ndarray or scalar
Random variates of given size.
stats(*args, **kwds)

Some statistics of the given RV.

arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional (continuous RVs only)
scale parameter (default=1)
moments : str, optional
composed of letters [‘mvsk’] defining which moments to compute: ‘m’ = mean, ‘v’ = variance, ‘s’ = (Fisher’s) skew, ‘k’ = (Fisher’s) kurtosis. (default is ‘mv’)
stats : sequence
of requested moments.
entropy(*args, **kwds)

Differential entropy of the RV.

arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
Location parameter (default=0).
scale : array_like, optional (continuous distributions only).
Scale parameter (default=1).

Entropy is defined base e:

>>> drv = rv_discrete(values=((0, 1), (0.5, 0.5)))
>>> np.allclose(drv.entropy(), np.log(2.0))
True
moment(n, *args, **kwds)

n-th order non-central moment of distribution.

n : int, n >= 1
Order of moment.
arg1, arg2, arg3,… : float
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
median(*args, **kwds)

Median of the distribution.

arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
Location parameter, Default is 0.
scale : array_like, optional
Scale parameter, Default is 1.
median : float
The median of the distribution.
stats.distributions.rv_discrete.ppf
Inverse of the CDF
mean(*args, **kwds)

Mean of the distribution.

arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
mean : float
the mean of the distribution
var(*args, **kwds)

Variance of the distribution.

arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
var : float
the variance of the distribution
std(*args, **kwds)

Standard deviation of the distribution.

arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
std : float
standard deviation of the distribution
interval(alpha, *args, **kwds)

Confidence interval with equal areas around the median.

alpha : array_like of float
Probability that an rv will be drawn from the returned range. Each value should be in the range [0, 1].
arg1, arg2, … : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
location parameter, Default is 0.
scale : array_like, optional
scale parameter, Default is 1.
a, b : ndarray of float
end-points of range that contain 100 * alpha % of the rv’s possible values.
class rv_continuous(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, extradoc=None, seed=None)

A generic continuous random variable class meant for subclassing.

rv_continuous is a base class to construct specific distribution classes and instances for continuous random variables. It cannot be used directly as a distribution.

momtype : int, optional
The type of generic moment calculation to use: 0 for pdf, 1 (default) for ppf.
a : float, optional
Lower bound of the support of the distribution, default is minus infinity.
b : float, optional
Upper bound of the support of the distribution, default is plus infinity.
xtol : float, optional
The tolerance for fixed point calculation for generic ppf.
badvalue : float, optional
The value in a result arrays that indicates a value that for which some argument restriction is violated, default is np.nan.
name : str, optional
The name of the instance. This string is used to construct the default example for distributions.
longname : str, optional
This string is used as part of the first line of the docstring returned when a subclass has no docstring of its own. Note: longname exists for backwards compatibility, do not use for new subclasses.
shapes : str, optional
The shape of the distribution. For example "m, n" for a distribution that takes two integers as the two shape arguments for all its methods. If not provided, shape parameters will be inferred from the signature of the private methods, _pdf and _cdf of the instance.
extradoc : str, optional, deprecated
This string is used as the last part of the docstring returned when a subclass has no docstring of its own. Note: extradoc exists for backwards compatibility, do not use for new subclasses.
seed : None or int or numpy.random.RandomState instance, optional
This parameter defines the RandomState object to use for drawing random variates. If None (or np.random), the global np.random state is used. If integer, it is used to seed the local RandomState instance. Default is None.

rvs pdf logpdf cdf logcdf sf logsf ppf isf moment stats entropy expect median mean std var interval __call__ fit fit_loc_scale nnlf

Public methods of an instance of a distribution class (e.g., pdf, cdf) check their arguments and pass valid arguments to private, computational methods (_pdf, _cdf). For pdf(x), x is valid if it is within the support of a distribution, self.a <= x <= self.b. Whether a shape parameter is valid is decided by an _argcheck method (which defaults to checking that its arguments are strictly positive.)

Subclassing

New random variables can be defined by subclassing the rv_continuous class and re-defining at least the _pdf or the _cdf method (normalized to location 0 and scale 1).

If positive argument checking is not correct for your RV then you will also need to re-define the _argcheck method.

Correct, but potentially slow defaults exist for the remaining methods but for speed and/or accuracy you can over-ride:

_logpdf, _cdf, _logcdf, _ppf, _rvs, _isf, _sf, _logsf

Rarely would you override _isf, _sf or _logsf, but you could.

Methods that can be overwritten by subclasses

_rvs
_pdf
_cdf
_sf
_ppf
_isf
_stats
_munp
_entropy
_argcheck

There are additional (internal and private) generic methods that can be useful for cross-checking and for debugging, but might work in all cases when directly called.

A note on shapes: subclasses need not specify them explicitly. In this case, shapes will be automatically deduced from the signatures of the overridden methods (pdf, cdf etc). If, for some reason, you prefer to avoid relying on introspection, you can specify shapes explicitly as an argument to the instance constructor.

Frozen Distributions

Normally, you must provide shape parameters (and, optionally, location and scale parameters to each call of a method of a distribution.

Alternatively, the object may be called (as a function) to fix the shape, location, and scale parameters returning a “frozen” continuous RV object:

rv = generic(<shape(s)>, loc=0, scale=1)
frozen RV object with the same methods but holding the given shape, location, and scale fixed

Statistics

Statistics are computed using numerical integration by default. For speed you can redefine this using _stats:

  • take shape parameters and return mu, mu2, g1, g2
  • If you can’t compute one of these, return it as None
  • Can also be defined with a keyword argument moments, which is a string composed of “m”, “v”, “s”, and/or “k”. Only the components appearing in string should be computed and returned in the order “m”, “v”, “s”, or “k” with missing values returned as None.

Alternatively, you can override _munp, which takes n and shape parameters and returns the n-th non-central moment of the distribution.

To create a new Gaussian distribution, we would do the following:

>>> from scipy.stats import rv_continuous
>>> class gaussian_gen(rv_continuous):
...     "Gaussian distribution"
...     def _pdf(self, x):
...         return np.exp(-x**2 / 2.) / np.sqrt(2.0 * np.pi)
>>> gaussian = gaussian_gen(name='gaussian')

scipy.stats distributions are instances, so here we subclass rv_continuous and create an instance. With this, we now have a fully functional distribution with all relevant methods automagically generated by the framework.

Note that above we defined a standard normal distribution, with zero mean and unit variance. Shifting and scaling of the distribution can be done by using loc and scale parameters: gaussian.pdf(x, loc, scale) essentially computes y = (x - loc) / scale and gaussian._pdf(y) / scale.

__init__(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, extradoc=None, seed=None)
_updated_ctor_param()

Return the current version of _ctor_param, possibly updated by user.

Used by freezing and pickling. Keep this in sync with the signature of __init__.

_ppf_to_solve(x, q, *args)
_ppf_single(q, *args)
_mom_integ0(x, m, *args)
_mom0_sc(m, *args)
_mom_integ1(q, m, *args)
_mom1_sc(m, *args)
_pdf(x, *args)
_logpdf(x, *args)
_cdf_single(x, *args)
_cdf(x, *args)
pdf(x, *args, **kwds)

Probability density function at x of the given RV.

x : array_like
quantiles
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
pdf : ndarray
Probability density function evaluated at x
logpdf(x, *args, **kwds)

Log of the probability density function at x of the given RV.

This uses a more numerically accurate calculation if available.

x : array_like
quantiles
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
logpdf : array_like
Log of the probability density function evaluated at x
cdf(x, *args, **kwds)

Cumulative distribution function of the given RV.

x : array_like
quantiles
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
cdf : ndarray
Cumulative distribution function evaluated at x
logcdf(x, *args, **kwds)

Log of the cumulative distribution function at x of the given RV.

x : array_like
quantiles
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
logcdf : array_like
Log of the cumulative distribution function evaluated at x
sf(x, *args, **kwds)

Survival function (1 - cdf) at x of the given RV.

x : array_like
quantiles
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
sf : array_like
Survival function evaluated at x
logsf(x, *args, **kwds)

Log of the survival function of the given RV.

Returns the log of the “survival function,” defined as (1 - cdf), evaluated at x.

x : array_like
quantiles
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
logsf : ndarray
Log of the survival function evaluated at x.
ppf(q, *args, **kwds)

Percent point function (inverse of cdf) at q of the given RV.

q : array_like
lower tail probability
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
x : array_like
quantile corresponding to the lower tail probability q.
isf(q, *args, **kwds)

Inverse survival function (inverse of sf) at q of the given RV.

q : array_like
upper tail probability
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
location parameter (default=0)
scale : array_like, optional
scale parameter (default=1)
x : ndarray or scalar
Quantile corresponding to the upper tail probability q.
_nnlf(x, *args)
_unpack_loc_scale(theta)
nnlf(theta, x)

Return negative loglikelihood function.

This is -sum(log pdf(x, theta), axis=0) where theta are the parameters (including loc and scale).

_nnlf_and_penalty(x, args)
_penalized_nnlf(theta, x)

Return penalized negative loglikelihood function, i.e., - sum (log pdf(x, theta), axis=0) + penalty

where theta are the parameters (including loc and scale)
_fitstart(data, args=None)
_reduce_func(args, kwds)
fit(data, *args, **kwds)

Return MLEs for shape (if applicable), location, and scale parameters from data.

MLE stands for Maximum Likelihood Estimate. Starting estimates for the fit are given by input arguments; for any arguments not provided with starting estimates, self._fitstart(data) is called to generate such.

One can hold some parameters fixed to specific values by passing in keyword arguments f0, f1, …, fn (for shape parameters) and floc and fscale (for location and scale parameters, respectively).

data : array_like
Data to use in calculating the MLEs.
args : floats, optional
Starting value(s) for any shape-characterizing arguments (those not provided will be determined by a call to _fitstart(data)). No default value.
kwds : floats, optional

Starting values for the location and scale parameters; no default. Special keyword arguments are recognized as holding certain parameters fixed:

  • f0…fn : hold respective shape parameters fixed. Alternatively, shape parameters to fix can be specified by name. For example, if self.shapes == "a, b", fa``and ``fix_a are equivalent to f0, and fb and fix_b are equivalent to f1.
  • floc : hold location parameter fixed to specified value.
  • fscale : hold scale parameter fixed to specified value.
  • optimizer : The optimizer to use. The optimizer must take func, and starting position as the first two arguments, plus args (for extra arguments to pass to the function to be optimized) and disp=0 to suppress output as keyword arguments.
mle_tuple : tuple of floats
MLEs for any shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g. norm).

This fit is computed by maximizing a log-likelihood function, with penalty applied for samples outside of range of the distribution. The returned answer is not guaranteed to be the globally optimal MLE, it may only be locally optimal, or the optimization may fail altogether.

Generate some data to fit: draw random variates from the beta distribution

>>> from scipy.stats import beta
>>> a, b = 1., 2.
>>> x = beta.rvs(a, b, size=1000)

Now we can fit all four parameters (a, b, loc and scale):

>>> a1, b1, loc1, scale1 = beta.fit(x)

We can also use some prior knowledge about the dataset: let’s keep loc and scale fixed:

>>> a1, b1, loc1, scale1 = beta.fit(x, floc=0, fscale=1)
>>> loc1, scale1
(0, 1)

We can also keep shape parameters fixed by using f-keywords. To keep the zero-th shape parameter a equal 1, use f0=1 or, equivalently, fa=1:

>>> a1, b1, loc1, scale1 = beta.fit(x, fa=1, floc=0, fscale=1)
>>> a1
1

Not all distributions return estimates for the shape parameters. norm for example just returns estimates for location and scale:

>>> from scipy.stats import norm
>>> x = norm.rvs(a, b, size=1000, random_state=123)
>>> loc1, scale1 = norm.fit(x)
>>> loc1, scale1
(0.92087172783841631, 2.0015750750324668)
_fit_loc_scale_support(data, *args)

Estimate loc and scale parameters from data accounting for support.

data : array_like
Data to fit.
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
Lhat : float
Estimated location parameter for the data.
Shat : float
Estimated scale parameter for the data.
fit_loc_scale(data, *args)

Estimate loc and scale parameters from data using 1st and 2nd moments.

data : array_like
Data to fit.
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
Lhat : float
Estimated location parameter for the data.
Shat : float
Estimated scale parameter for the data.
_entropy(*args)
expect(func=None, args=tuple, loc=0, scale=1, lb=None, ub=None, conditional=False, **kwds)

Calculate expected value of a function with respect to the distribution.

The expected value of a function f(x) with respect to a distribution dist is defined as:

        ubound
E[x] = Integral(f(x) * dist.pdf(x))
        lbound
func : callable, optional
Function for which integral is calculated. Takes only one argument. The default is the identity mapping f(x) = x.
args : tuple, optional
Shape parameters of the distribution.
loc : float, optional
Location parameter (default=0).
scale : float, optional
Scale parameter (default=1).
lb, ub : scalar, optional
Lower and upper bound for integration. Default is set to the support of the distribution.
conditional : bool, optional
If True, the integral is corrected by the conditional probability of the integration interval. The return value is the expectation of the function, conditional on being in the given interval. Default is False.

Additional keyword arguments are passed to the integration routine.

expect : float
The calculated expected value.

The integration behavior of this function is inherited from integrate.quad.

_drv2_moment(self, n, *args)

Non-central moment of discrete distribution.

_drv2_ppfsingle(self, q, *args)
entropy(pk, qk=None, base=None)

Calculate the entropy of a distribution for given probability values.

If only probabilities pk are given, the entropy is calculated as S = -sum(pk * log(pk), axis=0).

If qk is not None, then compute the Kullback-Leibler divergence S = sum(pk * log(pk / qk), axis=0).

This routine will normalize pk and qk if they don’t sum to 1.

pk : sequence
Defines the (discrete) distribution. pk[i] is the (possibly unnormalized) probability of event i.
qk : sequence, optional
Sequence against which the relative entropy is computed. Should be in the same format as pk.
base : float, optional
The logarithmic base to use, defaults to e (natural logarithm).
S : float
The calculated entropy.
class rv_discrete(a=0, b=inf, name=None, badvalue=None, moment_tol=1e-08, values=None, inc=1, longname=None, shapes=None, extradoc=None, seed=None)

A generic discrete random variable class meant for subclassing.

rv_discrete is a base class to construct specific distribution classes and instances for discrete random variables. It can also be used to construct an arbitrary distribution defined by a list of support points and corresponding probabilities.

a : float, optional
Lower bound of the support of the distribution, default: 0
b : float, optional
Upper bound of the support of the distribution, default: plus infinity
moment_tol : float, optional
The tolerance for the generic calculation of moments.
values : tuple of two array_like, optional
(xk, pk) where xk are integers with non-zero probabilities pk with sum(pk) = 1.
inc : integer, optional
Increment for the support of the distribution. Default is 1. (other values have not been tested)
badvalue : float, optional
The value in a result arrays that indicates a value that for which some argument restriction is violated, default is np.nan.
name : str, optional
The name of the instance. This string is used to construct the default example for distributions.
longname : str, optional
This string is used as part of the first line of the docstring returned when a subclass has no docstring of its own. Note: longname exists for backwards compatibility, do not use for new subclasses.
shapes : str, optional
The shape of the distribution. For example “m, n” for a distribution that takes two integers as the two shape arguments for all its methods If not provided, shape parameters will be inferred from the signatures of the private methods, _pmf and _cdf of the instance.
extradoc : str, optional
This string is used as the last part of the docstring returned when a subclass has no docstring of its own. Note: extradoc exists for backwards compatibility, do not use for new subclasses.
seed : None or int or numpy.random.RandomState instance, optional
This parameter defines the RandomState object to use for drawing random variates. If None, the global np.random state is used. If integer, it is used to seed the local RandomState instance. Default is None.

rvs pmf logpmf cdf logcdf sf logsf ppf isf moment stats entropy expect median mean std var interval __call__

This class is similar to rv_continuous. Whether a shape parameter is valid is decided by an _argcheck method (which defaults to checking that its arguments are strictly positive.) The main differences are:

  • the support of the distribution is a set of integers
  • instead of the probability density function, pdf (and the corresponding private _pdf), this class defines the probability mass function, pmf (and the corresponding private _pmf.)
  • scale parameter is not defined.

To create a new discrete distribution, we would do the following:

>>> from scipy.stats import rv_discrete
>>> class poisson_gen(rv_discrete):
...     "Poisson distribution"
...     def _pmf(self, k, mu):
...         return exp(-mu) * mu**k / factorial(k)

and create an instance:

>>> poisson = poisson_gen(name="poisson")

Note that above we defined the Poisson distribution in the standard form. Shifting the distribution can be done by providing the loc parameter to the methods of the instance. For example, poisson.pmf(x, mu, loc) delegates the work to poisson._pmf(x-loc, mu).

Discrete distributions from a list of probabilities

Alternatively, you can construct an arbitrary discrete rv defined on a finite set of values xk with Prob{X=xk} = pk by using the values keyword argument to the rv_discrete constructor.

Custom made discrete distribution:

>>> from scipy import stats
>>> xk = np.arange(7)
>>> pk = (0.1, 0.2, 0.3, 0.1, 0.1, 0.0, 0.2)
>>> custm = stats.rv_discrete(name='custm', values=(xk, pk))
>>>
>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots(1, 1)
>>> ax.plot(xk, custm.pmf(xk), 'ro', ms=12, mec='r')
>>> ax.vlines(xk, 0, custm.pmf(xk), colors='r', lw=4)
>>> plt.show()

Random number generation:

>>> R = custm.rvs(size=100)
__new__(a=0, b=inf, name=None, badvalue=None, moment_tol=1e-08, values=None, inc=1, longname=None, shapes=None, extradoc=None, seed=None)
__init__(a=0, b=inf, name=None, badvalue=None, moment_tol=1e-08, values=None, inc=1, longname=None, shapes=None, extradoc=None, seed=None)
_construct_docstrings(name, longname, extradoc)
_updated_ctor_param()

Return the current version of _ctor_param, possibly updated by user.

Used by freezing and pickling. Keep this in sync with the signature of __init__.

_nonzero(k, *args)
_pmf(k, *args)
_logpmf(k, *args)
_cdf_single(k, *args)
_cdf(x, *args)
rvs(*args, **kwargs)

Random variates of given type.

arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
Location parameter (default=0).
size : int or tuple of ints, optional
Defining number of random variates (Default is 1). Note that size has to be given as keyword, not as positional argument.
random_state : None or int or np.random.RandomState instance, optional
If int or RandomState, use it for drawing the random variates. If None, rely on self.random_state. Default is None.
rvs : ndarray or scalar
Random variates of given size.
pmf(k, *args, **kwds)

Probability mass function at k of the given RV.

k : array_like
Quantiles.
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
loc : array_like, optional
Location parameter (default=0).
pmf : array_like
Probability mass function evaluated at k
logpmf(k, *args, **kwds)

Log of the probability mass function at k of the given RV.

k : array_like
Quantiles.
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
Location parameter. Default is 0.
logpmf : array_like
Log of the probability mass function evaluated at k.
cdf(k, *args, **kwds)

Cumulative distribution function of the given RV.

k : array_like, int
Quantiles.
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
Location parameter (default=0).
cdf : ndarray
Cumulative distribution function evaluated at k.
logcdf(k, *args, **kwds)

Log of the cumulative distribution function at k of the given RV.

k : array_like, int
Quantiles.
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
Location parameter (default=0).
logcdf : array_like
Log of the cumulative distribution function evaluated at k.
sf(k, *args, **kwds)

Survival function (1 - cdf) at k of the given RV.

k : array_like
Quantiles.
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
Location parameter (default=0).
sf : array_like
Survival function evaluated at k.
logsf(k, *args, **kwds)

Log of the survival function of the given RV.

Returns the log of the “survival function,” defined as 1 - cdf, evaluated at k.

k : array_like
Quantiles.
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
Location parameter (default=0).
logsf : ndarray
Log of the survival function evaluated at k.
ppf(q, *args, **kwds)

Percent point function (inverse of cdf) at q of the given RV.

q : array_like
Lower tail probability.
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
Location parameter (default=0).
k : array_like
Quantile corresponding to the lower tail probability, q.
isf(q, *args, **kwds)

Inverse survival function (inverse of sf) at q of the given RV.

q : array_like
Upper tail probability.
arg1, arg2, arg3,… : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
loc : array_like, optional
Location parameter (default=0).
k : ndarray or scalar
Quantile corresponding to the upper tail probability, q.
_entropy(*args)
expect(func=None, args=tuple, loc=0, lb=None, ub=None, conditional=False, maxcount=1000, tolerance=1e-10, chunksize=32)

Calculate expected value of a function with respect to the distribution for discrete distribution.

func : callable, optional
Function for which the expectation value is calculated. Takes only one argument. The default is the identity mapping f(k) = k.
args : tuple, optional
Shape parameters of the distribution.
loc : float, optional
Location parameter. Default is 0.
lb, ub : int, optional
Lower and upper bound for the summation, default is set to the support of the distribution, inclusive (ul <= k <= ub).
conditional : bool, optional
If true then the expectation is corrected by the conditional probability of the summation interval. The return value is the expectation of the function, func, conditional on being in the given interval (k such that ul <= k <= ub). Default is False.
maxcount : int, optional
Maximal number of terms to evaluate (to avoid an endless loop for an infinite sum). Default is 1000.
tolerance : float, optional
Absolute tolerance for the summation. Default is 1e-10.
chunksize : int, optional
Iterate over the support of a distributions in chunks of this size. Default is 32.
expect : float
Expected value.

For heavy-tailed distributions, the expected value may or may not exist, depending on the function, func. If it does exist, but the sum converges slowly, the accuracy of the result may be rather low. For instance, for zipf(4), accuracy for mean, variance in example is only 1e-5. increasing maxcount and/or chunksize may improve the result, but may also make zipf very slow.

The function is not vectorized.

_expect(fun, lb, ub, x0, inc, maxcount=1000, tolerance=1e-10, chunksize=32)

Helper for computing the expectation value of fun.

_iter_chunked(x0, x1, chunksize=4, inc=1)

Iterate from x0 to x1 in chunks of chunksize and steps inc.

x0 must be finite, x1 need not be. In the latter case, the iterator is infinite. Handles both x0 < x1 and x0 > x1. In the latter case, iterates downwards (make sure to set inc < 0.)

>>> [x for x in _iter_chunked(2, 5, inc=2)]
[array([2, 4])]
>>> [x for x in _iter_chunked(2, 11, inc=2)]
[array([2, 4, 6, 8]), array([10])]
>>> [x for x in _iter_chunked(2, -5, inc=-2)]
[array([ 2,  0, -2, -4])]
>>> [x for x in _iter_chunked(2, -9, inc=-2)]
[array([ 2,  0, -2, -4]), array([-6, -8])]
class rv_sample(a=0, b=inf, name=None, badvalue=None, moment_tol=1e-08, values=None, inc=1, longname=None, shapes=None, extradoc=None, seed=None)

A ‘sample’ discrete distribution defined by the support and values.

The ctor ignores most of the arguments, only needs the values argument.

__init__(a=0, b=inf, name=None, badvalue=None, moment_tol=1e-08, values=None, inc=1, longname=None, shapes=None, extradoc=None, seed=None)
_pmf(x)
_cdf(x)
_ppf(q)
_rvs()
_entropy()
generic_moment(n)
get_distribution_names(namespace_pairs, rv_base_class)

Collect names of statistical distributions and their generators.

namespace_pairs : sequence
A snapshot of (name, value) pairs in the namespace of a module.
rv_base_class : class
The base class of random variable generator classes in a module.
distn_names : list of strings
Names of the statistical distributions.
distn_gen_names : list of strings
Names of the generators of the statistical distributions. Note that these are not simply the names of the statistical distributions, with a _gen suffix added.