stats.mstats_basic

An extension of scipy.stats.stats to support masked arrays

Module Contents

Functions

_chk_asarray(a,axis)
_chk2_asarray(a,b,axis)
_chk_size(a,b)
argstoarray(*args) Constructs a 2D array from a group of sequences.
find_repeats(arr) Find repeats in arr and return a tuple (repeats, repeat_count).
count_tied_groups(x,use_missing=False) Counts the number of tied values.
rankdata(data,axis=None,use_missing=False) Returns the rank (also known as order statistics) of each data point
mode(a,axis=0) Returns an array of the modal (most common) value in the passed array.
_betai(a,b,x)
msign(x) Returns the sign of x, or 0 if x is masked.
pearsonr(x,y) Calculates a Pearson correlation coefficient and the p-value for testing
spearmanr(x,y,use_ties=True) Calculates a Spearman rank-order correlation coefficient and the p-value
kendalltau(x,y,use_ties=True,use_missing=False) Computes Kendall’s rank correlation tau on two variables x and y.
kendalltau_seasonal(x) Computes a multivariate Kendall’s rank correlation tau, for seasonal data.
pointbiserialr(x,y) Calculates a point biserial correlation coefficient and its p-value.
linregress(x,y=None) Linear regression calculation
theilslopes(y,x=None,alpha=0.95) r
sen_seasonal_slopes(x)
ttest_1samp(a,popmean,axis=0) Calculates the T-test for the mean of ONE group of scores.
ttest_ind(a,b,axis=0,equal_var=True) Calculates the T-test for the means of TWO INDEPENDENT samples of scores.
ttest_rel(a,b,axis=0) Calculates the T-test on TWO RELATED samples of scores, a and b.
mannwhitneyu(x,y,use_continuity=True) Computes the Mann-Whitney statistic
kruskal(*args) Compute the Kruskal-Wallis H-test for independent samples
ks_twosamp(data1,data2,alternative=”two-sided”) Computes the Kolmogorov-Smirnov test on two samples.
trima(a,limits=None,inclusive=tuple) Trims an array by masking the data outside some given limits.
trimr(a,limits=None,inclusive=tuple,axis=None) Trims an array by masking some proportion of the data on each end.
trim(a,limits=None,inclusive=tuple,relative=False,axis=None) Trims an array by masking the data outside some given limits.
trimboth(data,proportiontocut=0.2,inclusive=tuple,axis=None) Trims the smallest and largest data values.
trimtail(data,proportiontocut=0.2,tail=”left”,inclusive=tuple,axis=None) Trims the data by masking values from one tail.
trimmed_mean(a,limits=tuple,inclusive=tuple,relative=True,axis=None) Returns the trimmed mean of the data along the given axis.
trimmed_var(a,limits=tuple,inclusive=tuple,relative=True,axis=None,ddof=0) Returns the trimmed variance of the data along the given axis.
trimmed_std(a,limits=tuple,inclusive=tuple,relative=True,axis=None,ddof=0) Returns the trimmed standard deviation of the data along the given axis.
trimmed_stde(a,limits=tuple,inclusive=tuple,axis=None) Returns the standard error of the trimmed mean along the given axis.
_mask_to_limits(a,limits,inclusive) Mask an array for values outside of given limits.
tmean(a,limits=None,inclusive=tuple,axis=None) Compute the trimmed mean.
tvar(a,limits=None,inclusive=tuple,axis=0,ddof=1) Compute the trimmed variance
tmin(a,lowerlimit=None,axis=0,inclusive=True) Compute the trimmed minimum
tmax(a,upperlimit=None,axis=0,inclusive=True) Compute the trimmed maximum
tsem(a,limits=None,inclusive=tuple,axis=0,ddof=1) Compute the trimmed standard error of the mean.
winsorize(a,limits=None,inclusive=tuple,inplace=False,axis=None) Returns a Winsorized version of the input array.
moment(a,moment=1,axis=0) Calculates the nth moment about the mean for a sample.
variation(a,axis=0) Computes the coefficient of variation, the ratio of the biased standard
skew(a,axis=0,bias=True) Computes the skewness of a data set.
kurtosis(a,axis=0,fisher=True,bias=True) Computes the kurtosis (Fisher or Pearson) of a dataset.
describe(a,axis=0,ddof=0,bias=True) Computes several descriptive statistics of the passed array.
stde_median(data,axis=None) Returns the McKean-Schrader estimate of the standard error of the sample
skewtest(a,axis=0) Tests whether the skew is different from the normal distribution.
kurtosistest(a,axis=0) Tests whether a dataset has normal kurtosis
normaltest(a,axis=0) Tests whether a sample differs from a normal distribution.
mquantiles(a,prob=list,alphap=0.4,betap=0.4,axis=None,limit=tuple) Computes empirical quantiles for a data array.
scoreatpercentile(data,per,limit=tuple,alphap=0.4,betap=0.4) Calculate the score at the given ‘per’ percentile of the
plotting_positions(data,alpha=0.4,beta=0.4) Returns plotting positions (or empirical percentile points) for the data.
obrientransform(*args) Computes a transform on input data (any number of columns). Used to
sem(a,axis=0,ddof=1) Calculates the standard error of the mean of the input array.
f_oneway(*args) Performs a 1-way ANOVA, returning an F-value and probability given
friedmanchisquare(*args) Friedman Chi-Square is a non-parametric, one-way within-subjects ANOVA.
_chk_asarray(a, axis)
_chk2_asarray(a, b, axis)
_chk_size(a, b)
argstoarray(*args)

Constructs a 2D array from a group of sequences.

Sequences are filled with missing values to match the length of the longest sequence.

args : sequences
Group of sequences.
argstoarray : MaskedArray
A ( m x n ) masked array, where m is the number of arguments and n the length of the longest argument.

numpy.ma.row_stack has identical behavior, but is called with a sequence of sequences.

find_repeats(arr)

Find repeats in arr and return a tuple (repeats, repeat_count).

The input is cast to float64. Masked values are discarded.

arr : sequence
Input array. The array is flattened if it is not 1D.
repeats : ndarray
Array of repeated values.
counts : ndarray
Array of counts.
count_tied_groups(x, use_missing=False)

Counts the number of tied values.

x : sequence
Sequence of data on which to counts the ties
use_missing : bool, optional
Whether to consider missing values as tied.
count_tied_groups : dict
Returns a dictionary (nb of ties: nb of groups).
>>> from scipy.stats import mstats
>>> z = [0, 0, 0, 2, 2, 2, 3, 3, 4, 5, 6]
>>> mstats.count_tied_groups(z)
{2: 1, 3: 2}

In the above example, the ties were 0 (3x), 2 (3x) and 3 (2x).

>>> z = np.ma.array([0, 0, 1, 2, 2, 2, 3, 3, 4, 5, 6])
>>> mstats.count_tied_groups(z)
{2: 2, 3: 1}
>>> z[[1,-1]] = np.ma.masked
>>> mstats.count_tied_groups(z, use_missing=True)
{2: 2, 3: 1}
rankdata(data, axis=None, use_missing=False)

Returns the rank (also known as order statistics) of each data point along the given axis.

If some values are tied, their rank is averaged. If some values are masked, their rank is set to 0 if use_missing is False, or set to the average rank of the unmasked values if use_missing is True.

data : sequence
Input data. The data is transformed to a masked array
axis : {None,int}, optional
Axis along which to perform the ranking. If None, the array is first flattened. An exception is raised if the axis is specified for arrays with a dimension larger than 2
use_missing : bool, optional
Whether the masked values have a rank of 0 (False) or equal to the average rank of the unmasked values (True).
mode(a, axis=0)

Returns an array of the modal (most common) value in the passed array.

a : array_like
n-dimensional array of which to find mode(s).
axis : int or None, optional
Axis along which to operate. Default is 0. If None, compute over the whole array a.
mode : ndarray
Array of modal values.
count : ndarray
Array of counts for each mode.

For more details, see stats.mode.

_betai(a, b, x)
msign(x)

Returns the sign of x, or 0 if x is masked.

pearsonr(x, y)

Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.

The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are probably reasonable for datasets larger than 500 or so.

x : 1-D array_like
Input
y : 1-D array_like
Input
pearsonr : float
Pearson’s correlation coefficient, 2-tailed p-value.

http://www.statsoft.com/textbook/glosp.html#Pearson%20Correlation

spearmanr(x, y, use_ties=True)

Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.

The Spearman correlation is a nonparametric measure of the linear relationship between two datasets. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply a monotonic relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

Missing values are discarded pair-wise: if a value is missing in x, the corresponding value in y is masked.

The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Spearman correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are probably reasonable for datasets larger than 500 or so.

x : array_like
The length of x must be > 2.
y : array_like
The length of y must be > 2.
use_ties : bool, optional
Whether the correction for ties should be computed.
correlation : float
Spearman correlation coefficient
pvalue : float
2-tailed p-value.

[CRCProbStat2000] section 14.7

kendalltau(x, y, use_ties=True, use_missing=False)

Computes Kendall’s rank correlation tau on two variables x and y.

x : sequence
First data list (for example, time).
y : sequence
Second data list.
use_ties : {True, False}, optional
Whether ties correction should be performed.
use_missing : {False, True}, optional
Whether missing data should be allocated a rank of 0 (False) or the average rank (True)
correlation : float
Kendall tau
pvalue : float
Approximate 2-side p-value.
kendalltau_seasonal(x)

Computes a multivariate Kendall’s rank correlation tau, for seasonal data.

x : 2-D ndarray
Array of seasonal data, with seasons in columns.
pointbiserialr(x, y)

Calculates a point biserial correlation coefficient and its p-value.

x : array_like of bools
Input array.
y : array_like
Input array.
correlation : float
R value
pvalue : float
2-tailed p-value

Missing values are considered pair-wise: if a value is missing in x, the corresponding value in y is masked.

For more details on pointbiserialr, see stats.pointbiserialr.

linregress(x, y=None)

Linear regression calculation

Note that the non-masked version is used, and that this docstring is replaced by the non-masked docstring + some info on missing data.

theilslopes(y, x=None, alpha=0.95)

r Computes the Theil-Sen estimator for a set of points (x, y).

theilslopes implements a method for robust linear regression. It computes the slope as the median of all slopes between paired values.

y : array_like
Dependent variable.
x : array_like or None, optional
Independent variable. If None, use arange(len(y)) instead.
alpha : float, optional
Confidence degree between 0 and 1. Default is 95% confidence. Note that alpha is symmetric around 0.5, i.e. both 0.1 and 0.9 are interpreted as “find the 90% confidence interval”.
medslope : float
Theil slope.
medintercept : float
Intercept of the Theil line, as median(y) - medslope*median(x).
lo_slope : float
Lower bound of the confidence interval on medslope.
up_slope : float
Upper bound of the confidence interval on medslope.

For more details on theilslopes, see stats.theilslopes.

sen_seasonal_slopes(x)
ttest_1samp(a, popmean, axis=0)

Calculates the T-test for the mean of ONE group of scores.

a : array_like
sample observation
popmean : float or array_like
expected value in null hypothesis, if array_like than it must have the same shape as a excluding the axis dimension
axis : int or None, optional
Axis along which to compute test. If None, compute over the whole array a.
statistic : float or array
t-statistic
pvalue : float or array
two-tailed p-value

For more details on ttest_1samp, see stats.ttest_1samp.

ttest_ind(a, b, axis=0, equal_var=True)

Calculates the T-test for the means of TWO INDEPENDENT samples of scores.

a, b : array_like
The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
axis : int or None, optional
Axis along which to compute test. If None, compute over the whole arrays, a, and b.
equal_var : bool, optional
If True, perform a standard independent 2 sample test that assumes equal population variances. If False, perform Welch’s t-test, which does not assume equal population variance. .. versionadded:: 0.17.0
statistic : float or array
The calculated t-statistic.
pvalue : float or array
The two-tailed p-value.

For more details on ttest_ind, see stats.ttest_ind.

ttest_rel(a, b, axis=0)

Calculates the T-test on TWO RELATED samples of scores, a and b.

a, b : array_like
The arrays must have the same shape.
axis : int or None, optional
Axis along which to compute test. If None, compute over the whole arrays, a, and b.
statistic : float or array
t-statistic
pvalue : float or array
two-tailed p-value

For more details on ttest_rel, see stats.ttest_rel.

mannwhitneyu(x, y, use_continuity=True)

Computes the Mann-Whitney statistic

Missing values in x and/or y are discarded.

x : sequence
Input
y : sequence
Input
use_continuity : {True, False}, optional
Whether a continuity correction (1/2.) should be taken into account.
statistic : float
The Mann-Whitney statistics
pvalue : float
Approximate p-value assuming a normal distribution.
kruskal(*args)

Compute the Kruskal-Wallis H-test for independent samples

sample1, sample2, … : array_like
Two or more arrays with the sample measurements can be given as arguments.
statistic : float
The Kruskal-Wallis H statistic, corrected for ties
pvalue : float
The p-value for the test using the assumption that H has a chi square distribution

For more details on kruskal, see stats.kruskal.

ks_twosamp(data1, data2, alternative="two-sided")

Computes the Kolmogorov-Smirnov test on two samples.

Missing values are discarded.

data1 : array_like
First data set
data2 : array_like
Second data set
alternative : {‘two-sided’, ‘less’, ‘greater’}, optional
Indicates the alternative hypothesis. Default is ‘two-sided’.
d : float
Value of the Kolmogorov Smirnov test
p : float
Corresponding p-value.
trima(a, limits=None, inclusive=tuple)

Trims an array by masking the data outside some given limits.

Returns a masked version of the input array.

a : array_like
Input array.
limits : {None, tuple}, optional
Tuple of (lower limit, upper limit) in absolute values. Values of the input array lower (greater) than the lower (upper) limit will be masked. A limit is None indicates an open interval.
inclusive : (bool, bool) tuple, optional
Tuple of (lower flag, upper flag), indicating whether values exactly equal to the lower (upper) limit are allowed.
trimr(a, limits=None, inclusive=tuple, axis=None)

Trims an array by masking some proportion of the data on each end. Returns a masked version of the input array.

a : sequence
Input array.
limits : {None, tuple}, optional
Tuple of the percentages to cut on each side of the array, with respect to the number of unmasked data, as floats between 0. and 1. Noting n the number of unmasked data before trimming, the (n*limits[0])th smallest data and the (n*limits[1])th largest data are masked, and the total number of unmasked data after trimming is n*(1.-sum(limits)). The value of one limit can be set to None to indicate an open interval.
inclusive : {(True,True) tuple}, optional
Tuple of flags indicating whether the number of data being masked on the left (right) end should be truncated (True) or rounded (False) to integers.
axis : {None,int}, optional
Axis along which to trim. If None, the whole array is trimmed, but its shape is maintained.
trim(a, limits=None, inclusive=tuple, relative=False, axis=None)

Trims an array by masking the data outside some given limits.

Returns a masked version of the input array.

%s

>>> from scipy.stats.mstats import trim
>>> z = [ 1, 2, 3, 4, 5, 6, 7, 8, 9,10]
>>> print(trim(z,(3,8)))
[-- -- 3 4 5 6 7 8 -- --]
>>> print(trim(z,(0.1,0.2),relative=True))
[-- 2 3 4 5 6 7 8 -- --]
trimboth(data, proportiontocut=0.2, inclusive=tuple, axis=None)

Trims the smallest and largest data values.

Trims the data by masking the int(proportiontocut * n) smallest and int(proportiontocut * n) largest values of data along the given axis, where n is the number of unmasked values before trimming.

data : ndarray
Data to trim.
proportiontocut : float, optional
Percentage of trimming (as a float between 0 and 1). If n is the number of unmasked values before trimming, the number of values after trimming is (1 - 2*proportiontocut) * n. Default is 0.2.
inclusive : {(bool, bool) tuple}, optional
Tuple indicating whether the number of data being masked on each side should be rounded (True) or truncated (False).
axis : int, optional
Axis along which to perform the trimming. If None, the input array is first flattened.
trimtail(data, proportiontocut=0.2, tail="left", inclusive=tuple, axis=None)

Trims the data by masking values from one tail.

data : array_like
Data to trim.
proportiontocut : float, optional
Percentage of trimming. If n is the number of unmasked values before trimming, the number of values after trimming is (1 - proportiontocut) * n. Default is 0.2.
tail : {‘left’,’right’}, optional
If ‘left’ the proportiontocut lowest values will be masked. If ‘right’ the proportiontocut highest values will be masked. Default is ‘left’.
inclusive : {(bool, bool) tuple}, optional
Tuple indicating whether the number of data being masked on each side should be rounded (True) or truncated (False). Default is (True, True).
axis : int, optional
Axis along which to perform the trimming. If None, the input array is first flattened. Default is None.
trimtail : ndarray
Returned array of same shape as data with masked tail values.
trimmed_mean(a, limits=tuple, inclusive=tuple, relative=True, axis=None)

Returns the trimmed mean of the data along the given axis.

%s

trimmed_var(a, limits=tuple, inclusive=tuple, relative=True, axis=None, ddof=0)

Returns the trimmed variance of the data along the given axis.

%s ddof : {0,integer}, optional

Means Delta Degrees of Freedom. The denominator used during computations is (n-ddof). DDOF=0 corresponds to a biased estimate, DDOF=1 to an un- biased estimate of the variance.
trimmed_std(a, limits=tuple, inclusive=tuple, relative=True, axis=None, ddof=0)

Returns the trimmed standard deviation of the data along the given axis.

%s ddof : {0,integer}, optional

Means Delta Degrees of Freedom. The denominator used during computations is (n-ddof). DDOF=0 corresponds to a biased estimate, DDOF=1 to an un- biased estimate of the variance.
trimmed_stde(a, limits=tuple, inclusive=tuple, axis=None)

Returns the standard error of the trimmed mean along the given axis.

a : sequence
Input array
limits : {(0.1,0.1), tuple of float}, optional

tuple (lower percentage, upper percentage) to cut on each side of the array, with respect to the number of unmasked data.

If n is the number of unmasked data before trimming, the values smaller than n * limits[0] and the values larger than n * `limits[1] are masked, and the total number of unmasked data after trimming is n * (1.-sum(limits)). In each case, the value of one limit can be set to None to indicate an open interval. If limits is None, no trimming is performed.

inclusive : {(bool, bool) tuple} optional
Tuple indicating whether the number of data being masked on each side should be rounded (True) or truncated (False).
axis : int, optional
Axis along which to trim.

trimmed_stde : scalar or ndarray

_mask_to_limits(a, limits, inclusive)

Mask an array for values outside of given limits.

This is primarily a utility function.

a : array limits : (float or None, float or None) A tuple consisting of the (lower limit, upper limit). Values in the input array less than the lower limit or greater than the upper limit will be masked out. None implies no limit. inclusive : (bool, bool) A tuple consisting of the (lower flag, upper flag). These flags determine whether values exactly equal to lower or upper are allowed.

A MaskedArray.

A ValueError if there are no values within the given limits.

tmean(a, limits=None, inclusive=tuple, axis=None)

Compute the trimmed mean.

a : array_like
Array of values.
limits : None or (lower limit, upper limit), optional
Values in the input array less than the lower limit or greater than the upper limit will be ignored. When limits is None (default), then all values are used. Either of the limit values in the tuple can also be None representing a half-open interval.
inclusive : (bool, bool), optional
A tuple consisting of the (lower flag, upper flag). These flags determine whether values exactly equal to the lower or upper limits are included. The default value is (True, True).
axis : int or None, optional
Axis along which to operate. If None, compute over the whole array. Default is None.

tmean : float

For more details on tmean, see stats.tmean.

tvar(a, limits=None, inclusive=tuple, axis=0, ddof=1)

Compute the trimmed variance

This function computes the sample variance of an array of values, while ignoring values which are outside of given limits.

a : array_like
Array of values.
limits : None or (lower limit, upper limit), optional
Values in the input array less than the lower limit or greater than the upper limit will be ignored. When limits is None, then all values are used. Either of the limit values in the tuple can also be None representing a half-open interval. The default value is None.
inclusive : (bool, bool), optional
A tuple consisting of the (lower flag, upper flag). These flags determine whether values exactly equal to the lower or upper limits are included. The default value is (True, True).
axis : int or None, optional
Axis along which to operate. If None, compute over the whole array. Default is zero.
ddof : int, optional
Delta degrees of freedom. Default is 1.
tvar : float
Trimmed variance.

For more details on tvar, see stats.tvar.

tmin(a, lowerlimit=None, axis=0, inclusive=True)

Compute the trimmed minimum

a : array_like
array of values
lowerlimit : None or float, optional
Values in the input array less than the given limit will be ignored. When lowerlimit is None, then all values are used. The default value is None.
axis : int or None, optional
Axis along which to operate. Default is 0. If None, compute over the whole array a.
inclusive : {True, False}, optional
This flag determines whether values exactly equal to the lower limit are included. The default value is True.

tmin : float, int or ndarray

For more details on tmin, see stats.tmin.

tmax(a, upperlimit=None, axis=0, inclusive=True)

Compute the trimmed maximum

This function computes the maximum value of an array along a given axis, while ignoring values larger than a specified upper limit.

a : array_like
array of values
upperlimit : None or float, optional
Values in the input array greater than the given limit will be ignored. When upperlimit is None, then all values are used. The default value is None.
axis : int or None, optional
Axis along which to operate. Default is 0. If None, compute over the whole array a.
inclusive : {True, False}, optional
This flag determines whether values exactly equal to the upper limit are included. The default value is True.

tmax : float, int or ndarray

For more details on tmax, see stats.tmax.

tsem(a, limits=None, inclusive=tuple, axis=0, ddof=1)

Compute the trimmed standard error of the mean.

This function finds the standard error of the mean for given values, ignoring values outside the given limits.

a : array_like
array of values
limits : None or (lower limit, upper limit), optional
Values in the input array less than the lower limit or greater than the upper limit will be ignored. When limits is None, then all values are used. Either of the limit values in the tuple can also be None representing a half-open interval. The default value is None.
inclusive : (bool, bool), optional
A tuple consisting of the (lower flag, upper flag). These flags determine whether values exactly equal to the lower or upper limits are included. The default value is (True, True).
axis : int or None, optional
Axis along which to operate. If None, compute over the whole array. Default is zero.
ddof : int, optional
Delta degrees of freedom. Default is 1.

tsem : float

For more details on tsem, see stats.tsem.

winsorize(a, limits=None, inclusive=tuple, inplace=False, axis=None)

Returns a Winsorized version of the input array.

The (limits[0])th lowest values are set to the (limits[0])th percentile, and the (limits[1])th highest values are set to the (1 - limits[1])th percentile. Masked values are skipped.

a : sequence
Input array.
limits : {None, tuple of float}, optional
Tuple of the percentages to cut on each side of the array, with respect to the number of unmasked data, as floats between 0. and 1. Noting n the number of unmasked data before trimming, the (n*limits[0])th smallest data and the (n*limits[1])th largest data are masked, and the total number of unmasked data after trimming is n*(1.-sum(limits)) The value of one limit can be set to None to indicate an open interval.
inclusive : {(True, True) tuple}, optional
Tuple indicating whether the number of data being masked on each side should be rounded (True) or truncated (False).
inplace : {False, True}, optional
Whether to winsorize in place (True) or to use a copy (False)
axis : {None, int}, optional
Axis along which to trim. If None, the whole array is trimmed, but its shape is maintained.

This function is applied to reduce the effect of possibly spurious outliers by limiting the extreme values.

moment(a, moment=1, axis=0)

Calculates the nth moment about the mean for a sample.

a : array_like
data
moment : int, optional
order of central moment that is returned
axis : int or None, optional
Axis along which the central moment is computed. Default is 0. If None, compute over the whole array a.
n-th central moment : ndarray or float
The appropriate moment along the given axis or over all values if axis is None. The denominator for the moment calculation is the number of observations, no degrees of freedom correction is done.

For more details about moment, see stats.moment.

variation(a, axis=0)

Computes the coefficient of variation, the ratio of the biased standard deviation to the mean.

a : array_like
Input array.
axis : int or None, optional
Axis along which to calculate the coefficient of variation. Default is 0. If None, compute over the whole array a.
variation : ndarray
The calculated variation along the requested axis.

For more details about variation, see stats.variation.

skew(a, axis=0, bias=True)

Computes the skewness of a data set.

a : ndarray
data
axis : int or None, optional
Axis along which skewness is calculated. Default is 0. If None, compute over the whole array a.
bias : bool, optional
If False, then the calculations are corrected for statistical bias.
skewness : ndarray
The skewness of values along an axis, returning 0 where all values are equal.

For more details about skew, see stats.skew.

kurtosis(a, axis=0, fisher=True, bias=True)

Computes the kurtosis (Fisher or Pearson) of a dataset.

Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution.

If bias is False then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators

Use kurtosistest to see if result is close enough to normal.

a : array
data for which the kurtosis is calculated
axis : int or None, optional
Axis along which the kurtosis is calculated. Default is 0. If None, compute over the whole array a.
fisher : bool, optional
If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).
bias : bool, optional
If False, then the calculations are corrected for statistical bias.
kurtosis : array
The kurtosis of values along an axis. If all values are equal, return -3 for Fisher’s definition and 0 for Pearson’s definition.

For more details about kurtosis, see stats.kurtosis.

describe(a, axis=0, ddof=0, bias=True)

Computes several descriptive statistics of the passed array.

a : array_like
Data array
axis : int or None, optional
Axis along which to calculate statistics. Default 0. If None, compute over the whole array a.
ddof : int, optional
degree of freedom (default 0); note that default ddof is different from the same routine in stats.describe
bias : bool, optional
If False, then the skewness and kurtosis calculations are corrected for statistical bias.
nobs : int
(size of the data (discarding missing values)
minmax : (int, int)
min, max
mean : float
arithmetic mean
variance : float
unbiased variance
skewness : float
biased skewness
kurtosis : float
biased kurtosis
>>> from scipy.stats.mstats import describe
>>> ma = np.ma.array(range(6), mask=[0, 0, 0, 1, 1, 1])
>>> describe(ma)
DescribeResult(nobs=3, minmax=(masked_array(data = 0,
             mask = False,
       fill_value = 999999)
, masked_array(data = 2,
             mask = False,
       fill_value = 999999)
), mean=1.0, variance=0.66666666666666663, skewness=masked_array(data = 0.0,
             mask = False,
       fill_value = 1e+20)
, kurtosis=-1.5)
stde_median(data, axis=None)

Returns the McKean-Schrader estimate of the standard error of the sample median along the given axis. masked values are discarded.

data : ndarray
Data to trim.
axis : {None,int}, optional
Axis along which to perform the trimming. If None, the input array is first flattened.
skewtest(a, axis=0)

Tests whether the skew is different from the normal distribution.

a : array
The data to be tested
axis : int or None, optional
Axis along which statistics are calculated. Default is 0. If None, compute over the whole array a.
statistic : float
The computed z-score for this test.
pvalue : float
a 2-sided p-value for the hypothesis test

For more details about skewtest, see stats.skewtest.

kurtosistest(a, axis=0)

Tests whether a dataset has normal kurtosis

a : array
array of the sample data
axis : int or None, optional
Axis along which to compute test. Default is 0. If None, compute over the whole array a.
statistic : float
The computed z-score for this test.
pvalue : float
The 2-sided p-value for the hypothesis test

For more details about kurtosistest, see stats.kurtosistest.

normaltest(a, axis=0)

Tests whether a sample differs from a normal distribution.

a : array_like
The array containing the data to be tested.
axis : int or None, optional
Axis along which to compute test. Default is 0. If None, compute over the whole array a.
statistic : float or array
s^2 + k^2, where s is the z-score returned by skewtest and k is the z-score returned by kurtosistest.
pvalue : float or array
A 2-sided chi squared probability for the hypothesis test.

For more details about normaltest, see stats.normaltest.

mquantiles(a, prob=list, alphap=0.4, betap=0.4, axis=None, limit=tuple)

Computes empirical quantiles for a data array.

Samples quantile are defined by Q(p) = (1-gamma)*x[j] + gamma*x[j+1], where x[j] is the j-th order statistic, and gamma is a function of j = floor(n*p + m), m = alphap + p*(1 - alphap - betap) and g = n*p + m - j.

Reinterpreting the above equations to compare to R lead to the equation: p(k) = (k - alphap)/(n + 1 - alphap - betap)

Typical values of (alphap,betap) are:
  • (0,1) : p(k) = k/n : linear interpolation of cdf (R type 4)
  • (.5,.5) : p(k) = (k - 1/2.)/n : piecewise linear function (R type 5)
  • (0,0) : p(k) = k/(n+1) : (R type 6)
  • (1,1) : p(k) = (k-1)/(n-1): p(k) = mode[F(x[k])]. (R type 7, R default)
  • (1/3,1/3): p(k) = (k-1/3)/(n+1/3): Then p(k) ~ median[F(x[k])]. The resulting quantile estimates are approximately median-unbiased regardless of the distribution of x. (R type 8)
  • (3/8,3/8): p(k) = (k-3/8)/(n+1/4): Blom. The resulting quantile estimates are approximately unbiased if x is normally distributed (R type 9)
  • (.4,.4) : approximately quantile unbiased (Cunnane)
  • (.35,.35): APL, used with PWM
a : array_like
Input data, as a sequence or array of dimension at most 2.
prob : array_like, optional
List of quantiles to compute.
alphap : float, optional
Plotting positions parameter, default is 0.4.
betap : float, optional
Plotting positions parameter, default is 0.4.
axis : int, optional
Axis along which to perform the trimming. If None (default), the input array is first flattened.
limit : tuple, optional
Tuple of (lower, upper) values. Values of a outside this open interval are ignored.
mquantiles : MaskedArray
An array containing the calculated quantiles.

This formulation is very similar to R except the calculation of m from alphap and betap, where in R m is defined with each type.

[1]R statistical software: http://www.r-project.org/
[2]R quantile function: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
>>> from scipy.stats.mstats import mquantiles
>>> a = np.array([6., 47., 49., 15., 42., 41., 7., 39., 43., 40., 36.])
>>> mquantiles(a)
array([ 19.2,  40. ,  42.8])

Using a 2D array, specifying axis and limit.

>>> data = np.array([[   6.,    7.,    1.],
...                  [  47.,   15.,    2.],
...                  [  49.,   36.,    3.],
...                  [  15.,   39.,    4.],
...                  [  42.,   40., -999.],
...                  [  41.,   41., -999.],
...                  [   7., -999., -999.],
...                  [  39., -999., -999.],
...                  [  43., -999., -999.],
...                  [  40., -999., -999.],
...                  [  36., -999., -999.]])
>>> print(mquantiles(data, axis=0, limit=(0, 50)))
[[ 19.2   14.6    1.45]
 [ 40.    37.5    2.5 ]
 [ 42.8   40.05   3.55]]
>>> data[:, 2] = -999.
>>> print(mquantiles(data, axis=0, limit=(0, 50)))
[[19.200000000000003 14.6 --]
 [40.0 37.5 --]
 [42.800000000000004 40.05 --]]
scoreatpercentile(data, per, limit=tuple, alphap=0.4, betap=0.4)

Calculate the score at the given ‘per’ percentile of the sequence a. For example, the score at per=50 is the median.

This function is a shortcut to mquantile

plotting_positions(data, alpha=0.4, beta=0.4)

Returns plotting positions (or empirical percentile points) for the data.

Plotting positions are defined as (i-alpha)/(n+1-alpha-beta), where:
  • i is the rank order statistics
  • n is the number of unmasked values along the given axis
  • alpha and beta are two parameters.
Typical values for alpha and beta are:
  • (0,1) : p(k) = k/n, linear interpolation of cdf (R, type 4)
  • (.5,.5) : p(k) = (k-1/2.)/n, piecewise linear function (R, type 5)
  • (0,0) : p(k) = k/(n+1), Weibull (R type 6)
  • (1,1) : p(k) = (k-1)/(n-1), in this case, p(k) = mode[F(x[k])]. That’s R default (R type 7)
  • (1/3,1/3): p(k) = (k-1/3)/(n+1/3), then p(k) ~ median[F(x[k])]. The resulting quantile estimates are approximately median-unbiased regardless of the distribution of x. (R type 8)
  • (3/8,3/8): p(k) = (k-3/8)/(n+1/4), Blom. The resulting quantile estimates are approximately unbiased if x is normally distributed (R type 9)
  • (.4,.4) : approximately quantile unbiased (Cunnane)
  • (.35,.35): APL, used with PWM
  • (.3175, .3175): used in scipy.stats.probplot
data : array_like
Input data, as a sequence or array of dimension at most 2.
alpha : float, optional
Plotting positions parameter. Default is 0.4.
beta : float, optional
Plotting positions parameter. Default is 0.4.
positions : MaskedArray
The calculated plotting positions.
obrientransform(*args)

Computes a transform on input data (any number of columns). Used to test for homogeneity of variance prior to running one-way stats. Each array in *args is one level of a factor. If an f_oneway() run on the transformed data and found significant, variances are unequal. From Maxwell and Delaney, p.112.

Returns: transformed data for use in an ANOVA

sem(a, axis=0, ddof=1)

Calculates the standard error of the mean of the input array.

Also sometimes called standard error of measurement.

a : array_like
An array containing the values for which the standard error is returned.
axis : int or None, optional
If axis is None, ravel a first. If axis is an integer, this will be the axis over which to operate. Defaults to 0.
ddof : int, optional
Delta degrees-of-freedom. How many degrees of freedom to adjust for bias in limited samples relative to the population estimate of variance. Defaults to 1.
s : ndarray or float
The standard error of the mean in the sample(s), along the input axis.

The default value for ddof changed in scipy 0.15.0 to be consistent with stats.sem as well as with the most common definition used (like in the R documentation).

Find standard error along the first axis:

>>> from scipy import stats
>>> a = np.arange(20).reshape(5,4)
>>> print(stats.mstats.sem(a))
[2.8284271247461903 2.8284271247461903 2.8284271247461903
 2.8284271247461903]

Find standard error across the whole array, using n degrees of freedom:

>>> print(stats.mstats.sem(a, axis=None, ddof=0))
1.2893796958227628
f_oneway(*args)

Performs a 1-way ANOVA, returning an F-value and probability given any number of groups. From Heiman, pp.394-7.

Usage: f_oneway(*args), where *args is 2 or more arrays, one per treatment group.

statistic : float
The computed F-value of the test.
pvalue : float
The associated p-value from the F-distribution.
friedmanchisquare(*args)

Friedman Chi-Square is a non-parametric, one-way within-subjects ANOVA. This function calculates the Friedman Chi-square test for repeated measures and returns the result, along with the associated probability value.

Each input is considered a given group. Ideally, the number of treatments among each group should be equal. If this is not the case, only the first n treatments are taken into account, where n is the number of treatments of the smallest group. If a group has some missing values, the corresponding treatments are masked in the other groups. The test statistic is corrected for ties.

Masked values in one group are propagated to the other groups.

statistic : float
the test statistic.
pvalue : float
the associated p-value.