Package 'PEkit' reference manual

Title:	Partition Exchangeability Toolkit
Description:	Implements the Bayesian supervised predictive classifiers, hypothesis testing, and parametric estimation under Partition Exchangeability. The two classifiers implemented are a marginal classifier that assumes test data is i.i.d., next to a more computationally costly but accurate simultaneous classifier that finds a labelling for the entire test dataset at once, using all the test data to predict each label. We also provide the Maximum Likelihood Estimation (MLE) of the only underlying parameter of the partition exchangeability generative model as well as hypothesis testing statistics for equality of this parameter with a single value, alternative, or multiple samples. We present functions to simulate the sequences from Ewens Sampling Formula as the realisation of the Poisson-Dirichlet distribution and their respective probabilities.
Authors:	Ville Kinnula [aut], Jing Tang [ctb] , Ali Amiryousefi [aut, cre]
Maintainer:	Ali Amiryousefi <[email protected]>
License:	MIT + file LICENSE
Version:	1.0.0.9000
Built:	2025-03-19 04:31:22 UTC
Source:	https://github.com/amiryousefilab/pekit

Vector of frequencies of frequencies

Description

A function to calculate the abundance vector, or frequencies of frequencies of discrete or partly discrete data vector x. The abundance vector is used as input in the functions dPD(), MLEp(), and LMTp().

Usage

abundance(x)
abundance(x)

Arguments

`x`	Data vector `x`.

Details

This function is equivalent to table(table(x)).

Value

This function returns a named vector with the frequencies of the frequencies in the data vector x. The function base::table(x) returns a contingency table with the frequencies in the input data vector x as values. The names(table(x)) are the unique values in data vector x. In abundance(x), the unique values in table(x) become the names of the values, while the values themselves are the frequencies of the frequencies of data vector x.

Examples

set.seed(111)
x<-rpois(10,10)
## The frequency table of x:
print(table(x))
## The frequency table of the frequency table of x:
abundance(x)
set.seed(111)
x<-rpois(10,10)
## The frequency table of x:
print(table(x))
## The frequency table of the frequency table of x:
abundance(x)

Fit the supervised classifier under partition exchangeability

Description

Fits the model according to training data x, where x is assumed to follow the Poisson-Dirichlet distribution, and discrete labels y.

Usage

classifier.fit(x, y)
classifier.fit(x, y)

Arguments

`x`	data vector, or matrix with rows as data points and columns as features.
`y`	training data label vector of length equal to the amount of rows in `x`.

Details

This function is used to learn the model parameters from the training data, and gather them into an object that is used by the classification algorithms tMarLab() and tSimLab(). The parameters it learns are the Maximum Likelihood Estimate of the $\psi$ of each feature within each class in the training data. It also records the frequencies of the data for each feature within each class as well. These are used in calculating the predictive probability of each test data being in each of the classes.

Value

Returns an object used as training data objects for the classification algorithms tMarLab() and tSimLab().

If x is multidimensional, each list described below is returned for each dimension.

Returns a list of classwise lists, each with components:

frequencies: the frequencies of values in the class.

psi: the Maximum Likelihood estimate of $\psi$ for the class.

Examples

## Create training data x and its class labels y from Poisson-Dirichlet distributions
## with different psis:
set.seed(111)
x1<-rPD(5000,10)
x2<-rPD(5000,100)
x<-c(x1,x2)
y1<-rep("1", 5000)
y2<-rep("2", 5000)
y<-c(y1,y2)
fit<-classifier.fit(x,y)

## With multidimensional x:
set.seed(111)
x1<-cbind(rPD(5000,10),rPD(5000,50))
x2<-cbind(rPD(5000,100),rPD(5000,500))
x<-rbind(x1,x2)
y1<-rep("1", 5000)
y2<-rep("2", 5000)
y<-c(y1,y2)
fit<-classifier.fit(x,y)
## Create training data x and its class labels y from Poisson-Dirichlet distributions
## with different psis:
set.seed(111)
x1<-rPD(5000,10)
x2<-rPD(5000,100)
x<-c(x1,x2)
y1<-rep("1", 5000)
y2<-rep("2", 5000)
y<-c(y1,y2)
fit<-classifier.fit(x,y)

## With multidimensional x:
set.seed(111)
x1<-cbind(rPD(5000,10),rPD(5000,50))
x2<-cbind(rPD(5000,100),rPD(5000,500))
x<-rbind(x1,x2)
y1<-rep("1", 5000)
y2<-rep("2", 5000)
y<-c(y1,y2)
fit<-classifier.fit(x,y)

The Poisson-Dirichlet distribution

Description

Distribution function for the Poisson-Dirichlet distribution.

Usage

dPD(abund, psi)
dPD(abund, psi)

Arguments

`abund`	An abundance vector.
`psi`	Dispersal parameter $\psi$ . Accepted input values are positive real numbers, "a" for absolute value $\psi$ =1 by default, or "r" for relative value $\psi=n$ , where $n$ is the size of the input sample.

Details

Given an abundance vector abunds, calculates the probability of a data vector x given by the Poisson-Dirichlet distribution. The higher the dispersal parameter $\psi$ , the higher the amount of distinct observed species. In terms of the paintbox process, a high $\psi$ increases the size of the continuous part $p_0$ of the process, while a low $\psi$ will increase the size of the discrete parts $p_{\neq 0}$ .

Value

The probability of the Poisson-Dirichlet distribution for the input abundance vector, e.g. an exchangeable random partition, and a dispersal parameter $\psi$ .

References

W.J. Ewens, The sampling theory of selectively neutral alleles, Theoretical Population Biology, Volume 3, Issue 1, 1972, Pages 87-112, ISSN 0040-5809, <doi:10.1016/0040-5809(72)90035-4>.

Examples

## Get a random sample from the Poisson Dirichlet distribution, and
## find the probability of such a sample with psi=5:
set.seed(111)
s <- rPD(n=100,psi=5)
a=abundance(s)
dPD(a, psi=5)

## Get a random sample from the Poisson Dirichlet distribution, and
## find the probability of such a sample with psi=5:
set.seed(111)
s <- rPD(n=100,psi=5)
a=abundance(s)
dPD(a, psi=5)

Test for the shape of the distribution

Description

This function performs a statistical test on the null hypothesis that a given sample's underlying distribution is the Poisson-Dirichlet distribution. It calculates a test statistic that is then used to gain a p-value from an empirical distribution of the statistic from simulated samples from a PD distribution.

Usage

is.PD(x, rounds)
is.PD(x, rounds)

Arguments

`x`	A discrete data vector.
`rounds`	How many samples are simulated to obtain the empirical distribution.

Details

The calculated test statistic is

$W=\sum_{i=1}^n n_i^2 / n ,$

which is calculated from the sample. Here $n_i$ are the frequencies of each unique value in the sample. The MLE of $\psi$ is then estimated from the sample with the function MLEp(), and an amount of samples equal to the input parameter rounds are generated with that estimate of $\psi$ and sample size $n$ . The test statistic $W$ is then calculated for each of the simulated samples. The original $W$ is then given a p-value based on what percentage of the simulated $W$ it exceeds.

Value

A p-value.

References

Watterson, G.A., (1978), The homozygosity test of neutrality. Genetics. 88(2):405-417.

Examples

##Test whether a typical sample follows PD:
x<-rPD(1000,10)
is.PD(x, 1000)

##Test whether a very atypical sample where frequencies of different values
## are similar:

x<-c(rep(1, 200), rep(2, 200), rep(3, 200), rep(4, 200), rep(5,200))
is.PD(x,1000)
##Test whether a typical sample follows PD:
x<-rPD(1000,10)
is.PD(x, 1000)

##Test whether a very atypical sample where frequencies of different values
## are similar:

x<-c(rep(1, 200), rep(2, 200), rep(3, 200), rep(4, 200), rep(5,200))
is.PD(x,1000)

Maximum Likelihood Estimate of $\psi$

Description

Numerically searches for the MLE of $\psi$ given an abundance vector with a binary search algorithm.

Usage

MLEp(abund)
MLEp(abund)

Arguments

abund

An abundance vector.

Details

Numerically searches for the MLE of $\psi$ as the root of equation

$K=\sum_{i=1}^n\psi/(\psi+i-1),$

where $K$ is the observed number of different species in the sample. The right side of the equation is monotonically increasing when $\psi>0$ , so a binary search is used to find the root. An accepted $\psi$ sets value of the right side of the equation within R's smallest possible value of the actual value of $K$ .

Value

The MLE of $\psi$ .

References

W.J. Ewens, The sampling theory of selectively neutral alleles, Theoretical Population Biology, Volume 3, Issue 1, 1972, Pages 87-112, ISSN 0040-5809, <doi:10.1016/0040-5809(72)90035-4>.

Examples

##Find the MLE of psi of the vector (1,2,2).
##The frequencies of the frequencies of the data vector are given as input:
MLEp(abundance(c(1,2,2)))

##Find the MLE of psi of a sample from the Poisson-Dirichlet distribution:
set.seed(1000)
x<-rPD(n=10000, psi=100)
MLEp(abundance(x))
##Find the MLE of psi of the vector (1,2,2).
##The frequencies of the frequencies of the data vector are given as input:
MLEp(abundance(c(1,2,2)))

##Find the MLE of psi of a sample from the Poisson-Dirichlet distribution:
set.seed(1000)
x<-rPD(n=10000, psi=100)
MLEp(abundance(x))

Bootstrap confidence interval for the MLE of $\psi$

Description

A bootstrapped confidence interval for the Maximum Likelihood Estimate for $\psi$ .

Usage

MLEp.bsci(x, level = 0.95, rounds = 1000, frac = 0.8)
MLEp.bsci(x, level = 0.95, rounds = 1000, frac = 0.8)

Arguments

`x`	A data vector.
`level`	Level of confidence interval as number between 0 and 1.
`rounds`	Number of bootstrap rounds. Default is 1000.
`frac`	Percentage of data `x` used for each bootstrap round. 0.8 by default with accepted values between 0 and 1.

Value

The MLE of $\psi$ as well as lower and upper bounds of the bootstrap confidence interval.

Examples

## Find a 95% -confidence interval for the MLE of psi given a sample from the
## Poisson-Dirichlet distribution:
x<-rPD(n=10000, psi=100)
MLEp.bsci(x, 0.95)

## Find a 95% -confidence interval for the MLE of psi given a sample from the
## Poisson-Dirichlet distribution:
x<-rPD(n=10000, psi=100)
MLEp.bsci(x, 0.95)

Test for $\psi$ of multiple samples

Description

Likelihood ratio test for the hypotheses $H_0: \: \psi_1=\psi_2=...=\psi_d$ and $H_1: \: \psi_1 \neq \psi_2 \neq ... \neq \psi_d$ , where $\psi_1,\psi_2,$ ... $,\psi_d$ are the dispersal parameters of the $d$ samples in the columns of the input data array x.

Usage

mult.sample.test(x)
mult.sample.test(x)

Arguments

`x`	The data array to be tested. Each column of `x` is an independent sample.

Details

Calculates the Likelihood Ratio Test statistic

$-2log(L(\hat{\psi})/L(\hat{\psi}_1, \hat{\psi}_2, ..., \hat{\psi}_d)),$

where L is the likelihood function of observing the $d$ input samples given a single $\psi$ in the numerator and $d$ different parameters $\psi_1,\psi_2,$ ... $,\psi_d$ for each sample respectively in the denominator. According to the theory of Likelihood Ratio Tests, this statistic converges in distribution to a $\chi_{d-1}^2$ -distribution when the null-hypothesis is true, where $d-1$ is the difference in the amount of parameters between the considered models. To calculate the statistic, the Maximum Likelihood Estimate for $\psi_1,\: \psi_2,\: ..., \: \psi_d$ of $H_1$ and the shared $\psi$ of $H_0$ are calculated.

Value

Gives a vector with the Likelihood Ratio Test -statistic Lambda, as well as the p-value of the test p.

References

Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical Or Physical Character, 231(694-706), 289-337. <doi:10.1098/rsta.1933.0009>.

Examples

##Create samples with different n and psi:
set.seed(111)
x<-rPD(1200, 15)
y<-c( rPD(1000, 20), rep(NA, 200) )
z<-c( rPD(800, 30), rep(NA, 400) )
samples<-cbind(cbind(x, y), z)
##Run test
mult.sample.test(samples)
##Create samples with different n and psi:
set.seed(111)
x<-rPD(1200, 15)
y<-c( rPD(1000, 20), rep(NA, 200) )
z<-c( rPD(800, 30), rep(NA, 400) )
samples<-cbind(cbind(x, y), z)
##Run test
mult.sample.test(samples)

Random sampling from the Poisson-Dirichlet Distribution

Description

rPD samples randomly from the PD distribution with a given $\psi$ by simulating the Hoppe urn model.

Usage

rPD(n, psi)
rPD(n, psi)

Arguments

`n`	number of observations.
`psi`	dispersal parameter.

Details

Samples random values with a given $\psi$ from the Poisson-Dirichlet distribution by simulating the Hoppe urn model.

Value

Returns a vector with a sample of size $n$ from the Hoppe urn model with parameter $\psi$ .

References

Hoppe, F.M. The sampling theory of neutral alleles and an urn model in population genetics. J. Math. Biology 25, 123–159 (1987). <doi:10.1007/BF00276386>.

W.J. Ewens, The sampling theory of selectively neutral alleles, Theoretical Population Biology, Volume 3, Issue 1, 1972, Pages 87-112, ISSN 0040-5809, <doi:10.1016/0040-5809(72)90035-4>.

Examples

## Get random sample from the PD distribution with different psi,
## and estimate the psi of the samples:
s1<-rPD(1000, 10)
s2<- rPD(1000, 50)
print(c(MLEp(abundance(s1)), MLEp(abundance(s2))))

## Get random sample from the PD distribution with different psi,
## and estimate the psi of the samples:
s1<-rPD(1000, 10)
s2<- rPD(1000, 50)
print(c(MLEp(abundance(s1)), MLEp(abundance(s2))))

Lagrange Multiplier Test for $\psi$

Description

Performs the Lagrange Multiplier test for the equality of the dispersion parameter $\psi$ of a sample. The null hypothesis of the test is $H_0: \psi = \psi_0$ , where $\psi_0$ is given as input here.

Usage

sample.test(abund, psi = "a")
sample.test(abund, psi = "a")

Arguments

`abund`	An abundance vector of a sample.
`psi`	Target positive number $\psi_0$ to be tested. Accepted values are "a" for absolute value 1, "r" for relative value $n$ (sample size), or any positive number.

Details

Calculates the Lagrange Multiplier test statistic

$S\, = \,U(\psi_0)^2 / I(\psi_0),$

where $U$ is the log-likelihood function of $\psi$ and $I$ is its Fisher information. The statistic $S$ follows $\chi^2$ -distribution with 1 degree of freedom when the null hypothesis $H_0:\psi=\psi_0$ is true.

Value

The statistic $S$ and a p-value of the two-sided test of the hypothesis.

References

Radhakrishna Rao, C, (1948), Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44(1), 50-57. <doi:10.1017/S0305004100023987>

Examples

## Test the psi of a sample from the Poisson-Dirichlet distribution:
set.seed(10000)
x<-rPD(1000, 10)
## Find the abundance of the data vector:
abund=abundance(x)
## Test for the psi that was used, as well as a higher and a lower one:
sample.test(abund, 10)
sample.test(abund, 15)
sample.test(abund, 5)
sample.test(abund)       #test for psi=1
sample.test(abund, "r")  #test for psi=n
## Test the psi of a sample from the Poisson-Dirichlet distribution:
set.seed(10000)
x<-rPD(1000, 10)
## Find the abundance of the data vector:
abund=abundance(x)
## Test for the psi that was used, as well as a higher and a lower one:
sample.test(abund, 10)
sample.test(abund, 15)
sample.test(abund, 5)
sample.test(abund)       #test for psi=1
sample.test(abund, "r")  #test for psi=n

Marginally predicted labels of the test data given training data classification.

Description

Classifies the test data x based on the training data object. The test data is considered i.i.d., so each data point is classified one by one.

Usage

tMarLab(training, x)
tMarLab(training, x)

Arguments

`training`	A training data object from the function `classifier.fit()`.
`x`	Test data vector or matrix with rows as data points and columns as features.

Details

Independently assigns a class label for each test data point according to a $maximum \, a \, posteriori$ rule. The predictive probability of data point $x_i$ arising from class $c$ assuming the training data of size $m_c$ in the class arises from a Poisson-Dirichlet( $\hat{\psi}_c$ ) distribution is:

$\hat{\psi}_c / (m_c + \hat{\psi}_c),$

if no value equal to $x_i$ exists in the training data of class $c$ , and

$m_{ci} / (m_c + \hat{\psi}_c),$

if there does, where $m_{ci}$ is the frequency of the value of $x_i$ in the training data.

Value

A vector of predicted labels for test data x.

References

Amiryousefi A. Asymptotic supervised predictive classifiers under partition exchangeability. . 2021. https://arxiv.org/abs/2101.10950.

Corander, J., Cui, Y., Koski, T., and Siren, J.: Have I seen you before? Principles of Bayesian predictive classification revisited. Springer, Stat. Comput. 23, (2011), 59–73, (<doi:10.1007/s11222-011-9291-7>).

Examples

## Create random samples x from Poisson-Dirichlet distributions with different
## psis, treating each sample as coming from a class of its own:
set.seed(111)
x1<-rPD(10500,10)
x2<-rPD(10500,1000)
test.ind1<-sample.int(10500,500) # Sample test datasets from the
test.ind2<-sample.int(10500,500) # original samples
x<-c(x1[-test.ind1],x2[-test.ind2])
## create training data labels:
y1<-rep("1", 10000)
y2<-rep("2", 10000)
y<-c(y1,y2)

## Test data t, with first half belonging to class "1", second have in "2":
t1<-x1[test.ind1]
t2<-x2[test.ind2]
t<-c(t1,t2)

fit<-classifier.fit(x,y)

## Run the classifier, which returns
tM<-tMarLab(fit, t)

##With multidimensional x:
set.seed(111)
x1<-cbind(rPD(5500,10),rPD(5500,50))
x2<-cbind(rPD(5500,100),rPD(5500,500))
test.ind1<-sample.int(5500,500)
test.ind2<-sample.int(5500,500)
x<-rbind(x1[-test.ind1,],x2[-test.ind2,])
y1<-rep("1", 5000)
y2<-rep("2", 5000)
y<-c(y1,y2)
fit<-classifier.fit(x,y)
t1<-x1[test.ind1,]
t2<-x2[test.ind2,]
t<-rbind(t1,t2)

tM<-tMarLab(fit, t)
## Create random samples x from Poisson-Dirichlet distributions with different
## psis, treating each sample as coming from a class of its own:
set.seed(111)
x1<-rPD(10500,10)
x2<-rPD(10500,1000)
test.ind1<-sample.int(10500,500) # Sample test datasets from the
test.ind2<-sample.int(10500,500) # original samples
x<-c(x1[-test.ind1],x2[-test.ind2])
## create training data labels:
y1<-rep("1", 10000)
y2<-rep("2", 10000)
y<-c(y1,y2)

## Test data t, with first half belonging to class "1", second have in "2":
t1<-x1[test.ind1]
t2<-x2[test.ind2]
t<-c(t1,t2)

fit<-classifier.fit(x,y)

## Run the classifier, which returns
tM<-tMarLab(fit, t)

##With multidimensional x:
set.seed(111)
x1<-cbind(rPD(5500,10),rPD(5500,50))
x2<-cbind(rPD(5500,100),rPD(5500,500))
test.ind1<-sample.int(5500,500)
test.ind2<-sample.int(5500,500)
x<-rbind(x1[-test.ind1,],x2[-test.ind2,])
y1<-rep("1", 5000)
y2<-rep("2", 5000)
y<-c(y1,y2)
fit<-classifier.fit(x,y)
t1<-x1[test.ind1,]
t2<-x2[test.ind2,]
t<-rbind(t1,t2)

tM<-tMarLab(fit, t)

Simultaneously predicted labels of the test data given the training data classification.

Description

Classifies the test data x based on the training data object. All of the test data is used simultaneously to make the classification.

Usage

tSimLab(training, x)
tSimLab(training, x)

Arguments

`training`	A training data object from the function `classifier.fit()`.
`x`	Test data vector or matrix with rows as data points and columns as features.

Details

The test data are first labeled with the marginal classifier. The simultaneous classifier then iterates over all test data, assigning each a label by finding the maximum predictive probability given the current classification structure of the test data as a whole. This is repeated until the classification structure converges after iterating over all data.

Value

A vector of predicted labels for test data x.

References

Amiryousefi A. Asymptotic supervised predictive classifiers under partition exchangeability. . 2021. https://arxiv.org/abs/2101.10950.

Examples

## Create random samples x from Poisson-Dirichlet distributions with different
## psis, treating each sample as coming from a class of its own:
set.seed(111)
x1<-rPD(10500,10)
x2<-rPD(10500,1000)
test.ind1<-sample.int(10500,500) # Sample test datasets from the
test.ind2<-sample.int(10500,500) # original samples
x<-c(x1[-test.ind1],x2[-test.ind2])
## create training data labels:
y1<-rep("1", 10000)
y2<-rep("2", 10000)
y<-c(y1,y2)

## Test data t, with first half belonging to class "1", second have in "2":
t1<-x1[test.ind1]
t2<-x2[test.ind2]
t<-c(t1,t2)

fit<-classifier.fit(x,y)

## Run the classifier, which returns
tS<-tSimLab(fit, t)

##With multidimensional x:
set.seed(111)
x1<-cbind(rPD(5500,10),rPD(5500,50))
x2<-cbind(rPD(5500,100),rPD(5500,500))
test.ind1<-sample.int(5500,500)
test.ind2<-sample.int(5500,500)
x<-rbind(x1[-test.ind1,],x2[-test.ind2,])
y1<-rep("1", 5000)
y2<-rep("2", 5000)
y<-c(y1,y2)
fit<-classifier.fit(x,y)
t1<-x1[test.ind1,]
t2<-x2[test.ind2,]
t<-rbind(t1,t2)

tS<-tSimLab(fit, t)
## Create random samples x from Poisson-Dirichlet distributions with different
## psis, treating each sample as coming from a class of its own:
set.seed(111)
x1<-rPD(10500,10)
x2<-rPD(10500,1000)
test.ind1<-sample.int(10500,500) # Sample test datasets from the
test.ind2<-sample.int(10500,500) # original samples
x<-c(x1[-test.ind1],x2[-test.ind2])
## create training data labels:
y1<-rep("1", 10000)
y2<-rep("2", 10000)
y<-c(y1,y2)

## Test data t, with first half belonging to class "1", second have in "2":
t1<-x1[test.ind1]
t2<-x2[test.ind2]
t<-c(t1,t2)

fit<-classifier.fit(x,y)

## Run the classifier, which returns
tS<-tSimLab(fit, t)

##With multidimensional x:
set.seed(111)
x1<-cbind(rPD(5500,10),rPD(5500,50))
x2<-cbind(rPD(5500,100),rPD(5500,500))
test.ind1<-sample.int(5500,500)
test.ind2<-sample.int(5500,500)
x<-rbind(x1[-test.ind1,],x2[-test.ind2,])
y1<-rep("1", 5000)
y2<-rep("2", 5000)
y<-c(y1,y2)
fit<-classifier.fit(x,y)
t1<-x1[test.ind1,]
t2<-x2[test.ind2,]
t<-rbind(t1,t2)

tS<-tSimLab(fit, t)

Two sample test for $\psi$

Description

Likelihood ratio test for the hypotheses $H_0: \: \psi_1=\psi_2$ and $H_1: \: \psi_1 \neq \psi_2$ , where $\psi_1$ and $\psi_2$ are the dispersal parameters of two input samples s1 and s2.

Usage

two.sample.test(s1, s2)
two.sample.test(s1, s2)

Arguments

s1, s2

The two data vectors to be tested.

Details

Calculates the Likelihood Ratio Test statistic

$-2log(L(\hat{\psi})/L(\hat{\psi}_1, \hat{\psi}_2)),$

where L is the likelihood function of observing the two input samples given a single $\psi$ in the numerator and two different parameters $\psi_1$ and $\psi_2$ for each sample respectively in the denominator. According to the theory of Likelihood Ratio Tests, this statistic converges in distribution to a $\chi_d^2$ -distribution under the null-hypothesis, where $d$ is the difference in the amount of parameters between the considered models, which is 1 here. To calculate the statistic, the Maximum Likelihood Estimate for $\psi_1,\: \psi_2$ of $H_1$ and the shared $\psi$ of $H_0$ are calculated.

Value

Gives a vector with the Likelihood Ratio Test -statistic Lambda, as well as the p-value of the test p.

References

Examples

##Create samples with different n and psi:
set.seed(111)
x<-rPD(500, 15)
y<-rPD(1000, 20)
z<-rPD(800, 30)
##Run tests
two.sample.test(x,y)
two.sample.test(x,z)
two.sample.test(y,z)
##Create samples with different n and psi:
set.seed(111)
x<-rPD(500, 15)
y<-rPD(1000, 20)
z<-rPD(800, 30)
##Run tests
two.sample.test(x,y)
two.sample.test(x,z)
two.sample.test(y,z)

Package 'PEkit'

Help Index

Vector of frequencies of frequencies

Description

Usage

Arguments

Details

Value

Examples

Fit the supervised classifier under partition exchangeability

Description

Usage

Arguments

Details

Value

Examples

The Poisson-Dirichlet distribution

Description

Usage

Arguments

Details

Value

References

Examples

Test for the shape of the distribution

Description

Usage

Arguments

Details

Value

References

Examples

Maximum Likelihood Estimate of ψ\psiψ

Description

Usage

Arguments

Details

Value

References

Examples

Bootstrap confidence interval for the MLE of ψ\psiψ

Description

Usage

Arguments

Value

Examples

Test for ψ\psiψ of multiple samples

Description

Usage

Arguments

Details

Value

References

Examples

Random sampling from the Poisson-Dirichlet Distribution

Description

Usage

Arguments

Details

Value

References

Examples

Lagrange Multiplier Test for ψ\psiψ

Description

Usage

Arguments

Details

Value

References

Examples

Marginally predicted labels of the test data given training data classification.

Description

Usage

Arguments

Details

Value

References

Examples

Simultaneously predicted labels of the test data given the training data classification.

Description

Maximum Likelihood Estimate of $\psi$

Bootstrap confidence interval for the MLE of $\psi$

Test for $\psi$ of multiple samples

Lagrange Multiplier Test for $\psi$

Two sample test for $\psi$