Title: | g-and-k and g-and-h Distribution Functions |
---|---|
Description: | Functions for the g-and-k and generalised g-and-h distributions. |
Authors: | Dennis Prangle [aut, cre, cph] |
Maintainer: | Dennis Prangle <[email protected]> |
License: | GPL-2 |
Version: | 0.6.0 |
Built: | 2024-11-10 05:31:00 UTC |
Source: | https://github.com/dennisprangle/gk |
Approximate Bayesian computation (ABC) inference for the g-and-k or g-and-h distribution.
abc( x, N, model = c("gk", "generalised_gh", "tukey_gh", "gh"), logB = FALSE, rprior, M, sumstats = c("all order statistics", "octiles", "moment estimates"), silent = FALSE )
abc( x, N, model = c("gk", "generalised_gh", "tukey_gh", "gh"), logB = FALSE, rprior, M, sumstats = c("all order statistics", "octiles", "moment estimates"), silent = FALSE )
x |
Vector of observations. |
N |
Number of iterations to perform. |
model |
Which model to check: "gk", "generalised_gh" or "tukey_gh". For backwards compatibility, "gh" acts the same as "generalised_gh". |
logB |
When true, the second parameter is log(B) rather than B. |
rprior |
A function with single argument, n, which returns a matrix with n rows consisting of samples from the prior distribution for 4 parameters e.g. (A,B,g,k). |
M |
Number of simulations to accept. |
sumstats |
Which summary statistics to use. |
silent |
When |
This function performs approximate Bayesian inference for iid data from a g-and-k or g-and-h distribution, avoiding expensive density calculations. The algorithm samples many parameter vectors from the prior and simulates corresponding data from the model. The parameters are accepted or rejected based on how similar the simulations are to the observed data. Similarity is measured using weighted Euclidean distance between summary vectors of the simulations and observations. Several summaries can be used, including the complete order statistics or summaries based on octiles. In the latter case only the corresponding order statistics are simulated, speeding up the method.
Matrix whose rows are accepted parameter estimates plus a column giving the ABC distances.
D. Prangle. gk: An R package for the g-and-k and generalised g-and-h distributions, 2017.
set.seed(1) x = rgk(10, A=3, B=1, g=2, k=0.5) ##An unusually small dataset for fast execution of this example rprior = function(n) { matrix(runif(4*n,0,10), ncol=4) } abc(x, N=1E4, rprior=rprior, M=100)
set.seed(1) x = rgk(10, A=3, B=1, g=2, k=0.5) ##An unusually small dataset for fast execution of this example rprior = function(n) { matrix(runif(4*n,0,10), ncol=4) } abc(x, N=1E4, rprior=rprior, M=100)
Interal function carrying out part of ABC inference calculations
abc_batch(sobs, priorSims, simStats, M, v = NULL)
abc_batch(sobs, priorSims, simStats, M, v = NULL)
sobs |
Vector of observed summary statistics. |
priorSims |
Matrix whose rows are vector of parameters drawn from the prior. |
simStats |
Function mapping a vector of parameters to summary statistics. |
M |
How many simulations to accept in this batch. |
v |
Optional vector of estimated variances for each summary statistic |
Outputs a list containing: 1) matrix whose rows are accepted parameter vectors, and a column of resulting distances 2) value of v. If v is not supplied then variances are estimated here and returned. (The idea is that this is done for the first batch, and these values are reused from then on.)
Density, distribution function, quantile function and random generation for the generalised g-and-h distribution
dgh(x, A, B, g, h, c = 0.8, log = FALSE, type = c("generalised", "tukey")) pgh(q, A, B, g, h, c = 0.8, zscale = FALSE, type = c("generalised", "tukey")) qgh(p, A, B, g, h, c = 0.8, type = c("generalised", "tukey")) rgh(n, A, B, g, h, c = 0.8, type = c("generalised", "tukey"))
dgh(x, A, B, g, h, c = 0.8, log = FALSE, type = c("generalised", "tukey")) pgh(q, A, B, g, h, c = 0.8, zscale = FALSE, type = c("generalised", "tukey")) qgh(p, A, B, g, h, c = 0.8, type = c("generalised", "tukey")) rgh(n, A, B, g, h, c = 0.8, type = c("generalised", "tukey"))
x |
Vector of quantiles. |
A |
Vector of A (location) parameters. |
B |
Vector of B (scale) parameters. Must be positive. |
g |
Vector of g parameters. |
h |
Vector of h parameters. Must be non-negative. |
c |
Vector of c parameters (used for generalised g-and-h). Often fixed at 0.8 which is the default. |
log |
If true the log density is returned. |
type |
Can be "generalised" (default) or "tukey". |
q |
Vector of quantiles. |
zscale |
If true the N(0,1) quantile of the cdf is returned. |
p |
Vector of probabilities. |
n |
Number of draws to make. |
The Tukey and generalised g-and-h distributions are defined by their quantile functions. For Tukey's g-and-h distribution, this is
The generalised g-and-h instead uses
In both cases z is the standard normal quantile of p.
Note that neither family of distributions is a special case of the other.
Parameter restrictions include and
.
The generalised g-and-h distribution typically takes
.
For more background information see the references.
rgh
and qgh
use quick direct calculations. However dgh
and pgh
involve slower numerical inversion of the quantile function.
Especially extreme values of the inputs will produce pgh
output rounded to 0 or 1 (-Inf or Inf for zscale=TRUE
).
The corresponding dgh
output will be 0 or -Inf for log=TRUE
.
dgh
gives the density, pgh
gives the distribution, qgh
gives the quantile function, and rgh
generates random deviates
Haynes ‘Flexible distributions and statistical models in ranking and selection procedures, with applications’ PhD Thesis QUT (1998) Rayner and MacGillivray ‘Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions’ Statistics and Computing, 12, 57-75 (2002) Tukey ‘Modern techniques in data analysis’ (1977) Yan and Genton ‘The Tukey g-and-h distribution’ Significance, 16, 12-13 (2019)
p = 1:9/10 ##Some probabilities x = qgh(seq(0.1,0.9,0.1), A=3, B=1, g=2, h=0.5) ##g-and-h quantiles rgh(5, A=3, B=1, g=2, h=0.5) ##g-and-h draws dgh(x, A=3, B=1, g=2, h=0.5) ##Densities of x under g-and-h dgh(x, A=3, B=1, g=2, h=0.5, log=TRUE) ##Log densities of x under g-and-h pgh(x, A=3, B=1, g=2, h=0.5) ##Distribution function of x under g-and-h
p = 1:9/10 ##Some probabilities x = qgh(seq(0.1,0.9,0.1), A=3, B=1, g=2, h=0.5) ##g-and-h quantiles rgh(5, A=3, B=1, g=2, h=0.5) ##g-and-h draws dgh(x, A=3, B=1, g=2, h=0.5) ##Densities of x under g-and-h dgh(x, A=3, B=1, g=2, h=0.5, log=TRUE) ##Log densities of x under g-and-h pgh(x, A=3, B=1, g=2, h=0.5) ##Distribution function of x under g-and-h
Density, distribution function, quantile function and random generation for the g-and-k distribution.
dgk(x, A, B, g, k, c = 0.8, log = FALSE) pgk(q, A, B, g, k, c = 0.8, zscale = FALSE) qgk(p, A, B, g, k, c = 0.8) rgk(n, A, B, g, k, c = 0.8)
dgk(x, A, B, g, k, c = 0.8, log = FALSE) pgk(q, A, B, g, k, c = 0.8, zscale = FALSE) qgk(p, A, B, g, k, c = 0.8) rgk(n, A, B, g, k, c = 0.8)
x |
Vector of quantiles. |
A |
Vector of A (location) parameters. |
B |
Vector of B (scale) parameters. Must be positive. |
g |
Vector of g parameters. |
k |
Vector of k parameters. Must be at least -0.5. |
c |
Vector of c parameters. Often fixed at 0.8 which is the default. |
log |
If true the log density is returned. |
q |
Vector of quantiles. |
zscale |
If true the N(0,1) quantile of the cdf is returned. |
p |
Vector of probabilities. |
n |
Number of draws to make. |
The g-and-k distribution is defined by its quantile function:
where z is the standard normal quantile of p.
Parameter restrictions include and
. Typically c=0.8. For more
background information see the references.
rgk
and qgk
use quick direct calculations. However dgk
and pgk
involve slower numerical inversion of the quantile function.
Especially extreme values of the inputs will produce pgk
output rounded to 0 or 1 (-Inf or Inf for zscale=TRUE
).
The corresponding dgk
output will be 0 or -Inf for log=TRUE
.
dgk
gives the density, pgk
gives the distribution, qgk
gives the quantile function, and rgk
generates random deviates
Haynes ‘Flexible distributions and statistical models in ranking and selection procedures, with applications’ PhD Thesis QUT (1998) Rayner and MacGillivray ‘Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions’ Statistics and Computing, 12, 57-75 (2002)
p = 1:9/10 ##Some probabilities x = qgk(seq(0.1,0.9,0.1), A=3, B=1, g=2, k=0.5) ##g-and-k quantiles rgk(5, A=3, B=1, g=2, k=0.5) ##g-and-k draws dgk(x, A=3, B=1, g=2, k=0.5) ##Densities of x under g-and-k dgk(x, A=3, B=1, g=2, k=0.5, log=TRUE) ##Log densities of x under g-and-k pgk(x, A=3, B=1, g=2, k=0.5) ##Distribution function of x under g-and-k
p = 1:9/10 ##Some probabilities x = qgk(seq(0.1,0.9,0.1), A=3, B=1, g=2, k=0.5) ##g-and-k quantiles rgk(5, A=3, B=1, g=2, k=0.5) ##g-and-k draws dgk(x, A=3, B=1, g=2, k=0.5) ##Densities of x under g-and-k dgk(x, A=3, B=1, g=2, k=0.5, log=TRUE) ##Log densities of x under g-and-k pgk(x, A=3, B=1, g=2, k=0.5) ##Distribution function of x under g-and-k
Finite difference stochastic approximation (FDSA) inference for the g-and-k or g-and-h distribution
fdsa( x, N, model = c("gk", "generalised_gh", "tukey_gh", "gh"), logB = FALSE, theta0, batch_size = 100, alpha = 1, gamma = 0.49, a0 = 1, c0 = NULL, A = 100, theta_min = c(-Inf, ifelse(logB, -Inf, 1e-05), -Inf, 1e-05), theta_max = c(Inf, Inf, Inf, Inf), silent = FALSE, plotEvery = 100 )
fdsa( x, N, model = c("gk", "generalised_gh", "tukey_gh", "gh"), logB = FALSE, theta0, batch_size = 100, alpha = 1, gamma = 0.49, a0 = 1, c0 = NULL, A = 100, theta_min = c(-Inf, ifelse(logB, -Inf, 1e-05), -Inf, 1e-05), theta_max = c(Inf, Inf, Inf, Inf), silent = FALSE, plotEvery = 100 )
x |
Vector of observations. |
N |
number of iterations to perform. |
model |
Which model to check: "gk", "generalised_gh" or "tukey_gh". For backwards compatibility, "gh" acts the same as "generalised_gh". |
logB |
When true, the second parameter is log(B) rather than B. |
theta0 |
Vector of initial value for 4 parameters. |
batch_size |
Mini-batch size. |
alpha |
Gain decay for step size. |
gamma |
Gain decay for finite difference. |
a0 |
Multiplicative step size tuning parameter (or vector of 4 values). |
c0 |
Multiplicative finite difference step tuning parameter (or vector of 4 values). |
A |
Additive step size tuning parameter. |
theta_min |
Vector of minimum values for each parameter.
Note: for |
theta_max |
Vector of maximum values for each parameter. |
silent |
When |
plotEvery |
How often to plot the results if |
fdsa
performs maximum likelihood inference for iid data from a g-and-k or g-and-h distribution, using simulataneous perturbation stochastic approximation. This should be faster than directly maximising the likelihood.
Matrix whose rows are FDSA states: the initial state theta0
and N subsequent states.
The final row is the MLE estimate.
D. Prangle gk: An R package for the g-and-k and generalised g-and-h distributions, 2017.
set.seed(1) x = rgk(10, A=3, B=1, g=2, k=0.5) ##An unusually small dataset for fast execution of this example out = fdsa(x, N=100, theta0=c(mean(x),sd(x),0,1E-5), theta_min=c(-5,1E-5,-5,1E-5), theta_max=c(5,5,5,5))
set.seed(1) x = rgk(10, A=3, B=1, g=2, k=0.5) ##An unusually small dataset for fast execution of this example out = fdsa(x, N=100, theta0=c(mean(x),sd(x),0,1E-5), theta_min=c(-5,1E-5,-5,1E-5), theta_max=c(5,5,5,5))
Performs a demo analysis of exchange rate data
fx(type = c("standard", "quick", "for paper"))
fx(type = c("standard", "quick", "for paper"))
type |
What type of demo to perform. |
fx
performs the analysis of exchange rate data in the supporting reference (Prangle 2017).
D. Prangle gk: An R package for the g-and-k and generalised g-and-h distributions, 2017.
## Not run: fx() ## End(Not run)
## Not run: fx() ## End(Not run)
Returns log density of an improper prior for the g-and-k or g-and-h distribution
improper_uniform_log_density(theta)
improper_uniform_log_density(theta)
theta |
A vector of 4 parameters representing (A,B,g,k) or (A,B,g,h) |
improper_uniform_log_density
takes a 4 parameter vector as input and returns a log density value.
The output corresponds to an improper uniform with constraints that the second and fourth parameters should be non-negative.
These ensure that the resulting parameters are valid to use in the g-and-k or g-and-h distribution is valid.
This function is supplied as a convenient default prior to use in the mcmc
function.
Value of an (unnormalised) log density
improper_uniform_log_density(c(0,1,0,0)) ##Valid parameters - returns 0 improper_uniform_log_density(c(0,-1,0,0)) ##Invalid parameters - returns -Inf
improper_uniform_log_density(c(0,1,0,0)) ##Valid parameters - returns 0 improper_uniform_log_density(c(0,-1,0,0)) ##Invalid parameters - returns -Inf
Check whether parameter choices produce a valid g-and-k or g-and-h distribution.
isValid( g, k_or_h, c = 0.8, model = c("gk", "generalised_gh", "tukey_gh", "gh"), initial_z = seq(-1, 1, 0.2) )
isValid( g, k_or_h, c = 0.8, model = c("gk", "generalised_gh", "tukey_gh", "gh"), initial_z = seq(-1, 1, 0.2) )
g |
Vector of g parameters. |
k_or_h |
Vector of k or h parameters. |
c |
Vector of c parameters. |
model |
Which model to check: "gk", "generalised_gh" or "tukey_gh". For backwards compatibility, "gh" acts the same as "generalised_gh". |
initial_z |
Vector of initial z values to use in each optimisation. |
This function tests whether parameter choices provide a valid distribution.
Only g k and c parameters need be supplied as A and B>0 have no effect.
The function operates by numerically minimising the derivative of the quantile function,
and returning TRUE
if the minimum is positive.
It is possible that a local minimum is found, so it is recommended to use multiple optimisation starting points, and to beware that false positive may still result!
Logical vector denoting whether each parameter combination is valid.
isValid(0:10, -0.5) isValid(0:10, 0.5, c=0.9, model="generalised_gh") isValid(0:10, 0.5, model="tukey_gh")
isValid(0:10, -0.5) isValid(0:10, 0.5, c=0.9, model="generalised_gh") isValid(0:10, 0.5, model="tukey_gh")
Markov chain Monte Carlo (MCMC) inference for the g-and-k or g-and-h distribution
mcmc( x, N, model = c("gk", "generalised_gh", "tukey_gh", "gh"), logB = FALSE, get_log_prior = improper_uniform_log_density, theta0, Sigma0, t0 = 100, epsilon = 1e-06, silent = FALSE, plotEvery = 100 )
mcmc( x, N, model = c("gk", "generalised_gh", "tukey_gh", "gh"), logB = FALSE, get_log_prior = improper_uniform_log_density, theta0, Sigma0, t0 = 100, epsilon = 1e-06, silent = FALSE, plotEvery = 100 )
x |
Vector of observations. |
N |
Number of MCMC steps to perform. |
model |
Which model to check: "gk", "generalised_gh" or "tukey_gh". For backwards compatibility, "gh" acts the same as "generalised_gh". |
logB |
When true, the second parameter is log(B) rather than B. |
get_log_prior |
A function with one argument (corresponding to a vector of 4 parameters e.g. A,B,g,k) returning the log prior density. This should ensure the parameters are valid - i.e. return -Inf for invalid parameters - as the |
theta0 |
Vector of initial value for 4 parameters. |
Sigma0 |
MCMC proposal covariance matrix |
t0 |
Tuning parameter (number of initial iterations without adaptation). |
epsilon |
Tuning parameter (weight given to identity matrix in covariance calculation). |
silent |
When |
plotEvery |
How often to plot the results if |
mcmc
performs Markov chain Monte Carlo inference for iid data from a g-and-k or g-and-h distribution, using the adaptive Metropolis algorithm of Haario et al (2001).
Matrix whose rows are MCMC states: the initial state theta0
and N subsequent states.
D. Prangle gk: An R package for the g-and-k and generalised g-and-h distributions, 2017. H. Haario, E. Saksman, and J. Tamminen. An adaptive Metropolis algorithm. Bernoulli, 2001.
set.seed(1) x = rgk(10, A=3, B=1, g=2, k=0.5) ##An unusually small dataset for fast execution of this example out = mcmc(x, N=1000, theta0=c(mean(x),sd(x),0,0), Sigma0=0.1*diag(4))
set.seed(1) x = rgk(10, A=3, B=1, g=2, k=0.5) ##An unusually small dataset for fast execution of this example out = mcmc(x, N=1000, theta0=c(mean(x),sd(x),0,0), Sigma0=0.1*diag(4))
Convert octiles to estimates of location, scale, skewness and kurtosis.
momentEstimates(octiles)
momentEstimates(octiles)
octiles |
Vector of octiles. |
Converts octiles to robust estimate of location, scale, skewness and kurtosis as used by Drovandi and Pettitt (2011).
Vector of moment estimates.
C. C. Drovandi and A. N. Pettitt. Likelihood-free Bayesian estimation of multivariate quantile distributions. Computational Statistics & Data Analysis, 2011.
Sample a subset of order statistics of the Uniform(0,1) distribution
orderstats(n, orderstats)
orderstats(n, orderstats)
n |
Total number of independent draws |
orderstats |
Which order statistics to generate, in increasing order |
Uniform order statistics are generated by the exponential spacings method (see Ripley for example).
A vector of order statistics equal in length to orderstats
Brian Ripley ‘Stochastic Simulation’ Wiley (1987)
orderstats(100, c(25,50,75))
orderstats(100, c(25,50,75))