saddle.distn {boot} | R Documentation |
Approximate an entire distribution using saddlepoint methods. This function
can calculate simple and conditional saddlepoint distribution approximations
for a univariate quantity of
interest. For the simple saddlepoint the quantity of interest
is a linear combination of W where W is a vector of random
variables. For the conditional saddlepoint we require the distribution of one
linear
combination given the values of any number of other linear combinations.
The distribution of W must be one of
multinomial, Poisson or binary. The primary use of this function is to
calculate quantiles of bootstrap distributions using saddlepoint
approximations. Such quantiles are required by the library function control
to approximate the distribution of the linear approximation to a statistic.
saddle.distn(A, u=NULL, alpha=NULL, wdist="m", type="simp", npts=20, t=NULL, t0=NULL, init=rep(0.1, d), mu=rep(0.5, n), LR=FALSE, strata=NULL, ...)
A |
This is a matrix of known coefficients or a function which returns such a
matrix. If a function then its first argument must be the point t at
which a saddlepoint is required. The most common reason
for A being a function would be if the statistic is not itself a linear
combination of the W but is the solution to a linear estimating
equation.
|
u |
If A is a function then u must also be a function returning a vector
with length equal to the number of columns of the matrix returned by A .
Usually all components other than the first will be constants as the
other components are the values of the conditioning variables.
If A is a matrix with more than one column (such as when wdist="cond" )
then u should be a vector with length one less than ncol(A) . In this
case u specifies the values of the conditioning variables. If A is
a matrix with one column or a vector then u is not used.
|
alpha |
The alpha levels for the quantiles of the distribution which should be returned. By default the 0.1, 0.5, 1, 2.5, 5, 10, 20, 50, 80, 90, 95, 97.5, 99, 99.5 and 99.9 percentiles are calculated. |
wdist |
The distribution of W. Possible values are "m" (multinomial),
"p" (Poisson), or "b" (binary).
|
type |
The type of saddlepoint to be used. Possible values are "simp" (simple
saddlepoint) and "cond" (conditional). If wdist is "m" ,
type is set to "simp" .
|
npts |
The number of points at which the saddlepoint approximation should be calculated and then used to fit the spline. |
t |
A vector of points at which the saddlepoint approximations are calculated.
These points should extend beyond the extreme quantiles required but still
be in the possible range of the bootstrap distribution. The observed value of
the statistic should not be included in t as the distribution function
approximation breaks down at that point. The points should, however cover the
entire effective range of the distribution including close to the centre.
If t is supplied then npts is set to length(t) .
When t is not supplied, the function attempts to find the effective range of
the distribution and then selects points to cover this range.
|
t0 |
If t is not supplied then a vector of length 2 should be passed as t0 .
The first component of t0 should be the centre of the distribution
and the second should be an estimate of spread (such as a standard error).
These two are then used to find the effective range of the distribution.
The range finding
mechanism does rely on an accurate estimate of location in t0[1] .
|
init |
When wdist is "m" , this vector should contain the initial values
to be passed to nlmin when it is called to solve the saddlepoint equations.
|
mu |
The vector of parameter values for the distribution. The default is that the components of W are identically distributed. |
LR |
A logical flag. When LR is TRUE the Lugananni-Rice cdf approximations
are calculated and used to fit the spline. Otherwise the cdf approximations
used are based on Barndorff-Nielsen's r*.
|
strata |
A vector giving the strata when the rows of A relate to stratified data. This
is used only when wdist is "m" .
|
... |
When A and u are functions any additional arguments are passed unchanged
each time one of them is called.
|
The range at which the saddlepoint is used is such that the cdf
approximation at the endpoints is more extreme than required by the extreme
values of alpha
. The lower endpoint is found by evaluating the saddlepoint at
the points t0[1]-2*t0[2]
, t0[1]-4*t0[2]
, t0[1]-8*t0[2]
etc. until a point
is found with a cdf approximation less than min(alpha)/10
,
then a bisection method is used to find the endpoint which has cdf
approximation in the range (min(alpha)/1000
, min(alpha)/10
).
Then a number of, equally spaced, points are chosen between the lower endpoint
and t0[1]
until a total of npts/2
approximations have been made.
The remaining npts/2
points are chosen to the right of t0[1]
in a similar
manner. Any points which are very close to the centre of the distribution are
then omitted as the cdf approximations are not reliable at the centre. A
smoothing spline is then fitted to the probit of the saddlepoint distribution
function approximations at the remaining points and the required
quantiles are predicted from the spline.
Sometimes the function will terminate with the message "Unable to find range"
.
There are two main reasons why this may occur. One is that the distribution is
too discrete and/or the required quantiles too extreme, this can cause the
function to be unable to find a point within the allowable range which is
beyond the extreme quantiles. Another possibility is that the value of
t0[2]
is too small and so too many steps are required to find the range.
The first problem cannot be solved except by asking for less extreme quantiles,
although for very discrete distributions the approximations may not be very
good. In the second case using a larger value of t0[2]
will usually
solve the problem.
The returned value is an object of class "saddle.distn"
. See the help
file for saddle.distn.object
for a description of such an object.
Booth, J.G. and Butler, R.W. (1990) Randomization distributions and saddlepoint approximations in generalized linear models. Biometrika, 77, 787796.
Canty, A.J. and Davison, A.C. (1997) Implementation of saddlepoint approximations to resampling distributions. Computing Science and Statistics; Proceedings of the 28th Symposium on the Interface 248253.
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and their Application. Cambridge University Press.
Jensen, J.L. (1995) Saddlepoint Approximations. Oxford University Press.
lines.saddle.distn
, saddle
, saddle.distn.object
, smooth.spline
library(modreg) # for smooth.spline # The bootstrap distribution of the mean of the air-conditioning # failure data: fails to find value on R (and probably on S too) data(aircondit) air.t0 <- c(mean(aircondit$hours), sqrt(var(aircondit$hours)/12)) saddle.distn(A=aircondit$hours/12,t0=air.t0) # alternatively using the conditional poisson saddle.distn(A=cbind(aircondit$hours/12,1),u=12,wdist="p", type="cond", t0=air.t0) # Distribution of the ratio of a sample of size 10 from the bigcity # data, taken from Example 9.16 of Davison and Hinkley (1997). data(city); data(bigcity) ratio <- function(d, w) sum(d$x *w)/sum(d$u * w) city.v <- var.linear(empinf(data=city, statistic=ratio)) bigcity.t0 <- c(mean(bigcity$x)/mean(bigcity$u), sqrt(city.v)) Afn <- function(t, data) cbind(data$x-t*data$u,1) ufn <- function(t, data) c(0,10) saddle.distn(A=Afn, u=ufn, wdist="b", type="cond", t0=bigcity.t0, data=bigcity) # From Example 9.16 of Davison and Hinkley (1997) again, we find the # conditional distribution of the ratio given the sum of city$u. Afn <- function(t, data) cbind(data$x-t*data$u, data$u, 1) ufn <- function(t, data) c(0, sum(data$u), 10) city.t0 <- c(mean(city$x)/mean(city$u), sqrt(city.v)) saddle.distn(A=Afn, u=ufn, wdist="p", type="cond", t0=city.t0, data=city)