| Title: | Fractional Weighted Bootstrap |
|---|---|
| Description: | An implementation of the fractional weighted bootstrap to be used as a drop-in for functions in the 'boot' package. The fractional weighted bootstrap (also known as the Bayesian bootstrap) involves drawing weights randomly that are applied to the data rather than resampling units from the data. See Xu et al. (2020) <doi:10.1080/00031305.2020.1731599> for details. |
| Authors: | Noah Greifer [aut, cre] (ORCID: <https://orcid.org/0000-0003-3067-7154>) |
| Maintainer: | Noah Greifer <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.6.0 |
| Built: | 2026-05-28 20:29:56 UTC |
| Source: | https://github.com/ngreifer/fwb |
The data consist of 1703 aircraft engines put into service over time. There were 6 failures and 1697 right-censored observations. These data were originally given in Abernethy et al. (1983) and were reanalyzed in Meeker and Escobar (1998, Ch. 8). The dataset used here specifically comes from Xu et al. (2020) and is used in a Weibull analysis of failure times.
data("bearingcage")data("bearingcage")
A data frame with 1703 rows and 2 variables:
hoursinteger; the number of hours until failure or censoring
failurelogical; whether a failure occurred
Abernethy, R. B., Breneman, J. E., Medlin, C. H., and Reinman, G. L. (1983), "Weibull Analysis Handbook," Technical Report, Air Force Wright Aeronautical Laboratories. doi:10.21236/ADA143100
Meeker, W. Q., and Escobar, L. A. (1998), Statistical Methods for Reliability Data, New York: Wiley.
Xu, L., Gotwalt, C., Hong, Y., King, C. B., & Meeker, W. Q. (2020). Applications of the Fractional-Random-Weight Bootstrap. The American Statistician, 74(4), 345–358. doi:10.1080/00031305.2020.1731599
fwb() implements the fractional (random) weighted bootstrap, also known as the Bayesian bootstrap. Rather than resampling units to include in bootstrap samples, weights are drawn to be applied to a weighted estimator.
fwb( data, statistic, R = 999, cluster = NULL, simple = NULL, wtype = getOption("fwb_wtype", "exp"), strata = NULL, drop0 = FALSE, verbose = NULL, cl = NULL, ... ) ## S3 method for class 'fwb' print( x, digits = getOption("digits", 3L), index = seq_len(ncol(x[["t"]])), ... )fwb( data, statistic, R = 999, cluster = NULL, simple = NULL, wtype = getOption("fwb_wtype", "exp"), strata = NULL, drop0 = FALSE, verbose = NULL, cl = NULL, ... ) ## S3 method for class 'fwb' print( x, digits = getOption("digits", 3L), index = seq_len(ncol(x[["t"]])), ... )
data |
the dataset used to compute the statistic |
statistic |
a function, which, when applied to |
R |
the number of bootstrap replicates. Default is 999 but more is always better. For the percentile bootstrap confidence interval to be exact, it can be beneficial to use one less than a multiple of 100. |
cluster |
optional; a vector containing cluster membership. If supplied, will run the cluster bootstrap. See Details. Evaluated first in |
simple |
|
wtype |
string; the type of weights to use. Allowable options include |
strata |
optional; a vector containing stratum membership for stratified bootstrapping. If supplied, will essentially perform a separate bootstrap within each level of |
drop0 |
|
verbose |
|
cl |
a cluster object created by |
... |
other arguments passed to |
x |
an |
digits |
the number of significant digits to print |
index |
the index or indices of the position of the quantity of interest in |
fwb() implements the fractional weighted bootstrap and is meant to function as a drop-in for boot::boot(., stype = "f") (i.e., the usual bootstrap but with frequency weights representing the number of times each unit is drawn). In each bootstrap replication, when wtype = "exp" (the default), the weights are sampled from independent exponential distributions with rate parameter 1 and then normalized to have a mean of 1, equivalent to drawing the weights from a Dirichlet distribution. Other weights are allowed as determined by the wtype argument (see below for details). The function supplied to statistic must incorporate the weights to compute a weighted statistic. For example, if the output is a regression coefficient, the weights supplied to the w argument of statistic should be supplied to the weights argument of lm(). These weights should be used any time frequency weights would be, since they are meant to function like frequency weights (which, in the case of the traditional bootstrap, would be integers). Unfortunately, there is no way for fwb() to know whether you are using the weights correctly, so care should be taken to ensure weights are correctly incorporated into the estimator.
When fitting binomial regression models (e.g., logistic) using glm(), it may be useful to change the family to a "quasi" variety (e.g., quasibinomial()) to avoid a spurious warning about "non-integer #successes".
The cluster bootstrap can be requested by supplying a vector of cluster membership to cluster. Rather than generating a weight for each unit, a weight is generated for each cluster and then applied to all units in that cluster.
Bootstrapping can be performed within strata by supplying a vector of stratum membership to strata. This essentially rescales the weights within each stratum to have a mean of 1, ensuring that the sum of weights in each stratum is equal to the stratum size. For multinomial weights, using strata is equivalent to drawing samples with replacement from each stratum. Strata do not affect bootstrapping when using Poisson weights.
Ideally, statistic should not involve a random element, or else it will not be straightforward to replicate the bootstrap results using the seed included in the output object. Setting a seed using set.seed() is always advised. See vignette("fwb-rep") for details.
The print() method displays the value of the statistics, the bias (the difference between the statistic and the mean of its bootstrap distribution), and the standard error (the standard deviation of the bootstrap distribution).
Different types of weights can be supplied to the wtype argument. A global default can be set using set_fwb_wtype(). The allowable weight types are described below.
"exp"Draws weights from an exponential distribution with rate parameter 1 using rexp(). These weights are the usual "Bayesian bootstrap" weights described in Xu et al. (2020). They are equivalent to drawing weights from a uniform Dirichlet distribution, which is what gives these weights the interpretation of a Bayesian prior. These positive weights have a mean and variance of 1 and skewness of 2. The weights are scaled to have a mean of 1 within each stratum (or in the full sample if strata is not supplied).
"multinom"Draws integer weights using sample(), which samples unit indices with replacement and uses the tabulation of the indices as frequency weights. This is equivalent to drawing weights from a multinomial distribution. Using wtype = "multinom" is the same as using boot::boot(., stype = "f") instead of fwb() (i.e., the resulting estimates will be identical). When strata is supplied, unit indices are drawn with replacement within each stratum so that the sum of the weights in each stratum is equal to the stratum size.
"poisson"Draws integer weights from a Poisson distribution with using rpois(). This is an alternative to the multinomial weights that yields similar estimates (especially as the sample size grows) but can be faster. Note strata is ignored when using "poisson".
"mammen"Draws weights from a modification of the distribution described by Mammen (1983) for use in the wild bootstrap. These positive weights have a mean, variance, and skewness of 1, making them second-order accurate (in contrast to the usual exponential weights, which are only first-order accurate). The weights are drawn such that and as described by Owen (2025). The weights are scaled to have a mean of 1 within each stratum (or in the full sample if strata is not supplied).
"beta"Draws weights from a distribution using rbeta() as described by Owen (2025). These positive weights have a mean, variance, and skewness of 1 when scaled by a factor of 4, making them second-order accurate. The weights are scaled to have a mean of 1 within each stratum (or in the full sample if strata is not supplied).
"power"Draws weights from a distribution using rbeta() as described by Owen (2025). These positive weights have a mean and variance of 1 and skewness of when scaled by a factor of . The weights are scaled to have a mean of 1 within each stratum (or in the full sample if strata is not supplied).
"exp" is the default due to it being the formulation described in Xu et al. (2020) and in the most formulations of the Bayesian bootstrap; it should be used if one wants to remain in line with these guidelines or to maintain a Bayesian flavor to the analysis, whereas other distributions might be preferred for their frequentist operating characteristics, though more research is needed on their general performance. Owen (2025) recommends "beta" and "power", as these provided close to nominal confidence interval coverage without excessively large intervals in the context of estimating the mean in a small sample. "multinom" and "poisson" should primarily be used for comparison purposes or as an alternative interface to boot.
To speed up evaluation, parallel processing can be enabled. One way to do so is to supply an argument to cl. This can be either an integer(not available on Windows), a cluster object created by parallel::makeCluster(), or the string "future". Another general way is to use functionality in the futurize package, which is compatible with fwb. See vignette("futurize-81-fwb", package = "futurize") for details. See also vignette("fwb-rep") for information on replicability with (and without) parallel processing.
An <fwb> object, which also inherits from <boot>, with the following components:
t0 |
The observed value of |
t |
A matrix with |
R |
The value of |
data |
The |
seed |
The value of |
statistic |
The function |
call |
The original call to |
cluster |
The vector passed to |
strata |
The vector passed to |
wtype |
The type of weights used as determined by the |
<fwb> objects have coef() and vcov() methods, which extract the t0 component and covariance of the t components, respectively.
print(fwb): Print an fwb object
Mammen, E. (1993). Bootstrap and Wild Bootstrap for High Dimensional Linear Models. The Annals of Statistics, 21(1). doi:10.1214/aos/1176349025
Owen, A. B. (2025). Better bootstrap t confidence intervals for the mean. arXiv. doi:10.48550/arXiv.2508.10083
Rubin, D. B. (1981). The Bayesian Bootstrap. The Annals of Statistics, 9(1), 130–134. doi:10.1214/aos/1176345338
Xu, L., Gotwalt, C., Hong, Y., King, C. B., & Meeker, W. Q. (2020). Applications of the Fractional-Random-Weight Bootstrap. The American Statistician, 74(4), 345–358. doi:10.1080/00031305.2020.1731599
fwb.ci() for calculating confidence intervals
summary.fwb() for displaying output in a clean way
plot.fwb() for plotting the bootstrap distributions
vcovFWB() for estimating the covariance matrix of estimates using the FWB
set_fwb_wtype() for an example of using weights other than the default exponential weights
boot::boot() for the traditional bootstrap.
See vignette("fwb-rep") for information on reproducibility.
# Performing a Weibull analysis of the Bearing Cage # failure data as done in Xu et al. (2020) set.seed(123, "L'Ecuyer-CMRG") data("bearingcage") weibull_est <- function(data, w) { fit <- survival::survreg(survival::Surv(hours, failure) ~ 1, data = data, weights = w, dist = "weibull") c(eta = unname(exp(coef(fit))), beta = 1/fit$scale) } boot_est <- fwb(bearingcage, statistic = weibull_est, R = 199, verbose = FALSE) boot_est #Get standard errors and CIs; uses bias-corrected #percentile CI by default summary(boot_est, ci.type = "bc") #Plot statistic distributions plot(boot_est, index = "beta", type = "hist")# Performing a Weibull analysis of the Bearing Cage # failure data as done in Xu et al. (2020) set.seed(123, "L'Ecuyer-CMRG") data("bearingcage") weibull_est <- function(data, w) { fit <- survival::survreg(survival::Surv(hours, failure) ~ 1, data = data, weights = w, dist = "weibull") c(eta = unname(exp(coef(fit))), beta = 1/fit$scale) } boot_est <- fwb(bearingcage, statistic = weibull_est, R = 199, verbose = FALSE) boot_est #Get standard errors and CIs; uses bias-corrected #percentile CI by default summary(boot_est, ci.type = "bc") #Plot statistic distributions plot(boot_est, index = "beta", type = "hist")
fwb.array() returns the bootstrap weights generated by fwb().
fwb.array(fwb.out)fwb.array(fwb.out)
fwb.out |
an |
The original seed is used to recover the bootstrap weights before being reset.
Bootstrap weights are used in computing BCa confidence intervals by approximating the empirical influence function for each unit with respect to each parameter (see Examples).
A matrix with R rows and n columns, where R is the number of bootstrap replications and n is the number of observations in boot.out$data.
fwb() for performing the fractional weighted bootstrap
boot::boot.array() for the equivalent function in boot
See vignette("fwb-rep") for information on replicability.
set.seed(123, "L'Ecuyer-CMRG") data("infert") fit_fun <- function(data, w) { fit <- glm(case ~ spontaneous + induced, data = data, family = "quasibinomial", weights = w) coef(fit) } fwb_out <- fwb(infert, fit_fun, R = 300, verbose = FALSE) fwb_weights <- fwb.array(fwb_out) dim(fwb_weights) # Recover computed estimates: est1 <- fit_fun(infert, fwb_weights[1, ]) stopifnot(all.equal(est1, fwb_out$t[1, ])) # Compute empirical influence function: empinf <- lm.fit(x = fwb_weights / ncol(fwb_weights), y = fwb_out$t)$coefficients empinf <- sweep(empinf, 2L, colMeans(empinf))set.seed(123, "L'Ecuyer-CMRG") data("infert") fit_fun <- function(data, w) { fit <- glm(case ~ spontaneous + induced, data = data, family = "quasibinomial", weights = w) coef(fit) } fwb_out <- fwb(infert, fit_fun, R = 300, verbose = FALSE) fwb_weights <- fwb.array(fwb_out) dim(fwb_weights) # Recover computed estimates: est1 <- fit_fun(infert, fwb_weights[1, ]) stopifnot(all.equal(est1, fwb_out$t[1, ])) # Compute empirical influence function: empinf <- lm.fit(x = fwb_weights / ncol(fwb_weights), y = fwb_out$t)$coefficients empinf <- sweep(empinf, 2L, colMeans(empinf))
fwb.ci() generates several types of equi-tailed two-sided nonparametric confidence intervals. These include the normal approximation, the basic bootstrap interval, the percentile bootstrap interval, the bias-corrected percentile bootstrap interval, and the bias-correct and accelerated (BCa) bootstrap interval.
fwb.ci( fwb.out, conf = 0.95, type = "bc", index = 1L, h = base::identity, hinv = base::identity, ... ) ## S3 method for class 'fwbci' print(x, hinv = NULL, ...)fwb.ci( fwb.out, conf = 0.95, type = "bc", index = 1L, h = base::identity, hinv = base::identity, ... ) ## S3 method for class 'fwbci' print(x, hinv = NULL, ...)
fwb.out |
an |
conf |
the desired confidence level. Default is .95 for 95% confidence intervals. |
type |
the type of confidence interval desired. Allowable options include |
index |
the index of the position of the quantity of interest in |
h |
a function defining a transformation. The intervals are calculated on the scale of |
hinv |
a function, like |
... |
ignored |
x |
an |
fwb.ci() functions similarly to boot::boot.ci() in that it takes in a bootstrapped object and computes confidence intervals. This interface is a bit old-fashioned, but was designed to mimic that of boot.ci(). For a more modern interface, see summary.fwb().
The bootstrap intervals are defined as follows, with 1 - conf, the estimate in the original sample, the average of the bootstrap estimates, the standard deviation of the bootstrap estimates, the set of ordered estimates with corresponding to their quantile, and and the upper and lower critical scores.
"wald"
This method is valid when the statistic is normally distributed around the estimate.
"norm" (normal approximation)
This involves subtracting the "bias" () from the estimate and using a standard Wald-type confidence interval. This method is valid when the statistic is normally distributed.
"basic"
"perc" (percentile confidence interval)
"bc" (bias-corrected percentile confidence interval)
, , where is the normal cumulative density function (i.e., pnorm()) and where is the proportion of bootstrap estimates less than the original estimate . This is similar to the percentile confidence interval but changes the specific quantiles of the bootstrap estimates to use, correcting for bias in the original estimate. It is described in Xu et al. (2020). When is the median of the bootstrap distribution, the "perc" and "bc" intervals coincide.
"bca" (bias-corrected and accelerated confidence interval)
, , using the same definitions as above, but with the additional acceleration parameter , where . is the empirical influence value of each unit, which is computed using the regression method described in boot::empinf(). When , the "bca" and "bc" intervals coincide. The acceleration parameter corrects for bias and skewness in the statistic. It can only be used when clusters are absent and the number of bootstrap replications is larger than the sample size. Note that BCa intervals cannot be requested when simple = TRUE and there is randomness in the statistic supplied to fwb().
Interpolation on the normal quantile scale is used when a non-integer order statistic is required, as in boot::boot.ci(). Note that unlike with boot::boot.ci(), studentized confidence intervals (type = "stud") are not allowed.
An <fwbci> object, which inherits from <bootci> and has the following components:
R |
the number of bootstrap replications in the original call to |
t0 |
the observed value of the statistic on the same scale as the intervals (i.e., after applying |
call |
the call to |
There will be additional components named after each confidence interval type requested. For "wald" and "norm", this is a matrix with one row containing the confidence level and the two confidence interval limits. For the others, this is a matrix with one row containing the confidence level, the indices of the two order statistics used in the calculations, and the confidence interval limits.
print(fwbci): Print a bootstrap confidence interval
fwb() for performing the fractional weighted bootstrap
get_ci() for extracting confidence intervals from an fwbci object
summary.fwb() for producing clean output from fwb() that includes confidence intervals calculated by fwb.ci()
boot::boot.ci() for computing confidence intervals from the traditional bootstrap
vcovFWB() for computing parameter estimate covariance matrices using the fractional weighted bootstrap
set.seed(123, "L'Ecuyer-CMRG") data("infert") fit_fun <- function(data, w) { fit <- glm(case ~ spontaneous + induced, data = data, family = "quasibinomial", weights = w) coef(fit) } fwb_out <- fwb(infert, fit_fun, R = 199, verbose = FALSE) # Bias corrected percentile interval bcci <- fwb.ci(fwb_out, index = "spontaneous", type = "bc") bcci # Using `get_ci()` to extract confidence limits get_ci(bcci) # Interval calculated on original (log odds) scale, # then transformed by exponentiation to be on OR fwb.ci(fwb_out, index = "induced", type = "norm", hinv = exp)set.seed(123, "L'Ecuyer-CMRG") data("infert") fit_fun <- function(data, w) { fit <- glm(case ~ spontaneous + induced, data = data, family = "quasibinomial", weights = w) coef(fit) } fwb_out <- fwb(infert, fit_fun, R = 199, verbose = FALSE) # Bias corrected percentile interval bcci <- fwb.ci(fwb_out, index = "spontaneous", type = "bc") bcci # Using `get_ci()` to extract confidence limits get_ci(bcci) # Interval calculated on original (log odds) scale, # then transformed by exponentiation to be on OR fwb.ci(fwb_out, index = "induced", type = "norm", hinv = exp)
bootci Objectget_ci() extracts the confidence intervals from the output of a call to boot::boot.ci() or fwb.ci() in a clean way. Normally the confidence intervals can be a bit challenging to extract because of the unusual structure of the object.
get_ci(x, type = "all")get_ci(x, type = "all")
x |
a |
type |
the type of confidence intervals to extract. Only those available in |
A list with an entry for each confidence interval type; each entry is a numeric vector of length 2 with names "L" and "U" for the lower and upper interval bounds, respectively. The "conf" attribute contains the confidence level.
fwb.ci(), confint.fwb(), boot::boot.ci()
#See example at help("fwb.ci")#See example at help("fwb.ci")
plot.fwb() takes an <fwb> object and produces plots for the bootstrap replicates of the statistic of interest.
## S3 method for class 'fwb' plot( x, index = 1L, qdist = "norm", nclass = NULL, df, type = c("hist", "qq"), ... )## S3 method for class 'fwb' plot( x, index = 1L, qdist = "norm", nclass = NULL, df, type = c("hist", "qq"), ... )
x |
an |
index |
the index of the position of the quantity of interest in |
qdist |
|
nclass |
when a histogram is requested (as it is by default; see |
df |
if |
type |
the type of plot to display. Allowable options include |
... |
ignored. |
This function can produces two side-by-side plots: a histogram of the bootstrap replicates and a Q-Q plot of the bootstrap replicates against theoretical quantiles of a supplied distribution (normal or chi-squared). For the histogram, a vertical dotted line indicates the position of the estimate computed in the original sample. For the Q-Q plot, the expected line is plotted.
x is returned invisibly.
fwb(), summary.fwb(), boot::plot.boot(), hist(), qqplot()
# See examples at `help("fwb")`# See examples at `help("fwb")`
Set the default for the type of weights used in the weighted bootstrap computed by fwb() and vcovFWB().
set_fwb_wtype(wtype = getOption("fwb_wtype", "exp")) get_fwb_wtype(fwb)set_fwb_wtype(wtype = getOption("fwb_wtype", "exp")) get_fwb_wtype(fwb)
wtype |
string; the type of weights to use. Allowable options include |
fwb |
optional; an |
set_fwb_wtype(x) is equivalent to calling options(fwb_wtype = x). get_fwb_wtype() is equivalent to calling getOption("fwb_wtype") when no argument is supplied and to extracting the wtype component of an <fwb> object when supplied.
set_fwb_wtype() returns a call to options() with the options set to those prior to set_fwb_wtype() being called. This makes it so that calling options(op), where op is the output of set_fwb_wtype(), resets the fwb_wtype to its original value. get_fwb_wtype() returns a string containing the fwb_wtype value set globally (if no argument is supplied) or used in the supplied <fwb> object.
fwb() for a definition of each types of weights; vcovFWB(); options(); boot::boot() for the traditional bootstrap.
# Performing a Weibull analysis of the Bearing Cage # failure data as done in Xu et al. (2020) set.seed(123, "L'Ecuyer-CMRG") data("bearingcage") #Set fwb type to "beta" op <- set_fwb_wtype("beta") weibull_est <- function(data, w) { fit <- survival::survreg(survival::Surv(hours, failure) ~ 1, data = data, weights = w, dist = "weibull") c(eta = unname(exp(coef(fit))), beta = 1/fit$scale) } boot_est <- fwb(bearingcage, statistic = weibull_est, R = 199, verbose = FALSE) boot_est #Get the fwb type used in the bootstrap get_fwb_wtype(boot_est) get_fwb_wtype() #Restore original options options(op) get_fwb_wtype()# Performing a Weibull analysis of the Bearing Cage # failure data as done in Xu et al. (2020) set.seed(123, "L'Ecuyer-CMRG") data("bearingcage") #Set fwb type to "beta" op <- set_fwb_wtype("beta") weibull_est <- function(data, w) { fit <- survival::survreg(survival::Surv(hours, failure) ~ 1, data = data, weights = w, dist = "weibull") c(eta = unname(exp(coef(fit))), beta = 1/fit$scale) } boot_est <- fwb(bearingcage, statistic = weibull_est, R = 199, verbose = FALSE) boot_est #Get the fwb type used in the bootstrap get_fwb_wtype(boot_est) get_fwb_wtype() #Restore original options options(op) get_fwb_wtype()
fwb Outputsummary() creates a regression summary-like table that displays the bootstrap estimates, their empirical standard errors, their confidence intervals, and, optionally, p-values for tests against a null value. confint() produces just the confidence intervals, and is called internally by summary().
## S3 method for class 'fwb' summary( object, conf = 0.95, ci.type = "bc", p.value = FALSE, index = seq_len(ncol(object$t)), null = 0, simultaneous = FALSE, ... ) ## S3 method for class 'fwb' confint(object, parm, level = 0.95, ci.type = "bc", simultaneous = FALSE, ...)## S3 method for class 'fwb' summary( object, conf = 0.95, ci.type = "bc", p.value = FALSE, index = seq_len(ncol(object$t)), null = 0, simultaneous = FALSE, ... ) ## S3 method for class 'fwb' confint(object, parm, level = 0.95, ci.type = "bc", simultaneous = FALSE, ...)
object |
an |
conf, level
|
the desired confidence level. Default is .95 for 95% confidence intervals. Set to 0 to prevent calculation of confidence intervals. |
ci.type |
the type of confidence interval desired. Allowable options include |
p.value |
|
index, parm
|
the index or indices of the position of the quantity of interest if more than one was specified in |
null |
|
simultaneous |
|
... |
ignored. |
P-values are computed by inverting the confidence interval for each parameter, i.e., finding the largest confidence level yielding a confidence interval that excludes null, and taking the p-value to be one minus that level. This ensures conclusions from tests based on the p-value and whether the confidence interval contains the null value always yield the same conclusion. Prior to version 0.5.0, all p-values were based on inverting Wald confidence intervals, regardless of ci.type.
Simultaneous confidence intervals are computed using the "sup-t" confidence band, which involves modifying the confidence level so that the intersection of all the adjusted confidence intervals contain the whole parameter vector with the specified coverage. This will always be less conservative than Bonferroni or Holm adjustment. See Olea and Plagborg-Møller (2019) for details on implementation for Wald and percentile intervals. Simultaneous p-values are computed by inverting the simultaneous bands. Simultaneous inference is only allowed when ci.type is "wald" or "perc" and index has length greater than 1. When ci.type = "wald", the mvtnorm package must be installed.
tidy() and print() methods are available for <summary.fwb> objects.
For summary(), a <summary.fwb> object, which is a matrix with the following columns:
Estimate: the statistic estimated in the original sample
Std. Error: the standard deviation of the bootstrap estimates
CI {L}% and CI {U}%: the upper and lower confidence interval bounds computed using the argument to ci.type (only when conf is not 0).
z value: when p.value = TRUE and ci.type = "wald", the z-statistic for the test of the estimate against against null.
Pr(>|z|): when p.value = TRUE, the p-value for the test of the estimate against against null.
For confint(), a matrix with a row for each statistic and a column for the upper and lower confidence interval limits.
Montiel Olea, J. L., & Plagborg-Møller, M. (2019). Simultaneous confidence bands: Theory, implementation, and an application to SVARs. Journal of Applied Econometrics, 34(1), 1–17. doi:10.1002/jae.2656
fwb() for performing the fractional weighted bootstrap; fwb.ci() for computing multiple confidence intervals for a single bootstrapped quantity
set.seed(123, "L'Ecuyer-CMRG") data("infert") fit_fun <- function(data, w) { fit <- glm(case ~ spontaneous + induced, data = data, family = "quasibinomial", weights = w) coef(fit) } fwb_out <- fwb(infert, fit_fun, R = 199, verbose = FALSE) # Basic confidence interval for both estimates summary(fwb_out, ci.type = "basic") # Just for "induced" coefficient; p-values requested, # no confidence intervals summary(fwb_out, ci.type = "norm", conf = 0, index = "induced", p.value = TRUE)set.seed(123, "L'Ecuyer-CMRG") data("infert") fit_fun <- function(data, w) { fit <- glm(case ~ spontaneous + induced, data = data, family = "quasibinomial", weights = w) coef(fit) } fwb_out <- fwb(infert, fit_fun, R = 199, verbose = FALSE) # Basic confidence interval for both estimates summary(fwb_out, ci.type = "basic") # Just for "induced" coefficient; p-values requested, # no confidence intervals summary(fwb_out, ci.type = "norm", conf = 0, index = "induced", p.value = TRUE)
vcovFWB() estimates the covariance matrix of model coefficient estimates using the fractional weighted bootstrap. It serves as a drop-in for stats::vcov() or sandwich::vcovBS(). Clustered covariances are can be requested.
vcovFWB( x, cluster = NULL, R = 1000, start = FALSE, wtype = getOption("fwb_wtype", "exp"), drop0 = FALSE, ..., fix = FALSE, use = "pairwise.complete.obs", .coef = stats::coef, verbose = FALSE, cl = NULL )vcovFWB( x, cluster = NULL, R = 1000, start = FALSE, wtype = getOption("fwb_wtype", "exp"), drop0 = FALSE, ..., fix = FALSE, use = "pairwise.complete.obs", .coef = stats::coef, verbose = FALSE, cl = NULL )
x |
a fitted model object, such as the output of a call to |
cluster |
a variable indicating the clustering of observations,
a |
R |
the number of bootstrap replications. Default is 1000 (more is better but slower). |
start |
|
wtype |
string; the type of weights to use. Allowable options include |
drop0 |
|
... |
ignored. |
fix |
|
use |
|
.coef |
a function used to extract the coefficients from each fitted model. Must return a numeric vector. By default, |
verbose |
|
cl |
a cluster object created by |
vcovFWB() functions like other vcov()-like functions, such as those in the sandwich package, in particular, sandwich::vcovBS(), which implements the traditional bootstrap (and a few other bootstrap varieties for linear models). Sets of weights are generated as described in the documentation for fwb(), and the supplied model is re-fit using those weights. When the fitted model already has weights, these are multiplied by the bootstrap weights.
For lm objects, the model is re-fit using .lm.fit() for speed, and, similarly, glm objects are re-fit using glm.fit() (or whichever fitting method was originally used). For other objects, update() is used to populate the weights and re-fit the model (this assumes the fitting function accepts non-integer case weights through a weights argument). If a model accepts weights in some other way, fwb() should be used instead; vcovFWB() is inherently limited in its ability to handle all possible models. It is important that the original model was not fit using frequency weights (i.e., weights that allow one row of data to represent multiple full, identical, individual units) unless clustering is used.
See sandwich::vcovBS() and sandwich::vcovCL() for more information on clustering covariance matrices, and see fwb() for more information on how clusters work with the fractional weighted bootstrap. When clusters are specified, each cluster is given a bootstrap weight, and all members of the cluster are given that weight; estimation then proceeds as normal. By default, when cluster is unspecified, each unit is considered its own cluster.
To speed up evaluation, parallel processing can be enabled. One way to do so is to supply an argument to cl. This can be either an integer(not available on Windows), a cluster object created by parallel::makeCluster(), or the string "future". Another general way is to use functionality in the futurize package, which is compatible with fwb. See vignette("futurize-81-fwb", package = "futurize") for details. See also vignette("fwb-rep") for information on replicability with (and without) parallel processing.
A matrix containing the covariance matrix estimate.
fwb() for performing the fractional weighted bootstrap on an arbitrary quantity
fwb.ci() for computing nonparametric confidence intervals for fwb objects
summary.fwb() for producing standard errors and confidence intervals for fwb objects
sandwich::vcovBS() for computing covariance matrices using the traditional bootstrap (the fractional weighted bootstrap is also available but with limited options).
set.seed(123, "L'Ecuyer-CMRG") data("infert") fit <- glm(case ~ spontaneous + induced, data = infert, family = "binomial") lmtest::coeftest(fit, vcov. = vcovFWB, R = 200) # Example from help("vcovBS", package = "sandwich") data("PetersenCL", package = "sandwich") m <- lm(y ~ x, data = PetersenCL) # Note: this is not to compare performance, just to # demonstrate the syntax cbind( "BS" = sqrt(diag(sandwich::vcovBS(m))), "FWB" = sqrt(diag(vcovFWB(m))), "BS-cluster" = sqrt(diag(sandwich::vcovBS(m, cluster = ~firm))), "FWB-cluster" = sqrt(diag(vcovFWB(m, cluster = ~firm))) ) # Using `wtype = "multinom"` exactly reproduces # `sandwich::vcovBS()` set.seed(11) s <- sandwich::vcovBS(m, R = 200) set.seed(11) f <- vcovFWB(m, R = 200, wtype = "multinom") all.equal(s, f) # Using a custom argument to `.coef` set.seed(123) data("infert") fit <- nnet::multinom(education ~ age, data = infert, trace = FALSE) # vcovFWB(fit, R = 200) ## error coef(fit) # coef() returns a matrix # Write a custom function to extract vector of # coefficients (can also use marginaleffects::get_coef()) coef_multinom <- function(x) { p <- t(coef(x)) setNames(as.vector(p), paste(colnames(p)[col(p)], rownames(p)[row(p)], sep = ":")) } coef_multinom(fit) # returns a vector vcovFWB(fit, R = 200, .coef = coef_multinom)set.seed(123, "L'Ecuyer-CMRG") data("infert") fit <- glm(case ~ spontaneous + induced, data = infert, family = "binomial") lmtest::coeftest(fit, vcov. = vcovFWB, R = 200) # Example from help("vcovBS", package = "sandwich") data("PetersenCL", package = "sandwich") m <- lm(y ~ x, data = PetersenCL) # Note: this is not to compare performance, just to # demonstrate the syntax cbind( "BS" = sqrt(diag(sandwich::vcovBS(m))), "FWB" = sqrt(diag(vcovFWB(m))), "BS-cluster" = sqrt(diag(sandwich::vcovBS(m, cluster = ~firm))), "FWB-cluster" = sqrt(diag(vcovFWB(m, cluster = ~firm))) ) # Using `wtype = "multinom"` exactly reproduces # `sandwich::vcovBS()` set.seed(11) s <- sandwich::vcovBS(m, R = 200) set.seed(11) f <- vcovFWB(m, R = 200, wtype = "multinom") all.equal(s, f) # Using a custom argument to `.coef` set.seed(123) data("infert") fit <- nnet::multinom(education ~ age, data = infert, trace = FALSE) # vcovFWB(fit, R = 200) ## error coef(fit) # coef() returns a matrix # Write a custom function to extract vector of # coefficients (can also use marginaleffects::get_coef()) coef_multinom <- function(x) { p <- t(coef(x)) setNames(as.vector(p), paste(colnames(p)[col(p)], rownames(p)[row(p)], sep = ":")) } coef_multinom(fit) # returns a vector vcovFWB(fit, R = 200, .coef = coef_multinom)
These functions are designed to compute weighted statistics (mean, variance, standard deviation, covariance, correlation, median and quantiles) to perform weighted transformation (scaling, centering, and standardization) using bootstrap weights. These automatically extract bootstrap weights when called from within fwb() to facilitate computation of bootstrap statistics without needing to think to hard about how to correctly incorporate weights.
w_mean(x, w = NULL, na.rm = FALSE) w_var(x, w = NULL, na.rm = FALSE) w_sd(x, w = NULL, na.rm = FALSE) w_cov(x, w = NULL, na.rm = FALSE) w_cor(x, w = NULL) w_quantile( x, w = NULL, probs = seq(0, 1, by = 0.25), na.rm = FALSE, names = TRUE, type = 7L, digits = 7L ) w_median(x, w = NULL, na.rm = FALSE) w_std(x, w = NULL, na.rm = TRUE, scale = TRUE, center = TRUE) w_scale(x, w = NULL, na.rm = TRUE) w_center(x, w = NULL, na.rm = TRUE)w_mean(x, w = NULL, na.rm = FALSE) w_var(x, w = NULL, na.rm = FALSE) w_sd(x, w = NULL, na.rm = FALSE) w_cov(x, w = NULL, na.rm = FALSE) w_cor(x, w = NULL) w_quantile( x, w = NULL, probs = seq(0, 1, by = 0.25), na.rm = FALSE, names = TRUE, type = 7L, digits = 7L ) w_median(x, w = NULL, na.rm = FALSE) w_std(x, w = NULL, na.rm = TRUE, scale = TRUE, center = TRUE) w_scale(x, w = NULL, na.rm = TRUE) w_center(x, w = NULL, na.rm = TRUE)
x |
a numeric variable; for |
w |
optional; a numeric vector of weights with length equal to the length of |
na.rm |
logical; whether to exclude |
probs, names, type, digits
|
see |
scale, center
|
logical; whether to scale or center the variable. |
These function automatically incorporate bootstrap weights when used inside fwb() or vcovFWB(). This works because fwb() and vcovFWB() temporarily set an option with the environment of the function that calls the estimation function with the sampled weights, and the w_*() functions access the bootstrap weights from that environment, if any. So, using, e.g., w_mean() inside the function supplied to the statistic argument of fwb(), computes the weighted mean using the bootstrap weights. Using these functions outside fwb() works just like other functions that compute weighted statistics: when w is supplied, the statistics are weighted, and otherwise they are unweighted.
See below for how each statistic is computed.
For all weighted statistics, the weights are first rescaled to sum to 1. in the formulas below refers to these weights.
w_mean()Calculates the weighted mean as
This is the same as weighted.mean().
w_var()Calculates the weighted variance as
w_sd()Calculates the weighted standard deviation as
w_cov()Calculates the weighted covariances as
This is the same as cov.wt().
w_cor()Calculates the weighted correlation as
This is the same as cov.wt().
w_quantile()Calculates the weighted quantiles using linear interpolation of the weighted cumulative density function.
w_median()Calculates the weighted median as w_quantile(., probs = .5).
Weighted transformations use the weighted mean and/or standard deviation to re-center or re-scale the given variable. In the formulas below, is the original variable and are the weights.
w_center()Centers the variable at its (weighted) mean.
w_scale()Scales the variable by its (weighted) standard deviation.
w_std()Centers and scales the variable by its (weighted) mean and standard deviation.
w_scale() and w_center() are efficient wrappers to w_std() with center = FALSE and scale = FALSE, respectively.
w_mean(), w_var(), w_sd(), and w_median() return a numeric scalar. w_cov() and w_cor() return numeric matrices. w_quantile() returns a numeric vector of length equal to probs. w_std(), w_scale(), and w_center() return numeric vectors of length equal to the length of x.
mean() and weighted.mean() for computing the unweighted and weighted mean
var() and sd() for computing the unweighted variance and standard deviation
median() and quantile() for computing the unweighted median and quantiles
cov(), cor(), and cov.wt() for unweighted and weighted covariance and correlation matrices
scale() for standardizing variables using arbitrary (by default, unweighted) centering and scaling factors
# G-computation of average treatment effects using lalonde # dataset data("lalonde", package = "cobalt") ate_est <- function(data, w) { fit <- lm(re78 ~ treat * (age + educ + married + race + nodegree + re74 + re75), data = data, weights = w) p0 <- predict(fit, newdata = transform(data, treat = 0)) p1 <- predict(fit, newdata = transform(data, treat = 1)) # Weighted means using bootstrap weights m0 <- w_mean(p0) m1 <- w_mean(p1) c(m0 = m0, m1 = m1, ATE = m1 - m0) } set.seed(123, "L'Ecuyer-CMRG") boot_est <- fwb(lalonde, statistic = ate_est, R = 199, verbose = FALSE) summary(boot_est) # Using `w_*()` data transformations inside a model # supplied to vcovFWB(): fit <- lm(re78 ~ treat * w_center(age), data = lalonde) lmtest::coeftest(fit, vcov = vcovFWB, R = 500)# G-computation of average treatment effects using lalonde # dataset data("lalonde", package = "cobalt") ate_est <- function(data, w) { fit <- lm(re78 ~ treat * (age + educ + married + race + nodegree + re74 + re75), data = data, weights = w) p0 <- predict(fit, newdata = transform(data, treat = 0)) p1 <- predict(fit, newdata = transform(data, treat = 1)) # Weighted means using bootstrap weights m0 <- w_mean(p0) m1 <- w_mean(p1) c(m0 = m0, m1 = m1, ATE = m1 - m0) } set.seed(123, "L'Ecuyer-CMRG") boot_est <- fwb(lalonde, statistic = ate_est, R = 199, verbose = FALSE) summary(boot_est) # Using `w_*()` data transformations inside a model # supplied to vcovFWB(): fit <- lm(re78 ~ treat * w_center(age), data = lalonde) lmtest::coeftest(fit, vcov = vcovFWB, R = 500)