Title: | Targeted Stable Balancing Weights Using Optimization |
---|---|
Description: | Use optimization to estimate weights that balance covariates for binary, multinomial, and continuous treatments in the spirit of Zubizarreta (2015) <doi:10.1080/01621459.2015.1023805>. The degree of balance can be specified for each covariate. In addition, sampling weights can be estimated that allow a sample to generalize to a population specified with given target moments of covariates. |
Authors: | Noah Greifer [aut, cre] |
Maintainer: | Noah Greifer <[email protected]> |
License: | GPL |
Version: | 0.2.5.9001 |
Built: | 2024-11-04 03:18:13 UTC |
Source: | https://github.com/ngreifer/optweight |
Checks whether proposed target population means values for targets
are suitable in number and order for submission to optweight
and optweight.svy
. Users should include one value per variable in formula
. For factor variables, one value per level of the variable is required. The output of check.targets
can also be used as an input to targets
in optweight
and optweight.svy
.
check.targets(formula, data = NULL, targets, stop = FALSE) ## S3 method for class 'optweight.targets' print(x, digits = 5, ...)
check.targets(formula, data = NULL, targets, stop = FALSE) ## S3 method for class 'optweight.targets' print(x, digits = 5, ...)
formula |
A formula with the covariates to be balanced with |
data |
An optional data set in the form of a data frame that contains the variables in |
targets |
A vector of target population means values for each covariate. These should be in the order corresponding to the order of the corresponding variable in |
stop |
|
x |
An |
digits |
How many digits to print. |
... |
Ignored. |
The purpose of check.targets
is to allow users to ensure that their proposed input to targets
in optweight
and optweight.svy
is correct both in the number of entries and their order. This is especially important when factor variables and interactions are included in the formula because factor variables are split into several dummies and interactions are moved to the end of the variable list, both of which can cause some confusion and potential error when entering targets
values.
Factor variables are internally split into a dummy variable for each level, so the user must specify a target population mean value for each level of the factor. These must add up to 1, and an error will be displayed if they do not. These values represent the proposrtion of units in the target population with each factor level.
Interactions (e.g., a:b
or a*b
in the formula
input) are always sent to the end of the variable list even if they are specified elsewhere in the formula
. It is important to run check.targets
to ensure the order of the proposed targets
corresponds to the represented order of covariates used in the formula. You can run check.targets
with targets = NULL
to see the order of covariates that is required without specifying any targets.
An optweight.targets
object, which is a named vector of target population mean values, one for each (expanded) covariate specified in formula
. This should be used as user inputs to optweight
and optweight.svy
.
Noah Greifer
library("cobalt") data("lalonde", package = "cobalt") #Checking if the correct number of entries are included: check.targets(treat ~ age + race + married + nodegree + re74, data = lalonde, targets = c(25, .4, .1, .5, .3, .5, 4000)) #Notice race is split into three values (.4, .1, and .5)
library("cobalt") data("lalonde", package = "cobalt") #Checking if the correct number of entries are included: check.targets(treat ~ age + race + married + nodegree + re74, data = lalonde, targets = c(25, .4, .1, .5, .3, .5, 4000)) #Notice race is split into three values (.4, .1, and .5)
Checks whether proposed tolerance values for tols
are suitable in number and order for submission to optweight
. Users should include one value per item in formula
. The output can also be used as an input to tols
in optweight
.
check.tols(formula, data = NULL, tols, stop = FALSE) ## S3 method for class 'optweight.tols' print(x, internal = FALSE, digits = 5, ...)
check.tols(formula, data = NULL, tols, stop = FALSE) ## S3 method for class 'optweight.tols' print(x, internal = FALSE, digits = 5, ...)
formula |
A formula with the covariates to be balanced with |
data |
An optional data set in the form of a data frame that contains the variables in |
tols |
A vector of balance tolerance values in standardized mean difference units for each covariate. These should be in the order corresponding to the order of the corresponding variable in |
stop |
|
x |
An |
internal |
|
digits |
How many digits to print. |
... |
Ignored. |
The purpose of check.tols
is to allow users to ensure that their proposed input to tols
in optweight
is correct both in the number of entries and their order. This is especially important when factor variables and interactions are included in the formula because factor variables are split into several dummies and interactions are moved to the end of the variable list, both of which can cause some confusion and potential error when entering tols
values.
Factor variables are internally split into a dummy variable for each level, but the user only needs to specify one tolerance value per original variable; check.tols
automatically expands the tols
input to match the newly created variables.
Interactions (e.g., a:b
or a*b
in the formula
input) are always sent to the end of the variable list even if they are specified elsewhere in the formula
. It is important to run check.tols
to ensure the order of the proposed tols
corresponds to the represented order of covariates used in optweight
. You can run check.tols
with no tols
input to see the order of covariates that is required.
check.tols
was designed to be used primarily for its message printing and print
method, but you can also assign its output to an object for use as an input to tols
in optweight
.
Note that only one formula and vector of tolerance values can be assessed at a time; for multiple treatment periods, each formula and tolerance vector must be entered seperately.
An optweight.tols
object, which is a named vector of tolerance values, one for each variable specified in formula
. This should be used as user inputs to optweight
. The "internal.tols"
attribute contains the tolerance values to be used internally by optweight
. These will differ from the vector values when there are factor variables that are split up; the user only needs to submit one tolerance per factor variable, but seperate tolerance values are produced for each new dummy created.
Noah Greifer
library("cobalt") data("lalonde", package = "cobalt") #Checking if the correct number of entries are included: check.tols(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = c(.01, .02, .03, .04)) #Checking the order of interactions; notice they go #at the end even if specified at the beginning. The #.09 values are where the interactions might be expected #to be, but they are in fact not. c <- check.tols(treat ~ age:educ + married*race + nodegree + re74, data = lalonde, tols = c(.09, .01, .01, .09, .01, .01)) print(c, internal = TRUE)
library("cobalt") data("lalonde", package = "cobalt") #Checking if the correct number of entries are included: check.tols(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = c(.01, .02, .03, .04)) #Checking the order of interactions; notice they go #at the end even if specified at the beginning. The #.09 values are where the interactions might be expected #to be, but they are in fact not. c <- check.tols(treat ~ age:educ + married*race + nodegree + re74, data = lalonde, tols = c(.09, .01, .01, .09, .01, .01)) print(c, internal = TRUE)
Estimate balancing weights for treatments and covariates specified in formula
. The degree of balance for each covariate is specified by tols
and the target population can be specified with targets
or estimand
. See Zubizarreta (2015), Wang & Zubizarreta (2019), and Yiu & Su (2018) for details of the properties of the weights and the methods used to fit them.
optweight(formula, data = NULL, tols = 0, estimand = "ATE", targets = NULL, s.weights = NULL, b.weights = NULL, focal = NULL, verbose = FALSE, force = FALSE, ...) ## S3 method for class 'optweight' print(x, ...) ## S3 method for class 'optweightMSM' print(x, ...)
optweight(formula, data = NULL, tols = 0, estimand = "ATE", targets = NULL, s.weights = NULL, b.weights = NULL, focal = NULL, verbose = FALSE, force = FALSE, ...) ## S3 method for class 'optweight' print(x, ...) ## S3 method for class 'optweightMSM' print(x, ...)
formula |
A formula with a treatment variable on the left hand side and the covariates to be balanced on the right hand side, or a list thereof. See |
data |
An optional data set in the form of a data frame that contains the variables in |
tols |
A vector of balance tolerance values for each covariate, or a list thereof. The resulting weighted balance statistics will be at least as small as these values. If only one value is supplied, it will be applied to all covariates. Can also be the output of a call to |
estimand |
The desired estimand, which determines the target population. For binary treatments, can be "ATE", "ATT", "ATC", or |
targets |
A vector of target populaton mean values for each baseline covariate. The resulting weights will yield sample means within |
s.weights |
A vector of sampling weights or the name of a variable in |
b.weights |
A vector of base weights or the name of a variable in |
focal |
When multinomial treatments are used and the "ATT" is requested, which group to consider the "treated" or focal group. This group will not be weighted, and the other groups will be weighted to be more like the focal group. If specified, |
verbose |
Whether information on the optimization problem solution should be printed. This information contains how many iterations it took to estimate the weights and whether the solution is optimal. |
force |
optweights are currently not valid for use with longitudinal treatments, and will produce an error message if attempted. Set to |
... |
For |
x |
An |
The optimization is performed by the lower-level function optweight.fit
using solve_osqp
in the osqp package, which provides a straightforward interface to specifying the constraints and objective function for quadratic optimization problems and uses a fast and flexible solving algorithm.
For binary and multinomial treatments, weights are estimated so that the weighted mean differences of the covariates are within the given tolerance thresholds (unless std.binary
or std.cont
are TRUE
, in which case standardized mean differences are considered for binary and continuous variables, respectively). For a covariate with specified tolerance
, the weighted means of each each group will be within
of each other. Additionally, when the ATE is specified as the estimand or a target population is specified, the weighted means of each group will each be within
of the target means; this ensures generalizability to the same population from which the original sample was drawn.
If standardized tolerance values are requested, the standardization factor corresponds to the estimand requested: when the ATE is requested or a target population specified, the standardization factor is the square root of the average variance for that covariate across treatment groups, and when the ATT or ATC are requested, the standardization factor is the standard deviation of the covariate in the focal group. The standardization factor is always unweighted.
For continuous treatments, weights are estimated so that the weighted correlation between the treatment and each covariate is within the specified tolerance threshold. If the ATE is requested or a target population is specified, the means of the weighted covariates and treatment are restricted to be equal to those of the target population to ensure generalizability to the desired target population. The weighted correlation is computed as the weighted covariance divided by the product of the unweighted standard deviations. The means used to center the variables in computing the covariance are those specified in the target population.
For longitudinal treatments, only "wide" data sets, where each row corresponds to a unit's entire variable history, are supported. You can use reshape
or other functions to transform your data into this format; see example in the documentation for weightitMSM
in the WeightIt package. Currently, longtiduinal treatments are not recommended as optweight's use with them has not been validated.
Two types of constriants may be associated with each covariate: target constraints and balance constraints. Target constraints require the mean of the covariate to be at (or near) a specific target value in each treatment group (or for the whole group when treatment is continuous). Balance constraints require the means of the covariate in pairs of treatments to be near each other. For binary and multinomial treatments, balance constraints are redundant if target constraints are provided for a variable. For continuous variables, balance constraints refer to the correlation between treatment and the covariate and are not redundant with target constraints. In the duals
component of the output, each covariate has a dual variable for each nonredundant constraint placed on it.
The dual variable for each constraint is the instantaneous rate of change of the objective function at the optimum due to a change in the constraint. Because this relationship is not linear, large changes in the constraint will not exactly map onto corresponding changes in the objective function at the optimum, but will be close for small changes in the constraint. For example, for a covariate with a balance constraint of .01 and a corresponding dual variable of .4, increasing (i.e., relaxing) the constraint to .025 will decrease the value of the objective function at the optimum by approximately (.025 - .01) * .4 = .006. When the L2 norm is used, this change corresponds to a change in the variance of the weights, which directly affects the effective sample size (though the magnitude of this effect depends on the original value of the effective sample size).
For factor variables, optweight
takes the sum of the absolute dual variables for the constraints for all levels and reports it as the the single dual variable for the variable itself. This summed dual variable works the same way as dual variables for continuous variables do.
Sometimes the optimization will fail to converge at a solution. There are a variety of reasons why this might happen, which include that the constraints are nearly impossible to satisfy or that the optimization surface is relatively flat. It can be hard to know the exact cause or how to solve it, but this section offers some solutions one might try.
Rarely is the problem too few iterations, though this is possible. Most problems can be solved in the default 200,000 iterations, but sometimes it can help to increase this number with the max_iter
argument. Usually, though, this just ends up taking more time without a solution found.
If the problem is that the constraints are too tight, it can be helpful to loosen the constraints. Sometimes examining the dual variables of a solution that has failed to converge can reveal which constraints are causing the problem.
Sometimes a suboptimal solution is possible; such a solution does not satisfy the constraints exactly but will come pretty close. To allow these solutions, the arguments eps_abs
and eps_rel
can be increased from 1E-8 to larger values. These should be adjusted together since they both must be satisfied for convergence to occur; this can be done easily using the shortcut argument eps
, which changes both eps_abs
and eps_rel
to the set value.
With continuous treatments, solutions that failed to converge may still be useable. Make sure to assess balance and examine the weights even after a optimal solution is not found, because the solution that is found may be good enough.
If only one time point is specified, an optweight
object with the following elements:
weights |
The estimated weights, one for each unit. |
treat |
The values of the treatment variable. |
covs |
The covariates used in the fitting. Only includes the raw covariates, which may have been altered in the fitting process. |
s.weights |
The provided sampling weights. |
b.weights |
The provided base weights. |
estimand |
The estimand requested. |
focal |
The focal variable if the ATT was requested with a multinomial treatment. |
call |
The function call. |
tols |
The tolerance values for each covariate. |
duals |
A data.frame containing the dual variables for each covariate. See Details for interpretation of these values. |
info |
The |
Otherwise, if multiple time points are specified, an optmatchMSM
object with the following elements:
weights |
The estimated weights, one for each unit. |
treat.list |
A list of the values of the treatment variables at each time point. |
covs.list |
A list of the covariates at each time point used in the fitting. Only includes the raw covariates, which may have been altered in the fitting process. |
s.weights |
The provided sampling weights. |
b.weights |
The provided base weights. |
call |
The function call. |
tols |
A list of tolerance values for each covariate at each time point. |
duals |
A list of data.frames containing the dual variables for each covariate at each time point. See Details for interpretation of these values. |
info |
The |
Noah Greifer
Anderson, E. (2018). osqp: Quadratic Programming Solver using the 'OSQP' Library. R package version 0.1.0. https://CRAN.R-project.org/package=osqp
Wang, Y., & Zubizarreta, J. R. (2020). Minimal dispersion approximately balancing weights: Asymptotic properties and practical considerations. Biometrika, 107(1), 93–105. doi:10.1093/biomet/asz050
Yiu, S., & Su, L. (2018). Covariate association eliminating weights: a unified weighting framework for causal effect estimation. Biometrika. doi:10.1093/biomet/asy015
Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data. Journal of the American Statistical Association, 110(511), 910–922. doi:10.1080/01621459.2015.1023805
https://osqp.org/docs/index.html for more information on osqp, the underlying solver, and the options for solve_osqp
.
osqpSettings
for details on options for solve_osqp
.
optweight.fit
, the lower-level function that performs the fitting.
The package sbw, which was the inspiration for this package and provides additional functionality for binary treatments.
library("cobalt") data("lalonde", package = "cobalt") #Balancing covariates between treatment groups (binary) (ow1 <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = c(.01, .02, .03, .04, .05), estimand = "ATE")) bal.tab(ow1) #Exactly alancing covariates with respect to race (multinomial) (ow2 <- optweight(race ~ age + educ + married + nodegree + re74, data = lalonde, tols = 0, estimand = "ATT", focal = "black")) bal.tab(ow2) # #Balancing covariates with longitudinal treatments # #NOT VALID; DO NOT DO THIS. # library("twang") # data("iptwExWide") # # ##Weighting more recent covariates more strictly # (ow3 <- optweight(list(tx1 ~ use0 + gender + age, # tx2 ~ tx1 + use1 + use0 + gender + # age, # tx3 ~ tx2 + use2 + tx1 + use1 + # use0 + gender + age), # data = iptwExWide, # tols = list(c(.001, .001, .001), # c(.001, .001, .01, .01, .01), # c(.001, .001, .01, .01, # .1, .1, .1)))) # bal.tab(ow3) #Balancing covariates between treatment groups (binary) #and requesting a specified target population (ow4a <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = 0, targets = c(26, 12, .4, .5, 1000), estimand = NULL)) bal.tab(ow4a, disp.means = TRUE) #Balancing covariates between treatment groups (binary) #and not requesting a target population (ow4b <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = 0, targets = NULL, estimand = NULL)) bal.tab(ow4b, disp.means = TRUE)
library("cobalt") data("lalonde", package = "cobalt") #Balancing covariates between treatment groups (binary) (ow1 <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = c(.01, .02, .03, .04, .05), estimand = "ATE")) bal.tab(ow1) #Exactly alancing covariates with respect to race (multinomial) (ow2 <- optweight(race ~ age + educ + married + nodegree + re74, data = lalonde, tols = 0, estimand = "ATT", focal = "black")) bal.tab(ow2) # #Balancing covariates with longitudinal treatments # #NOT VALID; DO NOT DO THIS. # library("twang") # data("iptwExWide") # # ##Weighting more recent covariates more strictly # (ow3 <- optweight(list(tx1 ~ use0 + gender + age, # tx2 ~ tx1 + use1 + use0 + gender + # age, # tx3 ~ tx2 + use2 + tx1 + use1 + # use0 + gender + age), # data = iptwExWide, # tols = list(c(.001, .001, .001), # c(.001, .001, .01, .01, .01), # c(.001, .001, .01, .01, # .1, .1, .1)))) # bal.tab(ow3) #Balancing covariates between treatment groups (binary) #and requesting a specified target population (ow4a <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = 0, targets = c(26, 12, .4, .5, 1000), estimand = NULL)) bal.tab(ow4a, disp.means = TRUE) #Balancing covariates between treatment groups (binary) #and not requesting a target population (ow4b <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = 0, targets = NULL, estimand = NULL)) bal.tab(ow4b, disp.means = TRUE)
optweight.fit
performs the optimization (via osqp; Anderson, 2018) for optweight
and should, in most coses, not be used directly. No processing of inputs is performed, so they must be given exactly as described below.
optweight.fit(treat.list, covs.list, tols, estimand = "ATE", targets = NULL, s.weights = NULL, focal = NULL, norm = "l2", std.binary = FALSE, std.cont = TRUE, min.w = 1E-8, verbose = FALSE, force = FALSE, ...)
optweight.fit(treat.list, covs.list, tols, estimand = "ATE", targets = NULL, s.weights = NULL, focal = NULL, norm = "l2", std.binary = FALSE, std.cont = TRUE, min.w = 1E-8, verbose = FALSE, force = FALSE, ...)
treat.list |
A list containing one vector of treatment statuses for each time point. Non-numeric (i.e., factor or character) vectors are allowed. |
covs.list |
A list containing one matrix of covariates to be balanced for each time point. All matrices must be numeric but do not have to be full rank. |
tols |
A list containing one vector of balance tolerance values for each time point. |
estimand |
The desired estimand, which determines the target population. For binary treatments, can be "ATE", "ATT", "ATC", or |
targets |
A vector of target populaton mean values for each baseline covariate. The resulting weights will yield sample means within |
s.weights |
A vector of sampling weights. Optimization occurs on the product of the sampling weights and the estimated weights. |
b.weights |
A vector of base weights or the name of a variable in |
focal |
When multinomial treatments are used and the "ATT" is requested, which group to consider the "treated" or focal group. This group will not be weighted, and the other groups will be weighted to be more like the focal group. |
norm |
A string containing the name of the norm corresponding to the objective function to minimize. The options are |
std.binary , std.cont
|
|
min.w |
A single |
verbose |
Whether information on the optimization problem solution should be printed. This information contains how many iterations it took to estimate the weights and whether the solution is optimal. |
force |
optweights are currently not valid for use with longitudinal treatments, and will produce an error message if attempted. Set to |
... |
Options that are passed to |
optweight.fit
transforms the inputs into the required inputs for solve_osqp
, which are (sparse) matrices and vectors, and then supplies the outputs (the weights, duals variables, and convergence information) back to optweight
. No processing of inputs is performed, as this is normally handled by optweight
.
The default values for some of the parameters sent to solve_osqp
are not the same as those in osqpSettings
. The following are the differences: max_iter
is set to 20000 and eps_abs
and eps_rel
are set to 1E-8 (i.e., 10^-8). All other values are the same.
Note that optweights with longitudinal treatments are not valid and should not be used until further research is done.
An optweight.fit
object with the following elements:
w |
The estimated weights, one for each unit. |
duals |
A data.frame containing the dual variables for each covariate, or a list thereof. See Zubizarreta (2015) for interpretation of these values. |
info |
The |
Noah Greifer
Anderson, E. (2018). osqp: Quadratic Programming Solver using the 'OSQP' Library. R package version 0.1.0. https://CRAN.R-project.org/package=osqp
Wang, Y., & Zubizarreta, J. R. (2017). Approximate Balancing Weights: Characterizations from a Shrinkage Estimation Perspective. ArXiv:1705.00998 [Math, Stat]. Retrieved from http://arxiv.org/abs/1705.00998
Yiu, S., & Su, L. (2018). Covariate association eliminating weights: a unified weighting framework for causal effect estimation. Biometrika. doi:10.1093/biomet/asy015
Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data. Journal of the American Statistical Association, 110(511), 910–922. doi:10.1080/01621459.2015.1023805
optweight
which you should use for estimating the balancing weights, unless you know better.
https://osqp.org/docs/index.html for more information on osqp, the underlying solver, and the options for solve_osqp
.
osqpSettings
for details on options for solve_osqp
.
library("cobalt") data("lalonde", package = "cobalt") treat.list <- list(lalonde$treat) covs.list <- list(splitfactor(lalonde[2:8], drop.first = "if2")) tols.list <- list(rep(.01, ncol(covs.list[[1]]))) ow.fit <- optweight.fit(treat.list, covs.list, tols = tols.list, estimand = "ATE", norm = "l2")
library("cobalt") data("lalonde", package = "cobalt") treat.list <- list(lalonde$treat) covs.list <- list(splitfactor(lalonde[2:8], drop.first = "if2")) tols.list <- list(rep(.01, ncol(covs.list[[1]]))) ow.fit <- optweight.fit(treat.list, covs.list, tols = tols.list, estimand = "ATE", norm = "l2")
Estimate targeting weights for covariates specified in formula
. The target means are specified with targets
and the maximum distance between each weighted covariate mean and the corresponding target mean is specified by tols
. See Zubizarreta (2015) for details of the properties of the weights and the methods used to fit them.
optweight.svy(formula, data = NULL, tols = 0, targets = NULL, s.weights = NULL, verbose = FALSE, ...) ## S3 method for class 'optweight.svy' print(x, ...)
optweight.svy(formula, data = NULL, tols = 0, targets = NULL, s.weights = NULL, verbose = FALSE, ...) ## S3 method for class 'optweight.svy' print(x, ...)
formula |
A formula with nothing on the left hand side and the covariates to be targeted on the right hand side. See |
data |
An optional data set in the form of a data frame that contains the variables in |
tols |
A vector of target balance tolerance values for each covariate. The resulting weighted covariate means will be no further away from the targets than the specified values. If only one value is supplied, it will be applied to all covariates. Can also be the output of a call to |
targets |
A vector of target populaton mean values for each covariate. The resulting weights will yield sample means within |
s.weights |
A vector of sampling weights or the name of a variable in |
verbose |
Whether information on the optimization problem solution should be printed. This information contains how many iterations it took to estimate the weights and whether the solution is optimal. |
... |
For |
x |
An |
The optimization is performed by the lower-level function optweight.svy.fit
using solve_osqp
in the osqp package, which provides a straightforward interface to specifying the constraints and objective function for quadratic optimization problems and uses a fast and flexible solving algorithm.
Weights are estimated so that the standardized differences between the weighted covariate means and the corresponding targets are within the given tolerance thresholds (unless std.binary
or std.cont
are FALSE
, in which case unstandardized mean differences are considered for binary and continuous variables, respectively). For a covariate with specified tolerance
, the weighted mean will be within
of the target. If standardized tolerance values are requested, the standardization factor is the standard deviation of the covariate in the whole sample. The standardization factor is always unweighted.
See the optweight
help page for information on interpreting dual variables and solving convergence failure.
An optweight.svy
object with the following elements:
weights |
The estimated weights, one for each unit. |
covs |
The covariates used in the fitting. Only includes the raw covariates, which may have been altered in the fitting process. |
s.weights |
The provided sampling weights. |
call |
The function call. |
tols |
The tolerance values for each covariate. |
duals |
A data.frame containing the dual variables for each covariate. See Details for interpretation of these values. |
info |
The |
Noah Greifer
Anderson, E. (2018). osqp: Quadratic Programming Solver using the 'OSQP' Library. R package version 0.1.0. https://CRAN.R-project.org/package=osqp
Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data. Journal of the American Statistical Association, 110(511), 910–922. doi:10.1080/01621459.2015.1023805
https://osqp.org/docs/index.html for more information on osqp, the underlying solver, and the options for solve_osqp
.
osqpSettings
for details on options for solve_osqp
.
optweight.svy.fit
, the lower-level function that performs the fitting.
optweight
for estimating weights that balance treatment groups.
library("cobalt") data("lalonde", package = "cobalt") cov.formula <- ~ age + educ + race + married + nodegree targets <- check.targets(cov.formula, data = lalonde, targets = c(23, 9, .3, .3, .4, .2, .5)) tols <- check.tols(cov.formula, data = lalonde, tols = 0) ows <- optweight.svy(cov.formula, data = lalonde, tols = tols, targets = targets) ows covs <- splitfactor(lalonde[c("age", "educ", "race", "married", "nodegree")], drop.first = FALSE) #Unweighted means apply(covs, 2, mean) #Weighted means; same as targets apply(covs, 2, weighted.mean, w = ows$weights)
library("cobalt") data("lalonde", package = "cobalt") cov.formula <- ~ age + educ + race + married + nodegree targets <- check.targets(cov.formula, data = lalonde, targets = c(23, 9, .3, .3, .4, .2, .5)) tols <- check.tols(cov.formula, data = lalonde, tols = 0) ows <- optweight.svy(cov.formula, data = lalonde, tols = tols, targets = targets) ows covs <- splitfactor(lalonde[c("age", "educ", "race", "married", "nodegree")], drop.first = FALSE) #Unweighted means apply(covs, 2, mean) #Weighted means; same as targets apply(covs, 2, weighted.mean, w = ows$weights)
optweight.svy.fit
performs the optimization (via osqp; Anderson, 2018) for optweight.svy
and should, in most coses, not be used directly. No processing of inputs is performed, so they must be given exactly as described below.
optweight.svy.fit(covs, tols = 0, targets, s.weights = NULL, norm = "l2", std.binary = FALSE, std.cont = TRUE, min.w = 1E-8, verbose = FALSE, ...)
optweight.svy.fit(covs, tols = 0, targets, s.weights = NULL, norm = "l2", std.binary = FALSE, std.cont = TRUE, min.w = 1E-8, verbose = FALSE, ...)
covs |
A matrix of covariates to be targeted. Should must be numeric but does not have to be full rank. |
tols |
A vector of target balance tolerance values. |
targets |
A vector of target populaton mean values for each covariate. The resulting weights will yield sample means within |
s.weights |
A vector of sampling weights. Optimization occurs on the product of the sampling weights and the estimated weights. |
norm |
A string containing the name of the norm corresponding to the objective function to minimize. The options are |
std.binary , std.cont
|
|
min.w |
A single |
verbose |
Whether information on the optimization problem solution should be printed. This information contains how many iterations it took to estimate the weights and whether the solution is optimal. |
... |
Options that are passed to |
optweight.svy.fit
transforms the inputs into the required inputs for solve_osqp
, which are (sparse) matrices and vectors, and then supplies the outputs (the weights, duals variables, and convergence information) back to optweight.svy
. No processing of inputs is performed, as this is normally handled by optweight.svy
.
An optweight.svy.fit
object with the following elements:
w |
The estimated weights, one for each unit. |
duals |
A data.frame containing the dual variables for each covariate. See Zubizarreta (2015) for interpretation of these values. |
info |
The |
Noah Greifer
Anderson, E. (2018). osqp: Quadratic Programming Solver using the 'OSQP' Library. R package version 0.1.0. https://CRAN.R-project.org/package=osqp
Wang, Y., & Zubizarreta, J. R. (2017). Approximate Balancing Weights: Characterizations from a Shrinkage Estimation Perspective. ArXiv:1705.00998 [Math, Stat]. Retrieved from http://arxiv.org/abs/1705.00998
Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data. Journal of the American Statistical Association, 110(511), 910–922. doi:10.1080/01621459.2015.1023805
optweight.svy
which you should use for estimating the balancing weights, unless you know better.
https://osqp.org/docs/index.html for more information on osqp, the underlying solver, and the options for solve_osqp
.
osqpSettings
for details on options for solve_osqp
.
library("cobalt") data("lalonde", package = "cobalt") covs <- splitfactor(lalonde[c("age", "educ", "race", "married", "nodegree")], drop.first = FALSE) targets <- c(23, 9, .3, .3, .4, .2, .5) tols <- rep(0, 7) ows.fit <- optweight.svy.fit(covs, tols = tols, targets = targets, norm = "l2") #Unweighted means apply(covs, 2, mean) #Weighted means; same as targets apply(covs, 2, weighted.mean, w = ows.fit$w)
library("cobalt") data("lalonde", package = "cobalt") covs <- splitfactor(lalonde[c("age", "educ", "race", "married", "nodegree")], drop.first = FALSE) targets <- c(23, 9, .3, .3, .4, .2, .5) tols <- rep(0, 7) ows.fit <- optweight.svy.fit(covs, tols = tols, targets = targets, norm = "l2") #Unweighted means apply(covs, 2, mean) #Weighted means; same as targets apply(covs, 2, weighted.mean, w = ows.fit$w)
Plots the dual variables resulting from optweight
in a way similar to figure 2 of Zubizarreta (2015), which explained how to interpret these values. These represent the cost of changing the constraint on the variance of the resulting weights. For covariates with large values of the dual variable, tightening the constraint will increase the variability of the weights, and loosening the constraint will decrease the variability of the weights, both to a greater extent than would doing the same for covariate with small values of the dual variable.
## S3 method for class 'optweight' plot(x, which.time = 1, ...) ## S3 method for class 'optweight.svy' plot(x, ...)
## S3 method for class 'optweight' plot(x, which.time = 1, ...) ## S3 method for class 'optweight.svy' plot(x, ...)
x |
An |
which.time |
For longitudinal treatments, which time period to display. Only one may be displayed at a time. |
... |
Ignored. |
A ggplot
object that can be used with other ggplot2 functions.
Noah Greifer
Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data. Journal of the American Statistical Association, 110(511), 910–922. doi:10.1080/01621459.2015.1023805
optweight
or optweight.svy
to estimate the weights and the dual variables
plot.summary.optweight
for plots of the distribution of weights
library("cobalt") data("lalonde", package = "cobalt") #Balancing covariates between treatment groups (binary) ow1 <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = c(.1, .1, .1, .1, .1), estimand = "ATT") summary(ow1) # Note the coefficient of variation # and effective sample size (ESS) plot(ow1) # age has a low value, married is high ow2 <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = c(0, .1, .1, .1, .1), estimand = "ATT") summary(ow2) # Notice that tightening the constraint # on age had a negligible effect on the # variability of the weights and ESS ow3 <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = c(.1, .1, 0, .1, .1), estimand = "ATT") summary(ow3) # In contrast, tightening the constraint # on married had a large effect on the # variability of the weights, shrinking # the ESS
library("cobalt") data("lalonde", package = "cobalt") #Balancing covariates between treatment groups (binary) ow1 <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = c(.1, .1, .1, .1, .1), estimand = "ATT") summary(ow1) # Note the coefficient of variation # and effective sample size (ESS) plot(ow1) # age has a low value, married is high ow2 <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = c(0, .1, .1, .1, .1), estimand = "ATT") summary(ow2) # Notice that tightening the constraint # on age had a negligible effect on the # variability of the weights and ESS ow3 <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = c(.1, .1, 0, .1, .1), estimand = "ATT") summary(ow3) # In contrast, tightening the constraint # on married had a large effect on the # variability of the weights, shrinking # the ESS
These functions summarize the weights resulting from a call to optweight
or optweight.svy
. summary
produces summary statistics on the distribution of weights, including their range and variability, and the effective sample size of the weighted sample (computing using the formula in McCaffrey, Rudgeway, & Morral, 2004). plot
creates a histogram of the weights.
## S3 method for class 'optweight' summary(object, top = 5, ignore.s.weights = FALSE, ...) ## S3 method for class 'optweightMSM' summary(object, top = 5, ignore.s.weights = FALSE, ...) ## S3 method for class 'optweight.svy' summary(object, top = 5, ignore.s.weights = FALSE, ...) ## S3 method for class 'summary.optweight' print(x, ...) ## S3 method for class 'summary.optweightMSM' print(x, ...) ## S3 method for class 'summary.optweight.svy' print(x, ...) ## S3 method for class 'summary.optweight' plot(x, ...)
## S3 method for class 'optweight' summary(object, top = 5, ignore.s.weights = FALSE, ...) ## S3 method for class 'optweightMSM' summary(object, top = 5, ignore.s.weights = FALSE, ...) ## S3 method for class 'optweight.svy' summary(object, top = 5, ignore.s.weights = FALSE, ...) ## S3 method for class 'summary.optweight' print(x, ...) ## S3 method for class 'summary.optweightMSM' print(x, ...) ## S3 method for class 'summary.optweight.svy' print(x, ...) ## S3 method for class 'summary.optweight' plot(x, ...)
object |
An |
top |
How many of the largest and smallest weights to display. Default is 5. |
ignore.s.weights |
Whether or not to ignore sampling weights when computing the weight summary. If |
x |
A |
... |
Additional arguments. For |
For point treatments (i.e., optweight
objects), summary
returns a summary.optweight
object with the following elements:
weight.range |
The range (minimum and maximum) weight for each treatment group. |
weight.top |
The units with the greatest weights in each treatment group; how many are included is determined by |
coef.of.var |
The coefficient of variation (standard deviation divided by mean) of the weights in each treatment group and overall. When no sampling weights are used, this is simply the standard deviation of the weights. |
mean.abs.dev |
The mean absolute deviation of the weights in each treatment group and overall. |
effective.sample.size |
The effective sample size for each treatment group before and after weighting. |
For longitudinal treatments (i.e., optweightMSM
objects), a list of the above elements for each treatment period.
For optweight.svy
objects, a list of the above elements but with no treatment group divisions.
plot
returns a ggplot
object with a histogram displaying the distribution of the estimated weights. If the estimand is the ATT or ATC, only the weights for the non-focal group(s) will be displayed (since the weights for the focal group are all 1). A dotted line is displayed at the mean of the weights (usually 1).
Noah Greifer
McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). Propensity Score Estimation With Boosted Regression for Evaluating Causal Effects in Observational Studies. Psychological Methods, 9(4), 403–425. doi:10.1037/1082-989X.9.4.403
plot.optweight
for plotting the values of the dual variables.
library("cobalt") data("lalonde", package = "cobalt") #Balancing covariates between treatment groups (binary) (ow1 <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = .001, estimand = "ATT")) (s <- summary(ow1)) plot(s, breaks = 12)
library("cobalt") data("lalonde", package = "cobalt") #Balancing covariates between treatment groups (binary) (ow1 <- optweight(treat ~ age + educ + married + nodegree + re74, data = lalonde, tols = .001, estimand = "ATT")) (s <- summary(ow1)) plot(s, breaks = 12)