I have a logistic regression model and output an $R^2$ value. I then go and add another predictor variable to fit a second model. I can output a new $R^2$ value associated with the second model. When I run an ANOVA test, I see no significant improvement in the second model, but I want to assess the power associated with including the additional variable in model 2. I have found an example for linear regression that uses an $F$-Test. I want to do something similar for a logistic regression using G*Power. But there appears to be very little documentation on multiple logistic regression models like my situation. I don't know how to do a more detailed power analysis for multiple logistic regression. From what I understand, in G*Power I set Test Family == z tests and statistical test == logisitic regression . But I am not sure what to set R² other X equal to. Is that the improvement in $R^2$? Reading the tutorial in 27.4 from the software manual makes no variation of $R^2$, whereas this example, does not discuss the improvements made from $R^2$.
$\begingroup$ R2 other X is probably not some "log reg pseudo R2", but it is the R-squared of the variable of interest with all the other covariables in the model, ignoring completely the response. $\endgroup$
Commented Mar 10, 2019 at 13:05The problem is that there isn't really a $R^2$ for logistic regression. Instead there are many different "pseudo-$R^2$s" that may be similar to the $R^2$ from a linear model in different ways. You can get a list of some at UCLA's statistics help website here.
In addition, the effect (e.g., odds ratio) of the added variable, $x_2$, isn't sufficient to determine your power to detect that effect. It matters how $x_2$ is distributed: The more widely spread the values are, the more powerful your test, even if the odds ratio is held constant. It further matters what the correlation between $x_2$ and $x_1$ is: The more correlated they are, the more data would be required to achieve the same power.
As a result of these facts, the way I try to calculate the power in these more complicated situations is to simulate. In that vein, it may help you to read my answer here: Simulation of logistic regression power analysis - designed experiments.
Looking at G*Power's documentation, they use a method based on Hsieh, Bloch, & Larsen (1998). The idea is that you first regress $x_2$ on $x_1$ (or whatever predictor variables went into the first model) using a linear regression. You use the regular $R^2$ for that. (That value should lie in the interval $[0,\ 1]$.) It goes in the R² other X field you are referring to. Then you specify the distribution of $x_2$ in the next couple of fields ( X distribution , X parm μ , and Z parm σ ).
The excellent book Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models have a treatment of power analysis for logistic regression, with some simple useful (approximate) formulas, very possibly the formulas used by GPower referred in another answer (in section 5.7.) If those approximations are not good enough, probably simulation will be needed.
Two-sided testing of $H_0\colon \beta_j=0$ (log-odds scale) versus $H_1\colon \beta_j=\beta_j^a$ with level $\alpha$ and power $\gamma$ , standard deviation of predictor $x_j$ is $\sigma_$ , $p$ the marginal prevalence of the outcome and $\rho_j^2$ is the multiple correlation of $x_j$ with all the other predictors (this is the R-squared reported by a linear multiple regression of $X_j$ as response on all the other predictors, and do not involve the response in the logistic regression at all.)
The minimum sample size is then $$ n=\frac<(z_+z_\gamma)^2><(\beta_j^a \sigma_
A graph showing minimum sample size as function of alternative coefficient $\beta_j^a$ :
For completeness some related formulas from the same source:
If sample size $n$ is decided then power is $$ \gamma=1-\Phi\left(z_-|\beta_j^a| \sigma_x\sqrt\right)$$ where $\Phi$ is the standard normal cumulative distribution function. The minimum detectable effect (on log-odds scale) is $$ \pm \beta_j^a = \frac
min_n
min_n(beta_a=0.2, sigma_x=1, p=0.5, R2=0.5) [1] 1570