Adding log odds to combine statistics
Adding log odds to combine statistics
This is an answer to a question on Stats Overflow.
I want to estimate the probability of a person aged 40-49 in Delaware to be vaccinated, but I only have nationwide statistics on vaccination levels by age, and a level of vaccination in Delaware, but no age breakdown for that state.
I will need to make some independence assumptions, notably, that the age distribution of vaccinations is the same in Delaware. See below for another assumption I have to make to work with the provided data.
The method I use is to manually calculate the coefficients in a logistic regression model. As you will see, what happens is that we cannot add and subtract percentages directly, but we can add and subtract logodds.
Logistic regression model
To begin, we need two percentages from the official statistics, the nationwide (full) vaccination grade, and the percentage in Delaware.
us_vacc_p = .561 del_vacc_p = .566
0.566
I'm going to be using the following functions. The programming language I'm using is Julia, but I'm using only two basic functions and assignments so the code is going to be pretty much the same as in Python or R.
function logit(p) log(p / (1 - p)) end function logistic(x) exp(x) / (exp(x) + 1) end
The logit
function calculates the so called log odds of a probability.
us_vacc_logodds = logit(us_vacc_p)
0.24522149244752528
The logistic
function inverts this operation. A logistic regression
model for this looks like
\[ \text{logit}(p) = \text{base} \]
for the general population, and
\[ \text{logit}(p) = \text{base} + \text{coefficient for Delaware} \]
for persons living in Delaware, where \(p\) is the probability of that person being vaccinated. Because the logistic function is the inverse of the logit function, we can calculate \(p\), the probability we are after, with the formula
\[ p = \text{logistic}\left(\text{base} + \text{coefficient for Delaware}\right) \]
Now the trick is that we can manually calculate the coefficient for Delaware using the following formula.
del_vacc_coef = logit(del_vacc_p) - logit(us_vacc_p)
0.020328051655252644
To check this, let's use this model to calculate the vaccination probability of the general us population,
logistic(us_vacc_logodds)
0.561
and for Delaware we use the coefficient as well, and we get
logistic(us_vacc_logodds + del_vacc_coef)
0.566
Now the next step is to calculate the coefficient for the age group, and add that to our model as well.
Calculating the age coefficient for 40-49
The official statistics aren't yet in the form we need them. On the graphs, you can find that 14.1% of those vaccinated are in the age group 40-49. What we want to know is how many in this age group are vaccinated. A complication here is that only 91% of those vaccinated have reported their age. We need another assumption here, namely that this nonresponse is independent from age group. If we assume that, we know that 14.1% of all vaccinated are in the age group 40-49.
vac_n = 186387228 # total number of vaccinated age_vacc_n = .142 * vac_n # in the age group 40-49
26466986.376
Also, we need the total number of people in the US in this age group, which isn't listed directly either. From the graph, it's 12.2% of the total population. The total population isn't listed either, but, 56.1% of the population is vaccinated, so we can calculated the total population from that.
us_n = vac_n / .561
332241048.1283422
So the percentage vaccinated in the age group 40-49 is
age_n = .122 * us_n age_p = age_vacc_n / age_n
0.6529672131147541
Converting this to log odds, the calculation of the coefficient for the age group 40-49 is the same as earlier for Delaware
age_vacc_coef = logit(age_p) - logit(us_vacc_p)
0.38688616375890655
In the final calculation I combine the age based coefficient to the coefficient for Delaware. This is the step where I need the assumption that the age distribution is the same in Delaware.
logistic(us_vacc_logodds + del_vacc_coef + age_vacc_coef)
0.6575591337731559
So with the listed assumptions I estimate the probability of a person aged 40-49 living in Delaware to be vaccinated at 65.7%.
It is interesting to compare this to the original probabilities, with 65.3% of this age group being vaccinated in general, which is then corrected by comparing the 56.6% Delaware population average to the 56.1% general us population average.
Thanks for reading! If you want to reach out, post an issue to the Github repository of this website or contact me on Twitter!