Adding log odds to combine statistics

This is an answer to a question on Stats Overflow.

I want to estimate the probability of a person aged 40-49 in Delaware to be vaccinated, but I only have nationwide statistics on vaccination levels by age, and a level of vaccination in Delaware, but no age breakdown for that state.

I will need to make some independence assumptions, notably, that the age distribution of vaccinations is the same in Delaware. See below for another assumption I have to make to work with the provided data.

The method I use is to manually calculate the coefficients in a logistic regression model. As you will see, what happens is that we cannot add and subtract percentages directly, but we can add and subtract logodds.

Logistic regression model

To begin, we need two percentages from the official statistics, the nationwide (full) vaccination grade, and the percentage in Delaware.

us_vacc_p = .561
del_vacc_p = .566

0.566

I'm going to be using the following functions. The programming language I'm using is Julia, but I'm using only two basic functions and assignments so the code is going to be pretty much the same as in Python or R.

function logit(p)
     log(p / (1 - p))
end

function logistic(x)
    exp(x) / (exp(x) + 1)
end

The logit function calculates the so called log odds of a probability.

us_vacc_logodds = logit(us_vacc_p)

0.24522149244752528

The logistic function inverts this operation. A logistic regression model for this looks like

\[ \text{logit}(p) = \text{base} \]

for the general population, and

\[ \text{logit}(p) = \text{base} + \text{coefficient for Delaware} \]

for persons living in Delaware, where \(p\) is the probability of that person being vaccinated. Because the logistic function is the inverse of the logit function, we can calculate \(p\), the probability we are after, with the formula

\[ p = \text{logistic}\left(\text{base} + \text{coefficient for Delaware}\right) \]

Now the trick is that we can manually calculate the coefficient for Delaware using the following formula.

del_vacc_coef = logit(del_vacc_p) - logit(us_vacc_p)

0.020328051655252644

To check this, let's use this model to calculate the vaccination probability of the general us population,

logistic(us_vacc_logodds)

0.561

and for Delaware we use the coefficient as well, and we get

logistic(us_vacc_logodds + del_vacc_coef)

0.566

Now the next step is to calculate the coefficient for the age group, and add that to our model as well.

Calculating the age coefficient for 40-49

The official statistics aren't yet in the form we need them. On the graphs, you can find that 14.1% of those vaccinated are in the age group 40-49. What we want to know is how many in this age group are vaccinated. A complication here is that only 91% of those vaccinated have reported their age. We need another assumption here, namely that this nonresponse is independent from age group. If we assume that, we know that 14.1% of all vaccinated are in the age group 40-49.

vac_n = 186387228           # total number of vaccinated
age_vacc_n = .142 * vac_n   # in the age group 40-49

26466986.376

Also, we need the total number of people in the US in this age group, which isn't listed directly either. From the graph, it's 12.2% of the total population. The total population isn't listed either, but, 56.1% of the population is vaccinated, so we can calculated the total population from that.

us_n = vac_n / .561

332241048.1283422

So the percentage vaccinated in the age group 40-49 is

age_n = .122 * us_n
age_p = age_vacc_n / age_n

0.6529672131147541

Converting this to log odds, the calculation of the coefficient for the age group 40-49 is the same as earlier for Delaware

age_vacc_coef = logit(age_p) - logit(us_vacc_p)

0.38688616375890655

In the final calculation I combine the age based coefficient to the coefficient for Delaware. This is the step where I need the assumption that the age distribution is the same in Delaware.

logistic(us_vacc_logodds + del_vacc_coef + age_vacc_coef)

0.6575591337731559

So with the listed assumptions I estimate the probability of a person aged 40-49 living in Delaware to be vaccinated at 65.7%.

It is interesting to compare this to the original probabilities, with 65.3% of this age group being vaccinated in general, which is then corrected by comparing the 56.6% Delaware population average to the 56.1% general us population average.

Thanks for reading! If you want to reach out, post an issue to the Github repository of this website or contact me on Twitter!