@@ -94,18 +94,17 @@ found on our [questions and coding page](../../symptom-survey/coding.md).
9494
9595## ILI and CLI Indicators
9696
97- Of primary interest for the API are the symptoms defining a COVID-like illness
98- (fever, along with cough, or shortness of breath, or difficulty breathing) or
99- influenza-like illness (fever, along with cough or sore throat). Using this
100- survey data, we estimate the percentage of people (age 18 or older) who have a
101- COVID-like illness, or influenza-like illness, in a given location, on a given
102- day.
103-
104- Signals beginning ` raw_w ` or ` smoothed_w ` are [ adjusted using survey weights
105- to be demographically representative] ( #survey-weighting ) as described below.
106- Weighted signals have 1-2 days of lag, so if low latency is paramount,
107- unweighted signals are also available. These begin ` smoothed_ ` or ` raw_ ` ,
108- such as ` raw_cli ` instead of ` raw_wcli ` .
97+ We define COVID-like illness (fever, along with cough, or shortness of breath,
98+ or difficulty breathing) or influenza-like illness (fever, along with cough or
99+ sore throat) for use in forecasting and modeling. Using this survey data, we
100+ estimate the percentage of people (age 18 or older) who have a COVID-like
101+ illness, or influenza-like illness, in a given location, on a given day.
102+
103+ Signals beginning ` raw_w ` or ` smoothed_w ` are [ adjusted using survey weights to
104+ be demographically representative] ( #survey-weighting-and-estimation ) as
105+ described below. Weighted signals have 1-2 days of lag, so if low latency is
106+ paramount, unweighted signals are also available. These begin ` smoothed_ ` or
107+ ` raw_ ` , such as ` raw_cli ` instead of ` raw_wcli ` .
109108
110109| Signals | Description |
111110| --- | --- |
@@ -187,58 +186,21 @@ p = 100 \cdot \frac{x}{n}
187186q = 100 \cdot \frac{y}{n}.
188187$$
189188
190- We estimate $$ p $$ and $$ q $$ across 4 aggregation schemes:
191-
192- 1 . daily, at the county level;
193- 2 . daily, at the MSA (metropolitan statistical area) level;
194- 3 . daily, at the HRR (hospital referral region) level;
195- 4 . daily, at the state level.
196-
197- These are possible because we have the ZIP code of the household from Q4 of the
198- survey. Our current rule-of-thumb is to discard any estimate (whether at a
199- county, MSA, HRR, or state level) that is based on fewer than 100 survey
200- responses. When our geographical mapping data indicates that a ZIP code is part
201- of multiple geographical units in a single aggregation, we assign weights
202- $$ w_i^\text{geodiv} $$ to each of these units (based on the ZIP code's overlap
203- with each geographical unit) and use these weights as part of the survey
204- weighting, as [ described below] ( #survey-weighting ) .
205-
206- In a given aggregation unit (for example, daily-county), let $$ X_i $$ and
207- $$ Y_i $$ denote number of ILI and CLI cases in the household, respectively
208- (computed according to the simple strategy described above), and let $$ N_i $$
209- denote the total number of people in the household, in survey $$ i $$ , out of
210- $$ m $$ surveys we collected. Then our estimates of $$ p $$ and $$ q $$ (see
211- the [ appendix] ( #appendix ) for motivating details) are:
189+ In a given aggregation unit (for example, daily-county), let $$ X_i $$ and $$ Y_i $$
190+ denote number of ILI and CLI cases in the household, respectively (computed
191+ according to the simple strategy [ described
192+ above] ( #defining-household-ili-and-cli ) ), and let $$ N_i $$ denote the total
193+ number of people in the household, in survey $$ i $$ , out of $$ m $$ surveys we
194+ collected. Then our unweighted estimates of $$ p $$ and $$ q $$ are:
212195
213196$$
214197\hat{p} = 100 \cdot \frac{1}{m}\sum_{i=1}^m \frac{X_i}{N_i}
215198\quad\text{and}\quad
216199\hat{q} = 100 \cdot \frac{1}{m}\sum_{i=1}^m \frac{Y_i}{N_i}.
217200$$
218201
219- Their estimated standard errors are:
220-
221- $$
222- \begin{aligned}
223- \widehat{\mathrm{se}}(\hat{p}) &= 100 \cdot \frac{1}{m+1}\sqrt{
224- \left(\frac{1}{2} - \frac{\hat{p}}{100}\right)^2 +
225- \sum_{i=1}^m \left(\frac{X_i}{N_i} - \frac{\hat{p}}{100}\right)^2
226- } \\
227- \widehat{\mathrm{se}}(\hat{q}) &= 100 \cdot \frac{1}{m+1}\sqrt{
228- \left(\frac{1}{2} - \frac{\hat{q}}{100}\right)^2 +
229- \sum_{i=1}^m \left(\frac{Y_i}{N_i} - \frac{\hat{q}}{100}\right)^2
230- },
231- \end{aligned}
232- $$
233-
234- the standard deviations of the estimators after adding a single
235- pseudo-observation at 1/2 (treating $$ m $$ as fixed). The use of the
236- pseudo-observation prevents standard error estimates of zero, and in simulations
237- improves the quality of the standard error estimates.
238-
239- The pseudo-observation is not used in $$ \hat{p} $$ and $$ \hat{q} $$ themselves, to
240- avoid potentially large amounts of estimation bias, as $$ p $$ and $$ q $$ are
241- expected to be small.
202+ [ See below] ( #adjusting-household-ili-and-cli ) for details on weighting and
203+ standard errors for these estimates.
242204
243205### Estimating "Community CLI"
244206
@@ -254,36 +216,16 @@ a = 100 \cdot \frac{u}{n}
254216b = 100 \cdot \frac{y}{n}.
255217$$
256218
257- We will estimate $$ a $$ and $$ b $$ across the same 4 aggregation schemes as
258- before.
259-
260219For a single survey, let:
261220
262221- $$ U = 1 $$ if and only if a positive number is reported for Q2 or Q5;
263222- $$ V = 1 $$ if and only if a positive number is reported for Q2.
264223
265- In a given aggregation unit (for example, daily-county), let $$ U_i $$ and
266- $$ V_i $$ denote these quantities for survey $$ i $$ , and $$ m $$ denote the number of
267- surveys total. Then to estimate $$ a $$ and $$ b $$ , we simply use:
268-
269- $$
270- \hat{a} = 100 \cdot \frac{1}{m} \sum_{i=1}^m U_i
271- \quad\text{and}\quad
272- \hat{b} = 100 \cdot \frac{1}{m} \sum_{i=1}^m V_i.
273- $$
274-
275- Hence $$ \hat{a} $$ is reported in the ` hh_cmnty_cli ` signals and $$ \hat{b} $$ in
276- the ` nohh_cmnty_cli ` signals. Their estimated standard errors are:
277-
278- $$
279- \begin{aligned}
280- \widehat{\mathrm{se}}(\hat{a}) &= 100 \cdot \sqrt{\frac{\frac{\hat{a}}{100}(1-\frac{\hat{a}}{100})}{m}} \\
281- \widehat{\mathrm{se}}(\hat{b}) &= 100 \cdot \sqrt{\frac{\frac{\hat{b}}{100}(1-\frac{\hat{b}}{100})}{m}},
282- \end{aligned}
283- $$
284-
285- which are the plug-in estimates of the standard errors of the binomial
286- proportions (treating $$ m $$ as fixed).
224+ Let $$ U_i $$ and $$ V_i $$ denote these quantities for survey $$ i $$ , and $$ m $$
225+ denote the number of surveys total. We report the percentage of surveys where
226+ $$ U_i = 1 $$ as in the ` hh_cmnty_cli ` signals and the percentage where $$V_i =
227+ 1$$ in the ` nohh_cmnty_cli ` signals. The exact estimators are [ described
228+ below] ( #adjusting-other-percentage-estimators ) .
287229
288230Note that $$ \sum_{i=1}^m U_i $$ is the number of survey respondents who know
289231someone in their community with * either ILI or CLI* , and not CLI alone; and
0 commit comments