Skip to content

Commit 0b03330

Browse files
committed
Simplify initial sections on CLI
Remove technical detail and point to the later sections, so we don't show the estimators twice.
1 parent 3637083 commit 0b03330

File tree

1 file changed

+24
-82
lines changed

1 file changed

+24
-82
lines changed

docs/api/covidcast-signals/fb-survey.md

Lines changed: 24 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -94,18 +94,17 @@ found on our [questions and coding page](../../symptom-survey/coding.md).
9494

9595
## ILI and CLI Indicators
9696

97-
Of primary interest for the API are the symptoms defining a COVID-like illness
98-
(fever, along with cough, or shortness of breath, or difficulty breathing) or
99-
influenza-like illness (fever, along with cough or sore throat). Using this
100-
survey data, we estimate the percentage of people (age 18 or older) who have a
101-
COVID-like illness, or influenza-like illness, in a given location, on a given
102-
day.
103-
104-
Signals beginning `raw_w` or `smoothed_w` are [adjusted using survey weights
105-
to be demographically representative](#survey-weighting) as described below.
106-
Weighted signals have 1-2 days of lag, so if low latency is paramount,
107-
unweighted signals are also available. These begin `smoothed_` or `raw_`,
108-
such as `raw_cli` instead of `raw_wcli`.
97+
We define COVID-like illness (fever, along with cough, or shortness of breath,
98+
or difficulty breathing) or influenza-like illness (fever, along with cough or
99+
sore throat) for use in forecasting and modeling. Using this survey data, we
100+
estimate the percentage of people (age 18 or older) who have a COVID-like
101+
illness, or influenza-like illness, in a given location, on a given day.
102+
103+
Signals beginning `raw_w` or `smoothed_w` are [adjusted using survey weights to
104+
be demographically representative](#survey-weighting-and-estimation) as
105+
described below. Weighted signals have 1-2 days of lag, so if low latency is
106+
paramount, unweighted signals are also available. These begin `smoothed_` or
107+
`raw_`, such as `raw_cli` instead of `raw_wcli`.
109108

110109
| Signals | Description |
111110
| --- | --- |
@@ -187,58 +186,21 @@ p = 100 \cdot \frac{x}{n}
187186
q = 100 \cdot \frac{y}{n}.
188187
$$
189188

190-
We estimate $$p$$ and $$q$$ across 4 aggregation schemes:
191-
192-
1. daily, at the county level;
193-
2. daily, at the MSA (metropolitan statistical area) level;
194-
3. daily, at the HRR (hospital referral region) level;
195-
4. daily, at the state level.
196-
197-
These are possible because we have the ZIP code of the household from Q4 of the
198-
survey. Our current rule-of-thumb is to discard any estimate (whether at a
199-
county, MSA, HRR, or state level) that is based on fewer than 100 survey
200-
responses. When our geographical mapping data indicates that a ZIP code is part
201-
of multiple geographical units in a single aggregation, we assign weights
202-
$$w_i^\text{geodiv}$$ to each of these units (based on the ZIP code's overlap
203-
with each geographical unit) and use these weights as part of the survey
204-
weighting, as [described below](#survey-weighting).
205-
206-
In a given aggregation unit (for example, daily-county), let $$X_i$$ and
207-
$$Y_i$$ denote number of ILI and CLI cases in the household, respectively
208-
(computed according to the simple strategy described above), and let $$N_i$$
209-
denote the total number of people in the household, in survey $$i$$, out of
210-
$$m$$ surveys we collected. Then our estimates of $$p$$ and $$q$$ (see
211-
the [appendix](#appendix) for motivating details) are:
189+
In a given aggregation unit (for example, daily-county), let $$X_i$$ and $$Y_i$$
190+
denote number of ILI and CLI cases in the household, respectively (computed
191+
according to the simple strategy [described
192+
above](#defining-household-ili-and-cli)), and let $$N_i$$ denote the total
193+
number of people in the household, in survey $$i$$, out of $$m$$ surveys we
194+
collected. Then our unweighted estimates of $$p$$ and $$q$$ are:
212195

213196
$$
214197
\hat{p} = 100 \cdot \frac{1}{m}\sum_{i=1}^m \frac{X_i}{N_i}
215198
\quad\text{and}\quad
216199
\hat{q} = 100 \cdot \frac{1}{m}\sum_{i=1}^m \frac{Y_i}{N_i}.
217200
$$
218201

219-
Their estimated standard errors are:
220-
221-
$$
222-
\begin{aligned}
223-
\widehat{\mathrm{se}}(\hat{p}) &= 100 \cdot \frac{1}{m+1}\sqrt{
224-
\left(\frac{1}{2} - \frac{\hat{p}}{100}\right)^2 +
225-
\sum_{i=1}^m \left(\frac{X_i}{N_i} - \frac{\hat{p}}{100}\right)^2
226-
} \\
227-
\widehat{\mathrm{se}}(\hat{q}) &= 100 \cdot \frac{1}{m+1}\sqrt{
228-
\left(\frac{1}{2} - \frac{\hat{q}}{100}\right)^2 +
229-
\sum_{i=1}^m \left(\frac{Y_i}{N_i} - \frac{\hat{q}}{100}\right)^2
230-
},
231-
\end{aligned}
232-
$$
233-
234-
the standard deviations of the estimators after adding a single
235-
pseudo-observation at 1/2 (treating $$m$$ as fixed). The use of the
236-
pseudo-observation prevents standard error estimates of zero, and in simulations
237-
improves the quality of the standard error estimates.
238-
239-
The pseudo-observation is not used in $$\hat{p}$$ and $$\hat{q}$$ themselves, to
240-
avoid potentially large amounts of estimation bias, as $$p$$ and $$q$$ are
241-
expected to be small.
202+
[See below](#adjusting-household-ili-and-cli) for details on weighting and
203+
standard errors for these estimates.
242204

243205
### Estimating "Community CLI"
244206

@@ -254,36 +216,16 @@ a = 100 \cdot \frac{u}{n}
254216
b = 100 \cdot \frac{y}{n}.
255217
$$
256218

257-
We will estimate $$a$$ and $$b$$ across the same 4 aggregation schemes as
258-
before.
259-
260219
For a single survey, let:
261220

262221
- $$U = 1$$ if and only if a positive number is reported for Q2 or Q5;
263222
- $$V = 1$$ if and only if a positive number is reported for Q2.
264223

265-
In a given aggregation unit (for example, daily-county), let $$U_i$$ and
266-
$$V_i$$ denote these quantities for survey $$i$$, and $$m$$ denote the number of
267-
surveys total. Then to estimate $$a$$ and $$b$$, we simply use:
268-
269-
$$
270-
\hat{a} = 100 \cdot \frac{1}{m} \sum_{i=1}^m U_i
271-
\quad\text{and}\quad
272-
\hat{b} = 100 \cdot \frac{1}{m} \sum_{i=1}^m V_i.
273-
$$
274-
275-
Hence $$\hat{a}$$ is reported in the `hh_cmnty_cli` signals and $$\hat{b}$$ in
276-
the `nohh_cmnty_cli` signals. Their estimated standard errors are:
277-
278-
$$
279-
\begin{aligned}
280-
\widehat{\mathrm{se}}(\hat{a}) &= 100 \cdot \sqrt{\frac{\frac{\hat{a}}{100}(1-\frac{\hat{a}}{100})}{m}} \\
281-
\widehat{\mathrm{se}}(\hat{b}) &= 100 \cdot \sqrt{\frac{\frac{\hat{b}}{100}(1-\frac{\hat{b}}{100})}{m}},
282-
\end{aligned}
283-
$$
284-
285-
which are the plug-in estimates of the standard errors of the binomial
286-
proportions (treating $$m$$ as fixed).
224+
Let $$U_i$$ and $$V_i$$ denote these quantities for survey $$i$$, and $$m$$
225+
denote the number of surveys total. We report the percentage of surveys where
226+
$$U_i = 1$$ as in the `hh_cmnty_cli` signals and the percentage where $$V_i =
227+
1$$ in the `nohh_cmnty_cli` signals. The exact estimators are [described
228+
below](#adjusting-other-percentage-estimators).
287229

288230
Note that $$\sum_{i=1}^m U_i$$ is the number of survey respondents who know
289231
someone in their community with *either ILI or CLI*, and not CLI alone; and

0 commit comments

Comments
 (0)