Skip to content

Commit

Permalink
reload
Browse files Browse the repository at this point in the history
  • Loading branch information
DragonflyStats committed Jan 9, 2025
1 parent d5259f4 commit 163c6a1
Show file tree
Hide file tree
Showing 57 changed files with 4,746 additions and 57 deletions.
55 changes: 55 additions & 0 deletions 00_StatsResource/Bivariate_Formulas.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: "Formulas for Bivariate Analyses"
subtitle: "Formulas and Tables"
author: StatsResource
output:
prettydoc::html_pretty:
theme: cayman
highlight: github
---

### Introduction

* This sheet deals specifically with formulas for linear models and related bivariate analyses.

* Material related to categorical data will be published elsewhere.

### Bivariate Summations

$$\begin{eqnarray}
S_{XY} &=&
\sum x_iy_i - \frac{\sum x_i\sum y_i}{n}\\
S_{XX} &=&
\sum x_i^2 - \frac{(\sum x_i)^2}{n}\\
S_{YY} &=&
\sum y_i^2 - \frac{(\sum y_i)^2}{n}\\
\end{eqnarray}$$

### Correlation

**Pearson's correlation coefficient**

$$\begin{eqnarray}
r = \frac{S_{XY}}{\sqrt{S_{XX} \times S_{YY}}}
\end{eqnarray}$$

### Linear Regression Estimates

**Slope Estimate**
$$\begin{eqnarray}
b_1 = \frac{S_{XY}}{S_{XX}}
\end{eqnarray}$$

**Intercept Estimate**
$$\begin{eqnarray}
b_0 = \bar{y} -b_1\bar{x}
\end{eqnarray}$$

**Standard error of the Slope**
$$\begin{eqnarray*}
S.E.(b1) = \sqrt{\frac{s^2}{S_{XX}}}
\end{eqnarray*}$$

where $s^2 = \frac{SSE}{n-2}$

and SSE $= S_{YY} - b_1S_{XY}$
117 changes: 117 additions & 0 deletions 00_StatsResource/Bivariate_Formulas.html

Large diffs are not rendered by default.

27 changes: 27 additions & 0 deletions 00_StatsResource/Chi_Square_Test.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: "Chi Square Test"
subtitle: "Formulas and Tables"
author: StatsResource
output:
prettydoc::html_pretty:
theme: cayman
highlight: github
---


### Critical Values for Chi Square Test

$$\begin{array}{|c|c|c|c|c|}
\hline
df & \alpha=0.10 & \alpha=0.05 & \alpha=0.01 & \alpha=0.001 \\ \hline
1 & 2.705 & 3.841 & 6.634 & 10.827 \\ \hline
2 & 4.605 & 5.991 & 7.378 & 9.21 \\ \hline
3 & 6.251 & 7.815 & 9.348 & 11.345 \\ \hline
4 & 7.779 & 9.488 & 11.143 & 13.277 \\ \hline
5 & 9.236 & 11.07 & 12.833 & 15.086 \\ \hline
6 & 10.645 & 12.592 & 14.449 & 16.812 \\ \hline
7 & 12.017 & 14.067 & 16.013 & 18.475 \\ \hline
8 & 13.362 & 15.507 & 17.535 & 20.09 \\ \hline
9 & 14.684 & 16.919 & 19.023 & 21.666 \\ \hline
10 & 15.987 & 18.307 & 20.483 & 23.209 \\ \hline
\end{array}$$
103 changes: 103 additions & 0 deletions 00_StatsResource/Chi_Square_Test.html

Large diffs are not rendered by default.

50 changes: 50 additions & 0 deletions 00_StatsResource/Control_Charts_Factors.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: "Factors for Control Charts"
subtitle: "Formulas and Tables"
author: StatsResource
output:
prettydoc::html_pretty:
theme: cayman
highlight: github
---


## Factors for Control Charts

Control Chart Factors, also known as control chart constants, are critical components used in the creation and interpretation of control charts in statistical process control (SPC). These factors help in calculating control limits and other chart parameters, ensuring that the process variability is accurately monitored. Key control chart factors include the A2, D3, D4, B3, and B4 constants, which are used to set the upper and lower control limits for various types of control charts such as X-bar, R-chart, and S-chart. These factors depend on sample size and are derived from statistical distributions.


$$\begin{array}{|c|c|c|c|c|c|c|}
\hline
\text{Sample Size (n)} & c_4 & c_5 & d_2 & d_3 & D_3 & D_4 \\ \hline
2 & 0.7979 & 0.6028 & 1.128 & 0.853 & 0 & 3.267 \\
3 & 0.8862 & 0.4633 & 1.693 & 0.888 & 0 & 2.574 \\
4 & 0.9213 & 0.3889 & 2.059 & 0.88 & 0 & 2.282 \\
5 & 0.9400 & 0.3412 & 2.326 & 0.864 & 0 & 2.114 \\
6 & 0.9515 & 0.3076 & 2.534 & 0.848 & 0 & 2.004 \\
7 & 0.9594 & 0.282 & 2.704 & 0.833 & 0.076 & 1.924 \\
8 & 0.9650 & 0.2622 & 2.847 & 0.82 & 0.136 & 1.864 \\
9 & 0.9693 & 0.2459 & 2.970 & 0.808 & 0.184 & 1.816 \\
10 & 0.9727 & 0.2321 & 3.078 & 0.797 & 0.223 & 1.777 \\
11 & 0.9754 & 0.2204 & 3.173 & 0.787 & 0.256 & 1.744 \\
12 & 0.9776 & 0.2105 & 3.258 & 0.778 & 0.283 & 1.717 \\
13 & 0.9794 & 0.2019 & 3.336 & 0.770 & 0.307 & 1.693 \\
14 & 0.9810 & 0.1940 & 3.407 & 0.763 & 0.328 & 1.672 \\
15 & 0.9823 & 0.1873 & 3.472 & 0.756 & 0.347 & 1.653 \\
16 & 0.9835 & 0.1809 & 3.532 & 0.750 & 0.363 & 1.637 \\
17 & 0.9845 & 0.1754 & 3.588 & 0.744 & 0.378 & 1.622 \\
18 & 0.9854 & 0.1703 & 3.64 & 0.739 & 0.391 & 1.608 \\
19 & 0.9862 & 0.1656 & 3.689 & 0.734 & 0.403 & 1.597 \\
20 & 0.9869 & 0.1613 & 3.735 & 0.729 & 0.415 & 1.585 \\
21 & 0.9876 & 0.1570 & 3.778 & 0.724 & 0.425 & 1.575 \\
22 & 0.9882 & 0.1532 & 3.819 & 0.720 & 0.434 & 1.566 \\
23 & 0.9887 & 0.1499 & 3.858 & 0.716 & 0.443 & 1.557 \\
24 & 0.9892 & 0.1466 & 3.895 & 0.712 & 0.451 & 1.548 \\
25 & 0.9896 & 0.1438 & 3.931 & 0.708 & 0.459 & 1.541 \\
\hline
\end{array}$$


### Usage

By using these constants, practitioners can effectively detect process variations and maintain quality control in manufacturing and other operational processes.
125 changes: 125 additions & 0 deletions 00_StatsResource/Control_Charts_Factors.html

Large diffs are not rendered by default.

50 changes: 50 additions & 0 deletions 00_StatsResource/Dixon_Q_Test_Tables.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: "Dixon Q Test for Outliers"
subtitle: "Formulas and Tables"
author: StatsResource
output:
prettydoc::html_pretty:
theme: cayman
highlight: github
---

## Dixon Q Test for Outliers

The **Dixon Q Test**, also known simply as the **Q Test**, is a statistical test used to identify and reject outliers in a small data set. It's particularly useful for normally distributed data sets with fewer than 30 observations.


### Test Statistic
The test calculates a Q statistic, which is the ratio of the gap between the suspected outlier and the closest data point to the range of the data set.


The test statistic for this procedure is as follows:

$$Q_{TS} = \frac{\mbox{Gap}}{\mbox{Range}}$$



### Crtical Values

If the Q statistic exceeds a critical value from a Q table (which varies based on sample size and confidence level), the suspected data point is considered an outlier and can be rejected.

$$\begin{array}{|c|c|c|c|}
\hline
n & \alpha=0.10 & \alpha=0.05 & \alpha=0.01 \\ \hline
3 & 0.941 & 0.970 & 0.994 \\ \hline
4 & 0.765 & 0.829 & 0.926 \\ \hline
5 & 0.642 & 0.710 & 0.821 \\ \hline
6 & 0.560 & 0.625 & 0.740 \\ \hline
7 & 0.507 & 0.568 & 0.680 \\ \hline
8 & 0.468 & 0.526 & 0.634 \\ \hline
9 & 0.437 & 0.493 & 0.598 \\ \hline
10 & 0.412 & 0.466 & 0.568 \\ \hline
11 & 0.392 & 0.444 & 0.542 \\ \hline
12 & 0.376 & 0.426 & 0.522 \\ \hline
13 & 0.361 & 0.410 & 0.503 \\ \hline
14 & 0.349 & 0.396 & 0.488 \\ \hline
15 & 0.338 & 0.384 & 0.475 \\ \hline
\end{array}$$


If the Test Statistic is greater than the Critical value, reject the null hypothesis
$$Q_{TS} > Q_{CV}$$
118 changes: 118 additions & 0 deletions 00_StatsResource/Dixon_Q_Test_Tables.html

Large diffs are not rendered by default.

42 changes: 42 additions & 0 deletions 00_StatsResource/KS_Test.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

---
title: "Kolmogorov-Smirnov Test"
subtitle: "Inference Procedures with R"
author: StatsResource
output:
prettydoc::html_pretty:
theme: cayman
highlight: github
---

## Kolmogorov-Smirnov Test

\section{Kolmogorov-Smirnov test}
The Kolmogorov-Smirnov test is defined by:
\\
H$_0$: The data follow a specified distribution\\
H$_1$: The data do not follow the specified distribution\\

Test Statistic: The Kolmogorov-Smirnov test statistic is defined as

where F is the theoretical cumulative distribution of the distribution being tested which must be a continuous distribution (i.e., no discrete distributions such as the binomial or Poisson), and it must be fully specified


### Characteristics and Limitations of the K-S Test

#### Advantages
An attractive feature of this test is that the distribution of the K-S test statistic itself does not depend on the underlying cumulative distribution function being tested. Another advantage is that it is an exact test (the chi-square goodness-of-fit test depends on an adequate sample size for the approximations to be valid).

#### Limitations

Despite these advantages, the K-S test has several important limitations:

1. It only applies to continuous distributions.

2. It tends to be more sensitive near the center of the distribution than at the tails.

3. Perhaps the most serious limitation is that the distribution must be fully specified. That is, if location, scale, and shape parameters are estimated from the data, the critical region of the K-S test is no longer valid. It typically must be determined by simulation.

Due to limitations 2 and 3 above, many analysts prefer to use the Anderson-Darling goodness-of-fit test.

However, the Anderson-Darling test is only available for a few specific distributions.
126 changes: 126 additions & 0 deletions 00_StatsResource/KS_Test.html

Large diffs are not rendered by default.

Loading

0 comments on commit 163c6a1

Please sign in to comment.