reload

DragonflyStats · Jan 9, 2025 · 163c6a1 · 163c6a1
1 parent d5259f4
commit 163c6a1
Show file tree

Hide file tree

Showing 57 changed files with 4,746 additions and 57 deletions.
diff --git a/00_StatsResource/Bivariate_Formulas.Rmd b/00_StatsResource/Bivariate_Formulas.Rmd
@@ -0,0 +1,55 @@
+---
+title: "Formulas for Bivariate Analyses"
+subtitle: "Formulas and Tables"
+author: StatsResource
+output:
+  prettydoc::html_pretty:
+    theme: cayman
+    highlight: github
+---
+
+### Introduction
+
+* This sheet deals specifically with formulas for linear models and related bivariate analyses. 
+
+* Material related to categorical data will be published elsewhere.
+
+### Bivariate Summations
+
+$$\begin{eqnarray}
+	S_{XY} &=&
+	\sum x_iy_i - \frac{\sum x_i\sum y_i}{n}\\
+	S_{XX} &=&
+	\sum x_i^2 - \frac{(\sum x_i)^2}{n}\\
+	S_{YY} &=&
+	\sum y_i^2 - \frac{(\sum y_i)^2}{n}\\
+\end{eqnarray}$$
+
+### Correlation
+
+**Pearson's correlation coefficient**
+
+$$\begin{eqnarray}
+	r = \frac{S_{XY}}{\sqrt{S_{XX} \times S_{YY}}}
+\end{eqnarray}$$
+
+### Linear Regression Estimates
+
+**Slope Estimate**
+$$\begin{eqnarray}
+	b_1 = \frac{S_{XY}}{S_{XX}}
+\end{eqnarray}$$
+
+**Intercept Estimate**
+$$\begin{eqnarray}
+	b_0 = \bar{y} -b_1\bar{x}
+\end{eqnarray}$$
+
+**Standard error of the Slope**
+$$\begin{eqnarray*}
+	S.E.(b1) = \sqrt{\frac{s^2}{S_{XX}}}
+\end{eqnarray*}$$
+
+where $s^2 = \frac{SSE}{n-2}$
+
+and SSE $= S_{YY} - b_1S_{XY}$
diff --git a/00_StatsResource/Bivariate_Formulas.html b/00_StatsResource/Bivariate_Formulas.html
diff --git a/00_StatsResource/Chi_Square_Test.Rmd b/00_StatsResource/Chi_Square_Test.Rmd
@@ -0,0 +1,27 @@
+---
+title: "Chi Square Test"
+subtitle: "Formulas and Tables"
+author: StatsResource
+output:
+  prettydoc::html_pretty:
+    theme: cayman
+    highlight: github
+---
+
+
+### Critical Values for Chi Square Test
+
+$$\begin{array}{|c|c|c|c|c|}
+\hline 
+df	&	\alpha=0.10	&	\alpha=0.05	&	\alpha=0.01	&	\alpha=0.001	\\ \hline
+1	& 	2.705	&	3.841	&	6.634	&	10.827	\\ \hline
+2	&	4.605	&	5.991	&	7.378	&	9.21	\\ \hline
+3	&	6.251	&	7.815	&	9.348	&	11.345	\\ \hline
+4	&	7.779	&	9.488	&	11.143	&	13.277	\\ \hline
+5	&	9.236	&	11.07	&	12.833	&	15.086	\\ \hline
+6	&	10.645	&	12.592	&	14.449	&	16.812	\\ \hline
+7	&	12.017	&	14.067	&	16.013	&	18.475	\\ \hline
+8	&	13.362	&	15.507	&	17.535	&	20.09	\\ \hline
+9	&	14.684	&	16.919	&	19.023	&	21.666	\\ \hline
+10	&	15.987	&	18.307	&	20.483	&	23.209	\\ \hline
+\end{array}$$ 
diff --git a/00_StatsResource/Chi_Square_Test.html b/00_StatsResource/Chi_Square_Test.html
diff --git a/00_StatsResource/Control_Charts_Factors.Rmd b/00_StatsResource/Control_Charts_Factors.Rmd
@@ -0,0 +1,50 @@
+---
+title: "Factors for Control Charts"
+subtitle: "Formulas and Tables"
+author: StatsResource
+output:
+  prettydoc::html_pretty:
+    theme: cayman
+    highlight: github
+---
+
+
+## Factors for Control Charts
+
+Control Chart Factors, also known as control chart constants, are critical components used in the creation and interpretation of control charts in statistical process control (SPC). These factors help in calculating control limits and other chart parameters, ensuring that the process variability is accurately monitored. Key control chart factors include the A2, D3, D4, B3, and B4 constants, which are used to set the upper and lower control limits for various types of control charts such as X-bar, R-chart, and S-chart. These factors depend on sample size and are derived from statistical distributions. 
+
+
+$$\begin{array}{|c|c|c|c|c|c|c|}
+\hline  
+\text{Sample Size (n)} 	&	c_4 	&	c_5 	&	d_2 	&	d_3 	&	D_3 	&	D_4	\\	\hline
+2	&	0.7979	&	0.6028	&	1.128	&	0.853	&	0	&	3.267	\\	
+3	&	0.8862	&	0.4633	&	1.693	&	0.888	&	0	&	2.574	\\	
+4	&	0.9213	&	0.3889	&	2.059	&	0.88	&	0	&	2.282	\\	
+5	&	0.9400	&	0.3412	&	2.326	&	0.864	&	0	&	2.114	\\	
+6	&	0.9515	&	0.3076	&	2.534	&	0.848	&	0	&	2.004	\\	
+7	&	0.9594	&	0.282	&	2.704	&	0.833	&	0.076	&	1.924	\\	
+8	&	0.9650	&	0.2622	&	2.847	&	0.82	&	0.136	&	1.864	\\	
+9	&	0.9693	&	0.2459	&	2.970	&	0.808	&	0.184	&	1.816	\\	
+10	&	0.9727	&	0.2321	&	3.078	&	0.797	&	0.223	&	1.777	\\	
+11	&	0.9754	&	0.2204	&	3.173	&	0.787	&	0.256	&	1.744	\\	
+12	&	0.9776	&	0.2105	&	3.258	&	0.778	&	0.283	&	1.717	\\	
+13	&	0.9794	&	0.2019	&	3.336	&	0.770	&	0.307	&	1.693	\\	
+14	&	0.9810	&	0.1940	&	3.407	&	0.763	&	0.328	&	1.672	\\	
+15	&	0.9823	&	0.1873	&	3.472	&	0.756	&	0.347	&	1.653	\\	
+16	&	0.9835	&	0.1809	&	3.532	&	0.750	&	0.363	&	1.637	\\
+17	&	0.9845	&	0.1754	&	3.588	&	0.744	&	0.378	&	1.622	\\
+18	&	0.9854	&	0.1703	&	3.64	&	0.739	&	0.391	&	1.608	\\
+19	&	0.9862	&	0.1656	&	3.689	&	0.734	&	0.403	&	1.597	\\
+20	&	0.9869	&	0.1613	&	3.735	&	0.729	&	0.415	&	1.585	\\
+21	&	0.9876	&	0.1570	&	3.778	&	0.724	&	0.425	&	1.575	\\
+22	&	0.9882	&	0.1532	&	3.819	&	0.720	&	0.434	&	1.566	\\
+23	&	0.9887	&	0.1499	&	3.858	&	0.716	&	0.443	&	1.557	\\
+24	&	0.9892	&	0.1466	&	3.895	&	0.712	&	0.451	&	1.548	\\
+25	&	0.9896	&	0.1438	&	3.931	&	0.708	&	0.459	&	1.541	\\
+\hline 
+\end{array}$$ 
+
+
+### Usage
+
+By using these constants, practitioners can effectively detect process variations and maintain quality control in manufacturing and other operational processes.
diff --git a/00_StatsResource/Control_Charts_Factors.html b/00_StatsResource/Control_Charts_Factors.html
diff --git a/00_StatsResource/Dixon_Q_Test_Tables.Rmd b/00_StatsResource/Dixon_Q_Test_Tables.Rmd
@@ -0,0 +1,50 @@
+---
+title: "Dixon Q Test for Outliers"
+subtitle: "Formulas and Tables"
+author: StatsResource
+output:
+  prettydoc::html_pretty:
+    theme: cayman
+    highlight: github
+---
+
+## Dixon Q Test for Outliers
+
+The **Dixon Q Test**, also known simply as the **Q Test**, is a statistical test used to identify and reject outliers in a small data set. It's particularly useful for normally distributed data sets with fewer than 30 observations. 
+
+
+### Test Statistic
+The test calculates a Q statistic, which is the ratio of the gap between the suspected outlier and the closest data point to the range of the data set.
+
+
+The test statistic for this procedure is as follows:
+
+$$Q_{TS} =  \frac{\mbox{Gap}}{\mbox{Range}}$$
+
+
+
+### Crtical Values
+
+If the Q statistic exceeds a critical value from a Q table (which varies based on sample size and confidence level), the suspected data point is considered an outlier and can be rejected.
+
+$$\begin{array}{|c|c|c|c|}
+\hline
+n	&	\alpha=0.10	&	\alpha=0.05	&	\alpha=0.01	\\ \hline
+3	&	0.941	&	0.970	&	0.994	\\ \hline
+4	&	0.765	&	0.829	&	0.926	\\ \hline
+5	&	0.642	&	0.710	&	0.821	\\ \hline
+6	&	0.560	&	0.625	&	0.740	\\ \hline
+7	&	0.507	&	0.568	&	0.680	\\ \hline
+8	&	0.468	&	0.526	&	0.634	\\ \hline
+9	&	0.437	&	0.493	&	0.598	\\ \hline
+10	&	0.412	&	0.466	&	0.568	\\ \hline
+11	&	0.392	&	0.444	&	0.542	\\ \hline
+12	&	0.376	&	0.426	&	0.522	\\ \hline
+13	&	0.361	&	0.410	&	0.503	\\ \hline
+14	&	0.349	&	0.396	&	0.488	\\ \hline
+15	&	0.338	&	0.384	&	0.475	\\ \hline
+\end{array}$$
+
+
+If the Test Statistic is greater than the Critical value, reject the null hypothesis
+$$Q_{TS} > Q_{CV}$$
diff --git a/00_StatsResource/Dixon_Q_Test_Tables.html b/00_StatsResource/Dixon_Q_Test_Tables.html
diff --git a/00_StatsResource/KS_Test.Rmd b/00_StatsResource/KS_Test.Rmd
@@ -0,0 +1,42 @@
+
+---
+title: "Kolmogorov-Smirnov Test"
+subtitle: "Inference Procedures with R"
+author: StatsResource
+output:
+  prettydoc::html_pretty:
+    theme: cayman
+    highlight: github
+---
+
+## Kolmogorov-Smirnov Test
+
+\section{Kolmogorov-Smirnov test}
+The Kolmogorov-Smirnov test is defined by:
+\\
+H$_0$:     The data follow a specified distribution\\
+H$_1$:     The data do not follow the specified distribution\\
+
+Test Statistic:     The Kolmogorov-Smirnov test statistic is defined as
+
+where F is the theoretical cumulative distribution of the distribution being tested which must be a continuous distribution (i.e., no discrete distributions such as the binomial or Poisson), and it must be fully specified
+
+
+### Characteristics and Limitations of the K-S Test
+
+#### Advantages 
+An attractive feature of this test is that the distribution of the K-S test statistic itself does not depend on the underlying cumulative distribution function being tested. Another advantage is that it is an exact test (the chi-square goodness-of-fit test depends on an adequate sample size for the approximations to be valid). 
+
+#### Limitations
+
+Despite these advantages, the K-S test has several important limitations:
+
+1. It only applies to continuous distributions.
+
+2. It tends to be more sensitive near the center of the distribution than at the tails.
+
+3. Perhaps the most serious limitation is that the distribution must be fully specified. That is, if location, scale, and shape parameters are estimated from the data, the critical region of the K-S test is no longer valid. It typically must be determined by simulation.
+
+Due to limitations 2 and 3 above, many analysts prefer to use the Anderson-Darling goodness-of-fit test.
+
+However, the Anderson-Darling test is only available for a few specific distributions.
diff --git a/00_StatsResource/KS_Test.html b/00_StatsResource/KS_Test.html