docs: Fix a few typos (#94)

timgates42 · web-flow · commit e40f51f267a8 · 2021-07-30T08:18:39.000-05:00
There are small typos in:
- code/ch01/README.md
- code/ch12/optional-streamlined-neuralnet.py
- faq/ai-and-ml.md
- faq/naive-bayes-vartypes.md
- faq/scale-training-test.md

Fixes:
- Should read `were` rather than `weere`.
- Should read `occurrences` rather than `occurences`.
- Should read `laborious` rather than `labrorious`.
- Should read `initializing` rather than `initalizing`.
- Should read `distribution` rather than `distribtion`.
- Should read `dataset` rather than `datast`.
diff --git a/code/ch01/README.md b/code/ch01/README.md
@@ -93,7 +93,7 @@ The version numbers of the major Python packages that were used for writing this
 
 ## Python/Jupyter Notebook
 
-Some readers weere wondering about the `.ipynb` of the code files -- these files are IPython notebooks. I chose IPython notebooks over plain Python `.py` scripts, because I think that they are just great for data analysis projects! IPython notebooks allow us to have everything in one place: Our code, the results from executing the code, plots of our data, and documentation that supports the handy Markdown and powerful LaTeX syntax!
+Some readers were wondering about the `.ipynb` of the code files -- these files are IPython notebooks. I chose IPython notebooks over plain Python `.py` scripts, because I think that they are just great for data analysis projects! IPython notebooks allow us to have everything in one place: Our code, the results from executing the code, plots of our data, and documentation that supports the handy Markdown and powerful LaTeX syntax!
 
 ![](./images/ipynb_ex1.png)
 
diff --git a/code/ch12/optional-streamlined-neuralnet.py b/code/ch12/optional-streamlined-neuralnet.py
@@ -25,7 +25,7 @@ class NeuralNetMLP(object):
     minibatche_size : int (default: 1)
         Number of training samples per minibatch.
     seed : int (default: None)
-        Random seed for initalizing weights and shuffling.
+        Random seed for initializing weights and shuffling.
 
     Attributes
     -----------
diff --git a/faq/ai-and-ml.md b/faq/ai-and-ml.md
@@ -1,6 +1,6 @@
 # How are Artificial Intelligence and Machine Learning related?
 
-Artifical Intellicence (AI) started as a subfield of computer science with the focus on solving tasks that humans can but computers can't do (for instance, image recognition). AI can be approached in many ways, for example, writing a computer program that implements a set of rules devised by domain experts. Now, hand-crafting rules can be very labrorious and time consuming.
+Artifical Intellicence (AI) started as a subfield of computer science with the focus on solving tasks that humans can but computers can't do (for instance, image recognition). AI can be approached in many ways, for example, writing a computer program that implements a set of rules devised by domain experts. Now, hand-crafting rules can be very laborious and time consuming.
 
 The field of machine learning -- originally, we can consider it as a subfield of AI -- was concerned with the development of algorithms so that computers can automatically learn (predictive) models from data.
 
diff --git a/faq/naive-bayes-vartypes.md b/faq/naive-bayes-vartypes.md
@@ -42,7 +42,7 @@ To come back to the original question, let us consider the multi-variate Bernoul
 
 
 We use the Bernoulli distribution to compute the likelihood of a binary variable.
-For example, we could estimate P(x<sub>k</sub>=1 | &omega;<sub>j</sub>) via MLE as the frequency of occurences in the training set:
+For example, we could estimate P(x<sub>k</sub>=1 | &omega;<sub>j</sub>) via MLE as the frequency of occurrences in the training set:
 &theta; = P&#770;(x<sub>k</sub>=1 | &omega;<sub>j</sub>) = N<sub>x<sub>k</sub>, &omega;<sub>j</sub></sub> / N<sub> &omega;<sub>j</sub></sub>  
 which reads "number of training samples in class &omega;<sub>j</sub> that have the property x<sub>k</sub>=1 (N<sub>x<sub>k</sub>, &omega;<sub>j</sub></sub>) divided by by all training samples in &omega;<sub>j</sub></sub> (N<sub> &omega;<sub>j</sub></sub>)." In context of text classification, this is basically the set of documents in class &omega;<sub>j</sub> that contain a particular word divided by all documents in &omega;<sub>j</sub>.
 Now, we can compute the likelihood of the binary feature vector **x** given class &omega;<sub>j</sub> as
diff --git a/faq/scale-training-test.md b/faq/scale-training-test.md
@@ -59,7 +59,7 @@ Now, let's say our model has learned the following hypotheses: It classifies sam
 - sample5: 6 cm -> class ?
 - sample6: 7 cm -> class ?
 
-If we look at the "unstandardized “length in cm" values in our training datast, it is intuitive to say that all of these samples are likely belonging to class 2. However, if we standardize these by re-computing the *standard deviation* and and *mean* from the new data, we would get similar values as before (i.e., properties of a standard normal distribtion) in the training set and our classifier would (probably incorrectly) assign the “class 2” label to the samples 4 and 5.
+If we look at the "unstandardized “length in cm" values in our training dataset, it is intuitive to say that all of these samples are likely belonging to class 2. However, if we standardize these by re-computing the *standard deviation* and and *mean* from the new data, we would get similar values as before (i.e., properties of a standard normal distribution) in the training set and our classifier would (probably incorrectly) assign the “class 2” label to the samples 4 and 5.
 
 - sample5: -1.21 -> class 2
 - sample6: 0 -> class 2