A collection of three hands-on projects exploring core linear algebra concepts and their real-world applications, implemented in Python using Jupyter Notebooks.
Gram-Schmidt Process (Gram-Schmidt process.ipynb)
Implements the Gram-Schmidt orthogonalization algorithm from scratch. Given a set of linearly independent vectors, the algorithm constructs an orthonormal basis for the spanned subspace. Key concepts covered:
- Orthogonal projection and vector decomposition
- Iterative orthonormalization
- Detecting the dimension of a spanned subspace
Random Projections (random_projections.ipynb)
Explores orthogonal projections in 1D and N-dimensional subspaces, applied to real Persian digit image data (TrainData.txt). Tasks include:
- Implementing 1D and ND projection matrices using the formula
$\pi_U(\mathbf{x}) = \mathbf{B}(\mathbf{B}^T\mathbf{B})^{-1}\mathbf{B}^T\mathbf{x}$ - Projecting image vectors onto random subspaces
- Finding the minimum subspace dimension that preserves image recognizability
PCA (PCA.ipynb)
Implements PCA step by step on the Fashion-MNIST dataset (fashion_MNIST.zip). The pipeline covers:
- Data standardization — zero mean, unit variance normalization
- Covariance matrix computation — capturing pairwise feature relationships
- Eigendecomposition — extracting principal components from the symmetric covariance matrix
- Dimensionality reduction — projecting data onto the top-k eigenvectors
- Reconstruction — recovering images from compressed representations and finding the minimum number of components needed for recognizable reconstruction
- Visualization — 2D and 3D scatter plots of projected data
PageRank (PageRank.ipynb)
Implements Google's PageRank algorithm using three independent methods, applied to a real web-graph dataset:
-
NetworkX built-in — baseline using
networkx.algorithms.pagerank - Eigendecomposition — computing the L1-normalized eigenvector of the Google matrix corresponding to eigenvalue 1
-
Power Method — iterative approximation via
$b_{k+1} = \frac{A \cdot b_k}{|Ab_k|}$ until convergence - Random Walk — stochastic simulation of a surfer traversing the graph
The Google matrix incorporates a damping factor
Results are visualized as directed graphs with node sizes proportional to rank.
TextRank (TextRank.ipynb)
Applies the PageRank idea to NLP for unsupervised keyword and sentence extraction, based on the Mihalcea & Tarau (2004) paper. Applied to two fairy tale texts (Cinderella and Beauty and the Beast):
- Tokenization and part-of-speech tagging using NLTK
- Co-occurrence graph construction with a sliding word window
- Weighted PageRank via the Power Method for keyword ranking
Linear_Algebra_Projects/
├── Project1/
│ ├── Gram-Schmidt process.ipynb
│ ├── random_projections.ipynb
│ └── TrainData.txt # Persian digit image vectors
├── Project2/
│ ├── PCA.ipynb
│ └── fashion_MNIST.zip # Fashion-MNIST dataset
└── Project3/
├── PageRank.ipynb
├── TextRank.ipynb
├── graphs.py # Graph loading & visualization utilities
├── page_rank.py # PageRank helper functions
├── text_rank.py # TextRank / NLP utilities
└── data/
├── Beauty_and_the_Beast.txt
└── Cinderalla.txt
- Python 3.9+
numpypandasmatplotlibnetworkxnltkscikit-learn(for PCA comparison / data loading)- Jupyter Notebook / JupyterLab
Install all dependencies:
pip install numpy pandas matplotlib networkx nltk scikit-learn jupyterFor TextRank, also download the required NLTK data:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')Clone the repository and launch Jupyter:
git clone https://github.com/Sepovsky/Linear_Algebra_Projects.git
cd Linear_Algebra_Projects
jupyter notebookThen open any .ipynb file inside Project1/, Project2/, or Project3/ and run the cells sequentially.