This project revolves around loading and cleaning up FT-ICR (Fourier Transform Ion Cyclotron Resonance) experimental data. Data is first pulled from an online repository and downloaded locally. After that Fourier analysis is performed on the raw data to generate m/z spectra. These spectra are then denoised before finally undergoing a peak-picking procedure where mass peaks are highlighted for further analysis.
Requesting data from RU repository
Loading and transforming local data
Diffusion Autoencoder Model
Peak selection
The data used in this analysis is stored in the Radboud Data Repository and is formatted in HDF5. Using functions defined in RDR_request credentials are pulled from the config file and used to send a request to the repository to download the data locally.
After the .h5 files are downloaded locally, the next step is extracting relevant data. In the main file, the files are read and data is transformed from the transient domain to the mass domain using Fourier analysis. This allows for plotting m/z spectra.
Random measurement noise is removed using a Diffusion Autoencoder Model. This is an unsupervised neural network that trains itself to remove random noise by adding its own artificial noise and comparing the noisy and original spectra. This process is accompanied by a classifier algorithm using Principal Component Analysis (PCA) and a random forest classifier.
Lastly, leftover peaks are highlighted using the Isotope Prediction-file. Spectra are generated containing all relevant peaks and their corresponding m/z values. This allows for manual analysis of the resulting m/z values.