From ecbd3c9d306eaa6a66e2b2704738e5b0b9b2ddd1 Mon Sep 17 00:00:00 2001 From: Francesco Fiusco Date: Tue, 4 Feb 2025 16:53:30 +0100 Subject: [PATCH] Fixed typos in linear-algebra and added JLD2 to data-science.rst --- content/data-science.rst | 96 +++++++++++++------------------------- content/linear-algebra.rst | 22 ++++----- 2 files changed, 43 insertions(+), 75 deletions(-) diff --git a/content/data-science.rst b/content/data-science.rst index cd3eb7d..dcf7cc9 100644 --- a/content/data-science.rst +++ b/content/data-science.rst @@ -23,14 +23,12 @@ Data science and machine learning Working with data ----------------- -Via Data Formats and Dataframes lesson, we explored a Julian approach -to manipulating and visualization of data. +In the Data Formats and Dataframes lesson, we explored a Julian approach +to manipulation and visualisation of data. -Julia is a good language to use for data science problems as -it will perform well and alleviate the need to translate -computationally demanding parts to another language. -Here we will learn and clustering, classification, machine learning and deep learning (toy example). Use penguin data.machine learning. +Here we will learn and clustering, classification, machine learning and deep learning with some toy examples. + Download a dataset ^^^^^^^^^^^^^^^^^^ @@ -55,50 +53,6 @@ of characteristic features of different penguin species. using PalmerPenguins -Dataframes -^^^^^^^^^^ - -.. todo:: Dataframes - - We will use `DataFrames.jl `_ - package function here to analyze the penguins dataset, but first we need to install it: - - .. code-block:: julia - - Pkg.add("DataFrames") - using DataFrames - - We now create a dataframe containing the PalmerPenguins dataset. - - .. code-block:: julia - - # using PalmerPenguins - table = PalmerPenguins.load() - df = DataFrame(table) - - # the raw data can be loaded by - #tableraw = PalmerPenguins.load(; raw = true) - - Summary statistics can be displayed with the ``describe`` function: - - .. code-block:: julia - - describe(df) - - .. code-block:: text - - 7×7 DataFrame - Row │ variable mean min median max nmissing eltype - │ Symbol Union… Any Union… Any Int64 Type - ─────┼────────────────────────────────────────────────────────────────────────────────────────── - 1 │ species Adelie Gentoo 0 String - 2 │ island Biscoe Torgersen 0 String - 3 │ bill_length_mm 43.9219 32.1 44.45 59.6 2 Union{Missing, Float64} - 4 │ bill_depth_mm 17.1512 13.1 17.3 21.5 2 Union{Missing, Float64} - 5 │ flipper_length_mm 200.915 172 197.0 231 2 Union{Missing, Int64} - 6 │ body_mass_g 4201.75 2700 4050.0 6300 2 Union{Missing, Int64} - 7 │ sex female male 11 Union{Missing, String} - As it was done in the Data Formats and Dataframes lesson, we can .. code-block:: julia @@ -119,7 +73,8 @@ Saving the Current Setup ------------------------ There are several ways to save the current setup in Julia. -This section will cover three methods: saving the environment, saving data as a CSV file, and saving data using JLD.jl. +This section will cover three parts: saving the environment to +have reproducible code and saving data using CSV files or ``JLD``. 1. Saving the Environment ^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -144,7 +99,7 @@ This section will cover three methods: saving the environment, saving data as a This will display the list of packages in the current environment along with their versions. To save the state of your environment, Julia uses two files: ``Project.toml`` and ``Manifest.toml``. - The ``Project.tom`` file specifies the packages that you explicitly added to your environment, + The ``Project.toml`` file specifies the packages that you explicitly added to your environment, while the ``Manifest.toml`` file records the exact versions of these packages and all their dependencies1. When you add packages using ``Pkg.add()``, Julia automatically updates these files. @@ -162,12 +117,12 @@ This section will cover three methods: saving the environment, saving data as a 2. Saving Data as a CSV File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -(The way we use in this lesson). +As shown in the Data Formats and DataFrames lesson, a DataFrame can easily dumped into a CSV file using +the ``CSV.jl`` package, which also allows for reading tabular data. .. todo:: - (Include the content about saving data as a CSV file here) - You can use the CSV.jl package to save your DataFrame as a CSV file, which can be loaded later. + You can use the CSV.jl package to save a DataFrame as a CSV file, which can be re-read later. .. code-block:: julia @@ -182,35 +137,48 @@ This section will cover three methods: saving the environment, saving data as a df = CSV.read("penguins.csv", DataFrame) -3. Saving Data Using JLD.jl +3. Saving Data Using JLD/JLD2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Another option is to use `JLD.jl `_ - The `JLD.jl` package provides a way to save and load Julia variables while preserving native types. - It is a specific "dialect" of HDF5, a cross-platform, multi-language data storage format most frequently used for scientific data. + The ``JLD.jl`` package provides a way to save and load Julia variables while preserving native types. + It is based on HDF5, a cross-platform, multi-language data storage format most frequently used for scientific data. + However, it is written in pure Julia and does not require any of the original C HDF5 implementation. - To use the `JLD.jl` module, you can start your code with `using JLD`. - If you want to save a few variables and don't care to use the more advanced features, then a simple syntax is: + The ``JLD`` package can be imported in the usual way: .. code-block:: julia using Pkg Pkg.add("JLD") - Now, we can save our DataFrame `df` to a JLD file. + A DataFrame can be saved to file in the following way: .. code-block:: julia using JLD save("penguins.jld", "df", df) - Here we're saving `df` as "df" within `penguins.jld`. You can load this DataFrame back in with: + Here we're saving ``df`` as "df" within ``penguins.jld``. You can load this DataFrame back in with: .. code-block:: julia df = load("penguins.jld", "df") - This will return the DataFrame `df` from the file and assign it back to `df`. + This will return the DataFrame ``df`` from the file and assign it back to ``df``. + In the past years, the ``JLD2.jl`` package came forward as an alternative to ``JLD``. It + is also based on HDF5 and can read h5 files saved by other HDF5 implementations. It exposes an interface + similar to ``JLD`` with ``save()`` and ``load()`` functions, but the more user-friendly function ``jldsave()`` + is also available: + + .. code-block:: julia + + using JLD2 + jldsave("penguins.jld2"; df) # This is equivalent to the save command above + df = load("penguins.jld2", "df") + + Moreover, a ``jldopen()`` function provides a file-like interface. More information can be found + `here `__. Machine learning ---------------- @@ -609,4 +577,4 @@ Quantum ^^^^^^^ - https://juliapackages.com/c/quantum-mechanics - - Swedish Quantum Society | SQS – https://swedishquantumsociety.vercel.app/ \ No newline at end of file + - Swedish Quantum Society | SQS – https://swedishquantumsociety.vercel.app/ diff --git a/content/linear-algebra.rst b/content/linear-algebra.rst index 80ffaff..6b78afc 100644 --- a/content/linear-algebra.rst +++ b/content/linear-algebra.rst @@ -22,12 +22,12 @@ Linear algebra Vectors and matrices in Julia ----------------------------- -We will start with a breif look at how we can form arrays +We will start with a brief look at how we can create arrays and vectors in Julia and how to perform vector and matrix operations. .. code-block:: julia - # range notation, list from 1 to 10 + # lazy range notation, list from 1 to 10 1:10 # make into vector @@ -36,7 +36,7 @@ and vectors in Julia and how to perform vector and matrix operations. # another way to make ranges range(1, 10) -.. code-block:: text +.. code-block:: julia-repl julia> Vector(1:10) 10-element Vector{Int64}: @@ -49,7 +49,7 @@ and vectors in Julia and how to perform vector and matrix operations. 9 10 -Picking out elements or parts of vectors and matrices can be done with sclicing as in Python or Matlab. +Indexing elements or parts of vectors and matrices can be done with slicing as in Python or Matlab. .. code-block:: julia @@ -75,7 +75,7 @@ Picking out elements or parts of vectors and matrices can be done with sclicing ones(5) # [1,1,1,1,1] ones(5,5) # 5x5-matrix of ones -.. code-block:: text +.. code-block:: julia-repl julia> u 4-element Vector{Int64}: @@ -106,7 +106,7 @@ Picking out elements or parts of vectors and matrices can be done with sclicing 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 -To perform vector and matrix operations we can use syntax similar to Matlab och Python. +To perform vector and matrix operations we can use a syntax similar to Matlab or Python. .. code-block:: julia @@ -145,7 +145,7 @@ To perform vector and matrix operations we can use syntax similar to Matlab och # vector matrix multiplication A*v - # matrix multiplicaiton + # matrix multiplication B = A*A # Matrix multiplication @@ -165,9 +165,9 @@ Below we will discuss Principal Component Analysis and in that context we recall here the notion of eigenvectors and eigenvalues of a square matrix :math:`M`. -.. callout:: +.. callout:: Eigendecomposition - A vector :math:`u \neq 0` is called an eigenvector of :math:`M` + A vector :math:`u \neq 0` is called an eigenvector of a square matrix :math:`M` with eigenvalue :math:`\lambda \in \mathbb{R}` if :math:`Mu=\lambda u`. Let us for illustration say that :math:`\lambda=2`. Then :math:`Mu=2u` and the linear map :math:`M` maps :math:`u` to a vector @@ -217,11 +217,11 @@ it down to a smaller dimensional space. that approximates the dataset in a least squares sense. This means that the points are as close to the linear space as possible measured in the sum of squared distances. The approximating linear space is spanned by so-called - principal components which are ordered in terms of imporance: the first + principal components which are ordered in terms of importance: the first principal component, the second principal component and so on. It turns out the principal components are eigenvectors of the so-called - covaraince matrix of the data. The corresponding eigenvalues rank the principal + covariance matrix of the data. The corresponding eigenvalues rank the principal components in importance, where the biggest eigenvalue marks the first principal component.