You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/dataformats-dataframes.rst
+96-98Lines changed: 96 additions & 98 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,12 +48,12 @@ You can perform various operations on a DataFrame, such as filtering,
48
48
sorting, grouping, joining, and aggregating data.
49
49
50
50
The `DataFrames.jl <https://dataframes.juliadata.org/stable/>`_
51
-
package is Julia's version of the ``pandas`` library in Python and
51
+
package offers similar functionality as the ``pandas`` library in Python and
52
52
the ``data.frame()`` function in R.
53
-
DataFrames.jl also provides a rich set of functions for data cleaning,
53
+
``DataFrames.jl`` also provides a rich set of functions for data cleaning,
54
54
transformation, and visualization, making it a popular choice for
55
55
data science and machine learning tasks in Julia. Just like in Python and R,
56
-
the DataFrames.jl package provides functionality for data manipulation and analysis.
56
+
the ``DataFrames.jl`` package provides functionality for data manipulation and analysis.
57
57
58
58
59
59
Download a dataset
@@ -68,7 +68,7 @@ of characteristic features of different penguin species.
68
68
69
69
Artwork by @allison_horst
70
70
71
-
To obtain the data we simply add the PalmerPenguins package.
71
+
The dataset is bundled within the ``PalmerPenguins`` package, so we need to add that:
72
72
73
73
.. code-block:: julia
74
74
@@ -90,9 +90,9 @@ Here's how you can create a new dataframe:
90
90
using DataFrames
91
91
names = ["Ali", "Clara", "Jingfei", "Stefan"]
92
92
age = ["25", "39", "14", "45"]
93
-
df = DataFrame(; name=names, age=age)
93
+
df = DataFrame(name=names, age=age)
94
94
95
-
.. code-block:: text
95
+
.. code-block:: julia-repl
96
96
97
97
4×2 DataFrame
98
98
Row │ name age
@@ -105,24 +105,46 @@ Here's how you can create a new dataframe:
105
105
106
106
.. todo:: Dataframes
107
107
108
-
The following code loads the `PalmerPenguins` dataset into a DataFrame.
109
-
Then it demonstrates how to write and read the data in CSV, JSON, and
110
-
Parquet formats using the `CSV`, `JSONTables`, and `Parquet` packages respectively.
108
+
The following code loads the ``PalmerPenguins`` dataset into a DataFrame.
111
109
112
-
More about `Types of scientific data` one can find at `ENCCS High Performance Data Analytics in Python <https://enccs.github.io/hpda-python/scientific-data/#types-of-scientific-data>`_ training.
110
+
.. code-block:: julia
113
111
114
-
.. tabs::
112
+
using DataFrames
113
+
#Load the PalmerPenguins dataset
114
+
table = PalmerPenguins.load()
115
+
df = DataFrame(table);
116
+
# the raw data can be loaded by
117
+
#tableraw = PalmerPenguins.load(; raw = true)
115
118
116
-
.. tab:: DataFrame
117
-
.. code-block:: julia
119
+
first(df, 5)
120
+
121
+
.. code-block:: text
122
+
123
+
344×7 DataFrame
124
+
Row │ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
- **Long format**: In this format, each row is a single observation, and each column is a variable. This format is also known as "tidy" data.
291
275
- **Wide format**: In this format, each row is a subject, and each column is an observation. This format is also known as "spread" data.
292
276
293
-
The `DataFrames.jl` package provides functions to reshape data between long and wide formats. These functions are `stack`, `unstack`, `melt`, and `pivot`.
The ``DataFrames.jl`` package provides functions to reshape data between long and wide formats. These functions are ``stack``, ``unstack``, ``melt``, and ``pivot``.
278
+
Further examples can be found in the `official documentation <https://dataframes.juliadata.org/stable/man/reshaping_and_pivoting/>`__.
In this example, `groupby(df, [:species, :island])` groups your DataFrame by the `species` and `island` columns.
333
-
Then, `combine(df_grouped, :body_mass_g => mean)` calculates the mean of the `body_mass_g` column for each group.
334
-
The `mean` function is used for aggregation.
335
327
336
-
The result is a new DataFrame where each unique value in the `:species` column forms a row, each unique
337
-
value in the `:island` column forms a column, and the mean body mass for each species-island combination fills the DataFrame.
328
+
In this example, ``groupby(df, [:species, :island])`` groups the DataFrame by the ``species`` and ``island`` columns.
329
+
Then, ``combine(df_grouped, :body_mass_g => mean)`` calculates the mean of the ``:body_mass_g`` column for each group.
330
+
The ``mean`` function is used for aggregation.
338
331
339
-
Note that if you don't provide an aggregation function and there are multiple values for a given row-column combination,
340
-
`pivot` will throw an error. To handle this, you can provide an aggregation function like `mean`, `sum`, etc.,
341
-
which will be applied to all values that fall into each cell of the resulting DataFrame.
332
+
The result is a new DataFrame where each unique ``:species``-``:island`` combination forms a row,
333
+
and the mean body mass for each species-island combination fills the DataFrame.
342
334
343
335
344
-
Creating and merging DataFrames like in SQL
345
-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
336
+
(Optional) Creating and merging DataFrames
337
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
346
338
347
339
Creating DataFrames
340
+
~~~~~~~~~~~~~~~~~~~
348
341
349
-
In Julia, you can create a DataFrame from scratch using the `DataFrame` constructor from the `DataFrames` package.
342
+
In Julia, you can create a DataFrame from scratch using the ``DataFrame`` constructor from the ``DataFrames`` package.
350
343
This constructor allows you to create a DataFrame by passing column vectors as keyword arguments or pairs.
351
-
For example, to create a DataFrame with two columns named `:A` and `:B`, you can use the following code:
352
-
`DataFrame(A = 1:3, B = ["x", "y", "z"])`
353
-
You can also create a DataFrame from other data structures such as dictionaries, named tuples, vectors of vectors, matrices, and more.
344
+
For example, to create a DataFrame with two columns named ``:A`` and ``:B``, the following works:
345
+
346
+
``DataFrame(A = 1:3, B = ["x", "y", "z"])``
347
+
348
+
A DataFrame can also be created from other data structures such as dictionaries, named tuples, vectors of vectors, matrices, and more.
354
349
You can find more information about creating DataFrames in Julia in the `official documentation <https://dataframes.juliadata.org/stable/man/getting_started/>`_
355
350
356
351
Merging DataFrames
352
+
~~~~~~~~~~~~~~~~~~
357
353
358
-
Also, you can merge two or more DataFrames using the `join` function from the `DataFrames` package.
354
+
Also, you can merge two or more DataFrames using the ``join`` function from the ``DataFrames`` package.
359
355
This function allows you to perform various types of joins, such as inner join, left join, right join, outer join, semi join, and anti join.
360
-
You can specify the columns used to determine which rows should be combined during a join by passing them as the `on` argument to the `join` function.
361
-
For example, to perform an inner join on two DataFrames `df1` and `df2` using the `:ID` column as the key, you can use the following code: `join(df1, df2, on = :ID, kind = :inner)`.
362
-
You can find more information about joining DataFrames in Julia in the `official documentation <https://dataframes.juliadata.org/stable/man/joins/>`_
356
+
You can specify the columns used to determine which rows should be combined during a join by passing them as the ``on`` argument to the ``join`` function.
357
+
For example, to perform an inner join on two DataFrames ``df1`` and ``df2`` using the ``:ID`` column as the key, you can use the following code:
358
+
``join(df1, df2, on = :ID, kind = :inner)``.
359
+
You can find more information about joining DataFrames in Julia in the `official documentation <https://dataframes.juliadata.org/stable/man/joins/>`_.
0 commit comments