diff --git a/layouts/partials/data-science.html b/layouts/partials/data-science.html index af6bf16569..c3993f55d8 100644 --- a/layouts/partials/data-science.html +++ b/layouts/partials/data-science.html @@ -9,38 +9,34 @@
NumPy lies at the core of a rich ecosystem of data science libraries. -
-- Data science is the analysis of massive amounts of data - to gain insight. A typical workflow might be: + A typical exploratory data science workflow might look like:
- Pandas helps in data discovery and handling, - Intake helps with - data access and distribution, while - Beautiful Soup - is widely used for web-scraping and gathering data sets. - Seaborn is well known for - exploratory data analysis (EDA); - scikit-learn and - SciPy (statistical computing) serve some - of the backbone processes required for machine learning (regression methods, - classification, clustering, model validation and selection). - Statistical data exploration, estimation of various statistical models, - and conducting statistical tests are some of the functions offered by - statsmodels. +
+ For high data volumes, Dask and + Ray are designed to scale. Stable production + environments rely on data versioning (DVC), + experiment tracking (MLFlow), and + workflow automation (Airflow and + Prefect).
- Effective data analytics requires deep knowledge of the data domain (e.g., - retail, healthcare, marketing, finance, social media, automation, sales, travel, - etc.) as well as other core disciplines of data science, data engineering, and - data visualization. Tools such as MLFlow address - experiment hyperparameter and result tracking needs, while - DVC provides data version control for data science - and machine learning workflows. -