Skip to content

DOC: Emphasize NumPy in Ecosystem openers #242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 22, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 12 additions & 13 deletions layouts/partials/array-libraries.html
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
<!-- Array libraries Tab Content -->
<li class="array-libraries">
<p>
Numpy array forms the core of the organically growing numeric
Python <b>array library</b> ecosystem that now supports GPUs, sparse,
distributed arrays and more.
</p>
<p>
Several of these newer libraries such as CuPy, Sparse and Dask,
implement the NumPy API adding support for modern user cases,
newer hardware and higher scalability of array computing. Other
array libraries such as Xarray, TensorLy consume NumPy API and
build newer functionality on top of it, thus enhancing array
computing in Python beyond Numpy capabilities.
When libraries emerge to exploit new
hardware technologies and architectures, they take
NumPy as their starting point.
<a href="https://cupy.chainer.org">CuPy</a>,
<a href="https://sparse.pydata.org/en/latest/">Sparse</a>, and
<a href="https://dask.org/">Dask</a>
implement the NumPy API with support for modern user cases and
scalable hardware;
<a href="https://xarray.pydata.org/en/stable/index.html">Xarray</a> and
<a href="http://tensorly.org/stable/home.html">Tensor.ly</a> add newer functionality.
</p>
<table>
<tr class="highlight-th">
Expand Down Expand Up @@ -56,7 +55,7 @@
astronomy, satellite imagery and mobile network modeling.</td>
</tr>
<tr>
<td><img class="first-column-layout" src="images/content_images/arlib/CuPy.png" alt="CuPy"></td>
<td><img class="first-column-layout" src="images/content_images/arlib/cupy.png" alt="CuPy"></td>
<td class="full-center-text"><a href="https://cupy.chainer.org">CuPy</a></td>
<td class="left-text">NumPy-compatible matrix library accelerated by CUDA used to implement Neural Networks
for Deep Learning.</td>
Expand All @@ -82,7 +81,7 @@
</tr>
<tr>
<td><img class="first-column-layout" src="images/content_images/arlib/xtensor.png" alt="xtensor"></td>
<td class="full-center-text"><a href="" https://github.com/xtensor-stack/xtensor-python>xtensor </a> </td>
<td class="full-center-text"><a href="https://github.com/xtensor-stack/xtensor-python">xtensor</a> </td>
<td class="left-text">Multi-dimensional arrays with broadcasting and lazy computing for numerical
analysis.</td>
</tr>
Expand Down
63 changes: 40 additions & 23 deletions layouts/partials/data-science.html
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,44 @@
</div>
<div>
<p>
Data Science makes it possible to analyze massive amounts of data
and gain meaningful insights. A typical data science workflow involves
various techniques and tools such as:
NumPy lies at the core of a rich ecosystem of data science libraries.
</p>
<p>
Data science is the analysis of massive amounts of data
to gain insight. A typical workflow might be:

<ul class="content-tab">
<li><b>Extract, Transform, Load (ETL):</b> Pandas, Beautiful Soup, Intake</li>
<li><b>Explore:</b> Seaborn, Matplotlib</li>
<li><b>Model:</b> Scikit-learn, SciPy, statsmodels</li>
<li><b>Evaluate:</b> NumPy, TensorFlow </li>
<li><b>Extract, Transform, Load (ETL):</b>
<a href="https://pandas.pydata.org">Pandas</a>,
<a href="https://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a>,
<a href="https://intake.readthedocs.io/en/latest/"> Intake</a>
</li>

<li><b>Explore:</b>
<a href="https://seaborn.pydata.org"> Seaborn</a>,
<a href="https://matplotlib.org">Matplotlib</a>,

</li>

<li><b>Model:</b>
<a href="https://scikit-learn.org">scikit-learn</a>,
<a href="https://www.scipy.org">SciPy</a>,
<a href="https://www.statsmodels.org/stable/index.html"> statsmodels</a>.
</li>

<li><b>Evaluate:</b>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rethink the contents here - having NumPy right at the end again is a little odd perhaps.

NumPy,
<a href="https://www.tensorflow.org">TensorFlow</a>
</li>

<li>
<b>Presentation:</b>
<b>Display:</b>
<a href="./index.html/#tab-visual"> Data Visualization Tools</a>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New issue found: this doesn't work as well as it should; it opens the correct tab, but the view jumps back to top of the page. Noting here, can deal with it later.

</li>
</ul>
</p>
</div>
</div>
<p>
Python has a rich ecosystem of libraries that enable Data Science
workflows. <b> NumPy</b> is the foundation of almost all of these tools
such as Pandas, Seaborn, Beautiful Soup and several others.
</p>
<div class="grid-container">
<div>
<p>
Expand All @@ -37,13 +54,13 @@
data access and distribution, while
<a href="https://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a>
is widely used for web-scraping and gathering data sets.
<a href="https://seaborn.pydata.org"> Seaborn</a> is well known for its
<a href="https://towardsdatascience.com/how-to-perform-exploratory-data-analysis-with-seaborn-97e3413e841d">exploratory data analysis (EDA)</a>
capabilities, <a href="https://scikit-learn.org">Scikit-learn</a> and
<a href="https://www.scipy.org">Scipy</a> (statistical computing) serve some
<a href="https://seaborn.pydata.org"> Seaborn</a> is well known for
<a href="https://towardsdatascience.com/how-to-perform-exploratory-data-analysis-with-seaborn-97e3413e841d">exploratory data analysis (EDA)</a>;
<a href="https://scikit-learn.org">scikit-learn</a> and
<a href="https://www.scipy.org">SciPy</a> (statistical computing) serve some
of the backbone processes required for machine learning (regression methods,
classification, clustering, model validation and selection).
Statistical data exploration, estimation of various statistical models
Statistical data exploration, estimation of various statistical models,
and conducting statistical tests are some of the functions offered by
<a href="https://www.statsmodels.org/stable/index.html"> statsmodels</a>.
</p>
Expand All @@ -53,11 +70,11 @@
</div>
</div>
<p>
Effective data analytics require deep knowledge of the data domain (e.g.,
Retail, Healthcare, Marketing, Finance, Social Media, Automation, Sales, Travel,
etc.) as well as other core disciplines of Data Science, Data Engineering and
Data Visualization. Tools such as <a href="https://mlflow.org">MLFlow</a> address
experiment hyper-parameter and result tracking needs, while
Effective data analytics requires deep knowledge of the data domain (e.g.,
retail, healthcare, marketing, finance, social media, automation, sales, travel,
etc.) as well as other core disciplines of data science, data engineering, and
data visualization. Tools such as <a href="https://mlflow.org">MLFlow</a> address
experiment hyperparameter and result tracking needs, while
<a href="https://dvc.org"> DVC</a> provides data version control for data science
and machine learning workflows.
Copy link
Member

@rgommers rgommers May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still would like to make this tab a little more compact. Perhaps in a follow-up, all these changes look good. EDIT: followed up in gh-262

</p>
Expand Down
48 changes: 20 additions & 28 deletions layouts/partials/machine-learning.html
Original file line number Diff line number Diff line change
Expand Up @@ -13,29 +13,21 @@
</div>
<div>
<p>
<b>Machine learning</b> (ML) enables computers to learn using
data, without having to be explicitly programmed.
<b>NumPy</b> is the foundation of all data pre-processing
that happens in the implementation of several ML Algorithms.
</p>
<p>
Python’s rich machine language and deep learning ecosystem
provides powerful tools such as
<a href="https://scikit-learn.org/stable/">Scikit-learn</a>
that is built on top of NumPy and
<a href="https://www.scipy.org">SciPy</a> and offers data
mining and analytics using classical ML algorithms.
</p>
<p>
<a href="https://www.tensorflow.org">Tensorflow’s</a>
deep learning capabilities help to define and run
computations involving tensors that have broad
applications in Speech and image recognition, Text-based
applications, Time-Series analysis and Video Detection.
<a href="https://pytorch.org">PyTorch </a> is another deep
learning library that is very popular among researchers for
computer vision and NLP applications. <a href="https://github.com/apache/incubator-mxnet">MXNet</a>
is another AI package that provides blueprints and
NumPy forms the basis of powerful machine learning libraries
like
<a href="https://scikit-learn.org">scikit-learn</a> and
<a href="https://www.scipy.org">SciPy</a>.
As machine learning grows, so does the
list of libraries built on NumPy.
<a href="https://www.tensorflow.org">TensorFlow’s</a>
deep learning capabilities have broad
applications &mdash; among them speech and image recognition, text-based
applications, time-series analysis, and video detection.
<a href="https://pytorch.org">PyTorch</a>, another deep
learning library, is popular among researchers in
computer vision and natural language processing.
<a href="https://github.com/apache/incubator-mxnet">MXNet</a>
is another AI package, providing blueprints and
templates for deep learning.
</p>
</div>
Expand All @@ -44,15 +36,15 @@
<div>
<p>
Statistical techniques called
<a href="https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205">Ensemble</a>
<a href="https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205">ensemble</a>
methods such as binning,
bagging, stacking and boosting are widely used in various ML
bagging, stacking, and boosting are among the ML
algorithms implemented by tools such as
<a href="https://github.com/dmlc/xgboost">XGBoost</a>,
<a href="https://lightgbm.readthedocs.io/en/latest/">LightGBM</a>,
<a href="https://catboost.ai">CatBoost</a> - one of the
<a href="https://lightgbm.readthedocs.io/en/latest/">LightGBM</a>, and
<a href="https://catboost.ai">CatBoost</a> &mdash; one of the
fastest inference engines.
<a href="https://www.scikit-yb.org/en/latest/">Yellowbrick</a>,
<a href="https://www.scikit-yb.org/en/latest/">Yellowbrick</a> and
<a href="https://eli5.readthedocs.io/en/latest/">Eli5</a>
offer machine learning visualizations.
</p>
Expand Down
21 changes: 7 additions & 14 deletions layouts/partials/scientific-domains.html
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
<!-- Scientific Domains Tab Content -->
<li class="scientific-domains">
<p>
Data acquisition (experimental, simulation), processing and
visualization are the core data related tasks in almost all the
scientific domains. Visualization of results through high quality
figures make scientific reports and publications easy to
understand. Python is easier to learn and computationally
efficient for scientific computing. NumPy, SciPy and Matplotlib
form the core Python packages that are used across various
scientific domains.
Nearly every scientist working in Python draws on the power of NumPy.
</p>
<p>
NumPy brings the computational power of languages like C and Fortran
to Python, a language much easier to learn and use. With this power
comes simplicity: a solution in NumPy is often clear and elegant.
</p>
<!-- First Row -->
<table>
Expand Down Expand Up @@ -99,7 +97,7 @@
<td class="center-text"></td>
<td class="center-text"></td>
<td class="center-text"><a
href="https://towardsdatascience.com/easy-steps-to-plot-geographic-data-on-a-map-python-11217859a2db">NumPy</a>
href="https://towardsdatascience.com/easy-steps-to-plot-geographic-data-on-a-map-python-11217859a2db">NumPy</a>
</td>
<td class="center-text"></td>
</tr>
Expand All @@ -122,9 +120,4 @@
<td class="lastrow-center-text"></td>
</tr>
</table>
<p>
NumPy’s powerful array processing capabilities and elegant syntax
helps to clearly and efficiently express computational algorithms
in various scientific computing domains.
</p>
</li>
35 changes: 6 additions & 29 deletions layouts/partials/visualization.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,47 +54,24 @@
</div>
<div>
<p>
<a href="https://www.slideshare.net/Visage/data-visualization-101-how-to-design-chartsandgraphs">Data
Visualization</a>
exposes patterns, trends and correlations in textual-data, making it easier for humans to
analyse and interpret large volumes of data.
</p>
<p>
<a href="https://python-graph-gallery.com">Visualization elements</a>
such as bar graphs, pie charts, line charts, maps, infographics, dashboards,
geographic maps, heatmaps, and interactive images offer valuable
insights for making data-driven decisions.
</p>
</div>
<div>
<p>
NumPy is the key data transformation building block for the burgeoning
<a href="https://pyviz.org/overviews/index.html">Python visualization landscape</a> comprising of
NumPy is an essential component in the burgeoning
<a href="https://pyviz.org/overviews/index.html">Python visualization landscape</a>, which includes
<a href="https://matplotlib.org">Matplotlib</a>,
<a href="https://seaborn.pydata.org">Seaborn</a>,
<a href="https://plot.ly">Plotly</a>,
<a href="https://altair-viz.github.io">Altair</a>,
<a href="https://docs.bokeh.org/en/latest/">Bokeh</a>,
<a href="http://holoviz.org">Holoviz</a>,
<a href="http://vispy.org">Vispy</a> and
<a href="http://vispy.org">Vispy</a>, and
<a href="https://github.com/napari/napari">Napari</a>,
to name a few.
</p>
<p>
By performing parallel operations on large arrays, all at once, NumPy accelerates data-processing and
visualization of large quantities of data, beyond Python's native performance levels, for data
visualization at
scale.
NumPy's accelerated processing of large arrays allows researchers to visualize
datasets far larger than native Python could handle.
</p>
</div>
<div>
<p>
<a href="https://rougier.github.io/python-visualization-landscape/landscape-colors.png">
<img src="images/content_images/vis-landscape.png"
alt="Mindmap linking several concepts, such as Javascript, Matplotlib, d3js and OpenGL."
align="left">
</a>
</p>
</div>
</div>
</li>
</li>