From 79773dcede641178427923743a37b269d41dd1f9 Mon Sep 17 00:00:00 2001 From: Thiago Teixeira Date: Wed, 24 Sep 2025 17:49:33 -0300 Subject: [PATCH 1/4] Add Parallel Fragments spec, draft 1. --- candidates/parallel fragments.md | 386 +++++++++++++++++++++++++++++++ 1 file changed, 386 insertions(+) create mode 100644 candidates/parallel fragments.md diff --git a/candidates/parallel fragments.md b/candidates/parallel fragments.md new file mode 100644 index 0000000..a608856 --- /dev/null +++ b/candidates/parallel fragments.md @@ -0,0 +1,386 @@ +# Summary + +Make it possible for `st.fragment`s to run in a parallel thread. + +# Background + +Optional. Is there anything we need to know about this project before we continue? + +# Problem statement + +Dashboards are one of the most common classes of apps in Streamlit. In a dashboard, data is typically +loaded, then transformed (sometimes after some user input), then finally displayed as charts and +other widgets. + +It's very common for the load-transform code paths of any given chart to be completely +distinct from the code paths of other charts. However, these code paths are typically executed +sequentially, which leads to a slow loading pattern for the app, where one section will only load +once the previous has done so. + +Toy example: + +```py +import numpy as np + +def load_user_growth(): + time.sleep(1) + return np.random.randn(100, 2) + +def load_revenue_growth(): + time.sleep(1) + return np.random.randn(100, 2) + +def load_expenses_growth(): + time.sleep(1) + return np.random.randn(100, 2) + +def transform_user_growth(arr, x): + time.sleep(1) + return arr + x + +def transform_revenue_growth(arr, x): + time.sleep(1) + return arr - x + +def transform_expenses_growth(arr, x): + time.sleep(1) + return arr * x + +slider1 = st.slider("Pick a number", 123) +slider2 = st.slider("Pick a second number", 456) + +arr1 = load_user_growth() +arr1 = transform_user_growth(arr1, slider1) +st.line_chart(arr1) + +arr2 = load_revenue_growth() +arr2 = transform_revenue_growth(arr2, slider2) +st.line_chart(arr2) + +arr3 = load_expenses_growth() +arr3 = transform_expenses_growth(arr3, slider2) +st.line_chart(arr3) +``` + +In this app, each step runs sequentially after the previous one is done, +so the whole thing takes 6s to draw: + +```mermaid +flowchart + s1@{ label: "slider1" } + s2@{ label: "slider2" } + l1@{ label: "load_user_growth" } + l2@{ label: "load_revenue_growth" } + l3@{ label: "load_expenses_growth" } + t4@{ label: "transform_user_growth" } + t5@{ label: "transform_revenue_growth" } + t6@{ label: "transform_expenses_growth" } + d7@{ label: "draw_user_growth_line_chart" } + d8@{ label: "draw_revenue_growth_line_chart" } + d9@{ label: "draw_expenses_growth_line_chart" } + startCircle@{ shape: "circle", label: "Start" } + endCircle@{ shape: "circle", label: "End" } + startCircle --> s1 + s1 --> s2 + s2 --> l1 + l1 --> t4 + l2 --> t5 + l3 --> t6 + t4 --> d7 + t5 --> d8 + t6 --> d9 + d7 --> l2 + d8 --> l3 + d9 --> endCircle + style s1 fill:#e0f2fe + style s2 fill:#e0f2fe + style l1 fill:#fce7f3 + style l2 fill:#fce7f3 + style l3 fill:#fce7f3 + style t4 fill:#ecfccb + style t5 fill:#ecfccb + style t6 fill:#ecfccb + style d7 fill:#fef9c3 + style d8 fill:#fef9c3 + style d9 fill:#fef9c3 + style startCircle fill:#eee + style endCircle fill:#eee +``` + +**Question 1:** Given that these code paths are so different, it would make a lot more sense to load +them in parallel instead. What would be the a simple, Streamlity API that is powerful +enough to cover the more common patterns for this? + +**Question 2:** When a user moves the sliders, the entire app reloads. How can we make sure only +the fragments that depend on that slider reload instead? **For now we'll leave this unanswered, +as it will be the subject of a separate StEP.** But you should have this question in mind as you think +through this StEP since we don't want the solution to Question 1 to preclude a great solution +for Question 2. + + +## Goals + +- Make it possible to run `@st.fragment`s in a separate thread. +- Very easy to use. +- Covers major use cases. +- Does not break existing apps. + +## **Non-goals** + +- It it not necessary to cover _all_ use cases + + +# Proposed solution + +To address **question 1**, let's extend the fragments primitive to support parallel execution, so +the example above looks more like this: + +_(NOTE: Ignore the exact API right now)_ + +```py +import numpy as np + +def load_user_growth(): + time.sleep(1) + return np.random.randn(100, 2) + +def load_revenue_growth(): + time.sleep(1) + return np.random.randn(100, 2) + +def load_expenses_growth(): + time.sleep(1) + return np.random.randn(100, 2) + +def transform_user_growth(arr, x): + time.sleep(1) + return arr + x + +def transform_revenue_growth(arr, x): + time.sleep(1) + return arr - x + +def transform_expenses_growth(arr, x): + time.sleep(1) + return arr * x + +slider1 = st.slider("Pick a number", 123) +slider2 = st.slider("Pick a second number", 456) + +@st.fragment(parallelize=True) +def chart1(): + arr1 = load_user_growth() + arr1 = transform_user_growth(arr1, slider1) + st.line_chart(arr1) + +@st.fragment(parallelize=True) +def chart2(): + arr2 = load_revenue_growth() + arr2 = transform_revenue_growth(arr2, slider2) + st.line_chart(arr2) + +@st.fragment(parallelize=True) +def chart3() + arr3 = load_expenses_growth() + arr3 = transform_expenses_growth(arr3, slider2) + st.line_chart(arr3) + +chart1() +chart2() +chart3() +``` + +With parallel fragments, the execution flow of the app would look like this: + +```mermaid +flowchart + l1@{ label: "load_user_growth" } + l2@{ label: "load_revenue_growth" } + l3@{ label: "load_expenses_growth" } + t4@{ label: "transform_user_growth" } + t5@{ label: "transform_revenue_growth" } + t6@{ label: "transform_expenses_growth" } + d7@{ label: "draw_user_growth_line_chart" } + d8@{ label: "draw_revenue_growth_line_chart" } + d9@{ label: "draw_expenses_growth_line_chart" } + startCircle@{ shape: "circle", label: "Start" } + endCircle@{ shape: "circle", label: "End" } + startCircle --> l1 + startCircle --> l2 + startCircle --> l3 + l1 --> t4 + l2 --> t5 + l3 --> t6 + t4 --> d7 + t5 --> d8 + t6 --> d9 + d7 --> endCircle + d8 --> endCircle + d9 --> endCircle + style l1 fill:#fce7f3 + style l2 fill:#fce7f3 + style l3 fill:#fce7f3 + style t4 fill:#ecfccb + style t5 fill:#ecfccb + style t6 fill:#ecfccb + style d7 fill:#fef9c3 + style d8 fill:#fef9c3 + style d9 fill:#fef9c3 + style startCircle fill:#eee + style endCircle fill:#eee +``` + + +## API + +How should we declare that a given fragment can be executed in a parallel thread? + +
**Option 1: New keyword argument** + +```py +st.fragment(func=None, *, run_every=None, parallelize=True) +``` + +**Pros** +- Doesn't introduce a new primitive in Streamlit +- Very discoverable +- ? + +**Cons** +- A bit wordy +- ? + +**Naming** +1. parallelize +1. thread +1. background +1. bg +1. async +1. task +1. daemon +1. background_task +1. run_in_thread +1. run_in_parallel +1. run_in_background +1. run_in_bg +1. run_async + +
+ +
**Option 2: New decorator** + +```py +st.parallel_fragment(func=None, *, run_every=None) +``` + +**Pros** +- Very discoverable +- ? + +**Cons** +- Introduces a new flow control primitive in Streamlit. + + People tend to be confused by the primitives we already support (`cache_resource`, `cache_data`, + `fragment`, `form`), so I'd rather not make things more complicated for them. +- ? + +**Naming:** +1. @st.parallel_fragment +1. @st.threaded_fragment +1. @st.async_fragment +1. @st.thread +1. @st.fragment_thread +1. @st.daemon +1. @st.task +1. @st.async + +
+ +
**Option 3: Async def** ✅ CURRENT FAVORITE + +With this option, there would be no change to the `@st.fragment` signature: + +```py +st.fragment(func=None, *, run_every=None) +``` + +...but if `func` is declared with `async def`, then it automatically executes in parallel: + +```py +@st.fragment +async def chart1(): + ... +``` + +**Pros** +- Doesn't introduce a new primitive in Streamlit +- [Opinion] Feels really natural +- ? + +**Cons** +- Harder to discover +- This somewhat stretches the definition of `async` in Python +- ? + +
+ +## Design + +This is a Python-only feature. No impact on design. + +## Behavior + +The return value of an async fragment is ignored. + +Another option would be to return a `Future` or to somehow stuff the return value into Session +State, but it's unclear that any of this is needed. So let's leave this feature out for now +and see if there's a need. We can always add this later. + +## Other solutions considered + +_(This will be updated when we make a decision on which option to go with, above)_ + +## Metrics + +* Impact on metrics: + +The hope is that this would make a certain class of apps faster. However, it may be hard to +measure this since we'd need to look at performance metrics from _before_ and _after_ the change. + + +* Requires new metrics: + +If going with **Option 3**, we'll need to add some telemetry logic to be able to tell how much +usage this feature is getting. + +Otherwise, **Options 1 and 2** should get automatically tracked with the current telemetry logic. + + + + +# Implementation + +_Once there's a prototype implementation, we'll link the Github branch for it here._ From 97a64ca6524c91776d21e4cdb04da0f6af44e410 Mon Sep 17 00:00:00 2001 From: Thiago Teixeira Date: Wed, 24 Sep 2025 17:54:12 -0300 Subject: [PATCH 2/4] Update mermaid diagram --- candidates/parallel fragments.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/candidates/parallel fragments.md b/candidates/parallel fragments.md index a608856..5d6db13 100644 --- a/candidates/parallel fragments.md +++ b/candidates/parallel fragments.md @@ -194,6 +194,8 @@ With parallel fragments, the execution flow of the app would look like this: ```mermaid flowchart + s1@{ label: "slider1" } + s2@{ label: "slider2" } l1@{ label: "load_user_growth" } l2@{ label: "load_revenue_growth" } l3@{ label: "load_expenses_growth" } @@ -205,9 +207,11 @@ flowchart d9@{ label: "draw_expenses_growth_line_chart" } startCircle@{ shape: "circle", label: "Start" } endCircle@{ shape: "circle", label: "End" } - startCircle --> l1 - startCircle --> l2 - startCircle --> l3 + startCircle --> s1 + startCircle --> s2 + s1 --> l1 + s2 --> l2 + s2 --> l3 l1 --> t4 l2 --> t5 l3 --> t6 @@ -217,6 +221,8 @@ flowchart d7 --> endCircle d8 --> endCircle d9 --> endCircle + style s1 fill:#e0f2fe + style s2 fill:#e0f2fe style l1 fill:#fce7f3 style l2 fill:#fce7f3 style l3 fill:#fce7f3 From 9cb6372b025811165935b4e6c29717e37b7c297e Mon Sep 17 00:00:00 2001 From: Thiago Teixeira Date: Wed, 24 Sep 2025 17:58:23 -0300 Subject: [PATCH 3/4] Fix syntax and wording. --- candidates/parallel fragments.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/candidates/parallel fragments.md b/candidates/parallel fragments.md index 5d6db13..850dd55 100644 --- a/candidates/parallel fragments.md +++ b/candidates/parallel fragments.md @@ -127,7 +127,7 @@ for Question 2. ## **Non-goals** -- It it not necessary to cover _all_ use cases +- Covering every possible scenario. # Proposed solution @@ -241,7 +241,7 @@ flowchart How should we declare that a given fragment can be executed in a parallel thread? -
**Option 1: New keyword argument** +
Option 1: New keyword argument ```py st.fragment(func=None, *, run_every=None, parallelize=True) @@ -273,7 +273,7 @@ st.fragment(func=None, *, run_every=None, parallelize=True)
-
**Option 2: New decorator** +
Option 2: New decorator ```py st.parallel_fragment(func=None, *, run_every=None) @@ -302,7 +302,7 @@ st.parallel_fragment(func=None, *, run_every=None)
-
**Option 3: Async def** ✅ CURRENT FAVORITE +
Option 3: Async def ✅ CURRENT FAVORITE With this option, there would be no change to the `@st.fragment` signature: From 39411ead58f27cbed1b905a0e5b7d1b0477b63b1 Mon Sep 17 00:00:00 2001 From: Thiago Teixeira Date: Wed, 24 Sep 2025 20:24:32 -0300 Subject: [PATCH 4/4] Replace with stub --- candidates/parallel fragments.md | 391 +------------------------------ 1 file changed, 1 insertion(+), 390 deletions(-) diff --git a/candidates/parallel fragments.md b/candidates/parallel fragments.md index 850dd55..202262b 100644 --- a/candidates/parallel fragments.md +++ b/candidates/parallel fragments.md @@ -1,392 +1,3 @@ -# Summary - Make it possible for `st.fragment`s to run in a parallel thread. -# Background - -Optional. Is there anything we need to know about this project before we continue? - -# Problem statement - -Dashboards are one of the most common classes of apps in Streamlit. In a dashboard, data is typically -loaded, then transformed (sometimes after some user input), then finally displayed as charts and -other widgets. - -It's very common for the load-transform code paths of any given chart to be completely -distinct from the code paths of other charts. However, these code paths are typically executed -sequentially, which leads to a slow loading pattern for the app, where one section will only load -once the previous has done so. - -Toy example: - -```py -import numpy as np - -def load_user_growth(): - time.sleep(1) - return np.random.randn(100, 2) - -def load_revenue_growth(): - time.sleep(1) - return np.random.randn(100, 2) - -def load_expenses_growth(): - time.sleep(1) - return np.random.randn(100, 2) - -def transform_user_growth(arr, x): - time.sleep(1) - return arr + x - -def transform_revenue_growth(arr, x): - time.sleep(1) - return arr - x - -def transform_expenses_growth(arr, x): - time.sleep(1) - return arr * x - -slider1 = st.slider("Pick a number", 123) -slider2 = st.slider("Pick a second number", 456) - -arr1 = load_user_growth() -arr1 = transform_user_growth(arr1, slider1) -st.line_chart(arr1) - -arr2 = load_revenue_growth() -arr2 = transform_revenue_growth(arr2, slider2) -st.line_chart(arr2) - -arr3 = load_expenses_growth() -arr3 = transform_expenses_growth(arr3, slider2) -st.line_chart(arr3) -``` - -In this app, each step runs sequentially after the previous one is done, -so the whole thing takes 6s to draw: - -```mermaid -flowchart - s1@{ label: "slider1" } - s2@{ label: "slider2" } - l1@{ label: "load_user_growth" } - l2@{ label: "load_revenue_growth" } - l3@{ label: "load_expenses_growth" } - t4@{ label: "transform_user_growth" } - t5@{ label: "transform_revenue_growth" } - t6@{ label: "transform_expenses_growth" } - d7@{ label: "draw_user_growth_line_chart" } - d8@{ label: "draw_revenue_growth_line_chart" } - d9@{ label: "draw_expenses_growth_line_chart" } - startCircle@{ shape: "circle", label: "Start" } - endCircle@{ shape: "circle", label: "End" } - startCircle --> s1 - s1 --> s2 - s2 --> l1 - l1 --> t4 - l2 --> t5 - l3 --> t6 - t4 --> d7 - t5 --> d8 - t6 --> d9 - d7 --> l2 - d8 --> l3 - d9 --> endCircle - style s1 fill:#e0f2fe - style s2 fill:#e0f2fe - style l1 fill:#fce7f3 - style l2 fill:#fce7f3 - style l3 fill:#fce7f3 - style t4 fill:#ecfccb - style t5 fill:#ecfccb - style t6 fill:#ecfccb - style d7 fill:#fef9c3 - style d8 fill:#fef9c3 - style d9 fill:#fef9c3 - style startCircle fill:#eee - style endCircle fill:#eee -``` - -**Question 1:** Given that these code paths are so different, it would make a lot more sense to load -them in parallel instead. What would be the a simple, Streamlity API that is powerful -enough to cover the more common patterns for this? - -**Question 2:** When a user moves the sliders, the entire app reloads. How can we make sure only -the fragments that depend on that slider reload instead? **For now we'll leave this unanswered, -as it will be the subject of a separate StEP.** But you should have this question in mind as you think -through this StEP since we don't want the solution to Question 1 to preclude a great solution -for Question 2. - - -## Goals - -- Make it possible to run `@st.fragment`s in a separate thread. -- Very easy to use. -- Covers major use cases. -- Does not break existing apps. - -## **Non-goals** - -- Covering every possible scenario. - - -# Proposed solution - -To address **question 1**, let's extend the fragments primitive to support parallel execution, so -the example above looks more like this: - -_(NOTE: Ignore the exact API right now)_ - -```py -import numpy as np - -def load_user_growth(): - time.sleep(1) - return np.random.randn(100, 2) - -def load_revenue_growth(): - time.sleep(1) - return np.random.randn(100, 2) - -def load_expenses_growth(): - time.sleep(1) - return np.random.randn(100, 2) - -def transform_user_growth(arr, x): - time.sleep(1) - return arr + x - -def transform_revenue_growth(arr, x): - time.sleep(1) - return arr - x - -def transform_expenses_growth(arr, x): - time.sleep(1) - return arr * x - -slider1 = st.slider("Pick a number", 123) -slider2 = st.slider("Pick a second number", 456) - -@st.fragment(parallelize=True) -def chart1(): - arr1 = load_user_growth() - arr1 = transform_user_growth(arr1, slider1) - st.line_chart(arr1) - -@st.fragment(parallelize=True) -def chart2(): - arr2 = load_revenue_growth() - arr2 = transform_revenue_growth(arr2, slider2) - st.line_chart(arr2) - -@st.fragment(parallelize=True) -def chart3() - arr3 = load_expenses_growth() - arr3 = transform_expenses_growth(arr3, slider2) - st.line_chart(arr3) - -chart1() -chart2() -chart3() -``` - -With parallel fragments, the execution flow of the app would look like this: - -```mermaid -flowchart - s1@{ label: "slider1" } - s2@{ label: "slider2" } - l1@{ label: "load_user_growth" } - l2@{ label: "load_revenue_growth" } - l3@{ label: "load_expenses_growth" } - t4@{ label: "transform_user_growth" } - t5@{ label: "transform_revenue_growth" } - t6@{ label: "transform_expenses_growth" } - d7@{ label: "draw_user_growth_line_chart" } - d8@{ label: "draw_revenue_growth_line_chart" } - d9@{ label: "draw_expenses_growth_line_chart" } - startCircle@{ shape: "circle", label: "Start" } - endCircle@{ shape: "circle", label: "End" } - startCircle --> s1 - startCircle --> s2 - s1 --> l1 - s2 --> l2 - s2 --> l3 - l1 --> t4 - l2 --> t5 - l3 --> t6 - t4 --> d7 - t5 --> d8 - t6 --> d9 - d7 --> endCircle - d8 --> endCircle - d9 --> endCircle - style s1 fill:#e0f2fe - style s2 fill:#e0f2fe - style l1 fill:#fce7f3 - style l2 fill:#fce7f3 - style l3 fill:#fce7f3 - style t4 fill:#ecfccb - style t5 fill:#ecfccb - style t6 fill:#ecfccb - style d7 fill:#fef9c3 - style d8 fill:#fef9c3 - style d9 fill:#fef9c3 - style startCircle fill:#eee - style endCircle fill:#eee -``` - - -## API - -How should we declare that a given fragment can be executed in a parallel thread? - -
Option 1: New keyword argument - -```py -st.fragment(func=None, *, run_every=None, parallelize=True) -``` - -**Pros** -- Doesn't introduce a new primitive in Streamlit -- Very discoverable -- ? - -**Cons** -- A bit wordy -- ? - -**Naming** -1. parallelize -1. thread -1. background -1. bg -1. async -1. task -1. daemon -1. background_task -1. run_in_thread -1. run_in_parallel -1. run_in_background -1. run_in_bg -1. run_async - -
- -
Option 2: New decorator - -```py -st.parallel_fragment(func=None, *, run_every=None) -``` - -**Pros** -- Very discoverable -- ? - -**Cons** -- Introduces a new flow control primitive in Streamlit. - - People tend to be confused by the primitives we already support (`cache_resource`, `cache_data`, - `fragment`, `form`), so I'd rather not make things more complicated for them. -- ? - -**Naming:** -1. @st.parallel_fragment -1. @st.threaded_fragment -1. @st.async_fragment -1. @st.thread -1. @st.fragment_thread -1. @st.daemon -1. @st.task -1. @st.async - -
- -
Option 3: Async def ✅ CURRENT FAVORITE - -With this option, there would be no change to the `@st.fragment` signature: - -```py -st.fragment(func=None, *, run_every=None) -``` - -...but if `func` is declared with `async def`, then it automatically executes in parallel: - -```py -@st.fragment -async def chart1(): - ... -``` - -**Pros** -- Doesn't introduce a new primitive in Streamlit -- [Opinion] Feels really natural -- ? - -**Cons** -- Harder to discover -- This somewhat stretches the definition of `async` in Python -- ? - -
- -## Design - -This is a Python-only feature. No impact on design. - -## Behavior - -The return value of an async fragment is ignored. - -Another option would be to return a `Future` or to somehow stuff the return value into Session -State, but it's unclear that any of this is needed. So let's leave this feature out for now -and see if there's a need. We can always add this later. - -## Other solutions considered - -_(This will be updated when we make a decision on which option to go with, above)_ - -## Metrics - -* Impact on metrics: - -The hope is that this would make a certain class of apps faster. However, it may be hard to -measure this since we'd need to look at performance metrics from _before_ and _after_ the change. - - -* Requires new metrics: - -If going with **Option 3**, we'll need to add some telemetry logic to be able to tell how much -usage this feature is getting. - -Otherwise, **Options 1 and 2** should get automatically tracked with the current telemetry logic. - - - - -# Implementation - -_Once there's a prototype implementation, we'll link the Github branch for it here._ +PR: https://github.com/streamlit/streamlit-enhancement-proposals/pull/2