-
Notifications
You must be signed in to change notification settings - Fork 142
Developer docs
These are things which are useful to refer back to. At some point in the future they might make their way into a proper docs page on RTD. These notes can be rough and might not always be up to date. If it's a quick answer then put it inline here; if it's a longer read then just link to it.
Source: https://github.com/mckinsey/vizro/pull/775
When serve_locally=True
(the default), Dash serves component library resources (generally CSS and JS) through the Flask server using the _dash-component-suites
route.
- For Vizro library components (currently just KPI cards), this should happen when Vizro is imported.
- For Vizro framework components (everything else), this should happen only when
Vizro()
is instantiated.
This makes our footprint as small as possible and ensures there's reduced risk of CSS name clash when someone wants to use Vizro just as a library but doesn't instantiate Vizro()
(not common at all now, but maybe will be in the future).
When serve_locally=False
, Dash switches to serving resources through external_url
, where it's specified. For Vizro components we use jsDeliver as a CDN for this.
A few complications:
- files that aren't CSS/JS (e.g. fonts, maps) can still be served through the same routes but should not have a
<script>
or<link>
in the HTML source. This is achieved withdynamic=True
- the CDN minifies CSS/JS files automatically, but some we have minified manually ourselves (currently just
vizro-bootstrap.min.css
) - it's not possible to serve JS as modules this way, which means we can't easily do
import/export
in them
In future:
- when we release vizro-bootstrap, nothing changes for Vizro framework. Pure Dash users would treat it like any other bootstrap theme i.e. set it through
external_stylesheets
that points to the stylesheet on our CDN or download the file to their ownassets
folder. We'd have a shortcut for this likevizro.theme
or do it through bootswatch if that's possible. - ideally we would webpack our JS and ship the
.min.js
rather than just relying on the CDN to minify it. This would let us write "proper" rather than just in-browser JS and mean we benefit from tree-shaking etc. rather than just minification that the CDN does. In reality the optimisations would make very little difference to performance, but it's kind of the "right" way to do things. It's more effort than it's worth to set up at the moment, but if we end up maintaining a bigger JS codebase we might do it - vizro-boostrap.css map file + all the SASS should live in the repo so that it can be handled correctly through developer tools in a browser
The order Dash inserts stylesheets is always:
external_stylesheets
- library stylesheets as we add on
import vizro
- stylesheets added through
append_css
as we do inVizro()
- user assets (also go through
append_css
but only when server gets its first request)
The problem was that figures.css
was served in stage 2 and therefore could come before vizro-bootstrap.min.css
. I hoped this wouldn't cause any issues but unfortunately it did...
So now what we do is remove the library stylesheets in Vizro()
and then add them using the framework's append_css
mechanism. This means that vizro-boostrap.min.css
always comes first for a framework user because we sort the stylesheets added in stage 3 to put it first (the rest are in alphabetical order). For a Dash user, it will be specified using external_stylesheets
so always come first anyway.
See https://github.com/mckinsey/vizro/pull/615. Then altered slightly by:
See https://github.com/mckinsey/vizro/pull/598#pullrequestreview-2200302196.
Source: https://github.com/mckinsey/vizro/pull/580
Development
Vizro().build(dashboard).run()
and then python app.py
, which is what we do across our docs. This only works while you're developing but I like recommending it as the first port of call for users because it's simple, quick and easy, like Vizro should be. There's no need to define app
.
Deployment
app = Vizro().build(dashboard)
# If you also want it to run during development with python app.py you also need this:
if __name__ == "__main__":
app.run()
and then e.g. gunicorn app:app
. The key change of this PR is that in this context there's no longer any need to define server = app.dash.server
(although that will still work).
The integration tests in this repo do something a bit different but that's just due to some technicalities of how they run and so don't show a generally recommended pattern.
Source: https://github.com/mckinsey/vizro/pull/151
Here's the rules for how we should write code so that paths are always correctly formed:
- always use
dash.get_relative_path
to link to pages withhref
(see_make_page_404_layout
example link) - always use
dash.get_relative_path(f"/{STATIC_URL_PREFIX}/..")
to refer to built-in assets in thestatic
folder (see_make_page_404_layout
examplehtml.Img
) - always use
dash.get_asset_url
to refer to things in the userassets
folder, e.g. the logo is done this way
Source: https://github.com/mckinsey/vizro/pull/188
- prefer to use
None
overhtml.Div(hidden=True)
in the case that we don't need to refer to the element in any way (basically whenever you don't set anid
). e.g.html.P(self.title) if self.title else None
- prefer to use
html.Div(hidden=True)
overNone
in the case that we do need to refer to the element (basically when you do set anid
). e.g.html.Div(hidden=True, id="nav_panel_outer")
. Generally these can be identified by the fact thatbuild
return values have a type like_NavBuildType
- prefer to use
""
as default value for optional fields which arestr
. These fields do not need to acceptNone
values at all
Source: https://github.com/mckinsey/vizro/pull/367#issuecomment-1994052080
when it comes to using CapturedCallable
we should always prefer to use the highest-level interface possible to avoid delving into private/protected things. There's basically three categories of attributes here:
- dunder attributes like
__call__
and__getitem__
: these are the main point of entry to any callers and should be used wherever possible - protected attributes like
_function
and_arguments
: ok to use if needed but will be removed or made into proper public things in due course, so put some thought into exactly what you're trying to do and whether you really need to use them or if you can already achieve it just with dunder attributes - private attributes like
__arguments
: you should never need to use these
Source: https://vizro.readthedocs.io/en/stable/pages/user-guides/data/
This sum-up can help you to quickly decide what Vizro data type and configuration to use in your future examples.
Static data:
- When to use: Use for data that does not need to be reloaded while the dashboard is running.
- Production ready: 🟢
- Performances: 🟢
- Limitations: Data can only be reloaded/refreshed by restarting the dashboard app.
- Use cases: Any time when your data does not need to be reloaded while the dashboard is running.
Dynamic data:
- When to use: Use for data that does need to be reloaded while the dashboard is running.
- Production ready: 🟠 The reason is performance that might degrade you app.
-
Performances: 🟠 Your dashboard performance may suffer if: (Use the cache to solve the problem.)
- loading your data is a slow operation or
- you have many figures that use dynamic data or
- many users use the app at the same time.
- Limitations: Performances
-
Use cases: When loading your data is fast operation and you strictly have to get the really latest results from the data source: For example:
- Displaying the results from the just finished workflow triggered by the user. (e.g. some model interaction flow).
- Repetitive reading logs from the file.
- Chat apps.
Dynamic data with cache:
- When to use: Use to improve app performances when the dynamic data is used. Use it only in a case you don't need the really latest data to be always displayed.
- Production ready: SimpleCache: 🔴 FileSystemCache: 🟢 and RedisCache: 🟢
- Performances: SimpleCache: 🟢 FileSystemCache: 🟠 and RedisCache: 🟢
- Limitations: Loaded/displayed data is old for the timeout (user specified) number of seconds, which means that real data (from its source) and the displayed one can differ.
-
Use cases: When loading your data is slow operation or you don't have to get the really latest results from the data source: For example:
- The forecast app (because you don't need the latest results),
- The data that presents searching engine results.
- Use it to reduce the number of external API calls (especially if you pay per API call).
Parametrised dynamic data:
- When to use: Use when the entire data source, or its version, or its chunk that will be loaded into the app should depends on the user's UI input.
- Production ready: Same as Dynamic data
- Performances: Same as Dynamic data
- Limitations: Same as Dynamic data + filter/parameter options are not natively updated according to the newly loaded data.
-
Use cases: For example:
- Selecting the source from where the data is going to be retrieved (e.g. selecting between: linkedin_result, twitter_results, instagram_results options).
- Displaying a certain version of the model interaction results.
- Displaying a only a chunk of the big data (where the concrete chunk depends on the user's input).
P.S. Parametrised data loading is compatible with caching. The cache uses memoization, so that the dynamic data function's arguments are included in the cache key. Pros/Cons between using the Parametrised dynamic data with or without the cache are the same as pros/cons between the Dynamic data with and without the cache which is presented above.
See https://github.com/mckinsey/vizro/pull/195.
We follow GitHub flow. In short:
-
main
is the only long-lived branch and is always in a releasable state -
main
is a protected branch and cannot be committed to directly - contributions to
main
must go through a PR process - contributions must be up to date with
main
before merging - PRs are merged using Squash and merge
To keep your PR in sync with main
you can use rebase or merge. There are pros and cons to each method. Rebasing rewrites commit history, which can make it cleaner but complicate thingss if there are multiple contributors to a branch. Ask yourself "do I understand what update by rebase does and think it's a good idea to use it here?". If yes, then do it. If no, then update by merge instead. So if in doubt, just use merge (the default).
You should try to avoid long-lived PRs. Tips to achieve this:
- keep code changes small. As a very general rule, it's better to have two small PRs rather than one big one. Consider basing one feature off another to break your work down into more manageable chunks
- make reviewers' lives easy, e.g. with a clear PR description, clean commit history (e.g. use rebase if you understand it), instructions on how to review
- reviewers should try to review quickly (e.g. within a day). PR authors should remind reviewers if required
- several long conversations on PRs and multiple rounds of reviewing can be slow and hard to follow. Consider just talking directly to the PR reviewers
- for complex changes, raise a draft PR early for visibility of your work and to get initial comments and feedback. Talk to PR reviewers and other developers before and while you do the work rather than just waiting for a single "big bang" review when it's complete
- consider merging a feature that's work in progress (e.g. code without tests) so long as you keep it undocumented and ideally private (use
_
). This allows an incomplete feature to be present in the codebase without being visible to users. Only do this sparingly or things get confusing though
Sometimes it's impossible to avoid long-lived PRs, e.g. for some big new features, large refactoring work, etc. This is ok. It just shouldn't be the norm.
Ideally, all the following happen on the same merge to main
(as above, this doesn't prevent you opening multiple PRs that point to a feature branch):
- source code
- tests
- changelog
- docs
Sometimes it might not be feasible to achieve all of these in one merge to main
. How then do we keep main
always release-ready? The key is that a feature is publicly available only when it is visible in documentation or changelog. This is ultimately what defines our functionality, rather than the existence of source code or tests in our codebase. This means that it's ok to merge code to main
that you are not yet happy for the general public to use, so long as it is not publicly documented and does not break existing functionality. If such code is released then this is fine because the feature isn't yet visible to users. The important thing is to not make documentation/changelog public until you are comfortable that the feature can be used.