Skip to content

Developer docs

Antony Milne edited this page Oct 10, 2024 · 22 revisions

These are things which are useful to refer back to. At some point in the future they might make their way into a proper docs page on RTD. These notes can be rough and might not always be up to date. If it's a quick answer then put it inline here; if it's a longer read then just link to it.


Technical

How Vizro resources are served

Source: https://github.com/mckinsey/vizro/pull/775

When serve_locally=True (the default), Dash serves component library resources (generally CSS and JS) through the Flask server using the _dash-component-suites route.

  • For Vizro library components (currently just KPI cards), this should happen when Vizro is imported.
  • For Vizro framework components (everything else), this should happen only when Vizro() is instantiated.

This makes our footprint as small as possible and ensures there's reduced risk of CSS name clash when someone wants to use Vizro just as a library but doesn't instantiate Vizro() (not common at all now, but maybe will be in the future).

When serve_locally=False, Dash switches to serving resources through external_url, where it's specified. For Vizro components we use jsDeliver as a CDN for this.

A few complications:

  • files that aren't CSS/JS (e.g. fonts, maps) can still be served through the same routes but should not have a <script> or <link> in the HTML source. This is achieved with dynamic=True
  • the CDN minifies CSS/JS files automatically, but some we have minified manually ourselves (currently just vizro-bootstrap.min.css)
  • it's not possible to serve JS as modules this way, which means we can't easily do import/export in them

In future:

  • when we release vizro-bootstrap, nothing changes for Vizro framework. Pure Dash users would treat it like any other bootstrap theme i.e. set it through external_stylesheets that points to the stylesheet on our CDN or download the file to their own assets folder. We'd have a shortcut for this like vizro.theme or do it through bootswatch if that's possible.
  • ideally we would webpack our JS and ship the .min.js rather than just relying on the CDN to minify it. This would let us write "proper" rather than just in-browser JS and mean we benefit from tree-shaking etc. rather than just minification that the CDN does. In reality the optimisations would make very little difference to performance, but it's kind of the "right" way to do things. It's more effort than it's worth to set up at the moment, but if we end up maintaining a bigger JS codebase we might do it
  • vizro-boostrap.css map file + all the SASS should live in the repo so that it can be handled correctly through developer tools in a browser

The order Dash inserts stylesheets is always:

  1. external_stylesheets
  2. library stylesheets as we add on import vizro
  3. stylesheets added through append_css as we do in Vizro()
  4. user assets (also go through append_css but only when server gets its first request)

The problem was that figures.css was served in stage 2 and therefore could come before vizro-bootstrap.min.css. I hoped this wouldn't cause any issues but unfortunately it did...

So now what we do is remove the library stylesheets in Vizro() and then add them using the framework's append_css mechanism. This means that vizro-boostrap.min.css always comes first for a framework user because we sort the stylesheets added in stage 3 to put it first (the rest are in alphabetical order). For a Dash user, it will be specified using external_stylesheets so always come first anyway.

Templates for plotly figures

See https://github.com/mckinsey/vizro/pull/615. Then altered slightly by:

Page loading

See https://github.com/mckinsey/vizro/pull/598#pullrequestreview-2200302196.

How to run a Vizro app

Source: https://github.com/mckinsey/vizro/pull/580

Development

Vizro().build(dashboard).run() and then python app.py, which is what we do across our docs. This only works while you're developing but I like recommending it as the first port of call for users because it's simple, quick and easy, like Vizro should be. There's no need to define app.

Deployment

app = Vizro().build(dashboard)

# If you also want it to run during development with python app.py you also need this:
if __name__ == "__main__":
    app.run()

and then e.g. gunicorn app:app. The key change of this PR is that in this context there's no longer any need to define server = app.dash.server (although that will still work).

The integration tests in this repo do something a bit different but that's just due to some technicalities of how they run and so don't show a generally recommended pattern.

Link to a page, use an asset

Source: https://github.com/mckinsey/vizro/pull/151

Here's the rules for how we should write code so that paths are always correctly formed:

  • always use dash.get_relative_path to link to pages with href (see _make_page_404_layout example link)
  • always use dash.get_relative_path(f"/{STATIC_URL_PREFIX}/..") to refer to built-in assets in the static folder (see _make_page_404_layout example html.Img)
  • always use dash.get_asset_url to refer to things in the user assets folder, e.g. the logo is done this way

html.Div(hidden=True) vs. None

Source: https://github.com/mckinsey/vizro/pull/188

  • prefer to use None over html.Div(hidden=True) in the case that we don't need to refer to the element in any way (basically whenever you don't set an id). e.g. html.P(self.title) if self.title else None
  • prefer to use html.Div(hidden=True) over None in the case that we do need to refer to the element (basically when you do set an id). e.g. html.Div(hidden=True, id="nav_panel_outer"). Generally these can be identified by the fact that build return values have a type like _NavBuildType
  • prefer to use "" as default value for optional fields which are str. These fields do not need to accept None values at all

CapturedCallable attributes

Source: https://github.com/mckinsey/vizro/pull/367#issuecomment-1994052080

when it comes to using CapturedCallable we should always prefer to use the highest-level interface possible to avoid delving into private/protected things. There's basically three categories of attributes here:

  • dunder attributes like __call__ and __getitem__: these are the main point of entry to any callers and should be used wherever possible
  • protected attributes like _function and _arguments: ok to use if needed but will be removed or made into proper public things in due course, so put some thought into exactly what you're trying to do and whether you really need to use them or if you can already achieve it just with dunder attributes
  • private attributes like __arguments: you should never need to use these

Static vs Dynamic Data

Source: https://vizro.readthedocs.io/en/stable/pages/user-guides/data/

This sum-up can help you to quickly decide what Vizro data type and configuration to use in your future examples.

Static data:

  • When to use: Use for data that does not need to be reloaded while the dashboard is running.
  • Production ready: 🟢
  • Performances: 🟢
  • Limitations: Data can only be reloaded/refreshed by restarting the dashboard app.
  • Use cases: Any time when your data does not need to be reloaded while the dashboard is running.

Dynamic data:

  • When to use: Use for data that does need to be reloaded while the dashboard is running.
  • Production ready: 🟠 The reason is performance that might degrade you app.
  • Performances: 🟠 Your dashboard performance may suffer if: (Use the cache to solve the problem.)
    • loading your data is a slow operation or
    • you have many figures that use dynamic data or
    • many users use the app at the same time.
  • Limitations: Performances
  • Use cases: When loading your data is fast operation and you strictly have to get the really latest results from the data source: For example:
    • Displaying the results from the just finished workflow triggered by the user. (e.g. some model interaction flow).
    • Repetitive reading logs from the file.
    • Chat apps.

Dynamic data with cache:

  • When to use: Use to improve app performances when the dynamic data is used. Use it only in a case you don't need the really latest data to be always displayed.
  • Production ready: SimpleCache: 🔴 FileSystemCache: 🟢 and RedisCache: 🟢
  • Performances: SimpleCache: 🟢 FileSystemCache: 🟠 and RedisCache: 🟢
  • Limitations: Loaded/displayed data is old for the timeout (user specified) number of seconds, which means that real data (from its source) and the displayed one can differ.
  • Use cases: When loading your data is slow operation or you don't have to get the really latest results from the data source: For example:
    • The forecast app (because you don't need the latest results),
    • The data that presents searching engine results.
    • Use it to reduce the number of external API calls (especially if you pay per API call).

Parametrised dynamic data:

  • When to use: Use when the entire data source, or its version, or its chunk that will be loaded into the app should depends on the user's UI input.
  • Production ready: Same as Dynamic data
  • Performances: Same as Dynamic data
  • Limitations: Same as Dynamic data + filter/parameter options are not natively updated according to the newly loaded data.
  • Use cases: For example:
    • Selecting the source from where the data is going to be retrieved (e.g. selecting between: linkedin_result, twitter_results, instagram_results options).
    • Displaying a certain version of the model interaction results.
    • Displaying a only a chunk of the big data (where the concrete chunk depends on the user's input).

P.S. Parametrised data loading is compatible with caching. The cache uses memoization, so that the dynamic data function's arguments are included in the cache key. Pros/Cons between using the Parametrised dynamic data with or without the cache are the same as pros/cons between the Dynamic data with and without the cache which is presented above.

assert_component_equal

See https://github.com/mckinsey/vizro/pull/195.


Process

Contribution flow

We follow GitHub flow. In short:

  • main is the only long-lived branch and is always in a releasable state
  • main is a protected branch and cannot be committed to directly
  • contributions to main must go through a PR process
  • contributions must be up to date with main before merging
  • PRs are merged using Squash and merge

To keep your PR in sync with main you can use rebase or merge. There are pros and cons to each method. Rebasing rewrites commit history, which can make it cleaner but complicate thingss if there are multiple contributors to a branch. Ask yourself "do I understand what update by rebase does and think it's a good idea to use it here?". If yes, then do it. If no, then update by merge instead. So if in doubt, just use merge (the default).

You should try to avoid long-lived PRs. Tips to achieve this:

  • keep code changes small. As a very general rule, it's better to have two small PRs rather than one big one. Consider basing one feature off another to break your work down into more manageable chunks
  • make reviewers' lives easy, e.g. with a clear PR description, clean commit history (e.g. use rebase if you understand it), instructions on how to review
  • reviewers should try to review quickly (e.g. within a day). PR authors should remind reviewers if required
  • several long conversations on PRs and multiple rounds of reviewing can be slow and hard to follow. Consider just talking directly to the PR reviewers
  • for complex changes, raise a draft PR early for visibility of your work and to get initial comments and feedback. Talk to PR reviewers and other developers before and while you do the work rather than just waiting for a single "big bang" review when it's complete
  • consider merging a feature that's work in progress (e.g. code without tests) so long as you keep it undocumented and ideally private (use _). This allows an incomplete feature to be present in the codebase without being visible to users. Only do this sparingly or things get confusing though

Sometimes it's impossible to avoid long-lived PRs, e.g. for some big new features, large refactoring work, etc. This is ok. It just shouldn't be the norm.

Ideally, all the following happen on the same merge to main (as above, this doesn't prevent you opening multiple PRs that point to a feature branch):

  • source code
  • tests
  • changelog
  • docs

Sometimes it might not be feasible to achieve all of these in one merge to main. How then do we keep main always release-ready? The key is that a feature is publicly available only when it is visible in documentation or changelog. This is ultimately what defines our functionality, rather than the existence of source code or tests in our codebase. This means that it's ok to merge code to main that you are not yet happy for the general public to use, so long as it is not publicly documented and does not break existing functionality. If such code is released then this is fine because the feature isn't yet visible to users. The important thing is to not make documentation/changelog public until you are comfortable that the feature can be used.