Skip to content

Visualization: Simplify 'author' app, make it linkable and linked from 'contributors' app#107

Merged
jnareb merged 29 commits intomainfrom
visualization
Mar 13, 2025
Merged

Visualization: Simplify 'author' app, make it linkable and linked from 'contributors' app#107
jnareb merged 29 commits intomainfrom
visualization

Conversation

@jnareb
Copy link
Copy Markdown
Member

@jnareb jnareb commented Mar 13, 2025

This series of commits makes both 'contributors' and 'author' parameters to be present as URL. This way they are linkable; there is a link from individual author's cards in 'contributors' app leading to 'author' app with appropriate repo and author selected.

Simplify both 'contributors' and 'author' app by removing exploratory (and debugging) panes at the bottom of each page, moving it to corresponding exploration page (either alone - only the exploratory panes - like in the case of 'contributions', or providing old look - like in the case of 'author').

Improve Dockerfile, to create a multi-page app, instead of starting with app selector.

jnareb added 29 commits March 12, 2025 16:56
This should make the interface a bit more understandable for someone
who is not well versed in how PatchScope and this web app works.

While at it, rename the controls for selecting JSON file with timeline
data, and for selecting specific subset of this JSON file.
The Sankey flow diagram is better suited for showing contributions
of an individual author, rather than showing distribution of paths
and line types in repository commits.
This function is used to turn full path to input JSON file to the key
that is used to select said file.  Currently this is the part of
basename up to the first '.' - a convention introduced in an earlier
commit 377b8e1.

With this it would be easy to use this key as query parameter.
Because we want to use the key and not the value of the Select widget,
we cannot use pn.state.location.sync().  Instead, we have to synchronize
query parameter and the Select widget manually:

- setting the value from the query param and setting query param
  from the value on page load (so that it is always present) follows
  https://discourse.holoviz.org/t/manually-setting-url-state-with-update-query/3445/2

- setting the query parameter on change is done via param watcher,
  see https://panel.holoviz.org/how_to/links/watchers.html

TODO: do the same for 'author' app, and maybe also to explore apps
Save the values of resample frequency ('freq') and date range ('from')
as query parameters, and recover it from them - synchronize them.
https://panel.holoviz.org/how_to/state/url.html#sync-and-unsync

The idea was copied from 02-contributors_graph.py notebook.
Making it work in similar way to the query parameter with the same name
that GitHub now uses for it's per-repo Contributors graph:
https://github.blog/changelog/2025-02-25-repositories-updated-insight-views-general-availability/
The function was using %a symbol for abbreviated weekday name,
instead of the correct %b symbol for abbreviated month name.
If DEBUG is False, and debbugging and exploratory panels are not shown,
there is no need to add extensions useful only for this type of panels.
Not in all cases the name of the author (contributor) is given a name,
sometimes what is given is pseudonym; it doesn't look nice if we have
mixture of proper names and pseudonyms.

Unfortunately in query params are passed URI-encoded, for example
'tycho@tycho.ws' is present in URL as 'tycho%40tycho.ws'.
Make list of authors (author emails) in the Select widget for "author"
sorted by the number of contributions (number of commits), descending.
This change means that default author would be one with the most
commits authored.

TODO: Use "<Name> <email>" as user-visible selector, or maybe even
"<Name> <email>: <number of commits>".
By itself it does not bring much, but coupled with putting 'repo'
into query params it will allow to link to per-author information
(the 'author' app) from per-repo information (the 'contributors' app).
Note that fixing this issue exposed another problem with the NaN:

  Error running application handler <panel.io.handlers.ScriptHandler object at 0x000001D24E97A6B0>:
    ufunc 'isfinite' not supported for the input types, and the inputs
    could not be safely coerced to any supported types according to the casting rule ''safe''
  File '.venv/[...]/numpy/ma/core.py', line 2415, in masked_invalid:
  res = masked_where(~(np.isfinite(a)), a, copy=copy)
Because we want to use the key and not the value of the Select widget,
we cannot use pn.state.location.sync().  Instead, we have to synchronize
query parameter and the Select widget manually, like it was done for the
'contributors' app in d78a6e9.
Split CMD into multiple lines for better readability, add EXPOSE
instruction as a kind of documentation for person running the container:
https://docs.docker.com/reference/dockerfile/#expose

The container builds and runs without problems.
Configure `panel serve` to to reuse sessions when serving the initial
request (there is no per-user state), and to add a global loading
spinner to the applications.
This follows the setup for multi-page apps from the documentation:
https://panel.holoviz.org/how_to/streamlit_migration/multipage_apps.html

The consequence is that with this configuration both of the following
URLs points to the same page:
  http://localhost:7860/
  http://localhost:7860/contributors

Solving this would probably involve using other server than Bokeh, for
example using FastAPI, or something like that.

We need to run both 'contributors' and 'author' apps to be able to
provide working hyperlinks from 'contributors' to 'author' pages
for individual authors.
This link makes it possible to examine specific author in the authors
grid part of the 'contributors' app, by going to 'author' page for
specific repo and author.

Note that currently there is no link in the reverse direction.
In almost all cases path_to_name() was used on strings, and had to
convert string to pathlib.Path itself.  Make path_to_name() more robust
by accepting 'str' typed parameter.

Make use of this new feature in the code, simplifying callers.
In the case where there is 'repo' query parameter set, which value
is the key that defines input JSON file, the widget that denotes part
of data from that file that we want to analyze also needs to be updated.

Without this change, when reloading page where 'repo' parameter is set,
the second widget in the sidebar gets empty value.
This commit is to have all those different plots available, even
when the 'author' app will get simplified.
This turns on exploratory panes: JSON view, tabulators, perspectives,
and terminal with debug (log) output.  Those were turned off by default
in the 'author' app, and remain turned off.
Leave for now only the following plots:
- number of commits over time per sample period, full width
- -/+ plot of aggregate of -/+ changed lines per sample period
- bi-histogram of -/+ change lines per commit
- heatmap of -/+ line types over time, full width
- local day of the week vs local hour heatmap with marginal histograms,
  to show periodic behavior, full width

TODO: Sankey flow diagram for contributions by given author.

NOTE: some variables might be present even if they are not used.
Due to the use of reactive expressions (*.rx()), they should not
introduce any performance penalty.
@jnareb jnareb added the enhancement New feature or request label Mar 13, 2025
@jnareb jnareb self-assigned this Mar 13, 2025
@dagshub
Copy link
Copy Markdown

dagshub Bot commented Mar 13, 2025

@github-actions
Copy link
Copy Markdown

Coverage

Coverage Report
FileStmtsMissCoverMissing
src/diffannotator
   __init__.py00100% 
   annotate.py82313683%52, 75–76, 85, 92, 101–104, 106–108, 110, 368–369, 373–374, 376, 413, 415–417, 419, 421–423, 425–426, 428–430, 528, 531, 534–535, 577, 727, 756, 799–801, 805, 836, 914, 1149, 1163–1166, 1168, 1172–1175, 1177, 1354, 1357, 1526, 1528, 1565, 1582–1583, 1669, 1672, 1784–1785, 1787, 1804, 1846–1847, 1859–1860, 1863, 1878, 1893–1894, 1925, 1988, 1999, 2025, 2035, 2037, 2046–2050, 2054–2058, 2060, 2086, 2173, 2187, 2189–2190, 2192–2194, 2196, 2198–2200, 2202–2203, 2205, 2207, 2221, 2224, 2239–2241, 2247, 2260, 2269, 2277, 2280, 2350, 2356, 2358–2359, 2372, 2381–2386, 2500–2501, 2504–2505, 2509, 2554
   config.py50786%52–56, 58, 93
   gather_data.py3445384%61, 87, 91, 93, 101, 106, 108, 121, 129, 161, 228–229, 232–233, 236, 239, 248, 259–265, 288–289, 309–310, 325, 343, 365–366, 531, 535, 568, 696, 709, 711, 714, 718, 720, 758, 784–785, 838, 890, 1076, 1108–1112, 1116
   generate_patches.py38392%102, 104, 126
   languages.py1011684%170, 173, 176, 179, 182, 186, 217–218, 230, 249, 261, 263–264, 267, 273, 311
   lexer.py29293%87–88
src/diffannotator/utils
   __init__.py00100% 
   git.py4878582%107, 233, 316, 318, 321–322, 324, 326–329, 331–333, 335–337, 342–343, 347, 349–350, 354–356, 358–359, 361–362, 364–365, 367, 475–476, 479, 516, 526, 529–531, 537, 568, 572, 579, 603, 616, 671, 753, 800, 807, 837, 841, 845, 885, 887, 898, 903, 908, 999–1000, 1047–1049, 1052–1053, 1087, 1091, 1157, 1162, 1164, 1167–1168, 1170, 1229–1230, 1344, 1346, 1369–1370, 1382, 1384, 1396–1397, 1431, 1445
TOTAL187230283% 

Tests Skipped Failures Errors Time
81 8 💤 0 ❌ 0 🔥 9.266s ⏱️

@jnareb jnareb merged commit dba25aa into main Mar 13, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant