- bring up to date with latest rust / libraries and make repeatable
- rust
1.75
to1.82
- update dependencies:
- for each of these, running following to confirm still working:
cargo clean cargo build cargo test
- do
cargo update
on libraries - change dependencies to be specified as minor only
- needs to be done in all Cargo.toml files i.e. also those in sub-packages
- update all libraries to latest compatible minor version
- install https://github.com/killercup/cargo-edit to get
cargo upgrade
- run
cargo upgrade --compatible
- install https://github.com/killercup/cargo-edit to get
- for each of these, running following to confirm still working:
- create repeatable DEVELOPMENT.md / Justfile
- for at least schedule import, indexing, visualisation generation, and running api (locally)
- doesn't need to cover slide or video indexing
- (re)publish to fly.io staging
- using "fosdem-fly-staging" openai key
- (re)publish to fly.io main
- using "fosdem-fly-prod" openai key
- rust
- start supporting fosdem 2025 schedule
- upgrade to latest Bulma
- move from bulma 0.9.4 -> 1.0.2
- add content-integrity attributes to bulma and fontawesome
- basic "bookmarks" system that works between tabs, and across my laptop and phone
- I'd prefer to avoid something that relies on a fast network connection (wifi can be iffy on the day) or an accounts system (can't be arsed owning/securing/paying for that)
- So, I'm gonna try to avoid any backend if possible
- I also want to play with some localfirst stuff :-)
- works between tabs
- add hooks to markup that allows a bookmark with a local viewmodel to be enabled/disabled by JS
- represent the core page model for bookmarks as
data-
attributes on event card- set
data-event-id
and initialdata-bookmark-status
from backend (not bookmarked) - store state of bookmark in parent element with
data-bookmark-status
- style bookmark based on status parent element with
data-bookmark-status
- for each bookmark, find containing event card then:
- toggle
data-bookmark-status
based on bookmark click
- toggle
- set
- use tinybase to support sharing between tabs and persistence of
data-bookmark-status
- create store
- set
data-bookmark-status
based on tinybase store (persistence across reloads) - use
MutationObserver
ondata-bookmark-status
to update tinybase store based on changes - set
data-bookmark-status
based on changes in tinybase store - sync between browser tabs
- add a '/bookmarks' endpoint which can show all items currently bookmarked
- refactor router into separate module per route
- add a new route which surfaces all events
- hide/display based on whether it is bookmarked
- link in to top nav, but only enable if bookmarks working
- works between laptop/phone/ipad
- I'm not going to go for live-sync of all bookmarks and all CRDT state, including deletions. This is for a few reasons:
- still requires some sort of backend (e.g. for websockets or webrtc or similar). This makes it more complicated and also means I still require a remote connection at some point.
- it also means I need to have separate "users" if I want to avoid accidentally mixing my bookmarks with anyone else who happens to use the site
- Instead, I'll use a local transfer system that relies only 'exporting' and 'importing' bookmarks via copy/paste. This will be a merge i.e. this will only export bookmarks that are set and import same. So, it's not syncing two devices to be the same.
- export all set bookmarks as a text string (via copy)
- import all bookmarks from a text string (via paste)
- I'm not going to go for live-sync of all bookmarks and all CRDT state, including deletions. This is for a few reasons:
- make each event have its own detail page with associated related events
- extract simple detail page from search pages
- make nav links in search go to detail pages
- make clicks from "connections" go to detail page
- Weekend tweaks
- Make entries on Bookmarks page sorted by start time of event
- Add Bookmarks filter to "Next" i.e. can show only what is bookmarked
- Add a breakdown of events by room (a "Room" page)
- Link to room page from events
- Add bookmarks filter to room page
- Bring schedule up to date and reindex
- design/other tweaks (just a holding ground as I see things)
- fix sojourner links (seems to be using
guid
instead ofid
now identifier in URL) - add ability to find related events to
InMemoryOpenAIQueryable
- switch bookmarks css to use nested css
- remove / refactor
current_event
as seems to be hanging around where not needed - use RoomId instead of String in Event
- make all
details
elements by default closed, and open via JS if on larger screen
- fix sojourner links (seems to be using
- minimal thing which get some semantic content and allows finding similar content
- get FOSDEM content (pentabarf)
- look up and store vectors based on title and abstract of event
- find similar events based on vector distance
- see
snippets.sql
- see
- minimal thing which allows querying existing content by an open query
- connect to remote supabase DB
- run a query from a local cli to a remote DB
- call openai for a string and find related events
- cleanup
- switch from
dotenv
todotenvy
(dotenv
no longer maintained)
- switch from
- allow urls for events to be opened
- minimal website that allows searches and showing of links
- create empty shuttle service
- extract querying into shared library
- expose as shuttle service which does query and returns json
- get working locally
- add size protections on input
- publish remotely
- expose as minimal website with a form which accepts open query and formats results
- add fly.io as an option
- simple "hello world" axum project working locally
- (-) building locally in docker (podman)
- does work, but is very very slow on my M1 (hours)
- building and running remotely on fly.io
- extract core of webapp separate from shuttle.rs usage (e.g. just Router)
- use core in a fly.io shell, but with different secrets to distinguish usage
- leave deployed side-by-side in both fly.io and shuttle.rs, for a day or so, before declaring fly better
- remove shuttle support (switch to fly.io)
- switch plausible.io setup to use fly.dev domain name
- remove shuttle code and config files
- upgrade to latest libraries for axum etc (shuttle required older versions)
- show related items
- show 5 related items per search item
- speed-up so that finding all related items is faster (less than a second for 20 items)
- (-) make error-handling more clear in
Queryable
- visualise all related items via D3
- use times and durations
- import and show next to events in display
- use the time of day to color items in D3 vis
- polish / UI
- add design system (bulma?)
- add icons
- general release
- switch to bespoke domain name, https://fosdem.houseofmoran.io
- get cert
- switch plausible.io to domain name
- switch "home" to always go to https://fosdem.houseofmoran.io
- add some example queries that you won't find on main site
- add "connections" to main nav, and use "connections" consistently
- log what searches people are doing
- simple "now and next"
- show a current talk (happening in current hour)
- 'now' is clamped to be either earliest or latest hour of the weekend
- show all those starting some time in the following hour
- show a current talk (happening in current hour)
- more searchable/usable content
- standardise event display
- add rooms
- add track
- re-index in openai (fetch new embeddings based on new info)
- re-fetch connection distances
- remove (external) DB dependency
- convert
Queryable
into a trait - re-implement Queryable using a "DB" which can just take the CSV files as input, and which uses nalgebra for vector distance
- update Docker setup and test by deploying to staging
- remove DB impl
- regenerate related items
- convert
- simple bookmarks / improve discovery
- add link to open item in sojourner
- make "related" link to now-and-next instead of fosdem site (allows it to then more easily bookmarked in sojourner)
- use slide content
- process latest version of schedule
- update schedule to include slide links
- setup tika on fly.io for usage in slide content extraction
- iterate over all slides, fetch content, and save to a local dir
- when generating embeddings, use slide text content and index that as well
- update related
- refactor / cleanup
- switch to writing/reading events as json files via serde
- update Dockerfile
- represent as directly and completely as possible e.g.
- record list of slide urls rather than single slide url
- add presenter names
- this was previously hard, as it was a list, with occasionally embedded quotes, and so hard to represent in CSV
- import persons as presenters
- show presenter names
- use presenter name in the embedding input
- switch to writing/reading embeddings as json files via serde
- warnings / clippy pass
- update README.md to capture current impl
- switch to writing/reading events as json files via serde
- video search
- update schedule to include video links
- write driver cli that:
- downloads mp4 to a
video
dir - uses ffmpeg to extract the audio from the video and convert it to wav, saved in
audio
dir - runs whisper across it, to get a WebVTT file
- downloads mp4 to a
- take all WebVTT file and extract text from them; add this to the content to what we use for embedding
- add an endpoint for showing content of videos with associated WebVTT captions
- investigate higher latency in asia regions
- context:
- as of 9th Mar, I have 5 machine instances in fly.io, spread across 5 regions: LHR, LAX, NRT, SYD and SIN
- however, looking in https://updown.io/vrp1, which is the URL https://fosdem.houseofmoran.io/search?q=Ceph&limit=20, the latency for Asian regions seems to be 1.1s or more, whereas other regions are 723ms or less; see investigations/latency_Mar_2024/fosdem-search-20240309.png
- change update frequency of updown.io check to once every 15s (from once a minute) to get more data
- setup opentelemetry to send to honeycomb.io from fly.io
- ensure it automatically runs in different regions
- honeycomb may require different endpoints (US vs EU) to be contacted when in different fly.io regions
- seems to work fine when run in
fra
so will just continue to use the US instance
- register local/staging/prod as environment attribute
- add region as an attribute
- ensure we log to console and to opentelemetry
- ensure a failure to initialise opentelemetry doesn't kill the app on startup, and it just falls back to default
- ensure it automatically runs in different regions
- deploy to prod and monitor for a few days
- try some (safe) experiments:
- switch all machines to be in US (lax), on assumption it is the hop to OpenAI which is the slow part
- I tried this and it made latencies worse; see investigations/latency_Mar_2024/fosdem-search-20240316.png
- reverted to having a single machine in each of sin,syd,nrt,lhr,lax
- it's not exactly the same as before, but now closer: investigations/latency_Mar_2024/fosdem-search-20240318.png
- reverted to having a single machine in each of sin,syd,nrt,lhr,lax
- I tried this and it made latencies worse; see investigations/latency_Mar_2024/fosdem-search-20240316.png
- apply some speedups on top of OpenAI call:
- from traces, it looks like dispatching
find_related_events
async on separate threads doesn't have much benefit as traces still look like a waterfall. So, switch to just doing in serial on single thread to save dispatch/sync overhead- did not see any major benefit in this, but it's simpler, so keeping it.
- note that I am not convinced I was definitely dispatching in parallel properly at all before, so may revisit again in the future
- from traces, it looks like dispatching
- switch all machines to be in US (lax), on assumption it is the hop to OpenAI which is the slow part
- note: I dunno why, but overall latencies seem to be < 1s now, see: investigations/latency_Mar_2024/fosdem-search-20240319.png
- revert updown.io check to once a minute (to save on credits)
- context:
- stable / usable clustering
- pre-cluster on Rust side
- don't re-start sim each time
- fix non-disappearing lines