Skip to content

Webstream relay crashes with decodebin typefind failure on transient upstream stalls #217

@chrobione

Description

@chrobione

What happened

After deploying v1.39.17 (2026-05-10 ~07:19 UTC), the webstream relay for the Grim Leftover's mount (f58c7e4a-151a-4d69-8587-32a1c73f1210, station 0e4edda8-fb53-44b5-83db-5792c512789d) crashed once with:

WRN webstream pipeline crashed, attempting reconnection mount=f58c7e4a-… webstream="Grim Leftover's"
WRN gstreamer pipeline exited with error error="exit status 1" mount=f58c7e4a-…
    stderr="ERROR: from element /GstPipeline:pipeline0/GstDecodeBin:decodebin0/GstTypeFindElement:typefind:
            Could not determine type of stream."

The auto-reconnect succeeded on attempt 1 (≈8 s gap), so listener impact was a brief audio dropout, not an outage. Frequency over the last 2 h: 1 occurrence.

Pipeline involved

souphttpsrc location="https://rlmradio.xyz/live/grim-leftovers" is-live=true do-timestamp=true iradio-mode=true
  ! queue max-size-time=5000000000
  ! watchdog timeout=15000
  ! decodebin
  ! audioconvert ! audioresample ! …

Note that the upstream URL is itself a Grimnir-served mount on the same host, so this is Grimnir relaying its own output. A momentary stall on the source mount makes souphttpsrc deliver too few bytes (or the wrong content-type/HTML) for typefind to classify, decodebin gives up, the whole pipeline exits, and the relay has to tear down and reconnect.

Suggested mitigation

Two complementary options, in order of cost:

  1. Treat early typefind failures as a soft retry in the webstream relay's reconnect logic — they are recoverable and normal for live HTTP sources, and shouldn't surface as WRN webstream pipeline crashed. Today the reconnect already kicks in, but every typefind miss costs an audible gap and a noisy WARN.
  2. Insert souphttpsrc retries=N timeout=… and/or a multiqueue/hlsdemux ahead of decodebin so brief upstream stalls are absorbed before typefind sees them. Worth comparing with the watchdog timeout=15000 already in place.

Severity

P3. Self-recovering, low frequency, but worth fixing because:

  • Each event = a real listener-audible gap.
  • The same pattern likely affects other relayed-internal mounts on stall.
  • Now that the per-track signal-interrupt log noise is gone (v1.39.17), this WARN is one of the few real signals left in the playout log — addressing it keeps the signal:noise ratio high.

Repro / further data

  • Container start: 2026-05-10 07:18 UTC
  • Crash log: 07:19:13 UTC
  • Reconnect succeeded: 07:19:21 UTC
  • Source mount: https://rlmradio.xyz/live/grim-leftovers
  • Webstream record: e4fd2190-f62e-444c-9fc4-f0654e03c699

While reading the log around this incident I also noticed a separate SQLSTATE 22P02 from internal/webstream/icy_metadata.go:141 on the same mount — same class as the v1.39.16 fix (empty media_id on a webstream PlayHistory Save). Worth filing as its own issue if not already known; happy to do so.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions