Skip to content

Signal Representation and dB Reporting on MDT-SIC #2

@Abraxas3d

Description

@Abraxas3d

MDT-SIC Signal Representation and dB Reporting

Date: 9 May 2026
Session: Post-reorg bring-up verification (mdt_sic + haifuraiya partition split)
Hardware: iCE40UP5K-B-EVN + NUCLEO-H753ZI

The functional bring-up succeeded: SPI link between STM32 and FPGA is healthy, the
4-point FFT in the polyphase channelizer is producing mathematically correct output,
and the readout chain through UART/ST-Link/PuTTY is delivering frames cleanly. While
inspecting the data, two findings emerged that should be handled before the next phase
of development. The first is a localized firmware bug. The second is a deeper
architectural question that the MDT-SIC mission needs a decision on.


Finding 1: dB readout is not really working

The PuTTY output reports per-channel I, Q, magnitude, and a dB value. The dB column
shows the same value (-50.0 dB) across magnitudes that span more than an order of
magnitude:

Channel Magnitude (LSBs) Reported dB
CH0, frame 1 67 -50.0
CH0, frame 2 232 -50.0
CH0, frame 3 117 -50.0
CH1, frame 1 51 -50.0
CH3, frame 2 172 -50.0

I only have an MSEE, so I might be wrong, but when you have a 16-bit signed full scale
number, then that is equal to 32767, and these magnitudes correspond to expected
levels of roughly -54, -43, and -49 dBFS. The dB readout is collapsing an 11 dB range
into a single bin, when it definitely should not be doing that.
Zero-magnitude readings correctly report -99.9 dB, so the formatter
is not entirely broken because it can tell "it's there" and "it's not" but it is not
distinguishing among non-zero values in this regime.

Ideas worth investigating (all in the STM32 firmware, specifically the
magnitude-to-dB conversion code):

  1. Floor saturation. A max(computed_dB, -50.0) clamp somewhere, intended for
    display sanity, that's eating real information.
  2. Wrong reference. If 0 dB is referenced to something other than 16-bit full
    scale (maybe 8-bit got in there, or a higher-bit fixed-point value is coming along),
    the displayed dB values could be saturating against the formatter's print width or
    against an unintended floor.
  3. Coarse log lookup. A small fixed-point lookup table that quantizes log₁₀ to a
    handful of bins, with this magnitude range happening to all land in the -50 dB bin.
  4. Bit-stripping. If the conversion operates on only the top byte of a 16-bit
    magnitude, magnitudes below 256 LSBs would all map to the same bin (or to the
    noise-floor case). Happened over on pluto_msk and C++ modem.

Investigation: Print the raw magnitude and the computed dB.
Maybe actually look harder at the conversion code, single-step
a few values, and identify which of the four patterns matches.
If not, then try to derive the pattern from the spew.

This is not a blocker for the current bring-up but it is a blocker for any meaningful
work where SNR is important. SIC validation in particular will require measuring
"residual" interferer power after cancellation in dB, so this needs to
be metrically correct before weak-signal work begins for realsies.


Finding 2: Conjugate symmetry in the output indicates a real-valued input. I didn't actually finish the complex value work.

The 4-channel output shows a structural pattern that's worth being explicit about,
because it has implications for the architecture going forward:

Frame CH1 (I, Q) CH3 (I, Q)
1 (-34, -34) (-34, 33)
2 (-115, -116) (-115, 115)
3 (57, 58) (57, -59)
4 (-24, -25) (-24, 24)

CH1 and CH3 have the same real part and opposite imaginary parts, within +/-1 LSB of
quantization. CH0 has zero imaginary part. CH2 is at the noise floor.

This is the spectral signature of a real-valued input passing through a 4-point
DFT. For a real input x[n], the DFT bins obey X[N-k] = X[k]*. The N=4 case gives
X[3] = X[1]*, which is exactly what's observed. CH0 (the DC bin) is purely real,
CH2 (the Nyquist bin) is purely real, and CH1/CH3 are conjugate twins. I mean to do
complex, did this to get the display working, and then started working on something
else. This needs to be finished. We need complex signals for real weak signal work.

The fact that the math works correctly is not an accident. The FFT, the polyphase
filterbank, and the SPI readout all preserve the structure. The deeper question is
whether real-valued input is the right architecture for the MDT-SIC mission. It isn't.


The real-vs-complex question for weak-signal and SIC work

Real-valued sampling and complex (I/Q) sampling are not equivalent for the kinds of
problems MDT-SIC needs to solve. The differences matter most in exactly the regimes
the project cares about: low SNR, strong-interferer cancellation, and coherent
demodulation. We started out with real on Locutus, and transitioned to complex later on.
We should do the same thing here. For Haifuraiya, we start out with complex.

Independent channels

With N real samples taken at rate Fₛ, the usable spectrum runs 0 to Fₛ/2, and
the N-point DFT produces only N/2 + 1 independent bins — the rest are conjugate
mirrors. The 4-channel channelizer fed with real input therefore gives roughly two
or three independent frequency bins (DC, Fₛ/4, Fₛ/2), with CH1 and CH3 carrying
duplicate information.

With N complex samples at the same rate Fₛ, the usable spectrum runs from -Fₛ/2
to +Fₛ/2, and all N DFT bins are independent. The same 4-channel channelizer
gives genuinely four channels of resolution. I think that's justification enough.

Image rejection

A real-sampled receiver cannot distinguish positive from negative frequencies. An
interferer at +f₀ and an interferer at -f₀ (relative to the local oscillator) fold
onto the same bin and become indistinguishable. For a SIC receiver trying to
characterize and cancel a strong interferer, this is a fundamental limit: the cancel
estimate is corrupted by image-band content that the receiver cannot separately
identify.

Complex sampling, by contrast, separates +f₀ and -f₀ into different bins. I think that
the SIC algorithm sees the interferer cleanly without image contamination. Plus, we get
phase information preserved.

Coherent detection of weak signals

For the weakest signals the FunCube/MDT mission needs to handle, which are well below the
noise floor in a single sample, are recoverable only through coherent integration. The
matched filter requires knowing the carrier phase. Real-valued processing makes this
harder. Complex processing makes it easier.

Dynamic range

Two ADC streams at (I, Q) with the same bit depth as a single real stream double
the total information bandwidth into the FPGA. For SIC, where the strong interferer
may be 60+ dB above the desired signal, every bit of effective dynamic range matters.
Complex sampling effectively gives a √2 improvement in noise floor for the same ADC
bit depth, on top of the qualitative benefits above. While we may not use the original
very high resolution ADCs that Martin Ling baselined for the MDT-SIC prototype, we
really don't want to throw away any dynamic range with the wrong type of math.

DC and 1/f offsets

A direct-conversion (zero-IF) real receiver puts the signal of interest right at DC,
where the front-end's DC offset and 1/f noise are largest. A complex zero-IF
receiver still has the DC issue, but the signal can be placed off-DC at synthesis
time by mixing with a complex local oscillator, sidestepping the noise-floor
contamination that hurts a real architecture. We do this in Locutus.


Architectural implications

"Going complex" means that the changes are significant but they are manageable.

  • Input path. Two real streams from the ADC (I and Q from a quadrature
    downconverter, or output of a digital Hilbert transformer) instead of one. Doubles
    the input bus width.
  • Polyphase filterbank. Same topology, but each branch processes complex samples
    instead of real ones. Roughly 2× LUTs and 2× DSP block usage for the filtering
    stages.
  • FFT. The 4-point FFT (fft_4pt.vhd) is already a complex DFT. It has complex
    inputs and complex outputs. The current build feeds it real I with Q = 0. Feeding
    it true complex input requires no FFT changes, just wiring up the Q path.
  • Output. Each channel is already (I, Q) on the wire. Nothing changes downstream.
  • SPI/STM32 side. No changes at the bus or formatter level. The per-channel
    format already carries I and Q.

The major cost is the filterbank doubling. Whether this fits on the iCE40UP5K depends
on what the current implementation is using and what budget remains. See the next
section.


Framing correction: iCE40UP5K is the deployment target

I was initially treating mdt_sic as a benchtop prototype that could hand off the
"real" complex work to the STM. That can't really happen.

  • haifuraiya is another satellite project with a larger FPGA (ZCU102, Zynq UltraScale+),
    higher powered, generous resources, runs the 64-channel complex channelizer for downlink
    processing. Complex from day one.
  • mdt_sic is the flight payload for FunCube+, and must run on something that fits the
    satellite power budget. AMSAT-UK will not fly anything more power-hungry than the
    iCE40 class. The iCE40UP5K is not a development convenience for mdt_sic, it is the
    deployment constraint. The STM co-processor is not an escape valve.

"Go complex" and "fit in iCE40UP5K" are therefore not alternatives. They are both hard
requirements. The dB formatter fix is the small piece. The complex conversion on this
device is the real engineering work the project needs to do from this point, towards
Friedrichshafen Milestone.


Resource utilization (sic_receiver_impl_1.mrp, May 9 2026 build, check this out)

Resource Used Available % Headroom
LUT4 sites 4,614 5,280 87% 666
Slice registers 2,499 5,280 47% 2,781
MAC16 (DSP) 4 8 50% 4
EBR (4 kbit blocks) 1 30 3% 29
SPRAM (32 kB blocks) 0 4 0% 4
IO sites 19 39 49% 20
PLLs 0 1 0% 1

LUT4 breakdown of the 4,614 used:

  • 2,953 logic LUT4s (actual computation)
  • 1,135 feedthru LUT4s (routing-only, no logic)
  • 520 ripple logic LUT4s (arithmetic carry chains)
  • 6 replicated

The 87% LUT figure is not the iCE40UP5K telling us "this is what complex SIC costs."
It is the current implementation telling us it isn't using the device well. The chip
has a substantial memory and multiplier subsystem sitting essentially idle while LUTs
do work that those blocks could absorb. The 1,135 feedthru LUTs (about 25% of total
LUT usage) is the router burning logic cells just to move signals around, mostly
because of the 256-deep shift-register delay lines fanning out across the fabric.
That is an implementation choice, not a device limit. A redesigned mdt_sic that uses
block RAM for delay lines, time-shares DSPs for complex multiplies, and lets the
iCE40's actual architecture do its job should fit complex 4-channel SIC processing in
a comparable or smaller LUT envelope than the current real 4-channel design.


Path to complex on iCE40UP5K

The redesign is a stack of moves, each measurable on synthesis, that together free up
the LUT budget for the complex doubling and the cancellation pipeline. Roughly in
order of Wonder Woman Lassos of Truth:

  1. Move delay lines from registers/LUTs into EBR or SPRAM. Dominant current
    routing burden is the 256-deep shift registers per branch × 4 branches.
    EBR-backed delay lines clock-gate cleanly, eliminate the long fan-out nets, and
    let the router relax dramatically. For complex (I and Q paths), 2× storage is
    needed. It still fits in ~8 EBRs out of our 30. Estimated saving: 15–25% of current
    LUT use, before any complex doubling. This is also a power win because EBR clock
    gating is more aggressive than register clock gating. Win win paradigms!

  2. DSP time-sharing for complex multiplies. Each complex multiply costs 3–4 real
    multiplies (3 with Karatsuba, 4 naive). Four complex multipliers × 4 real = 16
    real multiplies per sample period. With 8 MAC16 blocks running at a comfortable
    internal clock of 30–50 MHz and a 30 kSps complex sample rate, each MAC16 has
    hundreds of cycles between samples. 2–4 DSPs in time-multiplexed mode cover the
    full complex multiplier load. The DSPs absorb work the LUT-based multipliers are
    currently doing. Unless there's some lurking landmine.

  3. FFT serialized through shared multipliers. The 4-point FFT has 8 multiplies in
    the full butterfly. At these sample rates one DSP serving them sequentially is
    fine, with intermediate state in registers. Eliminates whatever LUT footprint the
    current parallel multipliers in fft_4pt.vhd are using.

  4. Reconsider filter tap count. 256 taps per branch is generous for 30 kHz total
    bandwidth across 4 channels. For a 4× decimating polyphase channelizer with
    ~7.5 kHz per channel, 64–128 taps per branch can give 60–80 dB stopband if the
    coefficient set is designed for it (Remez or windowed-sinc rather than a default
    sinc). Halving taps approximately halves per-branch storage and MAC throughput.
    Tradeoff: somewhat worse stopband attenuation. For amateur satellite passbands
    with controlled interferers, 60–80 dB might be plenty. But Martin wanted really good
    performance here.

  5. Keep channel count at 4. With moves 1–4 unlocking resources, no need to cut
    channels. 4 complex channels covers +/-15 kHz around DC with ~7.5 kHz per channel,
    which is exactly the resolution the SIC use case wants.

Rough estimate: with moves 1 to 4 applied, complex 4-channel SIC fits in 60–75% of the
LUT budget. That leaves comfortable room for the cancellation pipeline itself, parameter estimator, re-modulator, subtractor, which is where the real SIC algorithm work lives.


Effort estimate

Phase Estimate
dB formatter fix (firmware-only) hours
Clean up unconnected fpga_rst_n top-level port minutes
EBR-backed delay-line refactor of polyphase_filterbank.vhd up to 2 weeks
Wire up Q-path through filterbank and into the FFT up to a week
DSP time-sharing for complex multiplies folded into above
Cancellation pipeline (estimator, re-modulator, subtractor) a week??
Integration, on-hardware verification, sensitivity and residual measurement 1–2 weeks maybe more

Call it five to seven weeks of focused work for a flight-quality complex SIC payload
on iCE40UP5K. Each phase produces a measurable result on synthesis (the LUT count
should fall after the EBR refactor, then rise again with complex, but the final
number should be below the current 87%).


Recommendations and Plan

In execution order:

  1. Fix the dB formatter (this session, before moving on). Print raw magnitude
    alongside computed dB, single-step a few values, identify which of the four bug
    patterns applies, fix it. Without metrically correct dB readout we cannot measure
    cancellation depth, so this gates everything later.

  2. Clean up the fpga_rst_n warning (this session). The map report flagged
    Top module port 'fpga_rst_n' does not connect to anything twice. Either remove
    the port from the top-level entity or actually connect it. Cheap, and the design
    is fresh in our head right now.

  3. EBR-backed delay-line refactor. Highest leverage, also the highest-risk
    change because it touches the filterbank's most timing-critical path. Do this as
    its own branch off main, synthesize after each major piece, watch LUT utilization
    come down. Back out if anything breaks.

  4. Q-path wiring through the filterbank and FFT. With LUTs freed by step 3,
    wire in the Q path. The FFT entity needs no change, it is already complex
    internally.

  5. DSP time-sharing. Convert the per-branch dedicated multipliers to a shared
    pool. Folds naturally into step 4.

  6. Cancellation pipeline. New work on top of the now-complex channelizer.
    Parameter estimator (amplitude, frequency offset, phase) to re-modulator to
    subtractor. This is where the SIC algorithm actually lives.

  7. Integration and characterization. Synthesize stimulus (known strong + known
    weak signal at known offsets), run through the pipeline, measure residual after
    cancellation in dB (which requires step 1 to be done), document sensitivity and
    dynamic range. This is the data that goes in the AMSAT-UK deliverable.

Two things to track as the work proceeds:

  • Resource budget at each commit. Pull the .mrp numbers, log them. If LUT
    utilization starts climbing back toward 80% we want to know before place-and-route
    fails.
  • Power. The iCE40UP5K is one of the lower-power FPGAs available, but the design
    topology affects power significantly. EBR-backed storage and DSP-based multipliers
    are both more power-efficient than the LUT-based equivalents. The redesign should
    be a net power win, not a wash.

Achievements

  • Bring-up of post-reorg mdt_sic on iCE40UP5K-B-EVN + NUCLEO-H753ZI is complete
    and verified (MD5 match on flash readback, conjugate symmetry confirmed in
    channelizer output, SPI link working, stuff in puTTY).
  • Last commits on origin/main: 0551644 (fft_64pt rdf cleanup), bd54035
    (generated VHDL deletion + gitignore patterns).
  • Working tree should be clean! And it is!
  • Next action: dB formatter investigation in
    mdt_sic/firmware/stm32/sic_receiver/Core/Src/sic_fpga.c (or wherever the dB
    conversion lives). After that, the fpga_rst_n cleanup in the top-level VHDL.
    After that, the EBR-delay-line refactor as the first redesign branch.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdocumentationImprovements or additions to documentationenhancementNew feature or request

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions