MDT-SIC Signal Representation and dB Reporting
Date: 9 May 2026
Session: Post-reorg bring-up verification (mdt_sic + haifuraiya partition split)
Hardware: iCE40UP5K-B-EVN + NUCLEO-H753ZI
The functional bring-up succeeded: SPI link between STM32 and FPGA is healthy, the
4-point FFT in the polyphase channelizer is producing mathematically correct output,
and the readout chain through UART/ST-Link/PuTTY is delivering frames cleanly. While
inspecting the data, two findings emerged that should be handled before the next phase
of development. The first is a localized firmware bug. The second is a deeper
architectural question that the MDT-SIC mission needs a decision on.
Finding 1: dB readout is not really working
The PuTTY output reports per-channel I, Q, magnitude, and a dB value. The dB column
shows the same value (-50.0 dB) across magnitudes that span more than an order of
magnitude:
| Channel |
Magnitude (LSBs) |
Reported dB |
| CH0, frame 1 |
67 |
-50.0 |
| CH0, frame 2 |
232 |
-50.0 |
| CH0, frame 3 |
117 |
-50.0 |
| CH1, frame 1 |
51 |
-50.0 |
| CH3, frame 2 |
172 |
-50.0 |
I only have an MSEE, so I might be wrong, but when you have a 16-bit signed full scale
number, then that is equal to 32767, and these magnitudes correspond to expected
levels of roughly -54, -43, and -49 dBFS. The dB readout is collapsing an 11 dB range
into a single bin, when it definitely should not be doing that.
Zero-magnitude readings correctly report -99.9 dB, so the formatter
is not entirely broken because it can tell "it's there" and "it's not" but it is not
distinguishing among non-zero values in this regime.
Ideas worth investigating (all in the STM32 firmware, specifically the
magnitude-to-dB conversion code):
- Floor saturation. A
max(computed_dB, -50.0) clamp somewhere, intended for
display sanity, that's eating real information.
- Wrong reference. If 0 dB is referenced to something other than 16-bit full
scale (maybe 8-bit got in there, or a higher-bit fixed-point value is coming along),
the displayed dB values could be saturating against the formatter's print width or
against an unintended floor.
- Coarse log lookup. A small fixed-point lookup table that quantizes log₁₀ to a
handful of bins, with this magnitude range happening to all land in the -50 dB bin.
- Bit-stripping. If the conversion operates on only the top byte of a 16-bit
magnitude, magnitudes below 256 LSBs would all map to the same bin (or to the
noise-floor case). Happened over on pluto_msk and C++ modem.
Investigation: Print the raw magnitude and the computed dB.
Maybe actually look harder at the conversion code, single-step
a few values, and identify which of the four patterns matches.
If not, then try to derive the pattern from the spew.
This is not a blocker for the current bring-up but it is a blocker for any meaningful
work where SNR is important. SIC validation in particular will require measuring
"residual" interferer power after cancellation in dB, so this needs to
be metrically correct before weak-signal work begins for realsies.
Finding 2: Conjugate symmetry in the output indicates a real-valued input. I didn't actually finish the complex value work.
The 4-channel output shows a structural pattern that's worth being explicit about,
because it has implications for the architecture going forward:
| Frame |
CH1 (I, Q) |
CH3 (I, Q) |
| 1 |
(-34, -34) |
(-34, 33) |
| 2 |
(-115, -116) |
(-115, 115) |
| 3 |
(57, 58) |
(57, -59) |
| 4 |
(-24, -25) |
(-24, 24) |
CH1 and CH3 have the same real part and opposite imaginary parts, within +/-1 LSB of
quantization. CH0 has zero imaginary part. CH2 is at the noise floor.
This is the spectral signature of a real-valued input passing through a 4-point
DFT. For a real input x[n], the DFT bins obey X[N-k] = X[k]*. The N=4 case gives
X[3] = X[1]*, which is exactly what's observed. CH0 (the DC bin) is purely real,
CH2 (the Nyquist bin) is purely real, and CH1/CH3 are conjugate twins. I mean to do
complex, did this to get the display working, and then started working on something
else. This needs to be finished. We need complex signals for real weak signal work.
The fact that the math works correctly is not an accident. The FFT, the polyphase
filterbank, and the SPI readout all preserve the structure. The deeper question is
whether real-valued input is the right architecture for the MDT-SIC mission. It isn't.
The real-vs-complex question for weak-signal and SIC work
Real-valued sampling and complex (I/Q) sampling are not equivalent for the kinds of
problems MDT-SIC needs to solve. The differences matter most in exactly the regimes
the project cares about: low SNR, strong-interferer cancellation, and coherent
demodulation. We started out with real on Locutus, and transitioned to complex later on.
We should do the same thing here. For Haifuraiya, we start out with complex.
Independent channels
With N real samples taken at rate Fₛ, the usable spectrum runs 0 to Fₛ/2, and
the N-point DFT produces only N/2 + 1 independent bins — the rest are conjugate
mirrors. The 4-channel channelizer fed with real input therefore gives roughly two
or three independent frequency bins (DC, Fₛ/4, Fₛ/2), with CH1 and CH3 carrying
duplicate information.
With N complex samples at the same rate Fₛ, the usable spectrum runs from -Fₛ/2
to +Fₛ/2, and all N DFT bins are independent. The same 4-channel channelizer
gives genuinely four channels of resolution. I think that's justification enough.
Image rejection
A real-sampled receiver cannot distinguish positive from negative frequencies. An
interferer at +f₀ and an interferer at -f₀ (relative to the local oscillator) fold
onto the same bin and become indistinguishable. For a SIC receiver trying to
characterize and cancel a strong interferer, this is a fundamental limit: the cancel
estimate is corrupted by image-band content that the receiver cannot separately
identify.
Complex sampling, by contrast, separates +f₀ and -f₀ into different bins. I think that
the SIC algorithm sees the interferer cleanly without image contamination. Plus, we get
phase information preserved.
Coherent detection of weak signals
For the weakest signals the FunCube/MDT mission needs to handle, which are well below the
noise floor in a single sample, are recoverable only through coherent integration. The
matched filter requires knowing the carrier phase. Real-valued processing makes this
harder. Complex processing makes it easier.
Dynamic range
Two ADC streams at (I, Q) with the same bit depth as a single real stream double
the total information bandwidth into the FPGA. For SIC, where the strong interferer
may be 60+ dB above the desired signal, every bit of effective dynamic range matters.
Complex sampling effectively gives a √2 improvement in noise floor for the same ADC
bit depth, on top of the qualitative benefits above. While we may not use the original
very high resolution ADCs that Martin Ling baselined for the MDT-SIC prototype, we
really don't want to throw away any dynamic range with the wrong type of math.
DC and 1/f offsets
A direct-conversion (zero-IF) real receiver puts the signal of interest right at DC,
where the front-end's DC offset and 1/f noise are largest. A complex zero-IF
receiver still has the DC issue, but the signal can be placed off-DC at synthesis
time by mixing with a complex local oscillator, sidestepping the noise-floor
contamination that hurts a real architecture. We do this in Locutus.
Architectural implications
"Going complex" means that the changes are significant but they are manageable.
- Input path. Two real streams from the ADC (I and Q from a quadrature
downconverter, or output of a digital Hilbert transformer) instead of one. Doubles
the input bus width.
- Polyphase filterbank. Same topology, but each branch processes complex samples
instead of real ones. Roughly 2× LUTs and 2× DSP block usage for the filtering
stages.
- FFT. The 4-point FFT (
fft_4pt.vhd) is already a complex DFT. It has complex
inputs and complex outputs. The current build feeds it real I with Q = 0. Feeding
it true complex input requires no FFT changes, just wiring up the Q path.
- Output. Each channel is already (I, Q) on the wire. Nothing changes downstream.
- SPI/STM32 side. No changes at the bus or formatter level. The per-channel
format already carries I and Q.
The major cost is the filterbank doubling. Whether this fits on the iCE40UP5K depends
on what the current implementation is using and what budget remains. See the next
section.
Framing correction: iCE40UP5K is the deployment target
I was initially treating mdt_sic as a benchtop prototype that could hand off the
"real" complex work to the STM. That can't really happen.
- haifuraiya is another satellite project with a larger FPGA (ZCU102, Zynq UltraScale+),
higher powered, generous resources, runs the 64-channel complex channelizer for downlink
processing. Complex from day one.
- mdt_sic is the flight payload for FunCube+, and must run on something that fits the
satellite power budget. AMSAT-UK will not fly anything more power-hungry than the
iCE40 class. The iCE40UP5K is not a development convenience for mdt_sic, it is the
deployment constraint. The STM co-processor is not an escape valve.
"Go complex" and "fit in iCE40UP5K" are therefore not alternatives. They are both hard
requirements. The dB formatter fix is the small piece. The complex conversion on this
device is the real engineering work the project needs to do from this point, towards
Friedrichshafen Milestone.
Resource utilization (sic_receiver_impl_1.mrp, May 9 2026 build, check this out)
| Resource |
Used |
Available |
% |
Headroom |
| LUT4 sites |
4,614 |
5,280 |
87% |
666 |
| Slice registers |
2,499 |
5,280 |
47% |
2,781 |
| MAC16 (DSP) |
4 |
8 |
50% |
4 |
| EBR (4 kbit blocks) |
1 |
30 |
3% |
29 |
| SPRAM (32 kB blocks) |
0 |
4 |
0% |
4 |
| IO sites |
19 |
39 |
49% |
20 |
| PLLs |
0 |
1 |
0% |
1 |
LUT4 breakdown of the 4,614 used:
- 2,953 logic LUT4s (actual computation)
- 1,135 feedthru LUT4s (routing-only, no logic)
- 520 ripple logic LUT4s (arithmetic carry chains)
- 6 replicated
The 87% LUT figure is not the iCE40UP5K telling us "this is what complex SIC costs."
It is the current implementation telling us it isn't using the device well. The chip
has a substantial memory and multiplier subsystem sitting essentially idle while LUTs
do work that those blocks could absorb. The 1,135 feedthru LUTs (about 25% of total
LUT usage) is the router burning logic cells just to move signals around, mostly
because of the 256-deep shift-register delay lines fanning out across the fabric.
That is an implementation choice, not a device limit. A redesigned mdt_sic that uses
block RAM for delay lines, time-shares DSPs for complex multiplies, and lets the
iCE40's actual architecture do its job should fit complex 4-channel SIC processing in
a comparable or smaller LUT envelope than the current real 4-channel design.
Path to complex on iCE40UP5K
The redesign is a stack of moves, each measurable on synthesis, that together free up
the LUT budget for the complex doubling and the cancellation pipeline. Roughly in
order of Wonder Woman Lassos of Truth:
-
Move delay lines from registers/LUTs into EBR or SPRAM. Dominant current
routing burden is the 256-deep shift registers per branch × 4 branches.
EBR-backed delay lines clock-gate cleanly, eliminate the long fan-out nets, and
let the router relax dramatically. For complex (I and Q paths), 2× storage is
needed. It still fits in ~8 EBRs out of our 30. Estimated saving: 15–25% of current
LUT use, before any complex doubling. This is also a power win because EBR clock
gating is more aggressive than register clock gating. Win win paradigms!
-
DSP time-sharing for complex multiplies. Each complex multiply costs 3–4 real
multiplies (3 with Karatsuba, 4 naive). Four complex multipliers × 4 real = 16
real multiplies per sample period. With 8 MAC16 blocks running at a comfortable
internal clock of 30–50 MHz and a 30 kSps complex sample rate, each MAC16 has
hundreds of cycles between samples. 2–4 DSPs in time-multiplexed mode cover the
full complex multiplier load. The DSPs absorb work the LUT-based multipliers are
currently doing. Unless there's some lurking landmine.
-
FFT serialized through shared multipliers. The 4-point FFT has 8 multiplies in
the full butterfly. At these sample rates one DSP serving them sequentially is
fine, with intermediate state in registers. Eliminates whatever LUT footprint the
current parallel multipliers in fft_4pt.vhd are using.
-
Reconsider filter tap count. 256 taps per branch is generous for 30 kHz total
bandwidth across 4 channels. For a 4× decimating polyphase channelizer with
~7.5 kHz per channel, 64–128 taps per branch can give 60–80 dB stopband if the
coefficient set is designed for it (Remez or windowed-sinc rather than a default
sinc). Halving taps approximately halves per-branch storage and MAC throughput.
Tradeoff: somewhat worse stopband attenuation. For amateur satellite passbands
with controlled interferers, 60–80 dB might be plenty. But Martin wanted really good
performance here.
-
Keep channel count at 4. With moves 1–4 unlocking resources, no need to cut
channels. 4 complex channels covers +/-15 kHz around DC with ~7.5 kHz per channel,
which is exactly the resolution the SIC use case wants.
Rough estimate: with moves 1 to 4 applied, complex 4-channel SIC fits in 60–75% of the
LUT budget. That leaves comfortable room for the cancellation pipeline itself, parameter estimator, re-modulator, subtractor, which is where the real SIC algorithm work lives.
Effort estimate
| Phase |
Estimate |
| dB formatter fix (firmware-only) |
hours |
Clean up unconnected fpga_rst_n top-level port |
minutes |
EBR-backed delay-line refactor of polyphase_filterbank.vhd |
up to 2 weeks |
| Wire up Q-path through filterbank and into the FFT |
up to a week |
| DSP time-sharing for complex multiplies |
folded into above |
| Cancellation pipeline (estimator, re-modulator, subtractor) |
a week?? |
| Integration, on-hardware verification, sensitivity and residual measurement |
1–2 weeks maybe more |
Call it five to seven weeks of focused work for a flight-quality complex SIC payload
on iCE40UP5K. Each phase produces a measurable result on synthesis (the LUT count
should fall after the EBR refactor, then rise again with complex, but the final
number should be below the current 87%).
Recommendations and Plan
In execution order:
-
Fix the dB formatter (this session, before moving on). Print raw magnitude
alongside computed dB, single-step a few values, identify which of the four bug
patterns applies, fix it. Without metrically correct dB readout we cannot measure
cancellation depth, so this gates everything later.
-
Clean up the fpga_rst_n warning (this session). The map report flagged
Top module port 'fpga_rst_n' does not connect to anything twice. Either remove
the port from the top-level entity or actually connect it. Cheap, and the design
is fresh in our head right now.
-
EBR-backed delay-line refactor. Highest leverage, also the highest-risk
change because it touches the filterbank's most timing-critical path. Do this as
its own branch off main, synthesize after each major piece, watch LUT utilization
come down. Back out if anything breaks.
-
Q-path wiring through the filterbank and FFT. With LUTs freed by step 3,
wire in the Q path. The FFT entity needs no change, it is already complex
internally.
-
DSP time-sharing. Convert the per-branch dedicated multipliers to a shared
pool. Folds naturally into step 4.
-
Cancellation pipeline. New work on top of the now-complex channelizer.
Parameter estimator (amplitude, frequency offset, phase) to re-modulator to
subtractor. This is where the SIC algorithm actually lives.
-
Integration and characterization. Synthesize stimulus (known strong + known
weak signal at known offsets), run through the pipeline, measure residual after
cancellation in dB (which requires step 1 to be done), document sensitivity and
dynamic range. This is the data that goes in the AMSAT-UK deliverable.
Two things to track as the work proceeds:
- Resource budget at each commit. Pull the .mrp numbers, log them. If LUT
utilization starts climbing back toward 80% we want to know before place-and-route
fails.
- Power. The iCE40UP5K is one of the lower-power FPGAs available, but the design
topology affects power significantly. EBR-backed storage and DSP-based multipliers
are both more power-efficient than the LUT-based equivalents. The redesign should
be a net power win, not a wash.
Achievements
- Bring-up of post-reorg mdt_sic on iCE40UP5K-B-EVN + NUCLEO-H753ZI is complete
and verified (MD5 match on flash readback, conjugate symmetry confirmed in
channelizer output, SPI link working, stuff in puTTY).
- Last commits on
origin/main: 0551644 (fft_64pt rdf cleanup), bd54035
(generated VHDL deletion + gitignore patterns).
- Working tree should be clean! And it is!
- Next action: dB formatter investigation in
mdt_sic/firmware/stm32/sic_receiver/Core/Src/sic_fpga.c (or wherever the dB
conversion lives). After that, the fpga_rst_n cleanup in the top-level VHDL.
After that, the EBR-delay-line refactor as the first redesign branch.
MDT-SIC Signal Representation and dB Reporting
Date: 9 May 2026
Session: Post-reorg bring-up verification (mdt_sic + haifuraiya partition split)
Hardware: iCE40UP5K-B-EVN + NUCLEO-H753ZI
The functional bring-up succeeded: SPI link between STM32 and FPGA is healthy, the
4-point FFT in the polyphase channelizer is producing mathematically correct output,
and the readout chain through UART/ST-Link/PuTTY is delivering frames cleanly. While
inspecting the data, two findings emerged that should be handled before the next phase
of development. The first is a localized firmware bug. The second is a deeper
architectural question that the MDT-SIC mission needs a decision on.
Finding 1: dB readout is not really working
The PuTTY output reports per-channel I, Q, magnitude, and a dB value. The dB column
shows the same value (-50.0 dB) across magnitudes that span more than an order of
magnitude:
I only have an MSEE, so I might be wrong, but when you have a 16-bit signed full scale
number, then that is equal to 32767, and these magnitudes correspond to expected
levels of roughly -54, -43, and -49 dBFS. The dB readout is collapsing an 11 dB range
into a single bin, when it definitely should not be doing that.
Zero-magnitude readings correctly report -99.9 dB, so the formatter
is not entirely broken because it can tell "it's there" and "it's not" but it is not
distinguishing among non-zero values in this regime.
Ideas worth investigating (all in the STM32 firmware, specifically the
magnitude-to-dB conversion code):
max(computed_dB, -50.0)clamp somewhere, intended fordisplay sanity, that's eating real information.
scale (maybe 8-bit got in there, or a higher-bit fixed-point value is coming along),
the displayed dB values could be saturating against the formatter's print width or
against an unintended floor.
handful of bins, with this magnitude range happening to all land in the -50 dB bin.
magnitude, magnitudes below 256 LSBs would all map to the same bin (or to the
noise-floor case). Happened over on pluto_msk and C++ modem.
Investigation: Print the raw magnitude and the computed dB.
Maybe actually look harder at the conversion code, single-step
a few values, and identify which of the four patterns matches.
If not, then try to derive the pattern from the spew.
This is not a blocker for the current bring-up but it is a blocker for any meaningful
work where SNR is important. SIC validation in particular will require measuring
"residual" interferer power after cancellation in dB, so this needs to
be metrically correct before weak-signal work begins for realsies.
Finding 2: Conjugate symmetry in the output indicates a real-valued input. I didn't actually finish the complex value work.
The 4-channel output shows a structural pattern that's worth being explicit about,
because it has implications for the architecture going forward:
CH1 and CH3 have the same real part and opposite imaginary parts, within +/-1 LSB of
quantization. CH0 has zero imaginary part. CH2 is at the noise floor.
This is the spectral signature of a real-valued input passing through a 4-point
DFT. For a real input x[n], the DFT bins obey X[N-k] = X[k]*. The N=4 case gives
X[3] = X[1]*, which is exactly what's observed. CH0 (the DC bin) is purely real,
CH2 (the Nyquist bin) is purely real, and CH1/CH3 are conjugate twins. I mean to do
complex, did this to get the display working, and then started working on something
else. This needs to be finished. We need complex signals for real weak signal work.
The fact that the math works correctly is not an accident. The FFT, the polyphase
filterbank, and the SPI readout all preserve the structure. The deeper question is
whether real-valued input is the right architecture for the MDT-SIC mission. It isn't.
The real-vs-complex question for weak-signal and SIC work
Real-valued sampling and complex (I/Q) sampling are not equivalent for the kinds of
problems MDT-SIC needs to solve. The differences matter most in exactly the regimes
the project cares about: low SNR, strong-interferer cancellation, and coherent
demodulation. We started out with real on Locutus, and transitioned to complex later on.
We should do the same thing here. For Haifuraiya, we start out with complex.
Independent channels
With N real samples taken at rate Fₛ, the usable spectrum runs 0 to Fₛ/2, and
the N-point DFT produces only N/2 + 1 independent bins — the rest are conjugate
mirrors. The 4-channel channelizer fed with real input therefore gives roughly two
or three independent frequency bins (DC, Fₛ/4, Fₛ/2), with CH1 and CH3 carrying
duplicate information.
With N complex samples at the same rate Fₛ, the usable spectrum runs from -Fₛ/2
to +Fₛ/2, and all N DFT bins are independent. The same 4-channel channelizer
gives genuinely four channels of resolution. I think that's justification enough.
Image rejection
A real-sampled receiver cannot distinguish positive from negative frequencies. An
interferer at +f₀ and an interferer at -f₀ (relative to the local oscillator) fold
onto the same bin and become indistinguishable. For a SIC receiver trying to
characterize and cancel a strong interferer, this is a fundamental limit: the cancel
estimate is corrupted by image-band content that the receiver cannot separately
identify.
Complex sampling, by contrast, separates +f₀ and -f₀ into different bins. I think that
the SIC algorithm sees the interferer cleanly without image contamination. Plus, we get
phase information preserved.
Coherent detection of weak signals
For the weakest signals the FunCube/MDT mission needs to handle, which are well below the
noise floor in a single sample, are recoverable only through coherent integration. The
matched filter requires knowing the carrier phase. Real-valued processing makes this
harder. Complex processing makes it easier.
Dynamic range
Two ADC streams at (I, Q) with the same bit depth as a single real stream double
the total information bandwidth into the FPGA. For SIC, where the strong interferer
may be 60+ dB above the desired signal, every bit of effective dynamic range matters.
Complex sampling effectively gives a √2 improvement in noise floor for the same ADC
bit depth, on top of the qualitative benefits above. While we may not use the original
very high resolution ADCs that Martin Ling baselined for the MDT-SIC prototype, we
really don't want to throw away any dynamic range with the wrong type of math.
DC and 1/f offsets
A direct-conversion (zero-IF) real receiver puts the signal of interest right at DC,
where the front-end's DC offset and 1/f noise are largest. A complex zero-IF
receiver still has the DC issue, but the signal can be placed off-DC at synthesis
time by mixing with a complex local oscillator, sidestepping the noise-floor
contamination that hurts a real architecture. We do this in Locutus.
Architectural implications
"Going complex" means that the changes are significant but they are manageable.
downconverter, or output of a digital Hilbert transformer) instead of one. Doubles
the input bus width.
instead of real ones. Roughly 2× LUTs and 2× DSP block usage for the filtering
stages.
fft_4pt.vhd) is already a complex DFT. It has complexinputs and complex outputs. The current build feeds it real I with Q = 0. Feeding
it true complex input requires no FFT changes, just wiring up the Q path.
format already carries I and Q.
The major cost is the filterbank doubling. Whether this fits on the iCE40UP5K depends
on what the current implementation is using and what budget remains. See the next
section.
Framing correction: iCE40UP5K is the deployment target
I was initially treating mdt_sic as a benchtop prototype that could hand off the
"real" complex work to the STM. That can't really happen.
higher powered, generous resources, runs the 64-channel complex channelizer for downlink
processing. Complex from day one.
satellite power budget. AMSAT-UK will not fly anything more power-hungry than the
iCE40 class. The iCE40UP5K is not a development convenience for mdt_sic, it is the
deployment constraint. The STM co-processor is not an escape valve.
"Go complex" and "fit in iCE40UP5K" are therefore not alternatives. They are both hard
requirements. The dB formatter fix is the small piece. The complex conversion on this
device is the real engineering work the project needs to do from this point, towards
Friedrichshafen Milestone.
Resource utilization (sic_receiver_impl_1.mrp, May 9 2026 build, check this out)
LUT4 breakdown of the 4,614 used:
The 87% LUT figure is not the iCE40UP5K telling us "this is what complex SIC costs."
It is the current implementation telling us it isn't using the device well. The chip
has a substantial memory and multiplier subsystem sitting essentially idle while LUTs
do work that those blocks could absorb. The 1,135 feedthru LUTs (about 25% of total
LUT usage) is the router burning logic cells just to move signals around, mostly
because of the 256-deep shift-register delay lines fanning out across the fabric.
That is an implementation choice, not a device limit. A redesigned mdt_sic that uses
block RAM for delay lines, time-shares DSPs for complex multiplies, and lets the
iCE40's actual architecture do its job should fit complex 4-channel SIC processing in
a comparable or smaller LUT envelope than the current real 4-channel design.
Path to complex on iCE40UP5K
The redesign is a stack of moves, each measurable on synthesis, that together free up
the LUT budget for the complex doubling and the cancellation pipeline. Roughly in
order of Wonder Woman Lassos of Truth:
Move delay lines from registers/LUTs into EBR or SPRAM. Dominant current
routing burden is the 256-deep shift registers per branch × 4 branches.
EBR-backed delay lines clock-gate cleanly, eliminate the long fan-out nets, and
let the router relax dramatically. For complex (I and Q paths), 2× storage is
needed. It still fits in ~8 EBRs out of our 30. Estimated saving: 15–25% of current
LUT use, before any complex doubling. This is also a power win because EBR clock
gating is more aggressive than register clock gating. Win win paradigms!
DSP time-sharing for complex multiplies. Each complex multiply costs 3–4 real
multiplies (3 with Karatsuba, 4 naive). Four complex multipliers × 4 real = 16
real multiplies per sample period. With 8 MAC16 blocks running at a comfortable
internal clock of 30–50 MHz and a 30 kSps complex sample rate, each MAC16 has
hundreds of cycles between samples. 2–4 DSPs in time-multiplexed mode cover the
full complex multiplier load. The DSPs absorb work the LUT-based multipliers are
currently doing. Unless there's some lurking landmine.
FFT serialized through shared multipliers. The 4-point FFT has 8 multiplies in
the full butterfly. At these sample rates one DSP serving them sequentially is
fine, with intermediate state in registers. Eliminates whatever LUT footprint the
current parallel multipliers in
fft_4pt.vhdare using.Reconsider filter tap count. 256 taps per branch is generous for 30 kHz total
bandwidth across 4 channels. For a 4× decimating polyphase channelizer with
~7.5 kHz per channel, 64–128 taps per branch can give 60–80 dB stopband if the
coefficient set is designed for it (Remez or windowed-sinc rather than a default
sinc). Halving taps approximately halves per-branch storage and MAC throughput.
Tradeoff: somewhat worse stopband attenuation. For amateur satellite passbands
with controlled interferers, 60–80 dB might be plenty. But Martin wanted really good
performance here.
Keep channel count at 4. With moves 1–4 unlocking resources, no need to cut
channels. 4 complex channels covers +/-15 kHz around DC with ~7.5 kHz per channel,
which is exactly the resolution the SIC use case wants.
Rough estimate: with moves 1 to 4 applied, complex 4-channel SIC fits in 60–75% of the
LUT budget. That leaves comfortable room for the cancellation pipeline itself, parameter estimator, re-modulator, subtractor, which is where the real SIC algorithm work lives.
Effort estimate
fpga_rst_ntop-level portpolyphase_filterbank.vhdCall it five to seven weeks of focused work for a flight-quality complex SIC payload
on iCE40UP5K. Each phase produces a measurable result on synthesis (the LUT count
should fall after the EBR refactor, then rise again with complex, but the final
number should be below the current 87%).
Recommendations and Plan
In execution order:
Fix the dB formatter (this session, before moving on). Print raw magnitude
alongside computed dB, single-step a few values, identify which of the four bug
patterns applies, fix it. Without metrically correct dB readout we cannot measure
cancellation depth, so this gates everything later.
Clean up the
fpga_rst_nwarning (this session). The map report flaggedTop module port 'fpga_rst_n' does not connect to anythingtwice. Either removethe port from the top-level entity or actually connect it. Cheap, and the design
is fresh in our head right now.
EBR-backed delay-line refactor. Highest leverage, also the highest-risk
change because it touches the filterbank's most timing-critical path. Do this as
its own branch off main, synthesize after each major piece, watch LUT utilization
come down. Back out if anything breaks.
Q-path wiring through the filterbank and FFT. With LUTs freed by step 3,
wire in the Q path. The FFT entity needs no change, it is already complex
internally.
DSP time-sharing. Convert the per-branch dedicated multipliers to a shared
pool. Folds naturally into step 4.
Cancellation pipeline. New work on top of the now-complex channelizer.
Parameter estimator (amplitude, frequency offset, phase) to re-modulator to
subtractor. This is where the SIC algorithm actually lives.
Integration and characterization. Synthesize stimulus (known strong + known
weak signal at known offsets), run through the pipeline, measure residual after
cancellation in dB (which requires step 1 to be done), document sensitivity and
dynamic range. This is the data that goes in the AMSAT-UK deliverable.
Two things to track as the work proceeds:
utilization starts climbing back toward 80% we want to know before place-and-route
fails.
topology affects power significantly. EBR-backed storage and DSP-based multipliers
are both more power-efficient than the LUT-based equivalents. The redesign should
be a net power win, not a wash.
Achievements
and verified (MD5 match on flash readback, conjugate symmetry confirmed in
channelizer output, SPI link working, stuff in puTTY).
origin/main:0551644(fft_64pt rdf cleanup),bd54035(generated VHDL deletion + gitignore patterns).
mdt_sic/firmware/stm32/sic_receiver/Core/Src/sic_fpga.c(or wherever the dBconversion lives). After that, the
fpga_rst_ncleanup in the top-level VHDL.After that, the EBR-delay-line refactor as the first redesign branch.