Performance Optimization for Browser-Based SDR

Overview

Real-time signal processing in browsers presents unique challenges. This guide covers optimization strategies for maintaining high performance.

Performance Targets

Minimum Requirements

FFT computation: <16ms per frame (60 FPS)
Audio processing: <2ms latency for real-time
Waterfall update: 30-60 FPS smooth scrolling
UI responsiveness: <100ms for user interactions

Optimization Goals

Maintain 60 FPS spectrum/waterfall updates
Process 2+ MS/s sample rate in real-time
Keep CPU usage <50% on target hardware
Minimize memory allocations/garbage collection

Web Worker Architecture

Why Web Workers

Problem: JavaScript is single-threaded. Heavy DSP blocks UI.

Solution: Offload processing to background threads.

Worker Structure

// main.js - UI thread
const dspWorker = new Worker("dsp-worker.js");

// Send samples to worker
dspWorker.postMessage(
  {
    type: "process",
    samples: audioBuffer,
    sampleRate: 48000,
  },
  [audioBuffer.buffer],
); // Transfer ownership

// Receive results
dspWorker.onmessage = (e) => {
  const { spectrum, audio } = e.data;
  updateDisplay(spectrum);
  playAudio(audio);
};

// dsp-worker.js - Worker thread
self.onmessage = (e) => {
  const { type, samples, sampleRate } = e.data;

  if (type === "process") {
    const spectrum = computeFFT(samples);
    const audio = demodulate(samples);

    self.postMessage(
      {
        spectrum,
        audio,
      },
      [spectrum.buffer, audio.buffer],
    ); // Transfer back
  }
};

Multiple Workers

Strategy: Parallel processing pipeline

const fftWorker = new Worker('fft-worker.js');
const demodWorker = new Worker('demod-worker.js');
const audioWorker = new Worker('audio-worker.js');

// Pipeline
rawSamples → fftWorker → spectrum display
           ↘ demodWorker → audioWorker → audio output

Transferable Objects

Critical: Use transferable objects to avoid copying.

// ❌ SLOW - Copies data
worker.postMessage({ buffer: myFloat32Array });

// ✅ FAST - Transfers ownership
worker.postMessage({ buffer: myFloat32Array }, [myFloat32Array.buffer]);

Note: After transfer, original is neutered (length = 0).

SharedArrayBuffer for Zero-Copy

Advanced technique for maximum performance (requires HTTPS + security headers):

SharedArrayBuffer allows multiple workers and main thread to access the same memory without copying, eliminating transfer overhead entirely.

Browser Requirements:

HTTPS deployment
Cross-Origin-Opener-Policy: same-origin header
Cross-Origin-Embedder-Policy: require-corp header
Chrome 92+, Firefox 79+, Safari 15.2+

Zero-Copy Pattern:

// Create shared buffer
const sharedBuffer = new SharedArrayBuffer(1024 * 4); // 1024 floats
const sharedArray = new Float32Array(sharedBuffer);

// Pass to worker (no copy!)
worker.postMessage({ sharedBuffer });

// Worker receives same memory
// worker.js
self.onmessage = (e) => {
  const samples = new Float32Array(e.data.sharedBuffer);

  // Process directly - no copy needed
  for (let i = 0; i < samples.length; i++) {
    samples[i] = processFFT(samples[i]);
  }

  // Signal completion
  Atomics.notify(samples, 0, 1);
};

// Main thread reads results immediately
Atomics.wait(sharedArray, 0, 0);
const processedData = sharedArray; // Already updated!

Synchronization with Atomics:

// Ring buffer for continuous streaming
class SharedRingBuffer {
  constructor(size) {
    this.buffer = new SharedArrayBuffer((size + 2) * 4);
    this.data = new Float32Array(this.buffer, 8); // Skip header
    this.header = new Int32Array(this.buffer, 0, 2);
    this.size = size;
  }

  write(samples) {
    const writePos = Atomics.load(this.header, 0);

    for (let i = 0; i < samples.length; i++) {
      this.data[(writePos + i) % this.size] = samples[i];
    }

    Atomics.store(this.header, 0, (writePos + samples.length) % this.size);
    Atomics.notify(this.header, 0, 1);
  }

  read(count) {
    const readPos = Atomics.load(this.header, 1);
    const samples = new Float32Array(count);

    for (let i = 0; i < count; i++) {
      samples[i] = this.data[(readPos + i) % this.size];
    }

    Atomics.store(this.header, 1, (readPos + count) % this.size);
    return samples;
  }
}

Performance Benefits:

Transfer Method	Latency	Throughput (1MB)
postMessage (copy)	1-5ms	~200 MB/s
Transferable	0.1-0.5ms	~2 GB/s
SharedArrayBuffer	<0.01ms	~10+ GB/s

Use Cases:

Real-time SDR sample streaming (20+ MS/s)
Multi-worker parallel processing
Low-latency audio pipelines
High-frequency data visualization

Security Considerations:

Requires cross-origin isolation (Spectre mitigation)
May not work in embedded contexts (iframes)
Server configuration needed

Resources:

Memory Management

Typed Arrays

Always use typed arrays for numerical data:

// ✅ GOOD
const samples = new Float32Array(1024);
const spectrum = new Float32Array(512);

// ❌ BAD
const samples = [];
const spectrum = [];

Pre-allocation

Avoid allocations in hot paths:

// ✅ GOOD - Allocate once
class FFTProcessor {
  constructor(size) {
    this.buffer = new Float32Array(size);
    this.window = this.createWindow(size);
    this.fftReal = new Float32Array(size);
    this.fftImag = new Float32Array(size);
  }

  process(input) {
    // Reuse buffers
    for (let i = 0; i < input.length; i++) {
      this.buffer[i] = input[i] * this.window[i];
    }
    // ... process
  }
}

// ❌ BAD - Allocates every call
function process(input) {
  const buffer = new Float32Array(input.length);
  const window = createWindow(input.length);
  // ...
}

Object Pooling

Reuse objects instead of creating new ones:

class BufferPool {
  constructor(size, count) {
    this.size = size;
    this.available = [];
    this.inUse = new Set();

    for (let i = 0; i < count; i++) {
      this.available.push(new Float32Array(size));
    }
  }

  acquire() {
    if (this.available.length === 0) {
      console.warn("Pool exhausted");
      return new Float32Array(this.size);
    }
    const buffer = this.available.pop();
    this.inUse.add(buffer);
    return buffer;
  }

  release(buffer) {
    this.inUse.delete(buffer);
    this.available.push(buffer);
  }
}

// Usage
const pool = new BufferPool(4096, 10);
const buffer = pool.acquire();
// ... use buffer
pool.release(buffer);

Garbage Collection

Minimize GC pressure:

Avoid allocations in loops
Reuse buffers
Use object pools
Clear large arrays when done: array.fill(0)

FFT Optimization

Library Selection

Options:

fft.js: Pure JS, fast, well-tested
kiss-fft.js: Port of KISS FFT
dsp.js: Full DSP suite
Web Audio FFT: Built-in, but limited

Recommendation: fft.js for flexibility, Web Audio for simplicity

FFT Size Selection

Trade-offs:

Larger FFT = Better frequency resolution, more CPU
Power of 2 sizes are much faster
Common: 256, 512, 1024, 2048, 4096

Adaptive sizing:

function selectFFTSize(sampleRate, desiredResolution) {
  const minSize = Math.ceil(sampleRate / desiredResolution);
  return Math.pow(2, Math.ceil(Math.log2(minSize)));
}

Overlap Processing

Technique: Process overlapping windows for smoother display.

class OverlapProcessor {
  constructor(fftSize, overlap = 0.5) {
    this.fftSize = fftSize;
    this.hopSize = Math.floor(fftSize * (1 - overlap));
    this.buffer = new Float32Array(fftSize);
    this.position = 0;
  }

  process(newSamples) {
    const results = [];

    for (let i = 0; i < newSamples.length; i++) {
      this.buffer[this.position++] = newSamples[i];

      if (this.position >= this.fftSize) {
        results.push(this.computeFFT(this.buffer));

        // Shift buffer by hop size
        this.buffer.copyWithin(0, this.hopSize);
        this.position -= this.hopSize;
      }
    }

    return results;
  }
}

Window Function Caching

Pre-compute window once:

class WindowCache {
  constructor() {
    this.cache = new Map();
  }

  getWindow(type, size) {
    const key = `${type}-${size}`;
    if (!this.cache.has(key)) {
      this.cache.set(key, this.computeWindow(type, size));
    }
    return this.cache.get(key);
  }

  computeWindow(type, size) {
    const window = new Float32Array(size);
    // ... compute window
    return window;
  }
}

WebAssembly SIMD Acceleration

Advanced optimization for modern browsers (2024+):

WebAssembly SIMD (Single Instruction, Multiple Data) provides 2-4x additional speedup on top of regular WASM by performing operations on multiple data points simultaneously.

Browser Support:

Chrome 91+ ✅
Firefox 89+ ✅
Safari 16.4+ ✅
Edge 91+ ✅

Feature Detection:

// Example function - actual implementation: isWasmSIMDSupported() in src/utils/dspWasm.ts
function detectWasmSIMD() {
  try {
    // Minimal WASM module with SIMD instruction
    return WebAssembly.validate(
      new Uint8Array([
        0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127, 3, 2, 1, 0, 10, 10,
        1, 8, 0, 65, 0, 253, 12, 253, 98, 11,
      ]),
    );
  } catch {
    return false;
  }
}

if (detectWasmSIMD()) {
  // Load SIMD-optimized WASM module
  import("./dsp-simd.wasm");
} else {
  // Fallback to regular WASM
  import("./dsp.wasm");
}

Implementation Strategies:

Vector Operations: Process 4 float32 values per instruction
Butterfly FFT: SIMD-optimized complex multiply-add
Parallel Windowing: Apply window to 4+ samples simultaneously
SIMD Intrinsics: Use v128 types in AssemblyScript/Rust

Example - SIMD FFT (AssemblyScript):

// Load 4 samples at once
let samples = v128.load(samplesPtr);

// Multiply by twiddle factors (4 parallel operations)
let real = f32x4.mul(samples, twiddleReal);
let imag = f32x4.mul(samples, twiddleImag);

// Complex rotation
let result = f32x4.add(real, imag);

// Store result
v128.store(outputPtr, result);

Performance Gains:

Operation	Regular WASM	WASM + SIMD	Total Speedup
FFT 2048	6-8ms	2-3ms	8-10x vs JS
FFT 4096	25-30ms	8-12ms	7-9x vs JS
Windowing	0.1-0.15ms	0.03-0.05ms	4-7x vs JS

Limitations:

No runtime detection in WASM itself (must build separate modules)
Alignment requirements (16-byte boundaries)
Limited to 128-bit wide vectors
Trap on unsupported platforms (careful fallback needed)

Resources:

WebGL Acceleration

Waterfall Display

Use GPU for rendering large waterfall displays:

class WebGLWaterfall {
  constructor(canvas, width, height) {
    this.gl = canvas.getContext("webgl2");
    this.width = width;
    this.height = height;
    this.texture = this.createTexture();
    this.setupShaders();
  }

  updateLine(spectrumData) {
    // Scroll texture up
    this.gl.copyTexSubImage2D(
      this.gl.TEXTURE_2D,
      0,
      0,
      1,
      0,
      0,
      this.width,
      this.height - 1,
    );

    // Add new line at bottom
    this.gl.texSubImage2D(
      this.gl.TEXTURE_2D,
      0,
      0,
      0,
      this.width,
      1,
      this.gl.LUMINANCE,
      this.gl.UNSIGNED_BYTE,
      spectrumData,
    );

    this.render();
  }
}

Spectrum Display

GPU-accelerated line rendering:

// Upload spectrum data as texture
const texture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.texImage2D(
  gl.TEXTURE_2D,
  0,
  gl.R32F,
  spectrumLength,
  1,
  0,
  gl.RED,
  gl.FLOAT,
  spectrumData,
);

// Vertex shader samples texture, renders line

WebGPU Compute Acceleration

Next-generation GPU computing (2024+ browsers):

WebGPU provides direct access to GPU compute capabilities, enabling 5-10x speedup for large FFTs and parallel DSP operations compared to WASM.

Browser Support:

Chrome 113+ ✅
Edge 113+ ✅
Safari 18+ ✅
Firefox: In development 🔄

Feature Detection:

async function initWebGPU() {
  if (!navigator.gpu) {
    console.warn("WebGPU not supported");
    return null;
  }

  const adapter = await navigator.gpu.requestAdapter();
  if (!adapter) {
    console.warn("No GPU adapter found");
    return null;
  }

  const device = await adapter.requestDevice();
  return device;
}

Compute Shader FFT Example:

// WGSL compute shader for FFT butterfly operation
@group(0) @binding(0) var<storage, read_write> data: array<vec2<f32>>;
@group(0) @binding(1) var<storage, read> twiddles: array<vec2<f32>>;

@compute @workgroup_size(64)
fn fft_butterfly(@builtin(global_invocation_id) id: vec3<u32>) {
    let idx = id.x;
    let half_size = arrayLength(&data) / 2u;

    if (idx >= half_size) {
        return;
    }

    // Butterfly operation
    let even = data[idx];
    let odd = data[idx + half_size];
    let twiddle = twiddles[idx];

    // Complex multiplication: odd * twiddle
    let odd_mul = vec2<f32>(
        odd.x * twiddle.x - odd.y * twiddle.y,
        odd.x * twiddle.y + odd.y * twiddle.x
    );

    // Combine
    data[idx] = even + odd_mul;
    data[idx + half_size] = even - odd_mul;
}

JavaScript Integration:

class WebGPUFFT {
  async initialize(device, fftSize) {
    this.device = device;
    this.fftSize = fftSize;

    // Create buffers
    this.dataBuffer = device.createBuffer({
      size: fftSize * 8, // vec2<f32> = 8 bytes
      usage:
        GPUBufferUsage.STORAGE |
        GPUBufferUsage.COPY_DST |
        GPUBufferUsage.COPY_SRC,
    });

    // Load compute shader
    const module = device.createShaderModule({
      code: fftShaderCode,
    });

    // Create compute pipeline
    this.pipeline = device.createComputePipeline({
      layout: "auto",
      compute: {
        module,
        entryPoint: "fft_butterfly",
      },
    });

    // Create bind group
    this.bindGroup = device.createBindGroup({
      layout: this.pipeline.getBindGroupLayout(0),
      entries: [
        { binding: 0, resource: { buffer: this.dataBuffer } },
        { binding: 1, resource: { buffer: this.twiddleBuffer } },
      ],
    });
  }

  async compute(samples) {
    // Upload data
    this.device.queue.writeBuffer(this.dataBuffer, 0, samples);

    // Create command encoder
    const encoder = this.device.createCommandEncoder();
    const pass = encoder.beginComputePass();

    // Dispatch compute shader
    pass.setPipeline(this.pipeline);
    pass.setBindGroup(0, this.bindGroup);
    pass.dispatchWorkgroups(Math.ceil(this.fftSize / 64));
    pass.end();

    // Submit
    this.device.queue.submit([encoder.finish()]);

    // Read results
    const resultBuffer = this.device.createBuffer({
      size: this.dataBuffer.size,
      usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
    });

    const copyEncoder = this.device.createCommandEncoder();
    copyEncoder.copyBufferToBuffer(
      this.dataBuffer,
      0,
      resultBuffer,
      0,
      this.dataBuffer.size,
    );
    this.device.queue.submit([copyEncoder.finish()]);

    await resultBuffer.mapAsync(GPUMapMode.READ);
    const result = new Float32Array(resultBuffer.getMappedRange());
    resultBuffer.unmap();

    return result;
  }
}

Performance Characteristics:

Operation	WASM (ms)	WebGPU (ms)	Speedup
FFT 2048	6-8	0.5-1.0	8-12x
FFT 4096	25-30	2-4	8-12x
FFT 8192	100-120	8-12	10-12x
FFT 16384	400-500	25-35	12-15x

Best Practices:

Batch Operations: Process multiple FFTs in one shader invocation
Workgroup Size: Tune for target GPU (typically 64-256)
Memory Management: Reuse buffers, avoid allocations
Async Patterns: Use timestamps for profiling
Fallback Path: Always provide WASM/JS fallback

Direct Rendering Pipeline:

// Compute FFT and render in single GPU pass
class WebGPUSpectrumAnalyzer {
  async renderFrame(samples) {
    const encoder = this.device.createCommandEncoder();

    // 1. Compute pass: FFT
    const computePass = encoder.beginComputePass();
    computePass.setPipeline(this.fftPipeline);
    computePass.setBindGroup(0, this.fftBindGroup);
    computePass.dispatchWorkgroups(this.workgroupCount);
    computePass.end();

    // 2. Render pass: Visualize
    const renderPass = encoder.beginRenderPass({
      colorAttachments: [
        {
          view: context.getCurrentTexture().createView(),
          loadOp: "clear",
          storeOp: "store",
        },
      ],
    });
    renderPass.setPipeline(this.renderPipeline);
    renderPass.setBindGroup(0, this.renderBindGroup); // Uses FFT output
    renderPass.draw(vertexCount);
    renderPass.end();

    // Submit both passes together
    this.device.queue.submit([encoder.finish()]);
  }
}

Optimization Tips:

Use timestamp queries for profiling GPU work
Minimize CPU-GPU synchronization
Leverage shared memory within workgroups
Consider relaxed memory ordering for non-critical data
Profile with browser GPU debuggers (Chrome DevTools, Safari Web Inspector)

Resources:

Audio Processing Optimization

Web Audio API

Use native audio graph for efficiency:

const audioContext = new AudioContext();

// Create processing chain
const source = audioContext.createMediaStreamSource(stream);
const filter = audioContext.createBiquadFilter();
const analyser = audioContext.createAnalyser();
const gain = audioContext.createGain();

// Connect
source.connect(filter);
filter.connect(analyser);
analyser.connect(gain);
gain.connect(audioContext.destination);

// Configure
filter.type = "lowpass";
filter.frequency.value = 3000;
analyser.fftSize = 2048;

AudioWorklet

For custom DSP, use AudioWorklet (not ScriptProcessor):

// main.js
await audioContext.audioWorklet.addModule("demodulator.js");
const demodNode = new AudioWorkletNode(context, "demodulator");

// demodulator.js
class DemodulatorProcessor extends AudioWorkletProcessor {
  process(inputs, outputs, parameters) {
    const input = inputs[0][0];
    const output = outputs[0][0];

    for (let i = 0; i < input.length; i++) {
      output[i] = this.demodulate(input[i]);
    }

    return true;
  }
}

registerProcessor("demodulator", DemodulatorProcessor);

Sample Rate Conversion

Use OfflineAudioContext for resampling:

async function resample(buffer, targetRate) {
  const ctx = new OfflineAudioContext(
    buffer.numberOfChannels,
    buffer.duration * targetRate,
    targetRate,
  );

  const source = ctx.createBufferSource();
  source.buffer = buffer;
  source.connect(ctx.destination);
  source.start();

  return await ctx.startRendering();
}

Decimation for Display

Don't render every sample:

function decimateForDisplay(data, targetPoints) {
  if (data.length <= targetPoints) return data;

  const result = new Float32Array(targetPoints);
  const step = data.length / targetPoints;

  for (let i = 0; i < targetPoints; i++) {
    const start = Math.floor(i * step);
    const end = Math.floor((i + 1) * step);

    // Min-max decimation preserves peaks
    let min = Infinity,
      max = -Infinity;
    for (let j = start; j < end; j++) {
      if (data[j] < min) min = data[j];
      if (data[j] > max) max = data[j];
    }

    result[i] = i % 2 === 0 ? min : max;
  }

  return result;
}

Rendering Optimization

Canvas vs WebGL

Canvas 2D:

✅ Simple API
✅ Good for <1000 points
❌ Slow for complex scenes

WebGL:

✅ Very fast for large data
✅ GPU accelerated
❌ More complex

Use Canvas for UI, WebGL for spectrum/waterfall

requestAnimationFrame

Sync updates with display refresh:

class DisplayManager {
  constructor() {
    this.needsUpdate = false;
    this.isAnimating = false;
  }

  requestUpdate() {
    this.needsUpdate = true;
    if (!this.isAnimating) {
      this.isAnimating = true;
      requestAnimationFrame(() => this.render());
    }
  }

  render() {
    if (this.needsUpdate) {
      this.updateDisplay();
      this.needsUpdate = false;
    }

    this.isAnimating = false;
  }
}

Batch Updates

Group DOM operations:

// ❌ BAD - Multiple reflows
for (let i = 0; i < 100; i++) {
  element.style.top = positions[i] + "px";
}

// ✅ GOOD - Single update
const updates = positions.map((p) => `top: ${p}px`).join(";");
element.style.cssText = updates;

// ✅ BETTER - Use transform (GPU accelerated)
element.style.transform = `translateY(${position}px)`;

Profiling and Debugging

Performance API

Measure critical paths:

performance.mark("fft-start");
const spectrum = computeFFT(samples);
performance.mark("fft-end");
performance.measure("fft", "fft-start", "fft-end");

const measure = performance.getEntriesByName("fft")[0];
console.log(`FFT took ${measure.duration.toFixed(2)}ms`);

Chrome DevTools

Profile CPU usage:

Open DevTools > Performance
Record during operation
Look for:
- Long tasks (>50ms)
- GC pauses
- Layout thrashing

Memory Profiling

Check for leaks:

DevTools > Memory
Take heap snapshot
Compare before/after
Look for detached objects

Platform-Specific Optimization

Mobile Devices

Considerations:

Lower CPU/GPU power
Battery constraints
Smaller screens

Strategies:

Reduce FFT size (512-1024)
Lower update rate (30 FPS)
Simplify visualizations
Use more aggressive decimation

Desktop

Take advantage of:

Multiple cores (more workers)
Larger FFT sizes (4096-8192)
Higher resolution displays
WebGL 2.0 features

Common Pitfalls

❌ Don't: Allocate in hot loops

for (let i = 0; i < 1000; i++) {
  const temp = new Float32Array(1024);
  process(temp);
}

✅ Do: Reuse buffers

const temp = new Float32Array(1024);
for (let i = 0; i < 1000; i++) {
  process(temp);
}

❌ Don't: Block main thread

const spectrum = heavyFFT(largeSamples); // Freezes UI
updateDisplay(spectrum);

✅ Do: Use workers

worker.postMessage({ samples: largeSamples });
worker.onmessage = (e) => updateDisplay(e.data.spectrum);

❌ Don't: Update every sample

samples.forEach((s) => updateDisplay(s));

✅ Do: Batch updates

let buffer = [];
samples.forEach((s) => {
  buffer.push(s);
  if (buffer.length >= BATCH_SIZE) {
    updateDisplay(buffer);
    buffer = [];
  }
});

Benchmarking

Test Suite

class PerformanceTest {
  async runTests() {
    const sizes = [256, 512, 1024, 2048, 4096];

    for (const size of sizes) {
      const samples = new Float32Array(size);
      const iterations = 1000;

      const start = performance.now();
      for (let i = 0; i < iterations; i++) {
        computeFFT(samples);
      }
      const end = performance.now();

      const avg = (end - start) / iterations;
      console.log(`FFT ${size}: ${avg.toFixed(2)}ms avg`);
    }
  }
}

Target Performance Metrics

Acceptable Performance

FFT 2048: <5ms
FFT 4096: <10ms
Spectrum render: <16ms (60 FPS)
Waterfall update: <16ms
Total processing: <50% CPU

Excellent Performance

FFT 2048: <2ms
FFT 4096: <5ms
Spectrum render: <8ms (120 FPS capable)
Waterfall update: <8ms
Total processing: <30% CPU

Resources

Official Documentation

Web Workers: MDN Web Workers API
SharedArrayBuffer: MDN SharedArrayBuffer
WebGL: WebGL Fundamentals
WebGPU: WebGPU Fundamentals
Audio Worklet: MDN AudioWorklet
Performance API: MDN Performance

WebAssembly and SIMD

WebAssembly: Official Specification
WASM SIMD: V8 SIMD Features
RustFFT WASM: WASM SIMD Implementation
pffft.wasm: Fast FFT Library
WebFFT: Meta-Library for FFT
AssemblyScript: TypeScript to WASM

Performance Optimization Guides

Web.dev Performance: web.dev/performance
High-Performance JS: Off-Main-Thread
Signal Analyzer Example: Building with Modern Web Tech
WebGPU Optimization: Compute Shader Performance

Internal Documentation

Performance Benchmarks: Detailed measurements and optimization results
WASM Runtime Flags: WebAssembly configuration
FFT Implementation: FFT algorithm details
WebGL Visualization: GPU rendering techniques

FilesExpand file tree

performance-optimization.md

Latest commit

History

performance-optimization.md

File metadata and controls

Performance Optimization for Browser-Based SDR

Overview

Performance Targets

Minimum Requirements

Optimization Goals

Web Worker Architecture

Why Web Workers

Worker Structure

Multiple Workers

Transferable Objects

SharedArrayBuffer for Zero-Copy

Memory Management

Typed Arrays

Pre-allocation

Object Pooling

Garbage Collection

FFT Optimization

Library Selection

FFT Size Selection

Overlap Processing

Window Function Caching

WebAssembly SIMD Acceleration

WebGL Acceleration

Waterfall Display

Spectrum Display

WebGPU Compute Acceleration

Audio Processing Optimization

Web Audio API

AudioWorklet

Sample Rate Conversion

Decimation for Display

Rendering Optimization

Canvas vs WebGL

requestAnimationFrame

Batch Updates

Profiling and Debugging

Performance API

Chrome DevTools

Memory Profiling

Platform-Specific Optimization

Mobile Devices

Desktop

Common Pitfalls

❌ Don't: Allocate in hot loops

✅ Do: Reuse buffers

❌ Don't: Block main thread

✅ Do: Use workers

❌ Don't: Update every sample

✅ Do: Batch updates

Benchmarking

Test Suite

Target Performance Metrics

Acceptable Performance

Excellent Performance

Resources

Official Documentation

WebAssembly and SIMD

Performance Optimization Guides

Internal Documentation

See Also