Real-time signal processing in browsers presents unique challenges. This guide covers optimization strategies for maintaining high performance.
- FFT computation: <16ms per frame (60 FPS)
- Audio processing: <2ms latency for real-time
- Waterfall update: 30-60 FPS smooth scrolling
- UI responsiveness: <100ms for user interactions
- Maintain 60 FPS spectrum/waterfall updates
- Process 2+ MS/s sample rate in real-time
- Keep CPU usage <50% on target hardware
- Minimize memory allocations/garbage collection
Problem: JavaScript is single-threaded. Heavy DSP blocks UI.
Solution: Offload processing to background threads.
// main.js - UI thread
const dspWorker = new Worker("dsp-worker.js");
// Send samples to worker
dspWorker.postMessage(
{
type: "process",
samples: audioBuffer,
sampleRate: 48000,
},
[audioBuffer.buffer],
); // Transfer ownership
// Receive results
dspWorker.onmessage = (e) => {
const { spectrum, audio } = e.data;
updateDisplay(spectrum);
playAudio(audio);
};// dsp-worker.js - Worker thread
self.onmessage = (e) => {
const { type, samples, sampleRate } = e.data;
if (type === "process") {
const spectrum = computeFFT(samples);
const audio = demodulate(samples);
self.postMessage(
{
spectrum,
audio,
},
[spectrum.buffer, audio.buffer],
); // Transfer back
}
};Strategy: Parallel processing pipeline
const fftWorker = new Worker('fft-worker.js');
const demodWorker = new Worker('demod-worker.js');
const audioWorker = new Worker('audio-worker.js');
// Pipeline
rawSamples → fftWorker → spectrum display
↘ demodWorker → audioWorker → audio outputCritical: Use transferable objects to avoid copying.
// ❌ SLOW - Copies data
worker.postMessage({ buffer: myFloat32Array });
// ✅ FAST - Transfers ownership
worker.postMessage({ buffer: myFloat32Array }, [myFloat32Array.buffer]);Note: After transfer, original is neutered (length = 0).
Advanced technique for maximum performance (requires HTTPS + security headers):
SharedArrayBuffer allows multiple workers and main thread to access the same memory without copying, eliminating transfer overhead entirely.
Browser Requirements:
- HTTPS deployment
Cross-Origin-Opener-Policy: same-originheaderCross-Origin-Embedder-Policy: require-corpheader- Chrome 92+, Firefox 79+, Safari 15.2+
Zero-Copy Pattern:
// Create shared buffer
const sharedBuffer = new SharedArrayBuffer(1024 * 4); // 1024 floats
const sharedArray = new Float32Array(sharedBuffer);
// Pass to worker (no copy!)
worker.postMessage({ sharedBuffer });
// Worker receives same memory
// worker.js
self.onmessage = (e) => {
const samples = new Float32Array(e.data.sharedBuffer);
// Process directly - no copy needed
for (let i = 0; i < samples.length; i++) {
samples[i] = processFFT(samples[i]);
}
// Signal completion
Atomics.notify(samples, 0, 1);
};
// Main thread reads results immediately
Atomics.wait(sharedArray, 0, 0);
const processedData = sharedArray; // Already updated!Synchronization with Atomics:
// Ring buffer for continuous streaming
class SharedRingBuffer {
constructor(size) {
this.buffer = new SharedArrayBuffer((size + 2) * 4);
this.data = new Float32Array(this.buffer, 8); // Skip header
this.header = new Int32Array(this.buffer, 0, 2);
this.size = size;
}
write(samples) {
const writePos = Atomics.load(this.header, 0);
for (let i = 0; i < samples.length; i++) {
this.data[(writePos + i) % this.size] = samples[i];
}
Atomics.store(this.header, 0, (writePos + samples.length) % this.size);
Atomics.notify(this.header, 0, 1);
}
read(count) {
const readPos = Atomics.load(this.header, 1);
const samples = new Float32Array(count);
for (let i = 0; i < count; i++) {
samples[i] = this.data[(readPos + i) % this.size];
}
Atomics.store(this.header, 1, (readPos + count) % this.size);
return samples;
}
}Performance Benefits:
| Transfer Method | Latency | Throughput (1MB) |
|---|---|---|
| postMessage (copy) | 1-5ms | ~200 MB/s |
| Transferable | 0.1-0.5ms | ~2 GB/s |
| SharedArrayBuffer | <0.01ms | ~10+ GB/s |
Use Cases:
- Real-time SDR sample streaming (20+ MS/s)
- Multi-worker parallel processing
- Low-latency audio pipelines
- High-frequency data visualization
Security Considerations:
- Requires cross-origin isolation (Spectre mitigation)
- May not work in embedded contexts (iframes)
- Server configuration needed
Resources:
Always use typed arrays for numerical data:
// ✅ GOOD
const samples = new Float32Array(1024);
const spectrum = new Float32Array(512);
// ❌ BAD
const samples = [];
const spectrum = [];Avoid allocations in hot paths:
// ✅ GOOD - Allocate once
class FFTProcessor {
constructor(size) {
this.buffer = new Float32Array(size);
this.window = this.createWindow(size);
this.fftReal = new Float32Array(size);
this.fftImag = new Float32Array(size);
}
process(input) {
// Reuse buffers
for (let i = 0; i < input.length; i++) {
this.buffer[i] = input[i] * this.window[i];
}
// ... process
}
}
// ❌ BAD - Allocates every call
function process(input) {
const buffer = new Float32Array(input.length);
const window = createWindow(input.length);
// ...
}Reuse objects instead of creating new ones:
class BufferPool {
constructor(size, count) {
this.size = size;
this.available = [];
this.inUse = new Set();
for (let i = 0; i < count; i++) {
this.available.push(new Float32Array(size));
}
}
acquire() {
if (this.available.length === 0) {
console.warn("Pool exhausted");
return new Float32Array(this.size);
}
const buffer = this.available.pop();
this.inUse.add(buffer);
return buffer;
}
release(buffer) {
this.inUse.delete(buffer);
this.available.push(buffer);
}
}
// Usage
const pool = new BufferPool(4096, 10);
const buffer = pool.acquire();
// ... use buffer
pool.release(buffer);Minimize GC pressure:
- Avoid allocations in loops
- Reuse buffers
- Use object pools
- Clear large arrays when done:
array.fill(0)
Options:
- fft.js: Pure JS, fast, well-tested
- kiss-fft.js: Port of KISS FFT
- dsp.js: Full DSP suite
- Web Audio FFT: Built-in, but limited
Recommendation: fft.js for flexibility, Web Audio for simplicity
Trade-offs:
- Larger FFT = Better frequency resolution, more CPU
- Power of 2 sizes are much faster
- Common: 256, 512, 1024, 2048, 4096
Adaptive sizing:
function selectFFTSize(sampleRate, desiredResolution) {
const minSize = Math.ceil(sampleRate / desiredResolution);
return Math.pow(2, Math.ceil(Math.log2(minSize)));
}Technique: Process overlapping windows for smoother display.
class OverlapProcessor {
constructor(fftSize, overlap = 0.5) {
this.fftSize = fftSize;
this.hopSize = Math.floor(fftSize * (1 - overlap));
this.buffer = new Float32Array(fftSize);
this.position = 0;
}
process(newSamples) {
const results = [];
for (let i = 0; i < newSamples.length; i++) {
this.buffer[this.position++] = newSamples[i];
if (this.position >= this.fftSize) {
results.push(this.computeFFT(this.buffer));
// Shift buffer by hop size
this.buffer.copyWithin(0, this.hopSize);
this.position -= this.hopSize;
}
}
return results;
}
}Pre-compute window once:
class WindowCache {
constructor() {
this.cache = new Map();
}
getWindow(type, size) {
const key = `${type}-${size}`;
if (!this.cache.has(key)) {
this.cache.set(key, this.computeWindow(type, size));
}
return this.cache.get(key);
}
computeWindow(type, size) {
const window = new Float32Array(size);
// ... compute window
return window;
}
}Advanced optimization for modern browsers (2024+):
WebAssembly SIMD (Single Instruction, Multiple Data) provides 2-4x additional speedup on top of regular WASM by performing operations on multiple data points simultaneously.
Browser Support:
- Chrome 91+ ✅
- Firefox 89+ ✅
- Safari 16.4+ ✅
- Edge 91+ ✅
Feature Detection:
// Example function - actual implementation: isWasmSIMDSupported() in src/utils/dspWasm.ts
function detectWasmSIMD() {
try {
// Minimal WASM module with SIMD instruction
return WebAssembly.validate(
new Uint8Array([
0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127, 3, 2, 1, 0, 10, 10,
1, 8, 0, 65, 0, 253, 12, 253, 98, 11,
]),
);
} catch {
return false;
}
}
if (detectWasmSIMD()) {
// Load SIMD-optimized WASM module
import("./dsp-simd.wasm");
} else {
// Fallback to regular WASM
import("./dsp.wasm");
}Implementation Strategies:
- Vector Operations: Process 4 float32 values per instruction
- Butterfly FFT: SIMD-optimized complex multiply-add
- Parallel Windowing: Apply window to 4+ samples simultaneously
- SIMD Intrinsics: Use v128 types in AssemblyScript/Rust
Example - SIMD FFT (AssemblyScript):
// Load 4 samples at once
let samples = v128.load(samplesPtr);
// Multiply by twiddle factors (4 parallel operations)
let real = f32x4.mul(samples, twiddleReal);
let imag = f32x4.mul(samples, twiddleImag);
// Complex rotation
let result = f32x4.add(real, imag);
// Store result
v128.store(outputPtr, result);Performance Gains:
| Operation | Regular WASM | WASM + SIMD | Total Speedup |
|---|---|---|---|
| FFT 2048 | 6-8ms | 2-3ms | 8-10x vs JS |
| FFT 4096 | 25-30ms | 8-12ms | 7-9x vs JS |
| Windowing | 0.1-0.15ms | 0.03-0.05ms | 4-7x vs JS |
Limitations:
- No runtime detection in WASM itself (must build separate modules)
- Alignment requirements (16-byte boundaries)
- Limited to 128-bit wide vectors
- Trap on unsupported platforms (careful fallback needed)
Resources:
Use GPU for rendering large waterfall displays:
class WebGLWaterfall {
constructor(canvas, width, height) {
this.gl = canvas.getContext("webgl2");
this.width = width;
this.height = height;
this.texture = this.createTexture();
this.setupShaders();
}
updateLine(spectrumData) {
// Scroll texture up
this.gl.copyTexSubImage2D(
this.gl.TEXTURE_2D,
0,
0,
1,
0,
0,
this.width,
this.height - 1,
);
// Add new line at bottom
this.gl.texSubImage2D(
this.gl.TEXTURE_2D,
0,
0,
0,
this.width,
1,
this.gl.LUMINANCE,
this.gl.UNSIGNED_BYTE,
spectrumData,
);
this.render();
}
}GPU-accelerated line rendering:
// Upload spectrum data as texture
const texture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.texImage2D(
gl.TEXTURE_2D,
0,
gl.R32F,
spectrumLength,
1,
0,
gl.RED,
gl.FLOAT,
spectrumData,
);
// Vertex shader samples texture, renders lineNext-generation GPU computing (2024+ browsers):
WebGPU provides direct access to GPU compute capabilities, enabling 5-10x speedup for large FFTs and parallel DSP operations compared to WASM.
Browser Support:
- Chrome 113+ ✅
- Edge 113+ ✅
- Safari 18+ ✅
- Firefox: In development 🔄
Feature Detection:
async function initWebGPU() {
if (!navigator.gpu) {
console.warn("WebGPU not supported");
return null;
}
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) {
console.warn("No GPU adapter found");
return null;
}
const device = await adapter.requestDevice();
return device;
}Compute Shader FFT Example:
// WGSL compute shader for FFT butterfly operation
@group(0) @binding(0) var<storage, read_write> data: array<vec2<f32>>;
@group(0) @binding(1) var<storage, read> twiddles: array<vec2<f32>>;
@compute @workgroup_size(64)
fn fft_butterfly(@builtin(global_invocation_id) id: vec3<u32>) {
let idx = id.x;
let half_size = arrayLength(&data) / 2u;
if (idx >= half_size) {
return;
}
// Butterfly operation
let even = data[idx];
let odd = data[idx + half_size];
let twiddle = twiddles[idx];
// Complex multiplication: odd * twiddle
let odd_mul = vec2<f32>(
odd.x * twiddle.x - odd.y * twiddle.y,
odd.x * twiddle.y + odd.y * twiddle.x
);
// Combine
data[idx] = even + odd_mul;
data[idx + half_size] = even - odd_mul;
}JavaScript Integration:
class WebGPUFFT {
async initialize(device, fftSize) {
this.device = device;
this.fftSize = fftSize;
// Create buffers
this.dataBuffer = device.createBuffer({
size: fftSize * 8, // vec2<f32> = 8 bytes
usage:
GPUBufferUsage.STORAGE |
GPUBufferUsage.COPY_DST |
GPUBufferUsage.COPY_SRC,
});
// Load compute shader
const module = device.createShaderModule({
code: fftShaderCode,
});
// Create compute pipeline
this.pipeline = device.createComputePipeline({
layout: "auto",
compute: {
module,
entryPoint: "fft_butterfly",
},
});
// Create bind group
this.bindGroup = device.createBindGroup({
layout: this.pipeline.getBindGroupLayout(0),
entries: [
{ binding: 0, resource: { buffer: this.dataBuffer } },
{ binding: 1, resource: { buffer: this.twiddleBuffer } },
],
});
}
async compute(samples) {
// Upload data
this.device.queue.writeBuffer(this.dataBuffer, 0, samples);
// Create command encoder
const encoder = this.device.createCommandEncoder();
const pass = encoder.beginComputePass();
// Dispatch compute shader
pass.setPipeline(this.pipeline);
pass.setBindGroup(0, this.bindGroup);
pass.dispatchWorkgroups(Math.ceil(this.fftSize / 64));
pass.end();
// Submit
this.device.queue.submit([encoder.finish()]);
// Read results
const resultBuffer = this.device.createBuffer({
size: this.dataBuffer.size,
usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
});
const copyEncoder = this.device.createCommandEncoder();
copyEncoder.copyBufferToBuffer(
this.dataBuffer,
0,
resultBuffer,
0,
this.dataBuffer.size,
);
this.device.queue.submit([copyEncoder.finish()]);
await resultBuffer.mapAsync(GPUMapMode.READ);
const result = new Float32Array(resultBuffer.getMappedRange());
resultBuffer.unmap();
return result;
}
}Performance Characteristics:
| Operation | WASM (ms) | WebGPU (ms) | Speedup |
|---|---|---|---|
| FFT 2048 | 6-8 | 0.5-1.0 | 8-12x |
| FFT 4096 | 25-30 | 2-4 | 8-12x |
| FFT 8192 | 100-120 | 8-12 | 10-12x |
| FFT 16384 | 400-500 | 25-35 | 12-15x |
Best Practices:
- Batch Operations: Process multiple FFTs in one shader invocation
- Workgroup Size: Tune for target GPU (typically 64-256)
- Memory Management: Reuse buffers, avoid allocations
- Async Patterns: Use timestamps for profiling
- Fallback Path: Always provide WASM/JS fallback
Direct Rendering Pipeline:
// Compute FFT and render in single GPU pass
class WebGPUSpectrumAnalyzer {
async renderFrame(samples) {
const encoder = this.device.createCommandEncoder();
// 1. Compute pass: FFT
const computePass = encoder.beginComputePass();
computePass.setPipeline(this.fftPipeline);
computePass.setBindGroup(0, this.fftBindGroup);
computePass.dispatchWorkgroups(this.workgroupCount);
computePass.end();
// 2. Render pass: Visualize
const renderPass = encoder.beginRenderPass({
colorAttachments: [
{
view: context.getCurrentTexture().createView(),
loadOp: "clear",
storeOp: "store",
},
],
});
renderPass.setPipeline(this.renderPipeline);
renderPass.setBindGroup(0, this.renderBindGroup); // Uses FFT output
renderPass.draw(vertexCount);
renderPass.end();
// Submit both passes together
this.device.queue.submit([encoder.finish()]);
}
}Optimization Tips:
- Use timestamp queries for profiling GPU work
- Minimize CPU-GPU synchronization
- Leverage shared memory within workgroups
- Consider relaxed memory ordering for non-critical data
- Profile with browser GPU debuggers (Chrome DevTools, Safari Web Inspector)
Resources:
Use native audio graph for efficiency:
const audioContext = new AudioContext();
// Create processing chain
const source = audioContext.createMediaStreamSource(stream);
const filter = audioContext.createBiquadFilter();
const analyser = audioContext.createAnalyser();
const gain = audioContext.createGain();
// Connect
source.connect(filter);
filter.connect(analyser);
analyser.connect(gain);
gain.connect(audioContext.destination);
// Configure
filter.type = "lowpass";
filter.frequency.value = 3000;
analyser.fftSize = 2048;For custom DSP, use AudioWorklet (not ScriptProcessor):
// main.js
await audioContext.audioWorklet.addModule("demodulator.js");
const demodNode = new AudioWorkletNode(context, "demodulator");
// demodulator.js
class DemodulatorProcessor extends AudioWorkletProcessor {
process(inputs, outputs, parameters) {
const input = inputs[0][0];
const output = outputs[0][0];
for (let i = 0; i < input.length; i++) {
output[i] = this.demodulate(input[i]);
}
return true;
}
}
registerProcessor("demodulator", DemodulatorProcessor);Use OfflineAudioContext for resampling:
async function resample(buffer, targetRate) {
const ctx = new OfflineAudioContext(
buffer.numberOfChannels,
buffer.duration * targetRate,
targetRate,
);
const source = ctx.createBufferSource();
source.buffer = buffer;
source.connect(ctx.destination);
source.start();
return await ctx.startRendering();
}Don't render every sample:
function decimateForDisplay(data, targetPoints) {
if (data.length <= targetPoints) return data;
const result = new Float32Array(targetPoints);
const step = data.length / targetPoints;
for (let i = 0; i < targetPoints; i++) {
const start = Math.floor(i * step);
const end = Math.floor((i + 1) * step);
// Min-max decimation preserves peaks
let min = Infinity,
max = -Infinity;
for (let j = start; j < end; j++) {
if (data[j] < min) min = data[j];
if (data[j] > max) max = data[j];
}
result[i] = i % 2 === 0 ? min : max;
}
return result;
}Canvas 2D:
- ✅ Simple API
- ✅ Good for <1000 points
- ❌ Slow for complex scenes
WebGL:
- ✅ Very fast for large data
- ✅ GPU accelerated
- ❌ More complex
Use Canvas for UI, WebGL for spectrum/waterfall
Sync updates with display refresh:
class DisplayManager {
constructor() {
this.needsUpdate = false;
this.isAnimating = false;
}
requestUpdate() {
this.needsUpdate = true;
if (!this.isAnimating) {
this.isAnimating = true;
requestAnimationFrame(() => this.render());
}
}
render() {
if (this.needsUpdate) {
this.updateDisplay();
this.needsUpdate = false;
}
this.isAnimating = false;
}
}Group DOM operations:
// ❌ BAD - Multiple reflows
for (let i = 0; i < 100; i++) {
element.style.top = positions[i] + "px";
}
// ✅ GOOD - Single update
const updates = positions.map((p) => `top: ${p}px`).join(";");
element.style.cssText = updates;
// ✅ BETTER - Use transform (GPU accelerated)
element.style.transform = `translateY(${position}px)`;Measure critical paths:
performance.mark("fft-start");
const spectrum = computeFFT(samples);
performance.mark("fft-end");
performance.measure("fft", "fft-start", "fft-end");
const measure = performance.getEntriesByName("fft")[0];
console.log(`FFT took ${measure.duration.toFixed(2)}ms`);Profile CPU usage:
- Open DevTools > Performance
- Record during operation
- Look for:
- Long tasks (>50ms)
- GC pauses
- Layout thrashing
Check for leaks:
- DevTools > Memory
- Take heap snapshot
- Compare before/after
- Look for detached objects
Considerations:
- Lower CPU/GPU power
- Battery constraints
- Smaller screens
Strategies:
- Reduce FFT size (512-1024)
- Lower update rate (30 FPS)
- Simplify visualizations
- Use more aggressive decimation
Take advantage of:
- Multiple cores (more workers)
- Larger FFT sizes (4096-8192)
- Higher resolution displays
- WebGL 2.0 features
for (let i = 0; i < 1000; i++) {
const temp = new Float32Array(1024);
process(temp);
}const temp = new Float32Array(1024);
for (let i = 0; i < 1000; i++) {
process(temp);
}const spectrum = heavyFFT(largeSamples); // Freezes UI
updateDisplay(spectrum);worker.postMessage({ samples: largeSamples });
worker.onmessage = (e) => updateDisplay(e.data.spectrum);samples.forEach((s) => updateDisplay(s));let buffer = [];
samples.forEach((s) => {
buffer.push(s);
if (buffer.length >= BATCH_SIZE) {
updateDisplay(buffer);
buffer = [];
}
});class PerformanceTest {
async runTests() {
const sizes = [256, 512, 1024, 2048, 4096];
for (const size of sizes) {
const samples = new Float32Array(size);
const iterations = 1000;
const start = performance.now();
for (let i = 0; i < iterations; i++) {
computeFFT(samples);
}
const end = performance.now();
const avg = (end - start) / iterations;
console.log(`FFT ${size}: ${avg.toFixed(2)}ms avg`);
}
}
}- FFT 2048: <5ms
- FFT 4096: <10ms
- Spectrum render: <16ms (60 FPS)
- Waterfall update: <16ms
- Total processing: <50% CPU
- FFT 2048: <2ms
- FFT 4096: <5ms
- Spectrum render: <8ms (120 FPS capable)
- Waterfall update: <8ms
- Total processing: <30% CPU
- Web Workers: MDN Web Workers API
- SharedArrayBuffer: MDN SharedArrayBuffer
- WebGL: WebGL Fundamentals
- WebGPU: WebGPU Fundamentals
- Audio Worklet: MDN AudioWorklet
- Performance API: MDN Performance
- WebAssembly: Official Specification
- WASM SIMD: V8 SIMD Features
- RustFFT WASM: WASM SIMD Implementation
- pffft.wasm: Fast FFT Library
- WebFFT: Meta-Library for FFT
- AssemblyScript: TypeScript to WASM
- Web.dev Performance: web.dev/performance
- High-Performance JS: Off-Main-Thread
- Signal Analyzer Example: Building with Modern Web Tech
- WebGPU Optimization: Compute Shader Performance
- Performance Benchmarks: Detailed measurements and optimization results
- WASM Runtime Flags: WebAssembly configuration
- FFT Implementation: FFT algorithm details
- WebGL Visualization: GPU rendering techniques
- Performance Benchmarks Documentation - Comprehensive benchmark results and optimization roadmap
- Visualization Performance Guide - Detailed visualization optimization patterns