Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
185 changes: 185 additions & 0 deletions investigation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# Investigation: Duplicate Source File Uploads in sentry-cli

## 🔍 Summary

**YES, duplicate uploads can occur**, but only in specific scenarios. The colleague's concern is valid for certain build configurations.

## 📋 Scenarios Analysis

### ✅ Scenario 1: Typical Modern Build (NO DUPLICATION)

**Setup:**
```
dist/
├── app.min.js (minified bundle)
└── app.min.js.map (sourcemap with sourcesContent)
```

**What happens:**
1. User runs: `sentry-cli sourcemaps upload dist/`
2. Files collected: `app.min.js` and `app.min.js.map` (both match default extensions)
3. Files uploaded: 2 files total
4. Original sources are embedded in `sourcesContent` field of the sourcemap
5. **Result: NO DUPLICATION** ✅

This is the most common scenario. Build tools (webpack, rollup, esbuild, etc.) typically output:
- Minified/bundled files in the output directory
- Sourcemaps with `sourcesContent` already populated
- Original source files remain in `src/` or similar directories (not uploaded)

### ⚠️ Scenario 2: Mixed Output Directory (DUPLICATION OCCURS)

**Setup:**
```
dist/
├── app.js (original unbundled source)
├── app.min.js (minified bundle)
└── app.min.js.map (sourcemap referencing app.js, with sourcesContent)
```

**What happens:**
1. User runs: `sentry-cli sourcemaps upload dist/`
2. Files collected: ALL `.js` and `.map` files (per `DEFAULT_EXTENSIONS`)
- `app.js` ✓
- `app.min.js` ✓
- `app.min.js.map` ✓
3. Files uploaded: 3 files total
4. **Result: DUPLICATION** ⚠️
- `app.js` is uploaded as a standalone file
- `app.js` is ALSO embedded in `app.min.js.map` via `sourcesContent`

### 🔍 Scenario 3: TypeScript/JSX Sources (NO DUPLICATION)

**Setup:**
```
dist/
├── app.tsx (original TypeScript source)
├── app.min.js (transpiled & minified)
└── app.min.js.map (sourcemap with sourcesContent)
```

**What happens:**
1. User runs: `sentry-cli sourcemaps upload dist/`
2. Files collected: Only `app.min.js` and `app.min.js.map`
3. **Result: NO DUPLICATION** ✅
- `app.tsx` is NOT uploaded (doesn't match `.js` or `.map` extensions)
- Original TypeScript source is only in `sourcesContent`

## 🛠️ Technical Details

### Default File Extensions

From `src/commands/sourcemaps/upload.rs:22`:
```rust
const DEFAULT_EXTENSIONS: &[&str] = &["js", "cjs", "mjs", "map", "jsbundle", "bundle"];
```

Any file matching these extensions in the upload path will be collected and uploaded.

### The Rewrite Process

When `--no-rewrite` is NOT set (default behavior), `processor.rewrite()` is called with:

```rust
let options = sourcemap::RewriteOptions {
load_local_source_contents: true,
strip_prefixes: prefixes,
..Default::default()
};
```

The `load_local_source_contents: true` option tells the sourcemap library to:
1. Read the original source files from disk (if they exist)
2. Embed them into the sourcemap's `sourcesContent` field
3. This happens even if `sourcesContent` already exists

However, this does NOT prevent the original files from being uploaded if they were already collected.

### Upload Behavior

From the code flow in `src/utils/sourcemaps.rs` and `src/utils/file_upload.rs`:

1. **Collection phase**: All files matching extensions are added to `processor.sources`
2. **Rewrite phase**: Sourcemaps get their `sourcesContent` populated/updated
3. **Upload phase**: ALL files in `processor.sources` are uploaded

The validation logic in `validate_regular()` (line 1236) shows this clearly:
```rust
if sm.get_source_contents(idx).is_some() || source_urls.contains(source_url) {
info!("validator found source ({source_url})");
}
```

A source is considered valid if it's EITHER in `sourcesContent` OR in the uploaded files. Both conditions can be true simultaneously.

## 📊 Evidence from Tests

From `tests/integration/_cases/sourcemaps/sourcemaps-upload-some-debugids.trycmd`:
- Input: `tests/integration/_fixtures/upload_some_debugids` (20 files)
- Output: "Bundled 20 files for upload"
- Breakdown:
- 13 Scripts (.js files)
- 7 Source Maps (.map files)

The sourcemap `server/chunks/1.js.map` contains:
- 163 sources in the `sources` field
- 163 entries in `sourcesContent` (all populated, not null)
- But the original source files (e.g., `node_modules/@sentry/core/build/cjs/api.js`) do NOT exist in the fixture directory

This demonstrates typical behavior where sourcemaps reference files via `webpack://` URLs that don't exist locally, so no duplication occurs.

## 🎯 When Does Duplication Occur?

Duplication happens when **all** of these conditions are met:

1. ✅ Original source files have `.js`, `.cjs`, or `.mjs` extensions
2. ✅ Original source files are in the same directory tree as the upload path
3. ✅ Sourcemaps already contain `sourcesContent` for these files
4. ✅ User uploads the entire directory without filtering

## 💡 Likelihood Assessment

**Low to Medium** likelihood in practice because:

- Modern build tools (webpack, vite, rollup, esbuild) typically output minified files separate from source files
- Most projects have `dist/` or `build/` folders containing only build outputs
- TypeScript/JSX sources don't match the default extensions
- Sourcemaps often reference sources via `webpack://` or similar URLs that don't exist as local files

However, it **can occur** if:
- Build configuration outputs both bundled and unbundled JS files to the same directory
- Developer explicitly copies source files to the dist folder
- Using a build setup that preserves original `.js` files alongside minified versions

## 🔧 Potential Solutions

If this is deemed a problem worth addressing:

### Option 1: Filter out files already in sourcesContent
Before uploading, check if a file is already embedded in any sourcemap's `sourcesContent` and skip uploading it as a standalone file.

**Pros:** Reduces upload size and duplication
**Cons:** Complex logic; might break edge cases where both are intentionally needed

### Option 2: Document expected usage patterns
Clarify in documentation that users should:
- Only upload their `dist/` or `build/` directories
- Not mix original sources with build outputs in the upload path
- Use `--ext` flag to be more selective if needed

**Pros:** Simple, no code changes needed
**Cons:** Relies on user behavior

### Option 3: Add a flag like `--skip-embedded-sources`
Allow users to opt into skipping files that are already embedded in sourcemaps.

**Pros:** Gives users control; backward compatible
**Cons:** Adds complexity; another flag to understand

## 📝 Conclusion

The colleague's observation is **correct**: sentry-cli can upload sources twice when both the original source files and sourcemaps (with sourcesContent) are in the upload path. However, this is **not the typical use case** due to how modern build tools organize their output.

The current behavior is technically correct (uploading what was requested) but could be optimized for the edge case where duplication occurs.

**Recommendation:** Document the expected usage pattern and potentially add a warning when duplication is detected, rather than changing the default behavior which might break existing workflows.
Loading