feat: add Facebook video extractor by eagle1maledetto · Pull Request #38 · govdbot/govd

eagle1maledetto · 2026-03-09T21:10:04Z

Summary

Adds a new extractor for downloading videos from Facebook, supporting all common URL formats:

Reels: facebook.com/reel/<id>
Videos: facebook.com/videos/<id>, facebook.com/<user>/videos/<id>
Watch: facebook.com/watch/?v=<id> (with v= in any query parameter position)
Posts: facebook.com/posts/<id>, facebook.com/<user>/posts/<id>
Share links: facebook.com/share/r|v|p/<id> (resolved via HTTP redirect)
Mobile: m.facebook.com and mbasic.facebook.com variants

How it works

The extractor fetches the Facebook page HTML using auth cookies (required — loaded automatically from private/cookies/facebook.txt in Netscape format, same mechanism already used by the Twitter extractor) and a Chrome-like User-Agent.

Video URLs are extracted from the embedded JSON data in the HTML by matching progressive_url entries with "quality": "HD" or "quality": "SD" metadata. Both HD and SD formats are provided when available, letting the downstream format selector choose the best quality.

Video-ID-anchored extraction

Facebook pages embed data for multiple videos in a single HTML response (feed recommendations, related content, etc.). A naive global regex match on progressive_url would return the first occurrence, which often belongs to a different video than the one requested.

To solve this, the findVideoSection() function scopes the regex search to the correct video's data block:

Locates the dash_mpd_debug.mpd?v=<videoID> anchor — unique per video within the videoDeliveryResponseResult JSON structure
Bounds the search to the section ending at "id":"<videoID>"
Applies HD/SD extraction regex only within this scoped section
Falls back to full-body search for single-video pages (e.g., direct reel links)

Watch URL normalization

/watch/?v=XXX pages return HTML with inconsistent/misleading video data when scraped. These URLs are normalized to /reel/XXX before fetching, which yields reliable single-video pages. This is transparent to the user — they submit a watch URL and get the correct video.

Share link handling

ShareExtractor is registered before the main Extractor to match /share/r|v|p/<id> URLs first. It follows the HTTP redirect to the canonical Facebook URL, then returns it for re-matching against the main extractor pattern.

Files

File	Description
`internal/extractors/facebook/main.go`	`ShareExtractor` + `Extractor` definitions, `GetMedia()`, `buildMedia()`
`internal/extractors/facebook/models.go`	`VideoData` struct (HD/SD URLs, title, dimensions)
`internal/extractors/facebook/util.go`	`GetVideoData()`, `parseVideoFromBody()`, `findVideoSection()`, URL/Unicode unescape helpers
`internal/extractors/main.go`	Registration of both extractors in the global `Extractors` slice

Compliance with project standards

File structure: follows the standard 3-file extractor pattern (main.go, models.go, util.go) used by all existing extractors
Cookie handling: uses the existing private/cookies/<id>.txt Netscape format mechanism (same as Twitter)
Extractor registration: added to internal/extractors/main.go with ShareExtractor before Extractor (order matters for URL matching)
Commit style: Conventional Commits (feat: ...)
No new dependencies: uses only bytes, regexp, strings, and existing project packages
No tests: consistent with the rest of the codebase (no *_test.go files exist in the project)

Configuration

Requires a private/cookies/facebook.txt file with valid Facebook session cookies in Netscape format. The following cookies are needed: datr, sb, c_user, fr, xs.

Additionally, it is recommended to enable TLS fingerprint impersonation (impersonate: true) in the extractor YAML config to avoid detection by Facebook's bot protection.

Add a new extractor for Facebook video URLs, supporting: - /reel/<id> (Reels) - /videos/<id> and /video/<id> - /watch/?v=<id> (Watch, with any query parameter order) - /posts/<id> and /<user>/videos/<id> variants - /share/r|v|p/<id> (Share short links, resolved via redirect) - mobile (m.facebook.com) and mbasic variants ## Architecture Follows the standard extractor structure (`main.go`, `models.go`, `util.go`) consistent with existing extractors like TikTok, Twitter, and Instagram. ### Files - `main.go`: defines two extractors: - `ShareExtractor` — handles /share/ URLs by following the redirect to the canonical Facebook URL, then delegating to the main extractor. - `Extractor` — handles all other video URL patterns. Requires auth cookies (loaded automatically from `private/cookies/facebook.txt` in Netscape format, same mechanism used by Twitter). - `models.go`: minimal `VideoData` struct holding HD/SD URLs, title, and dimensions. - `util.go`: core scraping logic. Fetches the Facebook page HTML and extracts video URLs from the embedded JSON data using regex patterns matching `progressive_url` entries with HD/SD quality metadata. ### Key design decisions **Video-ID-anchored extraction**: Facebook pages embed multiple videos (feed recommendations, related content) in a single HTML response. A naive global regex match on `progressive_url` returns the FIRST occurrence, which often belongs to a different video. The `findVideoSection()` function solves this by: 1. Locating the `dash_mpd_debug.mpd?v=<videoID>` anchor specific to the requested video's delivery data block. 2. Bounding the search to the section ending at `"id":"<videoID>"`. 3. Applying HD/SD regex only within this scoped section. 4. Falling back to full-body search for pages with a single video. **Watch URL normalization**: `/watch/?v=XXX` pages return inconsistent video data when scraped. These URLs are converted to `/reel/XXX` which yields reliable, single-video pages. ## Compliance - Standard 3-file extractor structure (main.go, models.go, util.go) - Cookie loading via existing `private/cookies/<id>.txt` mechanism - Registered in `internal/extractors/main.go` (ShareExtractor first to match /share/ URLs before the general pattern) - Conventional Commits style - No new dependencies

stefanodvx merged commit c49232d into govdbot:main Mar 13, 2026
3 checks passed

eagle1maledetto deleted the feat/facebook-extractor branch March 13, 2026 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Facebook video extractor#38

feat: add Facebook video extractor#38
stefanodvx merged 1 commit intogovdbot:mainfrom
eagle1maledetto:feat/facebook-extractor

eagle1maledetto commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eagle1maledetto commented Mar 9, 2026

Summary

How it works

Video-ID-anchored extraction

Watch URL normalization

Share link handling

Files

Compliance with project standards

Configuration

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants