Fix: Robust SEC Filing Retrieval via Local Caching & Deep Scanning#147
Fix: Robust SEC Filing Retrieval via Local Caching & Deep Scanning#147LutherAbel wants to merge 3 commits intovirattt:mainfrom
Conversation
This PR overhauls the SEC scraping logic to eliminate 402/403 errors and handle non-standard filing formats. Key Changes: 1. Offline Ticker Cache: Added `company_tickers.json` as a static asset. The scraper now prioritizes reading CIKs from disk to prevent API rate-limiting and IP bans. 2. Deep Scanning Logic: Increased filing scan depth from 5 to 80 in `sec_scraper.ts`. This ensures filings for active companies (e.g., SOFI, ENTG) aren't pushed out of view by high volumes of Form 4s. 3. Hard Fallback Interceptor: Updated `financial-search.ts` to intercept "402 Payment Required" errors. Failed API calls now automatically force-route to the direct SEC scraper. 4. Brute Force Payload Discovery: Improved logic to find exhibit links (e.g., 99.1 Press Releases) even in non-standard HTML structures. 5. System Prompt Restoration: Restored the full financial routing prompt while maintaining the new SEC intercept rules.
|
Note on CI Failure: I noticed the CI build failed due to a frozen-lockfile error. This is because I modified package.json to fix the execution script (switching to tsx), but I am working in a Node.js/NPM environment and cannot regenerate the bun.lockb binary lockfile. Please run bun install to update the lockfile on your end. The logic changes themselves are verified and working. |
|
Hey @virattt, I saw your excellent deep-dive on X regarding Dexter's architecture. The Subagent design and the "raw data straight to LLM" philosophy are incredibly elegant. I built this PR specifically to harden the financial-search subagent you mentioned. It adds an offline CIK cache and a "Hard Fallback" mechanism to ensure the agent doesn't crash on 402/403 errors, keeping the data retrieval resilient even when APIs fail. It fully aligns with your design of keeping the decision space small and letting the subagent handle the heavy lifting of parsing non-standard SEC exhibits (like NBIS or SOFI Form 4s). I know you're busy building FD. Let me know if you'd like to integrate this resilience layer into the core, or if I should just close this PR to keep your roadmap clean. Cheers! |
|
@virattt could you take a look pls |
Description:
This PR addresses the "402 Payment Required" error that occurs when users do not have a paid subscription to the Financial Datasets API.
The Goal:
To allow the agent to seamlessly retrieve SEC filings (8-K, 6-K) directly from EDGAR when the paid API is unavailable, instead of failing or falling back to a generic web search.
Key Changes:
Verification:
Verified that queries like "search NBIS 6-k" and "search SOFI 8-k" now return correct data even without a paid API key.