fix(auth): replace raw Cookie header with httpx cookie jar#287
fix(auth): replace raw Cookie header with httpx cookie jar#287MCR-GLOBAL wants to merge 2 commits intoteng-lin:mainfrom
Conversation
) httpx drops raw Cookie headers on cross-domain redirects by design. When Google's short-lived cookies expire (~10 min), Google redirects notebooklm.google.com -> accounts.google.com, and the raw Cookie header is lost, causing auth failure. Replace raw Cookie headers with httpx.Cookies cookie jar in both fetch_tokens() and ClientCore.open(). The cookie jar correctly sends domain-appropriate cookies on cross-domain redirects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughCookie handling switched from injecting a raw Changes
Sequence Diagram(s)(No sequence diagram generated.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request replaces raw Cookie headers with httpx.Cookies jars in _core.py and auth.py to ensure authentication persists during cross-domain redirects to accounts.google.com. Review feedback suggests centralizing the cookie jar creation logic within the AuthTokens class to reduce code duplication and recommends updating the existing client cookie jar directly in update_auth_headers for better efficiency and idiomatic code usage.
| jar = httpx.Cookies() | ||
| for name, value in self.auth.cookies.items(): | ||
| jar.set(name, value, domain=".google.com") |
There was a problem hiding this comment.
The logic for creating an httpx.Cookies jar from a dictionary with a hardcoded .google.com domain is duplicated in multiple places (here, in update_auth_headers, and in fetch_tokens in auth.py). To improve maintainability and ensure consistency, consider centralizing this logic. Since AuthTokens is the primary container for these cookies, adding a method like to_cookie_jar() to that class would be a cleaner approach.
There was a problem hiding this comment.
Addressed in commit 057d063: centralized cookie jar creation into _build_cookie_jar() helper in auth.py, with AuthTokens.to_cookie_jar() method. Also update cookies in place in update_auth_headers().
| jar = httpx.Cookies() | ||
| for name, value in self.auth.cookies.items(): | ||
| jar.set(name, value, domain=".google.com") | ||
| self._http_client.cookies = jar |
There was a problem hiding this comment.
Instead of creating a new httpx.Cookies object and overwriting the client's cookie jar, you can update the existing jar directly. This avoids the overhead of creating a new object and is more idiomatic when updating an existing client's state.
| jar = httpx.Cookies() | |
| for name, value in self.auth.cookies.items(): | |
| jar.set(name, value, domain=".google.com") | |
| self._http_client.cookies = jar | |
| for name, value in self.auth.cookies.items(): | |
| self._http_client.cookies.set(name, value, domain=".google.com") |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/notebooklm/_core.py (1)
141-147: Extract cookie-jar construction into one helper to prevent behavior drift.
open()andupdate_auth_headers()duplicate the same jar-building logic. Centralizing this will keep auth behavior consistent when cookie handling changes again.♻️ Suggested refactor
class ClientCore: + def _build_auth_cookie_jar(self) -> httpx.Cookies: + jar = httpx.Cookies() + for name, value in self.auth.cookies.items(): + jar.set(name, value, domain=".google.com") + return jar + async def open(self) -> None: ... - jar = httpx.Cookies() - for name, value in self.auth.cookies.items(): - jar.set(name, value, domain=".google.com") + jar = self._build_auth_cookie_jar() self._http_client = httpx.AsyncClient( headers={ "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8", }, cookies=jar, timeout=timeout, ) def update_auth_headers(self) -> None: ... - jar = httpx.Cookies() - for name, value in self.auth.cookies.items(): - jar.set(name, value, domain=".google.com") - self._http_client.cookies = jar + self._http_client.cookies = self._build_auth_cookie_jar()Also applies to: 181-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/notebooklm/_core.py` around lines 141 - 147, Extract the duplicated cookie-jar construction into a single helper (e.g. _build_google_cookie_jar) that creates and returns an httpx.Cookies instance, sets each (name, value) from self.auth.cookies with domain=".google.com", and use that helper inside open() and update_auth_headers() instead of duplicating the loop; ensure the helper's return type is httpx.Cookies and that both callers replace their current jar construction with a call to this helper so behavior remains identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/notebooklm/auth.py`:
- Around line 667-670: The current loop that builds jar = httpx.Cookies() and
sets every cookie with domain=".google.com" loses original cookie scopes; update
fetch_tokens() to preserve domain metadata by either changing its input to
accept httpx.Cookies (so you can pass the result of load_httpx_cookies()) or
extend AuthTokens to carry per-cookie domain/path/host-only flags and
reconstruct the jar using that metadata (instead of unconditionally setting
domain=".google.com"); replace the hardcoded jar.set(name, value,
domain=".google.com") logic with a call to load_httpx_cookies() or equivalent
reconstruction that uses each cookie's original domain scope.
---
Nitpick comments:
In `@src/notebooklm/_core.py`:
- Around line 141-147: Extract the duplicated cookie-jar construction into a
single helper (e.g. _build_google_cookie_jar) that creates and returns an
httpx.Cookies instance, sets each (name, value) from self.auth.cookies with
domain=".google.com", and use that helper inside open() and
update_auth_headers() instead of duplicating the loop; ensure the helper's
return type is httpx.Cookies and that both callers replace their current jar
construction with a call to this helper so behavior remains identical.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c461300a-0f90-4b85-811d-409269bfe205
📒 Files selected for processing (2)
src/notebooklm/_core.pysrc/notebooklm/auth.py
| jar = httpx.Cookies() | ||
| for name, value in cookies.items(): | ||
| jar.set(name, value, domain=".google.com") | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In httpx, what is the exact cookie-matching behavior for cookies set with domain=".google.com" versus host-only cookies like accounts.google.com? Can host-only cookie scope be preserved if only name/value pairs are available?
💡 Result:
In httpx, cookie matching follows standard RFC 6265 semantics via Python's stdlib http.cookiejar.CookieJar, which httpx.Cookies wraps. For a cookie set with domain=".google.com": - domain_specified=True, domain_initial_dot=True (leading dot ignored for matching but flags it as domain cookie). - Matches any host in .google.com (e.g., accounts.google.com, www.google.com) per domain-matches algorithm: host ends with .google.com or is google.com. For a host-only cookie like for accounts.google.com (no Domain or Domain=accounts.google.com): - domain_specified=False or domain_initial_dot=False. - Matches only exactly accounts.google.com. When sending a request to a URL (e.g., https://accounts.google.com/...), CookieJar's add_cookie_header selects cookies where: - domain_return_ok(domain, request) and path_return_ok(path, request), then full return_ok(cookie, request). - Domain cookie (.google.com) matches accounts.google.com. - Host-only matches only if exact host match. Both can be sent together if they match; CookieJar handles multiple cookies per name by path/domain specificity (longer path first). If only name/value pairs available (e.g., dict{"name": "value"}), creating httpx.Cookies({"name": "value}): - Calls self.set(name, value), which uses jar.set_cookie(Cookie(... domain="", domain_specified=False, domain_initial_dot=False, ...)). - Becomes host-only for the request domain (inferred from request URL). - Original scope (domain/path) lost; cannot preserve ".google.com" domain or host-only distinction—defaults to unspecified domain (host-only for request host). To preserve scope with name/value only: impossible, as http.cookiejar requires full Cookie attrs (domain, path, etc.) for proper matching. Must reconstruct full Cookie(..., domain=".google.com", ...) or parse original Set-Cookie. Sources confirm httpx.Cookies uses CookieJar for std RFC behavior (13,19,27). Tests show empty domain=host-only (1). Docs note domain/path in set (19,22).
Citations:
- 1: https://github.com/encode/httpx/blob/master/tests/client/test_cookies.py
- 2: https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Cookies
- 3: https://stackoverflow.com/questions/16301690/should-a-cookie-with-the-host-only-flag-replace-a-cookie-without-it
- 4: https://github.com/encode/httpx/blob/master/httpx/_models.py
- 5: https://www.python-httpx.org/api/
- 6: https://www.python-httpx.org/quickstart/
- 7: https://docs.python.org/3.15/library/http.cookiejar.html
🏁 Script executed:
# Check the file and lines 667-670 with surrounding context
cat -n src/notebooklm/auth.py | sed -n '650,685p'Repository: teng-lin/notebooklm-py
Length of output: 1544
🏁 Script executed:
# Get the full function signature
cat -n src/notebooklm/auth.py | sed -n '630,670p'Repository: teng-lin/notebooklm-py
Length of output: 1757
🏁 Script executed:
# Find where fetch_tokens is called
rg "fetch_tokens" src/notebooklm/ -B 2 -A 2Repository: teng-lin/notebooklm-py
Length of output: 3677
🏁 Script executed:
# Find load_auth_from_storage definition and return type
rg "def load_auth_from_storage" src/notebooklm/ -A 30Repository: teng-lin/notebooklm-py
Length of output: 2010
🏁 Script executed:
# Find extract_cookies_from_storage definition
rg "def extract_cookies_from_storage" src/notebooklm/ -A 50Repository: teng-lin/notebooklm-py
Length of output: 3449
🏁 Script executed:
# Get the complete extract_cookies_from_storage function
cat -n src/notebooklm/auth.py | sed -n '565,650p'Repository: teng-lin/notebooklm-py
Length of output: 3779
🏁 Script executed:
# Check where load_httpx_cookies is defined and used
rg "load_httpx_cookies" src/notebooklm/ -B 2 -A 2Repository: teng-lin/notebooklm-py
Length of output: 1888
🏁 Script executed:
# Check ALLOWED_COOKIE_DOMAINS constant
rg "ALLOWED_COOKIE_DOMAINS" src/notebooklm/ -B 5 -A 15Repository: teng-lin/notebooklm-py
Length of output: 7910
🏁 Script executed:
# Check the extract_cookies_from_storage implementation in full to confirm what it returns
cat -n src/notebooklm/auth.py | sed -n '490,550p'Repository: teng-lin/notebooklm-py
Length of output: 3006
🏁 Script executed:
# Check the complete extract_cookies_from_storage function to see all cookie handling
cat -n src/notebooklm/auth.py | sed -n '420,480p'Repository: teng-lin/notebooklm-py
Length of output: 2741
🏁 Script executed:
# Find the complete extract_cookies_from_storage function
cat -n src/notebooklm/auth.py | sed -n '350,430p'Repository: teng-lin/notebooklm-py
Length of output: 3910
Preserve original cookie domains or use domain-aware cookies consistently with load_httpx_cookies().
On line 669, setting all cookies to domain=".google.com" loses the original domain scope from storage. While domain=".google.com" will match .google.com subdomains like accounts.google.com per RFC 6265, this approach loses domain-specific semantics that may exist in storage (e.g., host-only cookies vs. domain cookies).
The codebase already recognizes this pattern: load_httpx_cookies() (lines 597–642) exists specifically to preserve domain information, with the docstring stating: "This is required for downloads that follow redirects across Google domains." For consistency and resilience, fetch_tokens() should either:
- Accept
httpx.Cookiesinstead ofdict[str, str], or - Extract and preserve domain metadata through
AuthTokensso each cookie's original scope is maintained
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/notebooklm/auth.py` around lines 667 - 670, The current loop that builds
jar = httpx.Cookies() and sets every cookie with domain=".google.com" loses
original cookie scopes; update fetch_tokens() to preserve domain metadata by
either changing its input to accept httpx.Cookies (so you can pass the result of
load_httpx_cookies()) or extend AuthTokens to carry per-cookie
domain/path/host-only flags and reconstruct the jar using that metadata (instead
of unconditionally setting domain=".google.com"); replace the hardcoded
jar.set(name, value, domain=".google.com") logic with a call to
load_httpx_cookies() or equivalent reconstruction that uses each cookie's
original domain scope.
Address review feedback: - Add _build_cookie_jar() helper in auth.py used by both fetch_tokens() and AuthTokens.to_cookie_jar() - Use auth.to_cookie_jar() in _core.py instead of inline jar creation - Update cookies in place in update_auth_headers() instead of replacing jar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Cookie:header withhttpx.Cookiescookie jar infetch_tokens()andClientCore.open()update_auth_headers()to use cookie jar for consistencyaccounts.google.comTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit