-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Fix: Exclude deprecated properties from CrawlResult serialization #1356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Fix: Exclude deprecated properties from CrawlResult serialization #1356
Conversation
- Add explicit exclusion of deprecated properties (markdown_v2, fit_markdown, fit_html) in CrawlResult.model_dump() method - Prevents AttributeError when serializing models that contain these deprecated properties - Properly handles merging with existing exclude parameters
WalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CrawlResult
participant JSONResponse
User->>CrawlResult: Call model_dump(kwargs)
CrawlResult->>CrawlResult: Merge deprecated properties into exclude set
CrawlResult->>CrawlResult: Prepare serializable dict (excluding non-serializable properties)
CrawlResult->>JSONResponse: Return serializable output
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes detected. Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
crawl4ai/models.py (1)
255-267
: LGTM! Solid fix for the serialization issue.The implementation correctly addresses the core problem by excluding deprecated properties that raise
AttributeError
during serialization. The logic properly handles different types of exclude parameters.Minor suggestion for robustness:
Consider preserving unknown exclude parameter types rather than replacing them entirely:
else: - kwargs['exclude'] = exclude_properties + # For unknown types, try to combine if possible, otherwise use deprecated properties + try: + kwargs['exclude'] = set(kwargs['exclude']) | exclude_properties + except (TypeError, ValueError): + kwargs['exclude'] = exclude_propertiesThis would handle edge cases where users might pass custom exclude objects that support set operations.
- Try to convert unknown exclude types to set before replacing - Preserves user-specified exclusions when possible - More graceful error handling as suggested in PR review
Summary
This PR fixes a serialization issue in the
CrawlResult
class where deprecated properties were causingAttributeError
exceptions during model serialization.Problem
The
CrawlResult
class has three deprecated properties (markdown_v2
,fit_markdown
, andfit_html
) that raiseAttributeError
when accessed. When callingmodel_dump()
on the model, Pydantic attempts to access these properties during serialization, causing the process to fail with:This particularly affects API usage where the CrawlResult needs to be serialized to JSON.
Solution
The fix modifies the
model_dump()
method to explicitly exclude these deprecated properties from serialization. This prevents Pydantic from attempting to access them.Implementation details:
{'markdown_v2', 'fit_markdown', 'fit_html'}
model_dump()
Testing
The fix ensures that:
Related Issues
TypeError: Object of type property is not JSON serializable
This fix prevents the serialization errors that occur when using the CrawlResult model with deprecated property definitions, particularly in API contexts.
Summary by CodeRabbit