-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54610][SDP][DOCS] Improve SDP Programming Guide #53346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[SPARK-54610][SDP][DOCS] Improve SDP Programming Guide #53346
Conversation
|
@sryza Please review 🙏 |
|
@sryza Please please please 🙏 |
sryza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton for this @jaceklaskowski – important to maximize clarity of the docs. My main feedback is on the headings.
| ``` | ||
|
|
||
| ### Loading Data from a Streaming Source | ||
| ### Loading Data from Streaming Source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "from a Streaming Source" is more grammatically correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in the titles IMHO (but I'm not an English native speaking either)
Let's ask ChatGPT...it's not clear too 👎
But a "hack" is to use plurals 👍
| - The function used to define a dataset must return a Spark DataFrame. | ||
| - The function used to define a dataset must return a Spark `pyspark.sql.DataFrame`. | ||
| - Never use methods that save or write to files or tables as part of your SDP dataset code. | ||
| - When using the `for` loop pattern to define datasets in Python, ensure that the list of values passed to the `for` loop is always additive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be clearer to actually just leave this out – we don't discuss a for loop pattern above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, in Creating Tables in For Loop in Python earlier
Co-authored-by: Sandy Ryza <[email protected]>
|
How does it look now, Mr. @sryza? 🤔 Care to review again? 🙏 |
What changes were proposed in this pull request?
Improved the SDP Programming Guide
Why are the changes needed?
Better docs
Does this PR introduce any user-facing change?
Yes, given that the changes are user docs-related.
How was this patch tested?
Reviewed locally
Was this patch authored or co-authored using generative AI tooling?
No (unless auto-completion in IntelliJ IDEA counts)