a note that might be helpful with specific reasons not using a dedicated cluster and using serverless instead can be problematic

# Suggestion for Course Improvement: Cluster Configuration Guidance

I encountered an important issue during the course that could benefit future students if addressed with additional guidance. 

While the course already encourages students to set up a single-node cluster, I experienced complications that highlight where explanation could be helpful around what parts of the course will be impacted if serverless compute is used.

During my learning process, I utilized both the Azure free trial and Databricks' 14-day free trial. My cluster setup encountered an issue where the specified compute resource size was unavailable, which I overlooked in the logs. As a result, I proceeded with the labs using serverless compute instead.

This decision created downstream problems when reaching the section on using Spark for data querying and extraction. The provided notebook, which uses Derar Alhussein's anonymous S3 bucket as a data source, failed to execute properly. I believe this occurred because the configuration settings for S3 resource access in Spark had been renamed or modified in the serverless compute runtime version.

Additionally, I encountered challenges with the section demonstrating global temporary view lifetimes, as these views require a dedicated cluster to reference the schema created within them.

## Recommendation

While the course already instructs students to use their configured cluster for exercises, I recommend adding a specific note explaining:

1. The importance of using the exact cluster configuration specified in the instructions
2. Potential compatibility issues when using serverless compute instead of dedicated clusters
3. How runtime versions may affect S3 connection parameters in the spark configuration

This additional guidance would help students quickly identify and resolve configuration-related issues, particularly when working with external data sources and schema-dependent operations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

a note that might be helpful with specific reasons not using a dedicated cluster and using serverless instead can be problematic #50

Suggestion for Course Improvement: Cluster Configuration Guidance

Recommendation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

a note that might be helpful with specific reasons not using a dedicated cluster and using serverless instead can be problematic #50

Description

Suggestion for Course Improvement: Cluster Configuration Guidance

Recommendation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions