Skip to content

[SPARK-56454][DOCS][FOLLOWUP] Document supported SRIDs in geospatial types#55207

Closed
pratham76 wants to merge 1 commit intoapache:masterfrom
pratham76:srid-doc
Closed

[SPARK-56454][DOCS][FOLLOWUP] Document supported SRIDs in geospatial types#55207
pratham76 wants to merge 1 commit intoapache:masterfrom
pratham76:srid-doc

Conversation

@pratham76
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

#54780 added support for a pre-built SRID registry with standard spatial reference systems, but seems like corresponding documentation is missed out, adding through this PR.

Why are the changes needed?

Users would know supported SRIDs

Does this PR introduce any user-facing change?

No

How was this patch tested?

Only Doc Changes.

Was this patch authored or co-authored using generative AI tooling?

No

@pratham76
Copy link
Copy Markdown
Contributor Author

@szehon-ho @cloud-fan @uros-db Could you have a look at the doc additions for #54780? Thanks!

@pratham76
Copy link
Copy Markdown
Contributor Author

@szehon-ho gentle ping!

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding SRID documentation, @pratham76 — the "Commonly Used SRIDs" table is a useful addition. I have some suggestions to improve accuracy and reduce redundancy with existing content on the page.

Also, minor note: the PR description references #54780, but that PR was closed (not merged). You may want to update the description to reference the correct merged work.

Comment thread docs/sql-ref-geospatial-types.md Outdated
| 2154 | RGF93 / Lambert-93 | French national coordinate system | France-specific mapping and GIS |
| 32633 | WGS 84 / UTM zone 33N | Universal Transverse Mercator, zone 33 North | Central Europe (6°E to 12°E) |
| 32634 | WGS 84 / UTM zone 34N | Universal Transverse Mercator, zone 34 North | Eastern Europe (12°E to 18°E) |
| 32635 | WGS 84 / UTM zone 35N | Universal Transverse Mercator, zone 35 North | Eastern Europe/Western Asia (18°E to 24°E) |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a CRS Identifier column. Spark maps SRIDs to CRS strings internally, and these strings are visible to users in df.schema.json() output and in Parquet/Delta/Iceberg storage metadata. For example, GEOMETRY(4326) stores as geometry(OGC:CRS84) in JSON schema — not EPSG:4326. This is a common source of confusion.

The key mappings are:

SRID CRS Identifier
0 SRID:0
3857 EPSG:3857
4326 OGC:CRS84
4267 OGC:CRS27
4269 OGC:CRS83

Also worth noting which SRIDs are valid for GEOGRAPHY vs GEOMETRY. For instance, GEOMETRY(3857) works but GEOGRAPHY(3857) will error because 3857 is a projected (non-geographic) CRS. That's a real pitfall for users.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted! Have added CRS identifier column to the table, which contains the corresponding mappings as stated above. I have also introduced a column type which indicated if the SRIDs are valid for GEOGRAPHY or GEOMETRY, or both.

Along with these i have added some notes based on the above comments. Do inform if this helps. Thanks!

Comment thread docs/sql-ref-geospatial-types.md Outdated
| 32634 | WGS 84 / UTM zone 34N | Universal Transverse Mercator, zone 34 North | Eastern Europe (12°E to 18°E) |
| 32635 | WGS 84 / UTM zone 35N | Universal Transverse Mercator, zone 35 North | Eastern Europe/Western Asia (18°E to 24°E) |

The registry includes many additional SRIDs for various UTM zones, national coordinate systems, and other projections. For a complete list, refer to the [EPSG Geodetic Parameter Dataset](https://epsg.org/).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The registry also includes ESRI entries (e.g., ESRI:102100), not just EPSG. And it's pinned to PROJ 9.7.1 — not synced live with EPSG. The link to epsg.org could be misleading since users may find SRIDs there that aren't in Spark's registry, or miss ESRI SRIDs that are. Consider referencing the actual registry CSV or at least mentioning the PROJ version and ESRI inclusion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, have updated the note.

Comment thread docs/sql-ref-geospatial-types.md Outdated

#### Using Different SRIDs

**Creating tables with specific SRIDs:**
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the examples in sections "Using Different SRIDs", "Converting between SRIDs", and "SRID Validation" repeat what the page already covers in "Creating Tables" (lines 62–79) and "Built-in Geospatial Functions" (lines 129–137). Consider replacing them with examples that show genuinely new behavior:

  • SRID validation error: The 99999 case is useful — keep it.
  • GEOGRAPHY vs GEOMETRY pitfall: Show that GEOGRAPHY(3857) errors because 3857 is non-geographic — this is a real user trap not documented elsewhere.
  • OGC CRS strings in metadata: Show that df.schema.json() for GEOMETRY(4326) contains OGC:CRS84, so users know what to expect in Parquet/storage metadata.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments! have updated the examples.

Comment thread docs/sql-ref-geospatial-types.md Outdated
);
```

**Converting between SRIDs:**
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heading "Converting between SRIDs" implies coordinate reprojection, but ST_SetSrid only changes metadata. Suggest renaming to something like "Setting or Changing SRID Metadata".

Also, the example changes a point from SRID 4326 (lat/lon in degrees) to 3857 (Web Mercator in meters) — this produces a semantically incorrect result since the coordinates are still degree values but now labeled as meters. A better example would set SRID on data that was created without one, e.g. SRID 0 → 4326, which is the common real-world use case. The existing doc already shows an ST_SetSrid example (line 136) that does this correctly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, noted, have removed the repeated section.

Comment thread docs/sql-ref-geospatial-types.md Outdated

```sql
-- Valid: 4326 is in the registry
SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 4326);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 4326 and 3857 examples here repeat what's already shown in the "Built-in Geospatial Functions" section above. Consider trimming to just the 99999 error case — that's the genuinely new and useful example. You could also add a GEOGRAPHY(3857) failure example here, since that's a real pitfall not documented elsewhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! incorporated

Comment thread docs/sql-ref-geospatial-types.md Outdated

#### SRID 0 (Unspecified)

SRID 0 represents an unspecified or unknown coordinate system. It is allowed for GEOMETRY types but should be used with caution:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few issues here:

  1. "should be used with caution" is overstated — SRID 0 is the default for ST_GeomFromWKB(wkb) and is actively used in CREATE TABLE (e.g., CREATE TABLE t (geom GEOMETRY(0)) USING PARQUET in the test suite). It's a standard convention (PostGIS uses the same).

  2. Missing GEOGRAPHY restriction — SRID 0 is not valid for GEOGRAPHY types (it's registered as non-geographic, so GeographicSpatialReferenceSystemMapper rejects it). This is important to document.

  3. Could be confused with GEOMETRY(ANY) — Worth clarifying that GEOMETRY(0) means a fixed SRID of 0 (Cartesian, no defined CRS), not "per-row SRID." Per-row SRIDs use GEOMETRY(ANY).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! incorporated

@pratham76
Copy link
Copy Markdown
Contributor Author

Thanks for adding SRID documentation, @pratham76 — the "Commonly Used SRIDs" table is a useful addition. I have some suggestions to improve accuracy and reduce redundancy with existing content on the page.

Also, minor note: the PR description references #54780, but that PR was closed (not merged). You may want to update the description to reference the correct merged work.

Thank you @szehon-ho for the review comments. I had referenced #54780 as this seems to be the PR referenced in the JIRA. It also seems to be the one that was merged. Please do let know if i missed anything, and also if the changes are okay. Thanks!

@pratham76 pratham76 force-pushed the srid-doc branch 4 times, most recently from 6b2bdf7 to 41989b4 Compare April 8, 2026 16:11
@pratham76 pratham76 requested a review from szehon-ho April 9, 2026 03:21
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Apache Spark community recommends to file a proper JIRA issue for trace-ability, @pratham76 . Please create a JIRA ID and use it in the PR title.

@pratham76 pratham76 changed the title [MINOR][DOCS][FOLLOWUP] Document supported SRIDs in geospatial types [SPARK-56454][DOCS][FOLLOWUP] Document supported SRIDs in geospatial types Apr 12, 2026
@pratham76
Copy link
Copy Markdown
Contributor Author

The Apache Spark community recommends to file a proper JIRA issue for trace-ability, @pratham76 . Please create a JIRA ID and use it in the PR title.

Thanks @dongjoon-hyun for notifying, have updated the PR title to point to corresponding JIRA issue.

@pratham76
Copy link
Copy Markdown
Contributor Author

@szehon-ho Could you have a look at the changes? I've addressed all the above review comments. Please do let know if any other improvements needed. Thanks!

@pratham76
Copy link
Copy Markdown
Contributor Author

@szehon-ho gentle ping!

Comment thread docs/sql-ref-geospatial-types.md Outdated

### Supported SRIDs

Spark includes a pre-built registry of standard Spatial Reference Identifiers (SRIDs) from the PROJ database, with overrides to support OGC standards. This registry enables validation and proper handling of coordinate systems for geospatial data.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As PROJ can be queried by user, we should just list the OGC overrides

Comment thread docs/sql-ref-geospatial-types.md Outdated

#### Commonly Used SRIDs

| SRID | CRS Identifier | Name | Type | Description | Typical Use Case |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Type" column says GEOMETRY only or GEOGRAPHY or GEOMETRY per SRID. This frames the restriction backwards and requires readers to memorize per-SRID compatibility. The actual rule is simple:

  • GEOMETRY accepts all SRIDs in the registry (geographic + projected + SRID 0)
  • GEOGRAPHY only accepts geographic SRIDs (lat/lon coordinate systems)

I'd suggest:

  1. State this rule clearly before the table (once), rather than repeating it per row.
  2. Replace the "Type" column with "CRS Type" showing Geographic or Projected — which is the intrinsic property of the CRS. The GEOMETRY/GEOGRAPHY compatibility follows naturally: if Geographic, it works with both GEOGRAPHY and GEOMETRY; if Projected, GEOMETRY only.

Also consider dropping the "Typical Use Case" column — between "Name" and "Description" it's already covered, and 6 columns is very wide in Markdown.

Comment thread docs/sql-ref-geospatial-types.md Outdated
CREATE TABLE locations (id BIGINT, point GEOMETRY(4326));

-- The schema will show OGC:CRS84, not EPSG:4326
SELECT schema_of_json('{"point": ...}');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_of_json is a function for inferring a schema from a JSON string value — it doesn't inspect a table's schema. This example won't produce the claimed output.

A correct way to show this would be:

CREATE TABLE locations (id BIGINT, point GEOMETRY(4326)) USING PARQUET;
DESCRIBE locations;
-- point column shows: geometry(4326)

-- In Scala/Python the JSON schema shows the CRS string:
-- spark.table("locations").schema.json() contains "geometry(OGC:CRS84)"

Or simply remove this sub-section and fold the key point ("GEOMETRY(4326) stores as geometry(OGC:CRS84) in schema JSON and Parquet metadata") into the existing "CRS Identifier Mapping" bullet in the Important Notes — that bullet already says this.

Comment thread docs/sql-ref-geospatial-types.md Outdated
```sql
-- Error: 99999 is not a valid SRID in the registry
SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 99999);
-- Throws error: Invalid SRID
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error examples here show invented text. The actual error is:

[ST_INVALID_SRID_VALUE] Invalid or unsupported SRID (spatial reference identifier) value: <srid>.

Either show the real error class/message, or just say -- Throws ST_INVALID_SRID_VALUE without inventing the phrasing. Users who search for the error text in the doc should be able to find it. Same applies to the GEOGRAPHY(3857) example below.

Comment thread docs/sql-ref-geospatial-types.md Outdated
| 4267 | `OGC:CRS27` | NAD27 | GEOGRAPHY or GEOMETRY | North American Datum 1927 | Legacy North American data |
| 4269 | `OGC:CRS83` | NAD83 | GEOGRAPHY or GEOMETRY | North American Datum 1983 | North American mapping |
| 3857 | `EPSG:3857` | Web Mercator | GEOMETRY only | Pseudo-Mercator projection | Web maps (Google Maps, OpenStreetMap, Bing Maps) |
| 2154 | `EPSG:2154` | RGF93 / Lambert-93 | GEOMETRY only | French national coordinate system | France-specific mapping and GIS |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table mixes truly common SRIDs (0, 4326, 3857) with region-specific ones (2154, 32633–32635) that most users won't encounter. Consider trimming to the 4–5 most universal entries (0, 4326, 4267, 4269, 3857) and noting that the full registry includes thousands more. The existing bullet at the bottom already says this, so these extra rows add length without proportional value.

Comment thread docs/sql-ref-geospatial-types.md Outdated
-- Output includes: geometry(OGC:CRS84)
```

This CRS identifier is also stored in Parquet, Delta, and Iceberg metadata, so downstream tools see `OGC:CRS84` rather than `EPSG:4326`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends on the data source, we should be just mention spark's native parquet data source

Comment thread docs/sql-ref-geospatial-types.md Outdated
| 32635 | `EPSG:32635` | WGS 84 / UTM zone 35N | GEOMETRY only | Universal Transverse Mercator, zone 35 North | Eastern Europe/Western Asia (18°E to 24°E) |

**Important Notes:**
* **GEOGRAPHY vs GEOMETRY**: Only geographic (latitude/longitude) SRIDs can be used with GEOGRAPHY types. Projected coordinate systems like Web Mercator (3857) or UTM zones, as well as SRID 0, work only with GEOMETRY.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the GEOMETRY/GEOGRAPHY SRID rule is stated once before the table (per the earlier inline comment on the table header), this bullet restates the same thing. Consider removing it.

Comment thread docs/sql-ref-geospatial-types.md Outdated
**Important Notes:**
* **GEOGRAPHY vs GEOMETRY**: Only geographic (latitude/longitude) SRIDs can be used with GEOGRAPHY types. Projected coordinate systems like Web Mercator (3857) or UTM zones, as well as SRID 0, work only with GEOMETRY.
* **SRID 0**: Represents Cartesian coordinates with no defined CRS. `GEOMETRY(0)` means a fixed SRID of 0 (all rows use SRID 0), not per-row SRIDs. For per-row SRIDs, use `GEOMETRY(ANY)`.
* **CRS Identifier Mapping**: When you create `GEOMETRY(4326)`, it stores as `geometry(OGC:CRS84)` in JSON schema, not `EPSG:4326`. This is an OGC standard override.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically accurate, but the CRS string (OGC:CRS84 vs EPSG:4326) only surfaces if you call df.schema.json() programmatically or inspect Parquet/Iceberg file metadata directly. In all user-facing contexts (DESCRIBE, printSchema(), dtypes, error messages), users see the integer SRID (geometry(4326)). This is a niche detail that may cause more confusion than it resolves — consider dropping it, or shortening to a parenthetical like: "Note: the programmatic schema.json() API and storage-level metadata use CRS strings (e.g. OGC:CRS84) rather than integer SRIDs."

Comment thread docs/sql-ref-geospatial-types.md Outdated
* **SRID 0**: Represents Cartesian coordinates with no defined CRS. `GEOMETRY(0)` means a fixed SRID of 0 (all rows use SRID 0), not per-row SRIDs. For per-row SRIDs, use `GEOMETRY(ANY)`.
* **CRS Identifier Mapping**: When you create `GEOMETRY(4326)`, it stores as `geometry(OGC:CRS84)` in JSON schema, not `EPSG:4326`. This is an OGC standard override.
* **Registry Source**: The SRID registry is based on PROJ 9.7.1 and includes both EPSG and ESRI coordinate systems (e.g., `ESRI:102100`). The registry is pinned to this PROJ version and not synced live with external databases.
* The registry includes many additional SRIDs for various UTM zones, national coordinate systems, and other projections from both EPSG and ESRI sources.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restates what the "Registry Source" bullet directly above already says. Consider removing.

Comment thread docs/sql-ref-geospatial-types.md Outdated
* **Registry Source**: The SRID registry is based on PROJ 9.7.1 and includes both EPSG and ESRI coordinate systems (e.g., `ESRI:102100`). The registry is pinned to this PROJ version and not synced live with external databases.
* The registry includes many additional SRIDs for various UTM zones, national coordinate systems, and other projections from both EPSG and ESRI sources.

#### CRS Identifiers in Metadata
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema_of_json example is incorrect (see earlier inline comment), and the only useful content here — that GEOMETRY(4326) uses OGC:CRS84 in metadata — is already covered by the "CRS Identifier Mapping" bullet above. Consider removing this entire sub-section.

Comment thread docs/sql-ref-geospatial-types.md Outdated
-- Throws error: SRID 3857 is not valid for GEOGRAPHY (projected coordinate system)
```

This is a common pitfall: Web Mercator (3857) and UTM zones are projected (planar) coordinate systems and can only be used with GEOMETRY, not GEOGRAPHY. Only geographic (latitude/longitude) SRIDs like 4326, 4267, or 4269 work with GEOGRAPHY.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repeats the GEOGRAPHY/GEOMETRY rule for the third time (pre-table intro, Important Notes bullet, and now here). The code example above already demonstrates the pitfall. Consider removing this paragraph.

@szehon-ho
Copy link
Copy Markdown
Member

To tie together the inline comments: the section has good content but a lot of redundancy. Here's a suggestion for the overall shape after trimming (~25 lines instead of ~60):

### Supported SRIDs

Spark includes a pre-built registry of standard Spatial Reference Identifiers (SRIDs) from the
PROJ database. GEOMETRY accepts all SRIDs in the registry (geographic, projected, and SRID 0).
GEOGRAPHY only accepts geographic SRIDs (latitude/longitude coordinate systems), because it
performs spherical calculations that assume geographic coordinates. Attempting to use a projected
SRID like 3857 with GEOGRAPHY will raise an error.

#### Commonly Used SRIDs

| SRID | CRS Identifier | Name         | CRS Type   | Description                                        |
|------|----------------|--------------|------------|----------------------------------------------------|
| 0    | `SRID:0`       | Unspecified  | Projected  | Cartesian, no defined CRS (default for `ST_GeomFromWKB(wkb)`) |
| 4326 | `OGC:CRS84`    | WGS 84       | Geographic | Longitude/latitude, WGS 84 (default for GEOGRAPHY) |
| 4267 | `OGC:CRS27`    | NAD27        | Geographic | North American Datum 1927                          |
| 4269 | `OGC:CRS83`    | NAD83        | Geographic | North American Datum 1983                          |
| 3857 | `EPSG:3857`    | Web Mercator | Projected  | Pseudo-Mercator for web maps                       |

**Notes:**
* `GEOMETRY(0)` means a fixed SRID of 0 (all rows use SRID 0), not per-row SRIDs.
  For per-row SRIDs, use `GEOMETRY(ANY)`.
* `GEOMETRY(ANY)` and `GEOGRAPHY(ANY)` are valid for in-memory and query use, but cannot be
  persisted — the [Parquet](https://github.com/apache/parquet-format/blob/master/Geospatial.md)
  and [Iceberg](https://github.com/apache/iceberg/blob/master/format/spec.md) geospatial
  specifications require a fixed SRID per column.
* The registry is based on PROJ 9.7.1 and includes both EPSG and ESRI coordinate systems.
  It is pinned to this version and not synced live with external databases.

#### SRID Validation

```sql
-- Invalid SRID: not in the registry
SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 99999);
-- Throws [ST_INVALID_SRID_VALUE]

-- Projected SRID with GEOGRAPHY type
CREATE TABLE t (id BIGINT, loc GEOGRAPHY(3857));
-- Throws [ST_INVALID_SRID_VALUE] (3857 is projected, not geographic)
```

@pratham76
Copy link
Copy Markdown
Contributor Author

Thanks @szehon-ho for detailed comments. Have incorporated all the changes that you have mentioned above. Please do have a look if any more changes are required.

@pratham76 pratham76 requested a review from szehon-ho April 17, 2026 18:56
@pratham76
Copy link
Copy Markdown
Contributor Author

Gentle ping @szehon-ho — I’ve addressed all the comments from your previous review. Could you take another look when you get a chance?

Comment thread docs/sql-ref-geospatial-types.md Outdated
persisted — the [Parquet](https://github.com/apache/parquet-format/blob/master/Geospatial.md)
and [Iceberg](https://github.com/apache/iceberg/blob/master/format/spec.md) geospatial
specifications require a fixed SRID per column.
* The registry is based on PROJ 9.7.1 and includes both EPSG and ESRI coordinate systems.
Copy link
Copy Markdown
Member

@szehon-ho szehon-ho Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's be more specific here. Say , spark 4.2 => Proj 9.7.1, we can expand the table later.

Let's also include the OGC overrides exactly (and in separate table)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have a pinned version as 4.2.0 here? Also, when we mean seperate table for OGC overrides, how do we want to keep the content?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, something like 'since version' table maybe? example from other docs: https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html

Yes, separate table, I am thinking:

  • Proj version table
  • OGC overrides table
  • Commonly used SRID's

My initial hunch was there's no huge value in 'commonly used srids', as it can be probably be found on the web, but I can go either way.

The first two are more important imo, as it exactly describes the algorithm to select supported srids.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with your thoughts, have updated the doc to have table mapping spark releases to pinned Proj version. Also have added an OGC override table, but now the table with commonly used SRIDs seems a bit repetitive.Do let know your thoughts. Thanks!

Comment thread docs/sql-ref-geospatial-types.md Outdated
| 3857 | `EPSG:3857` | Web Mercator | Projected | Pseudo-Mercator projection used by web mapping services |

**Notes:**
* `GEOMETRY(0)` means a fixed SRID of 0 (all rows use SRID 0), not per-row SRIDs.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a bit repetitive, can remove second part (as covered by the second bullet)

Comment thread docs/sql-ref-geospatial-types.md Outdated
* `GEOMETRY(0)` means a fixed SRID of 0 (all rows use SRID 0), not per-row SRIDs.
For per-row SRIDs, use `GEOMETRY(ANY)`.
* `GEOMETRY(ANY)` and `GEOGRAPHY(ANY)` are valid for in-memory and query use, but cannot be
persisted — the [Parquet](https://github.com/apache/parquet-format/blob/master/Geospatial.md)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reprhase (not saying it cannot be persisted in general), there may be other formats later that support it. In any case its orthogonal to spark (as compute)

Persistence Notes - Iceberg (...) and Parquet (...) cannot persist GEOMETRY(ANY) and GEOGRAPHY(ANY)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed

@szehon-ho
Copy link
Copy Markdown
Member

@uros-db can you also take a look?

@pratham76
Copy link
Copy Markdown
Contributor Author

@szehon-ho @uros-db Have addressed all the above comments, PTAL

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc accurately reflects the registry implementation — verified PROJ 9.7.1, the OGC overrides (4326/4267/4269), SRID 0 handling, default SRIDs for ST_GeomFromWKB/ST_GeogFromWKB, and the ST_INVALID_SRID_VALUE error class against the source. One small URL nit below.

Comment thread docs/sql-ref-geospatial-types.md Outdated
**Notes:**
* `GEOMETRY(0)` means a fixed SRID of 0. For per-row SRIDs, use `GEOMETRY(ANY)`.
* [Parquet](https://github.com/apache/parquet-format/blob/master/Geospatial.md)
and [Iceberg](https://github.com/apache/iceberg/blob/master/format/spec.md) geospatial
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apache/iceberg's default branch is main; the /master URL works via GitHub redirect but isn't canonical.

Suggested change
and [Iceberg](https://github.com/apache/iceberg/blob/master/format/spec.md) geospatial
and [Iceberg](https://github.com/apache/iceberg/blob/main/format/spec.md) geospatial

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan. Have fixed this. Could you please let know if this could be checked in, if no more changed?

Comment thread docs/sql-ref-geospatial-types.md Outdated
| 3857 | `EPSG:3857` | Web Mercator | Projected | Pseudo-Mercator projection used by web mapping services |

**Notes:**
* `GEOMETRY(0)` means a fixed SRID of 0. For per-row SRIDs, use `GEOMETRY(ANY)`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `GEOMETRY(0)` means a fixed SRID of 0. For per-row SRIDs, use `GEOMETRY(ANY)`.
* `GEOMETRY(0)` means a fixed SRID of 0. For mixed per-row SRIDs, use `GEOMETRY(ANY)`.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @uros-db, have accommodated this change.

Copy link
Copy Markdown
Contributor

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @pratham76! Let's just make sure to include this in 4.2, cc @cloud-fan.

Also cc @szehon-ho PTAL and make sure to update this section after PROJ upgrade.

@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master/4.x/4.2!

@cloud-fan cloud-fan closed this in 3bb6a67 May 7, 2026
cloud-fan pushed a commit that referenced this pull request May 7, 2026
…types

### What changes were proposed in this pull request?

#54780 added support for a pre-built SRID registry with standard spatial reference systems, but seems like corresponding documentation is missed out, adding through this PR.

### Why are the changes needed?

Users would know supported SRIDs

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Only Doc Changes.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #55207 from pratham76/srid-doc.

Authored-by: Pratham Manja <prathammanja76@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 3bb6a67)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request May 7, 2026
…types

### What changes were proposed in this pull request?

#54780 added support for a pre-built SRID registry with standard spatial reference systems, but seems like corresponding documentation is missed out, adding through this PR.

### Why are the changes needed?

Users would know supported SRIDs

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Only Doc Changes.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #55207 from pratham76/srid-doc.

Authored-by: Pratham Manja <prathammanja76@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 3bb6a67)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants