Skip to content

Commit 37a6c57

Browse files
committed
Fix a bunch of typos
1 parent 5abf644 commit 37a6c57

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

sections/cloud-scale-data.qmd

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -90,13 +90,13 @@ In the remainder of this chapter, we will primarily focus on the data
9090
volume challenge, in particular exploring how different decisions about
9191
data storage formats and layouts enable (or constrain) us as we attempt
9292
to work with data at large scale. We'll formalize a couple of concepts
93-
we've alluded at throughout the course, introduce a few new
93+
we've alluded to throughout the course, introduce a few new
9494
technologies, and characterize the current state of best practice --
9595
with caveat that this is an evolving area!
9696

9797
## Cloud optimized data
9898

99-
As we've seen, when it comes global and even regional environmental
99+
As we've seen, when it comes to global and even regional environmental
100100
phenomena observed using some form of remote-sensing technology, much of
101101
our data is now _cloud-scale_. In the most basic sense, this means many
102102
datasets -- and certainly relevant collections of datasets that you
@@ -117,7 +117,7 @@ be beneficial when the compute platform is "near" the data, reducing
117117
network and perhaps even I/O latency. However, to the extent that
118118
realizing these benefits may require committing to some particular
119119
commercial cloud provider (with its associated cost structure), this may
120-
or may not be desirable change.
120+
or may not be a desirable change.
121121
:::
122122

123123
Practically speaking, accessing data in the cloud means using HTTP to
@@ -173,7 +173,7 @@ Geospatial Consortium
173173
(OGC)](https://www.ogc.org/announcement/cloud-optimized-geotiff-cog-published-as-official-ogc-standard/)
174174
in 2023. However, COGs _are_ GeoTIFFs, which have been around since the
175175
1990s, and GeoTIFFs are TIFFs, which date back to the 1980s. Let's work
176-
our way through this linage.
176+
our way through this lineage.
177177

178178
First we have the original **TIFF** format, which stands for Tagged Image
179179
File Format. Although we often think of TIFFs as image files, they're
@@ -227,7 +227,7 @@ the same compressed tile.
227227

228228
A set of overviews (lower resolution tile pyramids) are computed from
229229
the main full resolution data and stored in the file, again following a
230-
tiling scheme and and arranged in order. This allows clients to load a
230+
tiling scheme and arranged in order. This allows clients to load a
231231
lower resolution version of the data when appropriate, without needing
232232
to read the full resolution data itself.
233233
:::
@@ -261,7 +261,7 @@ particular desired subset of the image at a particular resolution
261261
the desired bounding box into pixel coordinates, then identifying which
262262
tile(s) in the COG intersect with the area of interest, then determining
263263
the associated byte ranges of the tile(s) based on the metadata read in
264-
teh first step. And the best part is that "client" here refers to the
264+
the first step. And the best part is that "client" here refers to the
265265
underlying software, which takes care of all of the details. As a user,
266266
typically all you need to do is specify the file location, area of
267267
interest, and desired overview level (if relevant)!
@@ -286,7 +286,7 @@ configuration (shape) optimized for expected usage patterns. This
286286
enables a client interested in a subset of data to retrieve the relevant
287287
data without receiving too much additional unwanted data. In addition,
288288
chunk layout should be such that, under expected common usage patterns,
289-
proximal chunks are morely likely to be requested together. On average,
289+
proximal chunks are more likely to be requested together. On average,
290290
this will reduce the number of separate read requests a client must
291291
issue to retrieve and piece together any particular desired data subset.
292292
In addition, chunks should almost certainly be compressed with a
@@ -339,7 +339,7 @@ choice of how to break the data into separately compressed and
339339
addressable subsets is now decoupled from the choice of how to break the
340340
data into separate files; a massive dataset can be segmented into a very
341341
large number of small chunks without necessarily creating a
342-
correspondingingly large number of small individual files, which can
342+
correspondingly large number of small individual files, which can
343343
cause problems in certain contexts. In some sense, this allows a Zarr
344344
store to behave a little more like a COG, with its many small,
345345
addressable tiles contained in a single file.
@@ -640,8 +640,8 @@ our own custom multidimensional data array from a large collection of
640640
data resources that themselves be arbitrarily organized with respect to
641641
our specific use case.
642642

643-
Interested in learning more about STAC? If so, head over the [STAC
644-
Index](https://stacindex.org/), and online resource listing many
643+
Interested in learning more about STAC? If so, head over to the [STAC
644+
Index](https://stacindex.org/), an online resource listing many
645645
published STAC catalogs, along with various related software and
646646
tooling.
647647

@@ -661,7 +661,7 @@ structured netCDF and GeoTIFF files -- historically successful and
661661
efficiently used in local storage, but often suboptimal at cloud scale.
662662
The second represents simple, ad hoc approaches to splitting larger data
663663
into smaller files, thrown somewhere on a network-accessible server, but
664-
without efficiently readable overaching metadata and without any optimal
664+
without efficiently readable overarching metadata and without any optimal
665665
structure. The next two represent cloud optimized approaches, with data
666666
split into addressable units described by up-front metadata that clients
667667
can use to efficiently access the data. The first of these resembles a
@@ -694,7 +694,7 @@ use of external metadata -- whether as Zarr metadata or STAC catalogs --
694694
that allows clients to issue data requests that "just work"
695695
regardless of the underlying implementation details.
696696

697-
As a final takeway, insofar as there's a community consensus around the
697+
As a final takeaway, insofar as there's a community consensus around the
698698
best approaches for managing data today, it probably looks something
699699
like this:
700700

@@ -703,7 +703,7 @@ like this:
703703
cloud
704704
- **Zarr stores** (with intelligent chunking and sharding), potentially
705705
referenced by STAC catalogs, as the go-to approach for storing and
706-
provisioing multidimensional Earth array data in the cloud
706+
provisioning multidimensional Earth array data in the cloud
707707
- **Virtual Zarr stores**, again potentially in conjunction with STAC
708708
catalogs, as a cost-effective approach for cloud-enabling many legacy
709709
data holdings in netCDF format

0 commit comments

Comments
 (0)