fhdsl · ehumph · Nov 6, 2025 · Nov 6, 2025 · Nov 6, 2025 · Nov 6, 2025
diff --git a/02-importing_with_tables.Rmd → 01-importing_with_tables.Rmd b/02-importing_with_tables.Rmd → 01-importing_with_tables.Rmd
@@ -1,19 +1,68 @@
-# (PART\*) Uploading Your Own Data {-}
+# (PART\*) Bringing Your Own Data {-}
 
 
 ```{r, include = FALSE}
 ottrpal::set_knitr_image_path()
 ```
 
-# Temporary Stub
+# Uploading from your desktop
 
-Data Tables provide a way to organize data and metadata, including URI links to storage buckets.  These tables are a convenient way to organize input for analyses as well as tracking workflow outputs.
+In this example, we'll upload some genomic data into AnVIL.
 
-```{r, echo=FALSE, fig.alt="Image shows a schematic of the data storage locations in an AnVIL Workspace. The Data Table is highlighted with a number 'three'."}
-ottrpal::include_slide("https://docs.google.com/presentation/d/1H5onDH7cBLK2m7fCcJ6ZodAAQ3wtJO8tNc2rwptrTPM/edit#slide=id.gf982a3c0cd_0_0")
+TODO: Add information about what the data is
+
+The starting point for bringing your own data to AnVIL is the Workspace Dashboard. At the bottom right, you'll find the full path to the Google Bucket information corresponding to your Workspace. You can click the clipboard icon on the right to copy the name of your Workspace Bucket. You will be able to see any uploaded files by clicking the "Open in browser" link.
+
+```{r, echo=FALSE, fig.alt="Image shows a screenshot of the Workspace Dashboard. Google Bucket information, including the Google Bucket name, location, and 'Open in browser' link, at the bottom right of the screen is highlighted."}
+ottrpal::include_slide("https://docs.google.com/presentation/d/1H5onDH7cBLK2m7fCcJ6ZodAAQ3wtJO8tNc2rwptrTPM/edit#slide=id.gf5172664d7_0_142")
+```
+
+::: {.dictionary}
+*Buckets** are the name of the containers used to store files and objects on Google Cloud. Everything you store on Google Cloud _must_ be in a bucket. Each bucket has its own unique name and location (URI). When we move data files into AnVIL workspaces, we use the URI to tell AnVIL where the data should be stored. (We can also use a URI to tell AnVIL where to find the data we want to upload.)
+
+You can read more about Google Cloud buckets [here](https://docs.cloud.google.com/storage/docs/buckets)
+:::
+
+You can also see any uploaded files by clicking the "Files" directory at the bottom left in the Data Tab.
+
+```{r, echo=FALSE, fig.alt='Image shows a screenshot of the Workspace Data tab. The Files directory and link on the bottom left is highlighted.'}
+ottrpal::include_slide("https://docs.google.com/presentation/d/1H5onDH7cBLK2m7fCcJ6ZodAAQ3wtJO8tNc2rwptrTPM/edit#slide=id.gf55fadc51c_0_3")
 ```
 
 
+## Clone Workspace
+
+## Identify bucket path
+
+## Upload file 
+
+## Check file status
+
+## Bring file into a workspace
+
+## Summary
+
+
+# Uploading from the cloud
+
+In this example, we'll upload some genomic data into AnVIL that is currently stored in the cloud (specifically, in a Google bucket).
+
+::: {.dictionary}
+*Buckets** are the name of the containers used to store files and objects on Google Cloud. Everything you store on Google Cloud _must_ be in a bucket. Each bucket has its own unique name and location (URI). When we move data files into AnVIL workspaces, we use the URI to tell AnVIL where the data should be stored. (We can also use a URI to tell AnVIL where to find the data we want to upload.)
+
+You can read more about Google Cloud buckets [here](https://docs.cloud.google.com/storage/docs/buckets)
+:::
+
+We're going to upload some fastq files for a SARS-CoV-2 sample. The bucket we're accessing contains 5 samples: two compressed fastq files, a fasta file for a SARS-CoV-2 reference genome, and two uncompressed fastq files. The bucket ID (URI) is `fc-80d0e1cd-61e9-472f-b1bd-c6a8223bd1cd`. For this activity, you will retrieve the two uncompressed fastq files and upload them into your workspace.
+
+```{r, echo=FALSE, fig.alt="Image shows the contents of a Google bucket used in the SARS-CoV-2 on Galaxy activity."}
+ottrpal::include_slide("https://docs.google.com/presentation/d/1H5onDH7cBLK2m7fCcJ6ZodAAQ3wtJO8tNc2rwptrTPM/edit?slide=id.g3ad3a8a2073_0_0#slide=id.g3ad3a8a2073_0_0")
+```
+
+## Step One: Create your workspace
+
+
+
 The starting point for bringing your own data to AnVIL is the Workspace Dashboard. At the bottom right, you'll find the full path to the Google Bucket information corresponding to your Workspace. You can click the clipboard icon on the right to copy the name of your Workspace Bucket. You will be able to see any uploaded files by clicking the "Open in browser" link.
 
 ```{r, echo=FALSE, fig.alt="Image shows a screenshot of the Workspace Dashboard. Google Bucket information, including the Google Bucket name, location, and 'Open in browser' link, at the bottom right of the screen is highlighted."}
@@ -26,6 +75,46 @@ You can also see any uploaded files by clicking the "Files" directory at the bot
 ottrpal::include_slide("https://docs.google.com/presentation/d/1H5onDH7cBLK2m7fCcJ6ZodAAQ3wtJO8tNc2rwptrTPM/edit#slide=id.gf55fadc51c_0_3")
 ```
 
+
+Data Tables provide a way to organize data and metadata, including URI links to storage buckets. These tables are a convenient way to organize input for analyses as well as tracking workflow outputs.
+
+```{r, echo=FALSE, fig.alt="Image shows a schematic of the data storage locations in an AnVIL Workspace. The Data Table is highlighted with a number 'three'."}
+ottrpal::include_slide("https://docs.google.com/presentation/d/1H5onDH7cBLK2m7fCcJ6ZodAAQ3wtJO8tNc2rwptrTPM/edit#slide=id.gf982a3c0cd_0_0")
+```
+
+
+## Access Data Uploader
+
+## Create a data collection
+
+## Upload data collection
+
+## Upload data table with metadata
+
+## Summary
+
+
+# Uploading from a remote cluster (HPC)
+
+## Install `gsutil` on your local server
+
+## Copy files
+
+## Check file status
+
+## Bring file into a workspace
+
+## Summary
+
+
+# Additional Resources
+
+You can read documentation about bringing your own data to AnVIL on the [Portal](https://anvilproject.org/learn/find-data/bringing-your-own-data)
+
+More details can be found in the [Terra documentation](https://support.terra.bio/hc/en-us/sections/360004147951)
+
+## Information from Getting Started guide
+
 ## Browser: Upload Single Files
 
 Click the "Files" directory at the bottom left of the Data Tab. Then click the "+" button in the bottom right corner of the screen. This will prompt a file browser on your local machine.
@@ -48,7 +137,7 @@ Here, you can upload files and manage your data and folders. You can also upload
 ottrpal::include_slide("https://docs.google.com/presentation/d/1H5onDH7cBLK2m7fCcJ6ZodAAQ3wtJO8tNc2rwptrTPM/edit#slide=id.gf57004a098_0_9")
 ```
 
-# `gsutil`: Local to Cloud
+## `gsutil`: Local to Cloud
 
 `gsutil` is a Python application that lets you access Cloud Storage from the command line in a terminal. The terminal you use can be run on your local machine (local instance) or built into the Workspace Cloud Environment.
 
@@ -100,8 +189,3 @@ gsutil cp users/name/data/test.bam gs://ab5-27x
 Remember that you can easily copy the Workspace Bucket ID using the clipboard button on the [Workspace Dashboard]({#bring-data-overview}). Please see the [`gsutil cp` documentation](https://cloud.google.com/storage/docs/gsutil/commands/cp) for more details, such as how to do parallel multi-threaded/multi-processing copying or copying an entire directory tree. The `gsutil cp` command can also be used to copy files from one Workspace Bucket to another (cloud-to-cloud copying).
 
 
-# Additional Resources
-
-You can read documentation about bringing your own data to AnVIL on the [Portal](https://anvilproject.org/learn/find-data/bringing-your-own-data)
-
-More details can be found in the [Terra documentation](https://support.terra.bio/hc/en-us/sections/360004147951)
diff --git a/01-intro.Rmd b/01-intro.Rmd
diff --git a/03-data_explorer.Rmd → 02-data_explorer.Rmd b/03-data_explorer.Rmd → 02-data_explorer.Rmd
@@ -33,7 +33,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1H5onDH7cBLK2m7fC
 
 # AnVIL Data Explorer
 
-The [AnVIL Data Explorer](https://explore.anvilproject.org/datasets) enables faceted searches of open and managed access datasets hosted in AnVIL, making it easier for researchers to find and custom-build cohorts. 
+The AnVIL Data Explorer enables faceted searches of open and managed access datasets hosted in AnVIL, making it easier for researchers to find and custom-build cohorts. 
 
 ```{r, echo=FALSE, fig.alt='Image shows a screenshot of the AnVIL Data Explorer website landing page.'}
 ottrpal::include_slide("https://docs.google.com/presentation/d/1H5onDH7cBLK2m7fCcJ6ZodAAQ3wtJO8tNc2rwptrTPM/edit#slide=id.g30d935bde8e_0_0")

diff --git a/04-importing_with_SRA.Rmd → 03-importing_with_SRA.Rmd b/04-importing_with_SRA.Rmd → 03-importing_with_SRA.Rmd
@@ -1,17 +1,21 @@
 
-# (PART\*) SRA ON AnVIL {-}
+# (PART\*) Importing Data from SRA {-}
 
 
 ```{r, include = FALSE}
 ottrpal::set_knitr_image_path()
 ```
 
-# Quick Start {#quick-start}
+# Quick Start: Importing a single file {#quick-start-sra}
 
-In this module, we'll bring some metagenomic data into AnVIL.
+In this example, we'll bring some metagenomic data into AnVIL.
 
 This data comes from [this BioProject](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA904247), which collected soil samples to study bacterial communities in tallgrass prairie. Bacteria play an important role in this ecosystem, but can be changed by disturbance, management, and the presence of herbivores.
 
+We will bring this data into AnVIL from the **Sequence Read Archive**, or SRA. You can check out the [SRA website](https://www.ncbi.nlm.nih.gov/sra) to learn more:
+
+> Sequence Read Archive (SRA) data, available through multiple cloud providers and NCBI servers, is the largest publicly available repository of high throughput sequencing data. The archive accepts data from all branches of life as well as metagenomic and environmental surveys. SRA stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries through data analysis. 
+
 The SRA Data corresponding to this project is located [here](https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP409181&o=acc_s%3Aa).
 
 ```{r, fig.align='center', echo = FALSE, fig.alt= "Microbiome diversity has many benefitial properties ranging soil and plant health.", out.width = '100%'}

diff --git a/05-controlled_access_data.Rmd → 04-controlled_access_data.Rmd b/05-controlled_access_data.Rmd → 04-controlled_access_data.Rmd
diff --git a/Feedback.Rmd b/Feedback.Rmd
@@ -0,0 +1,17 @@
+# (PART\*) Appendix {-}
+
+
+```{r, include = FALSE}
+ottrpal::set_knitr_image_path()
+```
+
+# Give us Feedback
+
+Thank you for your interest in this book! There are a few ways you can suggest improvements:
+
+<br>
+<!-- The capital letter above alters the formatting for the numbered points below -->
+
+1. Fill out this [Google form](https://docs.google.com/forms/d/e/1FAIpQLScrDVb_utm55pmb_SHx-RgELTEbCCWdLea0T3IzS0Oj00GE4w/viewform?usp=pp_url&entry.1565230805=AnVIL+Book+Getting+Started){target="_blank"}.
+1. If you have a GitHub account, you can [raise an issue](https://github.com/fhdsl/Data_on_AnVIL/issues){target="_blank"} in our repository.
+1. Submit a pull request!  Click the pencil icon on any page (top left) to view the source `.Rmd` for the page and suggest changes.
diff --git a/_bookdown.yml b/_bookdown.yml
@@ -2,11 +2,11 @@ book_filename: "Data on AnVIL"
 chapter_name: "Chapter"
 repo: https://github.com/jhudsl/AnVIL_Template/
 rmd_files: ["index.Rmd",
-            "01-intro.Rmd",
-            "02-importing_with_tables.Rmd",
-            "03-data_explorer.Rmd",
-            "04-importing_with_SRA.Rmd",
-            "05-controlled_access_data.Rmd",
+            "01-importing_with_tables.Rmd",
+            "02-data_explorer.Rmd",
+            "03-importing_with_SRA.Rmd",
+            "04-controlled_access_data.Rmd",
+            "Feedback.Rmd",
             "About.Rmd",
             "References.Rmd"]
 new_session: yes

diff --git a/index.Rmd b/index.Rmd
@@ -1,40 +1,41 @@
 ---
-title: "AnVIL Book Name"
+title: "Data on AnVIL"
 date: "`r format(Sys.time(), '%B %d, %Y')`"
 site: bookdown::bookdown_site
 documentclass: book
 bibliography: book.bib
 biblio-style: apalike
 link-citations: yes
-description: Description about Course/Book.
+description: This book contains vignettes on how to upload, find, and use data within an AnVIL workspace.
 favicon: assets/AnVIL_style/anvil_favicon.ico
 ---
 
 
 # About this Book {-}
 
-This book is part of a series of books for the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) of the National Human Genome Research Institute (NHGRI). Learn more about AnVIL by visiting https://anvilproject.org or reading the [article in Cell Genomics](https://www.sciencedirect.com/science/article/pii/S2666979X21001063).
+The chapters within this book contain hands-on activities to demonstrate how users can access and use data within an AnVIL workspace. Topics includes bringing your own data from an HPC, finding data already hosted on AnVIL with tools like the Data Explorer, importing data from online data repositories like SRA, and getting access to protected data stored in places like dbGaP.
+
+It can be very exciting to learn how much data is at your fingertips! Once you have settled on some data to use, you'll want to bring it into AnVIL if it's not already there.
+
+Navigate to the menu on the left to get started!
 
 ## Skills Level {-} 
 
 ::: {.notice}
 _Genetics_
-<!-- **Novice**: no genetics knowledge needed -->
+
+**Novice**: no genetics knowledge needed
 
 _Programming skills_
-<!-- **Novice**: no programming experience needed -->
+
+**Novice**: no programming experience needed
 :::
 
 ## AnVIL Collection {-}
 
-Please check out our full collection of AnVIL and related resources: https://hutchdatascience.org/AnVIL_Collection/
-
-# Learning Objectives {-}
+This module is part of a series of books for the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) of the National Human Genome Research Institute (NHGRI). 
 
-<!-- Learning objectives for this activity come from the [Genetics Core Competencies](https://genetics-gsa.org/education/genetics-learning-framework/): -->
+Please check out our full collection of AnVIL and related resources: https://hutchdatascience.org/AnVIL_Collection/
 
-<!-- - Objective 1 -->
-<!-- - Objective 2 -->
-<!-- - Objective 3 -->
+Learn more about AnVIL by visiting https://anvilproject.org or reading the [article in Cell Genomics](https://www.sciencedirect.com/science/article/pii/S2666979X21001063).
 
-<!-- Please also see the Bioinformatics core competencies for undergraduate life sciences education from NIBLSE: https://journals.plos.org/plosone/article/figure?id=10.1371/journal.pone.0196878.t002 -->
diff --git a/resources/dictionary.txt b/resources/dictionary.txt
@@ -22,6 +22,7 @@ Glimma
 glimmaMDS
 Gmail
 GTEx
+HPC
 impactful
 Inclusivity
 ingressing
@@ -54,7 +55,9 @@ timeframe
 TSA
 TSV
 underserved
+Uploader
 URI
+workspaces
 Workspaces
 Workspace's
 www