Skip to content

Commit b450737

Browse files
Paul-Cornellajaykrish2303ajay23-uns
authored
[Hold] Single local file UI quickstart (#566)
Co-authored-by: ajaykrish2303 <[email protected]> Co-authored-by: ajay23-uns <[email protected]>
1 parent 1a2e77d commit b450737

16 files changed

+234
-32
lines changed

api-reference/workflow/overview.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ scenarios as well as for documentation, reporting, and recovery needs.
1717

1818
Choose one of the following options to get started with the Unstructured Workflow Endpoint:
1919

20-
- Follow the [quickstart](#quickstart), which uses the Unstructured Python SDK from a remote hosted Google Collab notebook.
20+
- Follow the [quickstart](#quickstart), which uses the Unstructured Python SDK from a remote hosted Google Colab notebook.
2121
- Start using the [Unstructred Python SDK](#unstructured-python-sdk).
2222
- Start using a [REST](#rest-endpoints) client, such as `curl` or Postman.
2323

examplecode/tools/vectorshift.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ allowfullscreen
2424
></iframe>
2525

2626
import PineconeShared from '/snippets/general-shared-text/pinecone.mdx';
27-
import GetStartedSimpleUIAPI from '/snippets/general-shared-text/get-started-simple-ui-api.mdx';
27+
import GetStartedSimpleAPIOnly from '/snippets/general-shared-text/get-started-simple-api-only.mdx';
2828

2929
<PineconeShared />
3030

@@ -34,7 +34,7 @@ Also:
3434
- [Sign up for a VectorShift Starter account](https://app.vectorshift.ai/api/signup).
3535
- Sign up for an Unstructured account:
3636

37-
<GetStartedSimpleUIAPI />
37+
<GetStartedSimpleAPIOnly />
3838

3939
## Create and run the demonstration project
4040

108 KB
Loading
422 KB
Loading

img/ui/Workflow-Test-Source.png

27.9 KB
Loading

snippets/general-shared-text/get-started-simple-ui-api.mdx

Lines changed: 0 additions & 16 deletions
This file was deleted.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Go to [https://platform.unstructured.io](https://platform.unstructured.io) and use your email address, Google account, or GitHub account to
2+
sign up for an Unstructured account (if you do not already have one) and sign into the account at the same time. The
3+
[Unstructured user interface (UI)](/ui/overview) appears, and you can start using it right away.
4+
5+
<Tip>
6+
By following the preceding instructions, you are signed up for a [Developer](https://unstructured.io/developers) pay per page account by default.
7+
8+
To save money, consider switching to a [Subscribe & Save](https://unstructured.io/subscribeandsave) account instead. To save even more money,
9+
consider switching to an [Enterprise](https://unstructured.io/enterprise) account instead.
10+
</Tip>

snippets/quickstarts/platform-api.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
This quickstart uses the Unstructured Python SDK to call the Unstructured Workflow Endpoint to get your data RAG-ready. The Python code for this
2-
quickstart is in a remote hosted Google Collab notebook. Data is processed on Unstructured-hosted compute resources.
2+
quickstart is in a remote hosted Google Colab notebook. Data is processed on Unstructured-hosted compute resources.
33

44
The requirements are as follows:
55

snippets/quickstarts/platform.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,11 @@ allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; pic
1616
allowfullscreen
1717
></iframe>
1818

19-
import GetStartedSimpleUIAPI from '/snippets/general-shared-text/get-started-simple-ui-api.mdx';
19+
import GetStartedSimpleUIOnly from '/snippets/general-shared-text/get-started-simple-ui-only.mdx';
2020

2121
<Steps>
2222
<Step title="Sign up and sign in">
23-
<GetStartedSimpleUIAPI />
23+
<GetStartedSimpleUIOnly />
2424
</Step>
2525
<Step title="Set the source (input) location">
2626
![Sources in the sidebar](/img/ui/Sources-Sidebar.png)
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
This quickstart uses a no-code, point-and-click user interface (UI) in your web browser to have Unstructured process a single file that is stored on your local machine.
2+
3+
The file is first processed on Unstructured-hosted compute resources. The UI then shows the processed data that Unstructured generates for that file.
4+
You can download that processed data as a `.json` file to your local machine.
5+
6+
This approach enables rapid, local, run-adjust-repeat prototyping of end-to-end Unstructured ETL+ workflows with a full range of Unstructured features.
7+
After you get the results you want, you can then attach remote source and destination connectors to both ends of your existing workflow to begin processing remote files and data at scale in production.
8+
9+
To run this quickstart, you will need a local file with a size of 10 MB or less and one of the following file types:
10+
11+
| File type |
12+
|---|
13+
| `.bmp` |
14+
| `.csv` |
15+
| `.doc` |
16+
| `.docx` |
17+
| `.email` |
18+
| `.epub` |
19+
| `.heic` |
20+
| `.html` |
21+
| `.jpg` |
22+
| `.md` |
23+
| `.odt` |
24+
| `.org` |
25+
| `.pdf` |
26+
| `.pot` |
27+
| `.potm` |
28+
| `.ppt` |
29+
| `.pptm` |
30+
| `.pptx` |
31+
| `.rst` |
32+
| `.rtf` |
33+
| `.sgl` |
34+
| `.tiff` |
35+
| `.txt` |
36+
| `.tsv` |
37+
| `.xls` |
38+
| `.xlsx` |
39+
| `.xml` |
40+
41+
<Note>
42+
For processing remote files at scale in production, Unstructured supports many more files types than these. [See the list of supported file types](/ui/supported-file-types).
43+
44+
Unstructured also supports processing files from remote object stores, and data from remote sources in websites, web apps, databases, and vector stores. For more information, see the [source connector overview](/ui/sources/overview) and the [remote quickstart](/ui/quickstart#remote-quickstart)
45+
for how to set up and run production-ready Unstructured ETL+ workflows at scale.
46+
</Note>
47+
48+
If you do not have any files available, you can use one of the sample files that Unstructured offers in the UI. Or, you can download one or more sample files
49+
from the [example-docs](https://github.com/Unstructured-IO/unstructured-ingest/tree/main/example-docs) folder in the Unstructured repo on GitHub.
50+
51+
import GetStartedSimpleUIOnly from '/snippets/general-shared-text/get-started-simple-ui-only.mdx';
52+
53+
<Steps>
54+
<Step title="Sign up and sign in">
55+
<GetStartedSimpleUIOnly />
56+
</Step>
57+
<Step title="Create a workflow">
58+
1. In the Unstructured UI, on the sidebar, click **Workflows**.
59+
60+
![Workflows in the sidebar](/img/ui/Workflows-Sidebar.png)
61+
62+
2. Click **New Workflow**.
63+
3. Select **Build it Myself**, if it is not already selected.
64+
4. Click **Continue**. The visual workflow editor appears.
65+
66+
![Visual workflow designer](/img/ui/Workflow-Single-File-Design.png)
67+
68+
The workflow is represented visually as a series of directed acyclic graph (DAG) nodes. Each node represents a
69+
step in the workflow. The workflow proceeds end to end from left to right. By default, the workflow starts with three nodes:
70+
71+
- **Source**: This node represents the location where you have your files or data for Unstructured to process. For this quickstart, this node represents a single file on your local machine.
72+
After you get the results you want, you can update this node to represent files or data in a remote location at scale in production.
73+
- **Partitioner**: This node represents the [partitioning](/ui/partitioning) step, which extracts content from unstructured files and data and outputs it as structured
74+
[document elements](/ui/document-elements) for consistent representation across varying kinds of file and data types. For this quickstart, this node extracts the contents of a single file on
75+
your local machine and outputs it as a series of structured document elements in JSON format.
76+
- **Destination**: This node represents the location where you want Unstructured to put the processed files or data.
77+
After you get the results you want, you can update this node to have Unstructured put the processed files or data into a remote location at scale in production.
78+
79+
</Step>
80+
<Step title="Process a local file">
81+
1. Drag the file that you want Unstructured to process from your local machine's file browser app and drop it into the **Source** node's **Drop file to test** area.
82+
The file must have a size of 10 MB or less and one of the file types listed at the beginning of this quickstart.
83+
84+
If you are not able to drag and drop the file, you can click **Drop file to test** and then browse to and select the file instead.
85+
86+
Alternatively, you can use a sample file that Unstructured offers. To do this, click the **Source** node, and then in the **Source** pane, with
87+
**Details** selected, on the **Local file** tab, click one of the files under **Or use a provided sample file**. To view the file's contents before you
88+
select it, click the eyes button next to the file.
89+
90+
2. Above the **Source** node, click **Test**.
91+
92+
![Testing a single local file workflow](/img/ui/Workflow-Test-Source.png)
93+
94+
Unstructured displays a visual representation of the file and begins processing its contents, sending it through each of the workflow's nodes in sequence.
95+
Depending on the file's size and the workflow's complexity, this processing could take several minutes.
96+
97+
After Unstructured has finished its processing, the processed data appears in the **Test output** pane, as a series of structured elements in JSON format.
98+
99+
![Viewing single local file output](/img/ui/Workflow-Test-Single-File-Output.png)
100+
101+
3. In the **Test output** pane, you can:
102+
103+
- Search through the processed, JSON-formatted representation of the file by using the **Search JSON** box.
104+
- Download the full JSON as a `.json` file to your local machine by clicking **Download full JSON**.
105+
106+
4. When you are done, click the **Close** button in the **Test output** pane.
107+
108+
</Step>
109+
<Step title="Add more nodes to the workflow">
110+
1. You can now add more nodes to the workflow to do further testing of various Unstructured features and with the option of eventually moving the workflow into production. For example, you can:
111+
112+
![Adding a node to the workflow](/img/ui/Workflow-Add-Node.png)
113+
114+
- Add a **Chunker** node after the **Partitioner** node, to chunk the partitioned data into smaller pieces for your retrieval augmented generation (RAG) applications.
115+
To do this, click the add (**+**) button to the right of the **Partitioner** node, and then click **Enrich > Chunker**. Click the new **Chunker** node and
116+
specify its settings. For help, click the **FAQ** button in the **Chunker** node's pane. [Learn more about chunking and chunker settings](/ui/chunking).
117+
- Add an **Enrichment** node after the **Chunker** node, to apply enrichments to the chunked data such as image summaries, table summaries, table-to-HTML transforms, and
118+
named entity recognition (NER). To do this, click the add (**+**) button to the right of the **Chunker** node, and then click **Enrich > Enrichment**.
119+
Click the new **Enrichment** node and specify its settings. For help, click the **FAQ** button in the **Enrichment** node's pane. [Learn more about enrichments and enrichment settings](/ui/enriching/overview).
120+
- Add an **Embedder** node after the **Enrichment** node, to generate vector embeddings for performing vector-based searches. To do this, click the add (**+**) button to the
121+
right of the **Enrichment** node, and then click **Transform > Embedder**. Click the new **Embedder** node and specify its settings. For help, click the **FAQ** button
122+
in the **Embedder** node's pane. [Learn more about embedding and embedding settings](/ui/embedding).
123+
124+
2. Each time you add a node or change its settings, you can click **Test** above the **Source** node again to test the current workflow end to end and see the results of the changes, if any.
125+
126+
3. Keep repeating this step as many times as you want, until you get the results you want.
127+
128+
</Step>
129+
<Step title="Next steps">
130+
After you get the results you want, you have the option of moving your workflow into production. To do this, complete the following instructions.
131+
132+
<Note>
133+
The following instructions have you create a new workflow that is suitable for production.
134+
This behavior is planned to be fixed in a future release, allowing you to update the workflow that you just created, rather than needing to create a new one.
135+
</Note>
136+
137+
1. With your workflow remaining open in the visual workflow editor, open a new tab in your web browser, and in this new tab,
138+
sign in to your Unstructured account, at [https://platform.unstructured.io](https://platform.unstructured.io).
139+
2. In this new tab, create a [source connector](/ui/sources/overview) for your remote source location. This is the location in production where you have files or data in a file or object store, website, database, or vector store that you want Unstructured to process.
140+
141+
![Connectors button on the sidebar](/img/ui/Sources-Sidebar.png)
142+
143+
3. Create a [destination connector](/ui/destinations/overview) for your remote destination location. This is the location in production where you want Unstructured to put the processed data as `.json` files in a file or object store, or as records in a database or vector store.
144+
4. Create a workflow: on the sidebar, click **Workflows**, and then click **New Workflow**. Select **Build it Myself**, and then click **Continue** to open the visual workflow editor.
145+
5. In the visual workflow editor, click **Source**.
146+
6. In the **Source** pane, with **Details** selected, on the **Connectors** tab, select the source connector that you just created.
147+
7. Click the **Destination** node.
148+
8. In the **Destination** pane, with **Details** selected, select the destination connector that you just created.
149+
9. Using your original workflow on the other tab as a guide, add any additional nodes to this new workflow as needed, and configure those new nodes' settings to match the other ones.
150+
10. Click **Save**.
151+
11. To run the workflow:
152+
153+
a. Make sure to click **Save** first.<br/>
154+
b. Click the **Close** button next to the workflow's name in the top navigation bar.<br/>
155+
c. On the sidebar, click **Workflows**.<br/>
156+
d. In the list of available workflows, click the **Run** button for the workflow that you just saved.<br/>
157+
e. On the sidebar, click **Jobs**.<br/>
158+
159+
![Viewing the list of available jobs](/img/ui/Select-Job.png)
160+
161+
f. In the list of available jobs, click the job that you just ran.<br/>
162+
g. After the job status shows **Finished**, go to the your destination location to see the processed files or data that Unstructured put there.
163+
164+
See also the [remote quickstart](/ui/quickstart#remote-quickstart) for more coverage about how to set up and run production-ready Unstructured ETL+ workflows at scale.
165+
</Step>
166+
</Steps>

ui/enriching/image-descriptions.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,4 +57,7 @@ Select **Image**, and then choose one of the following provider (and model) comb
5757
- **OpenAI (GPT-4o)**. [Learn more](https://openai.com/index/hello-gpt-4o/).
5858
- **Anthropic (Claude 3.5 Sonnet)**. [Learn more](https://www.anthropic.com/news/claude-3-5-sonnet).
5959
- **Amazon Bedrock (Claude 3.5 Sonnet)**. [Learn more](https://aws.amazon.com/bedrock/claude/).
60-
- **Vertex AI (Gemini 2.0 Flash)**. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2).
60+
- **Vertex AI (Gemini 2.0 Flash)**. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2).
61+
62+
You can also customize the prompt used for image enrichment. In the **Details** tab, under **Prompt**, click **Edit**. Use the preview button on the
63+
**Input Sample** tab to view the image input, and click **Run Prompt** in the **Edit & Test Prompt** section to test the prompt.

ui/enriching/overview.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,7 @@ To add an enrichment, in an **Enrichment** node in a workflow, select one of the
2222
- **Table** can also provide a representation of each detected table in HTML markup format. [Learn more](/ui/enriching/table-to-html).
2323
- **Text** to provide a list of recognized entities and their types by using a technique called _named entity recognition_ (NER). [Learn more](/ui/enriching/ner).
2424

25+
All enrichment types also support custom prompts. In the **Details** tab, first select the input type (**Image**, **Table**, or **Text**) and choose a provider (and model) combination. Then, under **Prompt**, click **Edit**.
26+
For **Image** and **Table**, use the preview button on the **Input Sample** tab to see the input before running the prompt. You can test your custom prompt in the **Edit & Test Prompt** section by clicking **Run Prompt**.
27+
2528
To add multiple enrichments, create an additional **Enrichment** node for each enrichment type that you want to add.

ui/enriching/table-descriptions.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,4 +67,7 @@ Select **Table**, and then choose one of the following provider (and model) comb
6767
- **Vertex AI (Gemini 2.0 Flash)**. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2).
6868

6969
Make sure after you choose the provider and model, that **Table Description** is also displayed. If **Table Description** and **Table to HTML** are both
70-
displayed, be sure to select **Table Description**.
70+
displayed, be sure to select **Table Description**.
71+
72+
You can also customize the prompt used for table enrichment. In the **Details** tab, under **Prompt**, click **Edit**. Use the preview button on the
73+
**Input Sample** tab to view the table input, and click **Run Prompt** in the **Edit & Test Prompt** section to test the prompt.

0 commit comments

Comments
 (0)