Skip to content

Commit

Permalink
Invoice parsing
Browse files Browse the repository at this point in the history
  • Loading branch information
vl-albedo committed Nov 2, 2023
1 parent 0a18fe1 commit 60f714e
Show file tree
Hide file tree
Showing 5 changed files with 312 additions and 1 deletion.
2 changes: 1 addition & 1 deletion ocr/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,5 +37,5 @@ Even the complex recognition tasks can be done with a couple of API calls. You d
Master Aspose.OCR Cloud API and and build your own cross-platform OCR applications.
- [How-to's](/ocr/how-to/)
Find answers to the most common questions you may have when using Aspose.OCR Cloud.
- [Release Notes](https://releases.aspose.cloud/ocr/release-notes/)
- [Release Notes](/ocr/release-notes/)
Read a summary of recent changes, enhancements and bug fixes in Aspose.OCR Cloud.
33 changes: 33 additions & 0 deletions ocr/developer-reference/recognize-parse-invoice/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
weight: 70
date: "2023-11-02"
author: "Vladimir Lapin"
type: docs
url: /recognize-parse-invoice/
feedback: OCRCLOUD
title: Extracting information from scanned invoices
description: Extract information such as numbers, dates, items, and totals from scanned invoices using the Aspose.OCR Cloud API.
keywords:
- OCR
- recognize
- invoice
- parse
- details
- statement
---

Invoices are commonly exchanged as scanned documents in many business and financial transactions. Aspose.OCR Cloud extends beyond traditional optical character recognition by employing natural language processing to extract specific information from invoices and filters out specific information from them. The results can serve various purposes, such as generating summary reports, stored in a database, or seamlessly integrated into accounting, financial, and banking software.

The processing is performed in 3 API calls:

1. [Get access token](/ocr/authorization/)
2. [Send invoice for recognition](/ocr/send-invoice-for-recognition/)
3. [Fetch machine-readable invoice data](/ocr/fetch-invoice-recognition-result/)

Because Aspose.OCR Cloud is provided as a REST API, invoice processing can be performed from any platform with Internet access.

Aspose also provides open-source [SDKs](/ocr/invoice-recognition-sdk/) for all popular programming languages, that wrap all routine invoice processing into a few native methods. It makes interaction with Aspose.OCR Cloud services much easier, allowing you to focus on the task at hand rather than technical details.

{{% alert color="primary" %}}
Make sure the application has access to the **api.aspose.cloud** domain.
{{% /alert %}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
weight: 20
date: "2023-11-02"
author: "Vladimir Lapin"
type: docs
url: /fetch-invoice-recognition-result/
feedback: OCRCLOUD
title: Fetching invoice processing result
description: How to get the parsed invoice data from the Aspose.OCR Cloud queue.
keywords:
- OCR
- recognize
- queue
- get
- obtain
- fetch
- result
- invoice
---

When an invoice is [submitted](/ocr/send-invoice-for-recognition/) for processing, it is [queued](/ocr/recognition-workflow/) to ensure a stable response even under high load. To obtain the result, send a **GET** request to the `https://api.aspose.cloud/v5.0/ocr/RecognizeAndParseInvoice` Aspose.OCR Cloud REST API endpoint. To authorize the request, pass the [access token](/ocr/authorization/) in **Authorization** header (_Bearer authentication_).

Provide the [unique identifier](/ocr/send-invoice-for-recognition/#return-value) of the invoice processing task in `id` parameter:

```bash
curl --request GET --location 'https://api.aspose.cloud/v5.0/ocr/RecognizeAndParseInvoice?id=39b37b24-86e8-4e91-9a99-6c2574853eb5' \
--header 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...HaRYOxBcCRCPLnrFCVXpw7UA' \
```

## Processing results

The processing result is returned in JSON format in the response body.

```json
{
"id": "39b37b24-86e8-4e91-9a99-6c2574853eb5",
"responseStatusCode": "Ok",
"taskStatus": "Completed",
"results": [
{
"type": "Text",
"data": "eyJpc3N1ZV9kYXRlIjogIjIwMTctMTEtMjciL...sICJhY2NvdW50IjogIjJ4eHh4MG9veGsifQ=="
}
],
"error": null
}
```

{{% alert color="primary" %}}
Processing results are stored in the Aspose cloud and can be obtained by the task ID within **24 hours** after the invoice was sent to Aspose.OCR Cloud.
{{% /alert %}}

Property | Type | Description
--------- | ---- | -----------
`id` | string | Unique identifier of the invoice processing task. Equals to the value of the `id` request property.
`taskStatus` | string | [Current state](#task-statuses) of the invoice processing task in the queue.
`responseStatusCode` | string | Processing response status.
`results` | Base64 encoded JSON | [Invoice details](#invoice-details) in JSON format.<br />The data is returned as Base64 encoded string. You must decode it to deserialize into an object, display on the screen or save to a file.
`error/messages` | string[] | Processing error messages, if any.<br />Even if the invoice was processed, you can still get notifications and warnings about non-fatal processing errors.

## Invoice details

The parsed invoice contents are returned in JSON format:

```json
{
"issue_date": "2017-11-27",
"due_date": "",
"supplier_name": "abc exports",
"supplier_address": "4300 longbeach blvd longbeach california 90807 united states",
"supplier_email": "",
"supplier_phone": "15627349957",
"supplier_tax_id": "",
"receiver_name": "abc imports",
"receiver_address": "140 wecker road manstield brisbane queensland 4122 australia",
"receiver_tax_id": "",
"currency": "usd",
"total_amount": 43550.0,
"vat": -1,
"net_amount": 43550.0,
"bank_name": "bank of america",
"bic": "",
"account": "2xxxx0ooxk"
}
```

At the moment, Aspose.OCR API recognizes the following invoice data:

Property | Format | Description
-------- | ------ | -----------
"issue_date" | string | Invoice issue date in _YYYY-MM-dd_ format.
"due_date" | string | Invoice due date in _YYYY-MM-dd_ format.
"supplier_name" | string | Supplier or service provider name.
"supplier_address" | string | Supplier or service provider address (as one string).
"supplier_email" | string | Supplier or service provider email address.
"supplier_phone" | string | Supplier or service provider phone number (as provided in the invoice, without conversion to international format).
"supplier_tax_id" | string | Supplier or service provider TIN or similar ID.
"receiver_name" | string | Receiver name.
"receiver_address" | string | Receiver address (as one string).
"receiver_tax_id" | string | Receiver TIN or similar ID.
"currency" | string | Invoice currency.
"total_amount" | number | Total (raw) amount due.
"vat" | number | VAT, percent. `-1` if the value is missing in the invoice.
"net_amount" | number | Net amount due.
"bank_name" | string | Supplier's bank name.
"bic" | string | Supplier's SWIFT or similar code.
"account" | string | Supplier's account number.

{{% alert color="primary" %}}
The availability of the properties above depends on the invoice text and structure.
{{% /alert %}}

## Task statuses

Processing may take up to several seconds depending on the Aspose.OCR cloud load and the size of the original scan or photo. The status of the processing task is indicated in the `taskStatus` property of the processing result.

Status code | Description | To do
----------- | ----------- | ------
Pending | The invoice is queued for processing, but not yet processed. | Try fetching the result in a couple of seconds using the same ID.
Processing | The invoice is currently being processed. | Fetch the result again using the same ID.
Completed | The invoice is processed. | Read the result from `results` property.
Error | An error occurred during processing. | Check messages in the `error` property for more information.
NotExist | The request with the specified ID does not exist, or the result has already been deleted from the cloud storage. | Check the ID or [send the invoice for processing](/ocr/send-invoice-for-recognition/) again with the same parameters.
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
weight: 30
date: "2023-11-02"
author: "Vladimir Lapin"
type: docs
url: /invoice-recognition-sdk/
feedback: OCRCLOUD
title: Invoice processing with Aspose.OCR Cloud SDK
description: How to use Aspose.OCR Cloud SDK for parsing scanned or photographed invoices.
keywords:
- OCR
- process
- parse
- programming
- development
- SDK
- invoice
---

Although you can directly call the Aspose.OCR Cloud REST API to [send invoices for processing](/ocr/send-invoice-for-recognition/) and [fetch parsed data](/ocr/fetch-invoice-recognition-result/), there is a much easier way to implement OCR functionality in your applications. We provide software development kits (SDKs) for all popular programming languages. They wrap up all routine operations such as establishing connections, sending API requests, and parsing responses into a few simple methods. It makes interaction with Aspose.OCR Cloud services much easier, allowing you to focus on business logic rather than technical details.

{{< tabs tabID="1" tabTotal="1" tabName1=".NET" >}}

{{< tab tabNum="1" >}}
```csharp
using Aspose.OCR.Cloud.SDK.Api;
using Aspose.OCR.Cloud.SDK.Model;
using System.Text;

namespace Example
{
internal class Program
{
static void Main(string[] args)
{
/** Authorize your requests to Aspose.OCR Cloud API */
RecognizeAndParseInvoiceApi api = new RecognizeAndParseInvoiceApi("<Client Id>", "<Client Secret>");
/** Read invoice image to array of bytes */
byte[] invoice = File.ReadAllBytes("invoice.png");
/** Specify recognition language */
OCRSettingsRecognizeAndParseInvoice recognitionSettings = new OCRSettingsRecognizeAndParseInvoice {
Language = Language.English
};
/** Send invoice for processing */
OCRRecognizeAndParseInvoiceBody source = new OCRRecognizeAndParseInvoiceBody(invoice, recognitionSettings);
string taskID = api.PostRecognizeReceipt(source);
/** Fetch recognition result */
OCRResponse result = api.GetRecognizeAndParseInvoice(taskID);
Console.WriteLine(Encoding.UTF8.GetString(result.Results[0].Data));
}
}
}
```

Visit our GitHub repository for a working code and sample files: https://github.com/aspose-ocr-cloud/aspose-ocr-cloud-dotnet
{{< /tab >}}

{{< /tabs >}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
weight: 10
date: "2023-11-02"
author: "Vladimir Lapin"
type: docs
url: /send-invoice-for-recognition/
feedback: OCRCLOUD
title: Sending invoice for recognition
description: How to send a photo or scan of the invoice for processing to the Aspose.OCR Cloud API.
keywords:
- OCR
- recognize
- queue
- send
- invoice
---

To extract information from a scanned or photographed invoice, send a **POST** request to the `https://api.aspose.cloud/v5.0/ocr/RecognizeAndParseInvoice` Aspose.OCR Cloud REST API endpoint. To authorize the request, pass the [access token](/ocr/authorization/) in **Authorization** header (_Bearer authentication_).

The invoice and recognition parameters are provided in JSON format in the request body.

```json
{
"image": "Base64 string",
"settings": {
"language": "English",
"makeSkewCorrect": true,
"rotate": 0,
"makeBinarization": false,
"makeUpsampling": false,
"makeSpellCheck": false,
}
}
```

## Providing invoice image

Photo or scan of the invoice is provided in a value of `image` property as a Base64 encoded string.

{{% alert color="caution" %}}
Base64 encoded file can be very long, especially when recognizing scans and high resolution photos. As a result, you may encounter an error when calling recognition via cURL in a shell command. Use the `getconf ARG_MAX` command to check the maximum length of the command arguments (in bytes).
{{% /alert %}}

## Recognition settings

Property | Type | Default&nbsp;value | Description
------- | ---- | ------------- | -----------
`language` | string | `English` | Specify a [language](/ocr/supported-languages/) for recognition.
`makeSkewCorrect` | boolean | `true` | Automatically correct invoice image tilt (deskew) before proceeding to recognition.<br />Automatic deskew works for images rotated 15 degrees or less. If the scan or photo is rotated by a larger degree or upside down, you must manually specify the rotation angle.
`rotate` | integer | `0` | Rotate an invoice image by the specified degree.<br />Should be used when the image is rotated by a significant angle or turned upside down.
`makeBinarization` | boolean | `false` | Automatically convert an invoice to black and white before proceeding to recognition.
`makeUpsampling` | boolean | `false` | Intellectually upscale an invoice image to improve small font recognition and detection of dense lines.
`makeSpellCheck` | boolean | `false` | Automatically replace commonly misspelled words in recognition results with the correct ones. The dictionary is based on the [selected recognition language](/ocr/supported-languages/).

## Image preprocessing order

If image preprocessing filters are enabled, they are applied one after the other in the following order:

1. [Upsampling](/ocr/upsample-image/#using-the-recognition-setting) (`"makeUpsampling": true`)
2. [Skew correction](/ocr/deskew-image/#using-the-recognition-setting) (`"makeSkewCorrect": true`)

If you want to apply preprocessing filters in another order, disable the corresponding recognition settings and use [self-managed preprocessing](/ocr/preprocess-image/).

## Return value

If successful, this method returns a string with a unique identifier (GUID) of the invoice recognition request in the [queue](/ocr/recognition-workflow/).

Otherwise, it returns a HTTP status code corresponding to the error.

## What's next

Recognition and processing will take a few seconds, depending on the size of the source file and the current Aspose.Cloud load. See the article [Fetching invoice processing result](/ocr/fetch-invoice-recognition-result/) for information on how to get a JSON with parsed invoice data from the server.

## cURL example

{{< tabs tabID="1" tabTotal="2" tabName1="Request" tabName2="Response" >}}
{{< tab tabNum="1" >}}
```bash
curl --location --request POST 'https://api.aspose.cloud/v5.0/ocr/RecognizeAndParseInvoice' \
--header 'Accept: text/plain' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...HaRYOxBcCRCPLnrFCVXpw7UA' \
--data-raw '{
"image": "/9j/4AAQSkZJRgABAQEBLAEsAAD...8AkTf/2Q==",
"settings": {
"language": "English",
"makeSpellCheck": true
}
}'
```
{{< /tab >}}
{{< tab tabNum="2" >}}
```
39b37b24-86e8-4e91-9a99-6c2574853eb5
```
{{< /tab >}}
{{< /tabs >}}

0 comments on commit 60f714e

Please sign in to comment.