[WIP] PARSeq Model #2089

sineeli · 2025-02-10T22:36:45Z

PARSeq Model

Description of the Change

This PR adds an end-to-end scene text recognition model, PARSeq, to KerasHub. PARSeq is a ViT-based OCR model that enables iterative decoding for robust text recognition in natural scenes.

Closes the first half of #<issue_number>

Reference

For details, see Scene Text Recognition with Permuted Autoregressive Sequence Models (PARSeq paper). The model and configuration are based on the official paper and open-source implementation

Colab Notebook

Usage and numerics matching Colab:

Checklist

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have followed the Keras Hub Model contribution guidelines in making these changes.
I have followed the Keras Hub API design guidelines in making these changes.
I have signed the Contributor License Agreement.

keras_hub/src/models/parseq/parseq_tokenizer.py

abheesht17 · 2025-02-20T16:01:27Z

@sineeli - which parts of the PR are ready for review? Asking because it's still marked as draft

sineeli · 2025-02-20T18:52:00Z

Sure @abheesht17

First preprocessing and tokenizer these parts I think are good for reviewing, as they are the primary steps.

keras_hub/src/models/parseq/parseq_tokenizer.py
keras_hub/src/models/text_recognition_preprocessor.py

abheesht17

Thanks for the PR! Left some comments on the tokeniser. Will take a look at the text recognition preprocessor soon.

Sorry for the delay in reviewing

abheesht17 · 2025-02-25T01:41:14Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+        "keras_hub.models.PARSeqTokenizer",
+    ]
+)
+class PARSeqTokenizer(tokenizer.Tokenizer):


Please add a doc-string here, with examples. Makes it easier to review when we have examples :P

Let's add unit tests as well

Yes, will add them

keras_hub/src/models/parseq/parseq_tokenizer.py

abheesht17 · 2025-02-25T02:24:03Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+        self.char_to_id = tf.lookup.StaticHashTable(
+            initializer=tf.lookup.KeyValueTensorInitializer(
+                keys=list(self._stoi.keys()),
+                values=list(self._stoi.values()),
+                key_dtype=tf.string,
+                value_dtype=tf.int32,
+            ),
+            default_value=0,
+        )
+        self.id_to_char = tf.lookup.StaticHashTable(
+            initializer=tf.lookup.KeyValueTensorInitializer(
+                keys=list(self._stoi.values()),
+                values=list(self._stoi.keys()),
+                key_dtype=tf.int32,
+                value_dtype=tf.string,
+            ),
+            default_value=self.pad_token,
+        )


The defaults don't match. EOS is the 0th token, and pad is the len(vocabulary) - 1th token

I recognized the same in the original code, but seems they are using EOS -> 0, BOS->len(vocabulary), but while padding they are doing BOS first and then EOS at the end.

abheesht17 · 2025-02-25T02:24:23Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+            ),
+            default_value=0,
+        )
+        self.id_to_char = tf.lookup.StaticHashTable(


Do we need this? We aren't using it anywhere

But in case if user wants to bulk change the token ids to characters it will be helpful

keras_hub/src/models/parseq/parseq_tokenizer.py

abheesht17 · 2025-02-25T02:29:14Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+            label = tf.strings.upper(label)
+
+        label = tf.strings.regex_replace(label, self.unsupported_regex, "")
+        label = tf.strings.substr(label, 0, self.max_label_length)


Why are we truncating the input to 25 characters?

While preparing the dataset in the preprocessing itself if the label is above 25 they jus ignore that datapoint itself. Instead I truncated and we can start and end tokens instead.

Ref: https://github.com/baudm/parseq/blob/1902db043c029a7e03a3818c616c06600af574be/strhub/data/dataset.py#L112

keras_hub/src/models/parseq/parseq_tokenizer.py

keras_hub/src/models/parseq/parseq_causal_lm.py

sineeli · 2025-05-30T21:09:54Z

@sachinprasadhs, @abheesht17, @mattdangerw

Can you take a look at the PR when you get some time, thank you!

sachinprasadhs

Thanks, added some comments,
could you please add a PR description by following the recent PR description template which includes Colab notebook link with end to end working demo and numerics verification.
Also add the original implementation reference in the PR description.

keras_hub/src/models/parseq/parseq_backbone.py

sachinprasadhs · 2025-06-09T21:05:12Z

keras_hub/src/models/parseq/parseq_backbone.py

+        dropout_rate: float. The dropout rate. Defaults to `0.1`.
+        attention_dropout: float. The dropout rate for the attention weights.
+        Defaults to `0.1`.
+        dtype: str. The dtype used for layers.


Follow same arg description we follow for other models for dtype.

sachinprasadhs · 2025-06-09T22:24:15Z

keras_hub/src/models/parseq/parseq_backbone.py

+        Defaults to `0.1`.
+        dtype: str. The dtype used for layers.
+        **kwargs: Additional keyword arguments passed to the base
+            `keras.Model` constructor.


Add an Examples section demonstrating sample usage of the backbone

Adding in causal_lm file rather than here. Its more suitable there

sachinprasadhs · 2025-06-09T22:29:01Z

keras_hub/src/models/parseq/parseq_decoder.py

+        hidden_dim: int. The dimension of the hidden layers.
+        num_heads: int. The number of attention heads.
+        mlp_dim: int. The dimension of the MLP hidden layer.
+        dropout_rate: float. The dropout rate.


Update it to where exactly dropout will be applied, like MLP stage etc.

sachinprasadhs · 2025-06-09T22:32:57Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+            type (e.g., "int32") or a string type ("string").
+            Defaults to `"int32"`.
+        **kwargs: Additional keyword arguments passed to the base
+            `keras.layers.Layer` constructor.


Add Example section as well and unit test still pending I guess?

In preprocessor section we have the testing of both image converter and tokenizer

sineeli added 13 commits January 31, 2025 11:11

Base for parseq model

528d3a4

make it vit compatiable with diff height and width sizes

3bf11cd

correct vit conv scripts

a8fb177

make class token optional in backbone by default its included

6f4363a

add flags to adjust vit network

d1cece0

add test case for without class_token

92b2745

Merge branch 'master' into parseq

ed00b73

decoder file

25f661c

parseq tokenizer base

f97fab1

add api for parseq tokenizer

d424210

Add missing arg max_label_length.

3f3ad0d

nit

bb4457e

Merge branch 'master' into parseq

68829f8

sineeli commented Feb 10, 2025

View reviewed changes

keras_hub/src/models/parseq/parseq_tokenizer.py Show resolved Hide resolved

sineeli added 5 commits February 11, 2025 15:28

add missing normalization step using tf_text

1bde466

add missing config for preprocessor

e6c5379

add default start, pad and end tokens

5b08c93

nit

49260ef

correct special token order

b4150ed

abheesht17 self-assigned this Feb 18, 2025

divyashreepathihalli requested a review from abheesht17 February 18, 2025 17:20

sineeli added 3 commits February 18, 2025 10:33

return padding mask as well

ed8b9d7

use proper keras ops

4e4511c

nit

9222331

abheesht17 requested changes Feb 25, 2025

View reviewed changes

sineeli added 3 commits March 3, 2025 11:42

add decoder for parseq

78a07a0

Build unbuilt layers for model validation

decc12c

fix forward pass and decoder

7aa2b67

try to fix jax backend concretization error

032515d

sineeli commented May 8, 2025

View reviewed changes

keras_hub/src/models/parseq/parseq_causal_lm.py Outdated Show resolved Hide resolved

sineeli added 16 commits May 9, 2025 11:10

fix mask broadcast error

1f92e17

fix repeat for mismatch output length

85e9df2

ignore permutation based training

09157f1

fix dtype and add test case for parseq

a9e367a

Merge branch 'master' into parseq

eba3e69

fix input format and add causal lm testing

0e7cbbd

use numpy random images

a87ae57

fix jax backend issue when reduction set to "mean_with_sample_weight"

7c1fe2c

remove redudant classes and use causal lm base calsses itself.

58917dd

nit

3cf997c

fix decoder_head_dim usage

f3f3cef

fix preprocessing issues

eb5d4ef

Merge branch 'master' into parseq

f5e21ed

add checkpoint convertion script

b6b7a26

add missing flag

8c6f14c

validate convertion outputs

e89398b

sineeli marked this pull request as ready for review May 19, 2025 17:50

nit

764a204

sineeli requested review from abheesht17 and mattdangerw May 19, 2025 21:11

sineeli and others added 2 commits May 20, 2025 12:11

fix training for permutation logic

180774d

Merge branch 'master' into parseq

4201d0b

sineeli requested a review from sachinprasadhs May 30, 2025 21:10

sachinprasadhs reviewed Jun 9, 2025

View reviewed changes

sineeli added 3 commits June 18, 2025 22:56

add example usage for backbone and causal lm

751b0a8

nit

3860843

Merge remote-tracking branch 'upstream/master' into parseq

6f5f093

[WIP] PARSeq Model #2089

Are you sure you want to change the base?

[WIP] PARSeq Model #2089

Uh oh!

Conversation

sineeli commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PARSeq Model

Description of the Change

Reference

Colab Notebook

Checklist

Uh oh!

Uh oh!

abheesht17 commented Feb 20, 2025

Uh oh!

sineeli commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sineeli commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sachinprasadhs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sineeli commented Feb 10, 2025 •

edited

Loading

sineeli commented Feb 20, 2025 •

edited

Loading

abheesht17 left a comment •

edited

Loading

sineeli commented May 30, 2025 •

edited

Loading