LSTM layer doesn't learn with TextVectorization output (padding ?) #20898
Labels
stat:awaiting response from contributor
type:support
User is asking for help / asking an implementation question. Stackoverflow would be better suited.
I have an issue on my own textual data when training an LSTM for sentiment analysis. I used (Keras 2) to encode the textual data with the old
Tokenizer
+pad_sequences
way, and switched to the newTextVectorization
object, but it doesn't learn anymore (losses don't change, accuracies around .50). So I tried with an example from the documentation :I reran the example "Text classification from scratch" from the doc ( https://keras.io/examples/nlp/text_classification_from_scratch/ ), and it works fine as is (validation accuracy is going up).
Then I replaced the Conv1D and GlobalMaxPooling1D layers by a LSTM layer, and the model doesn't learn, the train and validation accuracies stay around .50.
However if I pass
go_backwards=True
to the LSTM layer, then it learns correctly (but also reads each text backwards consequently).It might be due to the fact that the
TextVectorization
layer "post"-pads the input (the x input vectors are filled with zeros at the end), whereas the LSTM layers expects "pre"-padded inputs (the x input vectors are filled with zeros at the beginning), and thus doesn't iterate at all on the input tokens.Indeed, the "Bidirectional LSTM on IMDB" https://keras.io/examples/nlp/bidirectional_lstm_imdb/ works well, as it loads an already tokenized (but not padded) version of IMDB, and thus doesn't use the
TextVectorization
layer, but thekeras.utils.pad_sequences
function that "pre"-pads by default. What is weird though is that it still does learn when settingpadding='post'
, but the validation accuracy goes up much more slowly at each epoch than withpadding='pre'
. So it might be more complicated than it seems, but still seems to be a padding issue.Still it might be easily solved by allowing to choose whether to pre- or post-pad in the
TextVectorization
layer, similarly to thekeras.utils.pad_sequences
function parameterpadding
.I had the same results on two different machines, on CPU and GPU (Geforce GTX 1650 6go), with tensorflow and JAX backends, on Keras 3.8.0, and python 3.10 and 3.12.
Here is the modified example from "Text classification from scratch" with an LSTM layer instead of the Conv1D and GlobalMaxPooling1D layers :
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: