Skip to content

Implementation of a text summarization model using transformer architecture with self-attention mechanism from scratch.

License

Notifications You must be signed in to change notification settings

Shreyash-Gaur/Transformer_Summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Transformer Summarizer

Introduction

The Transformer Summarizer project implements a robust text summarization model based on the Transformer architecture. This model leverages both encoder and decoder components to generate concise and accurate summaries from input text. By utilizing advanced attention mechanisms, the model captures the context and meaning of the input, making it suitable for various natural language processing tasks.

Features

  • Transformer Architecture: Utilizes both encoder and decoder for efficient text summarization.
  • Positional Encoding: Ensures the model understands the order of input sequences.
  • Masking Techniques: Implements padding and look-ahead masks to improve model focus.
  • Self-Attention Mechanism: Enhances the model's ability to weigh the importance of different parts of the input.
  • Multi-Head Attention: Allows the model to focus on different parts of the input simultaneously.
  • Comprehensive Preprocessing: Handles text normalization, tokenization, and special tokens for start and end of sentences.

Table of Contents

  1. Introduction
  2. Importing the Dataset
  3. Preprocessing the Data
  4. Positional Encoding
  5. Masking
  6. Self-Attention
  7. Encoder
  8. Decoder
  9. Transformer
  10. Model Initialization
  11. Training the Model
  12. Summarization
  13. Contributing
  14. License

Importing the Dataset

The dataset consists of dialogue and summary pairs stored in JSON format. The loading function in utils.py handles the import of these datasets. This function reads the JSON files and prepares them for preprocessing.

Preprocessing the Data

Preprocessing involves several steps to prepare the text data for model training:

  • Lowercasing: Converts all text to lowercase to maintain consistency.
  • Removing Newlines and Extra Spaces: Cleans the text data for better processing.
  • Adding Special Tokens: Appends start-of-sentence ([SOS]) and end-of-sentence ([EOS]) tokens to the text sequences to help the model identify the boundaries of the input and output.

Positional Encoding

Positional encoding is a crucial component in the transformer model, allowing it to capture the order of input sequences. This is achieved by adding sinusoidal positional encodings to the input embeddings, ensuring the model retains information about the position of each word in the sequence.

Masking

Masking techniques are implemented to help the model focus on relevant parts of the input:

  • Padding Mask: Masks the padding tokens in the input sequences to prevent the model from considering them in its computations.
  • Look-Ahead Mask: Masks future tokens in a sequence during training, ensuring the model only considers past and present tokens for predicting the next word.

Self-Attention

Self-attention enables the model to weigh the importance of different words in the input sequence. By calculating attention scores, the model can focus on the most relevant parts of the input while generating the output.

Encoder

The encoder consists of multiple layers, each comprising:

  • Multi-Head Attention: Allows the model to attend to different parts of the input simultaneously, enhancing its understanding of the context.
  • Feed-Forward Neural Network: Applies a series of linear transformations and activation functions to process the input.
  • Layer Normalization and Dropout: Improve model generalization and prevent overfitting.

Decoder

The decoder mirrors the structure of the encoder but includes additional mechanisms to handle the output generation:

  • Masked Multi-Head Attention: Ensures the model only attends to previous tokens in the sequence.
  • Multi-Head Attention with Encoder Outputs: Allows the decoder to attend to the encoder's output, incorporating context from the entire input sequence.

Transformer

The transformer model integrates the encoder and decoder components, enabling it to efficiently process input sequences and generate coherent summaries. The model leverages advanced attention mechanisms to capture the intricacies of the text.

Model Initialization

The model is initialized with specific hyperparameters, including the number of layers, attention heads, and hidden units. These parameters are fine-tuned to optimize the model's performance.

Training the Model

The training process involves feeding the preprocessed dataset into the model, allowing it to learn the mapping between dialogues and their corresponding summaries. The training loop includes:

  • Loss Calculation: Measures the discrepancy between the predicted and actual summaries.
  • Backpropagation and Optimization: Adjusts the model's weights to minimize the loss, improving its summarization capabilities.

Summarization

After training, the model can generate summaries for new input texts. The summarization process involves encoding the input sequence, decoding it to produce the output sequence, and applying post-processing to generate the final summary.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any changes or enhancements.

License

This project is licensed under the MIT License.


About

Implementation of a text summarization model using transformer architecture with self-attention mechanism from scratch.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published