Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#28 Docs #25

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 21 additions & 17 deletions README_FASTTEXT.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,36 +54,40 @@ class Program

### Custom models

#### Download the Pretrained Models
The default model for this package is [quantized](https://fasttext.cc/docs/en/language-identification.html#:~:text=size%20of%20126MB%20%3B-,lid.176.ftz,-%2C%20which%20is%20the) `lid.176.ftz` (see below).
You can use the default model by calling the `LoadDefaultModel()` method.

Depending on your needs, download one of the pretrained language identification (LID) models provided by Facebook:
We recommend using the following models, but you can use any model depending on your needs.
It could even be a model for another text classinfiction tasks, e.g:
[supervised models](https://fasttext.cc/docs/en/supervised-tutorial.html)

- For the LID model with 176 languages:
```sh
curl --location -o /models/fasttext176.bin https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin
```
| Model | Vendor | Languages | Label format | Learn more | Download |
| :---------- | :------------------- | :-------: | :----------- | :--------- | :------: |
| **lid.176** | Meta Platforms, Inc. | 176 | `__label__en` `__label__uk` `__label__hi` | [fasttext.cc](https://fasttext.cc/docs/en/language-identification.html) | [lid.176.bin](https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin) |
| **lid218e** | Meta Platforms, Inc. | 217 | `__label__eng_Latn` `__label__ukr_Cyrl` `__label__hin_Deva` | [@facebook/fasttext-language-identification](https://huggingface.co/facebook/fasttext-language-identification) | [model.bin](https://huggingface.co/facebook/fasttext-language-identification/resolve/main/model.bin?download=true) |
| **GlotLID** | CIS, LMU Munich | 2155(?) | `__label__eng_Latn` `__label__ukr_Cyrl` `__label__hin_Deva` | [@cis-lmu/glotlid](https://huggingface.co/cis-lmu/glotlid) | [model_v3.bin](https://huggingface.co/cis-lmu/glotlid/resolve/main/model_v3.bin?download=true) |

- For the LID model with 217 languages:
```sh
curl --location -o /models/fasttext217.bin https://huggingface.co/facebook/fasttext-language-identification/resolve/main/model.bin?download=true
```
#### Use custom model in codes

Learn more about these models here:
- [176 languages](https://fasttext.cc/docs/en/language-identification.html)
- [217 languages + script](https://huggingface.co/facebook/fasttext-language-identification)
**You can use the model included in this NuGet package:**
```
using var fastText = new FastTextDetector();
fastText.LoadDefaultModel();
```

#### Use custom model in code
**You can specify the path to the model file:**
```
using var fastText = new FastTextDetector();

var modelPath = "/models/fasttext176.bin";
var modelPath = "/path/to/model/fasttext176.bin";
fastText.LoadModel(modelPath);
```
OR

**Also you can also load the model as a memory stream:**
```
using var fastText = new FastTextDetector();

var modelPath = "/models/fasttext176.bin";
var modelPath = "/path/to/model/fasttext176.bin";
using var stream = File.Open(modelPath, FileMode.Open);
fastText.LoadModel(stream);
```
Expand Down
12 changes: 11 additions & 1 deletion README_MEDIAPIPE.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,23 @@ Learn more about this model here:
- [Google AI Edge](https://ai.google.dev/edge/mediapipe/solutions/text/language_detector)

#### Use custom model in code

**You can use the model included in this NuGet package:**
```
using var mediaPipe = new MediaPipeDetector(
options: MediaPipeOptions.FromDefault()
);
```

**You can specify the path to the model file:**
```
var modelPath = "/models/mediapipe_language_detector.tflite";
using var mediaPipe = new MediaPipeDetector(
options: MediaPipeOptions.FromFile(modelPath)
);
```
OR

**Also you can also load the model as a memory stream:**
```
var modelPath = "/models/mediapipe_language_detector.tflite";
using var stream = File.Open(modelPath, FileMode.Open);
Expand Down
16 changes: 15 additions & 1 deletion src/LanguageIdentification.CLD2/CLD2Detector.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,23 @@
namespace Panlingo.LanguageIdentification.CLD2
{
/// <summary>
/// .NET wrapper for CLD2
/// <para>Example:</para>
/// <code>
/// using var cld2 = new CLD2Detector();
/// var predictions = cld2.PredictLanguage("Привіт, як справи?");
/// </code>
///
/// <para>The using-operator is required to correctly remove unmanaged resources from memory after use.</para>
/// </summary>
public class CLD2Detector : IDisposable
{
private readonly Lazy<ImmutableHashSet<string>> _labels;

/// <summary>
/// <para>Creates an instance for <see cref="CLD2Detector"/>.</para>
/// <inheritdoc cref="CLD2Detector"/>
/// </summary>
/// <exception cref="NotSupportedException"></exception>
public CLD2Detector()
{
if (!IsSupported())
Expand Down Expand Up @@ -44,6 +55,9 @@ public CLD2Detector()
);
}

/// <summary>
/// Checks the suitability of the current platform for use. Key criteria are the operating system and processor architecture
/// </summary>
public static bool IsSupported()
{
return RuntimeInformation.OSArchitecture switch
Expand Down
16 changes: 15 additions & 1 deletion src/LanguageIdentification.CLD3/CLD3Detector.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,13 @@
namespace Panlingo.LanguageIdentification.CLD3
{
/// <summary>
/// .NET wrapper for CLD3
/// <para>Example:</para>
/// <code>
/// using var cld3 = new CLD3Detector(minNumBytes: 0, maxNumBytes: 512);
/// var prediction = cld3.PredictLanguage("Привіт, як справи?");
/// </code>
///
/// <para>The using-operator is required to correctly remove unmanaged resources from memory after use.</para>
/// </summary>
public class CLD3Detector : IDisposable
{
Expand All @@ -17,6 +23,11 @@ public class CLD3Detector : IDisposable
private IntPtr _detector;
private bool _disposed = false;

/// <summary>
/// <para>Creates an instance for <see cref="CLD3Detector"/>.</para>
/// <inheritdoc cref="CLD3Detector"/>
/// </summary>
/// <exception cref="NotSupportedException"></exception>
public CLD3Detector(int minNumBytes, int maxNumBytes)
{
if (!IsSupported())
Expand Down Expand Up @@ -51,6 +62,9 @@ public CLD3Detector(int minNumBytes, int maxNumBytes)
);
}

/// <summary>
/// Checks the suitability of the current platform for use. Key criteria are the operating system and processor architecture
/// </summary>
public static bool IsSupported()
{
return RuntimeInformation.OSArchitecture switch
Expand Down
25 changes: 22 additions & 3 deletions src/LanguageIdentification.FastText/FastTextDetector.cs
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,30 @@
namespace Panlingo.LanguageIdentification.FastText
{
/// <summary>
/// .NET wrapper for FastText
/// <para>Example:</para>
/// <code>
/// using var fastText = new FastTextDetector();
/// fastText.LoadDefaultModel();
///
/// var predictions = fastText.Predict(
/// text: "Привіт, як справи?",
/// count: 10
/// );
/// </code>
///
/// <para>The using-operator is required to correctly remove unmanaged resources from memory after use.</para>
/// </summary>
public class FastTextDetector : IDisposable
{
private IntPtr _detector;
private readonly SemaphoreSlim _semaphore;
private bool _disposed = false;

/// <summary>
/// <para>Creates an instance for <see cref="FastTextDetector"/>.</para>
/// <inheritdoc cref="FastTextDetector"/>
/// </summary>
/// <exception cref="NotSupportedException"></exception>
public FastTextDetector()
{
if (!IsSupported())
Expand All @@ -30,6 +46,9 @@ public FastTextDetector()
_semaphore = new SemaphoreSlim(1, 1);
}

/// <summary>
/// Checks the suitability of the current platform for use. Key criteria are the operating system and processor architecture
/// </summary>
public static bool IsSupported()
{
return RuntimeInformation.OSArchitecture switch
Expand All @@ -45,7 +64,7 @@ Architecture.Arm64 when RuntimeInformation.IsOSPlatform(OSPlatform.OSX) => true,
public string ModelPath { get; private set; } = string.Empty;

/// <summary>
/// Loads model file located on path
/// Loads model file located on path. Supports *.bin or *.ftz file formats.
/// </summary>
/// <param name="path">Path to *.bin or *.ftz model file</param>
public void LoadModel(string path)
Expand All @@ -68,7 +87,7 @@ public void LoadModel(string path)
}

/// <summary>
/// Loads model file from binary stream
/// Loads model file from binary stream. Supports *.bin or *.ftz file formats.
/// </summary>
/// <param name="stream">Stream of *.bin or *.ftz model file</param>
public void LoadModel(Stream stream)
Expand Down
11 changes: 10 additions & 1 deletion src/LanguageIdentification.Lingua/LinguaDetector.cs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
namespace Panlingo.LanguageIdentification.Lingua
{
/// <summary>
/// .NET wrapper for Lingua
/// <inheritdoc cref="LinguaDetectorBuilder"/>
/// </summary>
public class LinguaDetector : IDisposable
{
Expand Down Expand Up @@ -42,6 +42,9 @@
);
}

/// <summary>
/// Checks the suitability of the current platform for use. Key criteria are the operating system and processor architecture
/// </summary>
public static bool IsSupported()
{
return RuntimeInformation.OSArchitecture switch
Expand Down Expand Up @@ -151,6 +154,12 @@
}
}

/// <summary>
/// Converts <see cref="LinguaLanguage"/> to ISO 639-1 or ISO 639-3 string.
/// </summary>
/// <param name="language"></param>
/// <returns>Language code according to ISO 639-1 or ISO 639-3</returns>
/// <exception cref="WhatlangDetectorException"></exception>

Check warning on line 162 in src/LanguageIdentification.Lingua/LinguaDetector.cs

View workflow job for this annotation

GitHub Actions / 🚀 Pack Lingua

XML comment has cref attribute 'WhatlangDetectorException' that could not be resolved

Check warning on line 162 in src/LanguageIdentification.Lingua/LinguaDetector.cs

View workflow job for this annotation

GitHub Actions / 🚀 Pack Lingua

XML comment has cref attribute 'WhatlangDetectorException' that could not be resolved

Check warning on line 162 in src/LanguageIdentification.Lingua/LinguaDetector.cs

View workflow job for this annotation

GitHub Actions / 🚀 Pack Lingua

XML comment has cref attribute 'WhatlangDetectorException' that could not be resolved

Check warning on line 162 in src/LanguageIdentification.Lingua/LinguaDetector.cs

View workflow job for this annotation

GitHub Actions / 🚀 Pack Lingua

XML comment has cref attribute 'WhatlangDetectorException' that could not be resolved

Check warning on line 162 in src/LanguageIdentification.Lingua/LinguaDetector.cs

View workflow job for this annotation

GitHub Actions / 🚀 Pack Lingua

XML comment has cref attribute 'WhatlangDetectorException' that could not be resolved
public string GetLanguageCode(LinguaLanguage language, LinguaLanguageCode code)
{
CheckDisposed();
Expand Down
18 changes: 17 additions & 1 deletion src/LanguageIdentification.Lingua/LinguaDetectorBuilder.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,30 @@
namespace Panlingo.LanguageIdentification.Lingua
{
/// <summary>
/// .NET wrapper for Lingua
/// <para>Example:</para>
/// <code>
/// using var linguaBuilder = new LinguaDetectorBuilder(Enum.GetValues&lt;LinguaLanguage&gt;())
/// .WithPreloadedLanguageModels() // optional
/// .WithMinimumRelativeDistance(0.95) // optional
/// .WithLowAccuracyMode(); // optional
///
/// using var lingua = linguaBuilder.Build();
/// var predictions = lingua.PredictLanguages("Привіт, як справи?");
/// </code>
///
/// <para>The using-operator is required to correctly remove unmanaged resources from memory after use.</para>
/// </summary>
public class LinguaDetectorBuilder : IDisposable
{
private readonly LinguaLanguage[] _languages;
private IntPtr _builder;
private bool _disposed = false;

/// <summary>
/// <para>Creates an instance for <see cref="LinguaDetectorBuilder"/>.</para>
/// <inheritdoc cref="LinguaDetectorBuilder"/>
/// </summary>
/// <exception cref="NotSupportedException"></exception>
public LinguaDetectorBuilder(LinguaLanguage[] languages)
{
if (!LinguaDetector.IsSupported())
Expand Down
20 changes: 19 additions & 1 deletion src/LanguageIdentification.MediaPipe/MediaPipeDetector.cs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,16 @@
namespace Panlingo.LanguageIdentification.MediaPipe
{
/// <summary>
/// .NET wrapper for MediaPipe
/// <para>Example:</para>
/// <code>
/// using var mediaPipe = new MediaPipeDetector(
/// options: MediaPipeOptions.FromDefault()
/// );
///
/// var predictions = mediaPipe.PredictLanguages("Привіт, як справи?");
/// </code>
///
/// <para>The using-operator is required to correctly remove unmanaged resources from memory after use.</para>
/// </summary>
public class MediaPipeDetector : IDisposable
{
Expand All @@ -22,6 +31,11 @@ public class MediaPipeDetector : IDisposable

private const string LABEL_FILE_NAME = "labels.txt";

/// <summary>
/// <para>Creates an instance for <see cref="MediaPipeDetector"/>.</para>
/// <inheritdoc cref="MediaPipeDetector"/>
/// </summary>
/// <exception cref="NotSupportedException"></exception>
public MediaPipeDetector(MediaPipeOptions options)
{
if (!IsSupported())
Expand Down Expand Up @@ -66,6 +80,7 @@ public MediaPipeDetector(MediaPipeOptions options)
throw new InvalidOperationException("Model data not specified");
}

// The *.tflite is actually a zip archive. We need to read ‘labels.txt’ inside to get a list of labels.
_labels = new Lazy<ImmutableHashSet<string>>(
() =>
{
Expand Down Expand Up @@ -128,6 +143,9 @@ public MediaPipeDetector(MediaPipeOptions options)
}
}

/// <summary>
/// Checks the suitability of the current platform for use. Key criteria are the operating system and processor architecture
/// </summary>
public static bool IsSupported()
{
return RuntimeInformation.OSArchitecture switch
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,30 +92,30 @@ pub enum WhatlangLanguage {
#[repr(u8)]
#[derive(Debug, Copy, Clone)]
pub enum WhatlangScript {
Arabic = 0,
Armenian = 1,
Bengali = 2,
Cyrillic = 3,
Devanagari = 4,
Ethiopic = 5,
Georgian = 6,
Greek = 7,
Gujarati = 8,
Gurmukhi = 9,
Hangul = 10,
Hebrew = 11,
Hiragana = 12,
Kannada = 13,
Katakana = 14,
Khmer = 15,
Latin = 16,
Malayalam = 17,
Mandarin = 18,
Myanmar = 19,
Oriya = 20,
Sinhala = 21,
Tamil = 22,
Telugu = 23,
Arab = 0,
Armn = 1,
Beng = 2,
Cyrl = 3,
Deva = 4,
Ethi = 5,
Geor = 6,
Grek = 7,
Gujr = 8,
Guru = 9,
Hang = 10,
Hebr = 11,
Hira = 12,
Knda = 13,
Kana = 14,
Khmr = 15,
Latn = 16,
Mlym = 17,
Mand = 18,
Mymr = 19,
Orya = 20,
Sinh = 21,
Taml = 22,
Telu = 23,
Thai = 24,
}

Expand Down
Loading