Skip to content

Commit

Permalink
Pre-release PR (#44)
Browse files Browse the repository at this point in the history
* fix: add twitter and some optimizations for runs of asciis

* add twitter result to README

* added sse and avx results

* simplified the algorithm

* updating numbers

* tweaking

* preparing release

* integrating AVX-512

---------

Co-authored-by: Daniel Lemire <[email protected]>
  • Loading branch information
lemire and Daniel Lemire committed Jun 20, 2024
1 parent 92da59a commit 441c72c
Show file tree
Hide file tree
Showing 6 changed files with 15,726 additions and 145 deletions.
55 changes: 41 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,8 @@ To run just one benchmark, use a filter:

```
cd benchmark
dotnet run --configuration Release --filter "*Arabic-Lipsum*"
dotnet run --configuration Release --filter "*Twitter*"
dotnet run --configuration Release --filter "*Lipsum*"
```

If you are under macOS or Linux, you may want to run the benchmarks in privileged mode:
Expand All @@ -98,26 +99,52 @@ cd benchmark
sudo dotnet run -c Release
```


--anyCategories sse avx avx512
## Results (x64)

To be completed.
On an Intel Ice Lake system, our validation function is up to 13 times
faster than the standard library.
A realistic input is Twitter.json which is mostly ASCII with some Unicode content
where we are 2.4 times faster.

| data set | SimdUnicode current AVX2 (GB/s) | .NET speed (GB/s) | speed up |
|:----------------|:------------------------|:-------------------|:-------------------|
| Twitter.json | 29 | 12 | 2.4 x |
| Arabic-Lipsum | 12 | 2.3 | 5.2 x |
| Chinese-Lipsum | 12 | 3.9 | 3.0 x |
| Emoji-Lipsum | 12 | 0.9 | 13 x |
| Hebrew-Lipsum |12 | 2.3 | 5.2 x |
| Hindi-Lipsum | 12 | 2.1 | 5.7 x |
| Japanese-Lipsum | 10  | 3.5 | 2.9 x |
| Korean-Lipsum | 10 | 1.3 | 7.7 x |
| Latin-Lipsum | 76 | 76 | --- |
| Russian-Lipsum | 12 | 1.2 | 10 x |



On x64 system, we offer several functions: a fallback function for legacy systems,
a SSE42 function for older CPUs, an AVX2 function for current x64 systems and
an AVX-512 function for the most recent processors (AMD Zen 4 or better, Intel
Ice Lake, etc.).

## Results (ARM)

On an Apple M2 system, our validation function is two to three times
On an Apple M2 system, our validation function is 1.5 to four times
faster than the standard library.

| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) |
|:----------------|:-----------|:--------------------------|
| Arabic-Lipsum | 6.7 | 3.5 |
| Chinese-Lipsum | 6.7 | 4.8 |
| Emoji-Lipsum | 6.7 | 2.5 |
| Hebrew-Lipsum | 6.7 | 3.5 |
| Hindi-Lipsum | 6.8 | 3.0 |
| Japanese-Lipsum | 6.8 | 4.6  |
| Korean-Lipsum | 6.6 | 1.8 |
| Latin-Lipsum | 87 | 38 |
| Russian-Lipsum | 6.7 | 2.6 |
| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) | speed up |
|:----------------|:-----------|:--------------------------|:-------------------|
| Twitter.json | 25 | 14 | 1.8 x |
| Arabic-Lipsum | 7.4 | 3.5 | 2.1 x |
| Chinese-Lipsum | 7.4 | 4.8 | 1.5 x |
| Emoji-Lipsum | 7.4 | 2.5 | 3.0 x |
| Hebrew-Lipsum | 7.4 | 3.5 | 2.1 x |
| Hindi-Lipsum | 7.3 | 3.0 | 2.4 x |
| Japanese-Lipsum | 7.3 | 4.6  | 1.6 x |
| Korean-Lipsum | 7.4 | 1.8 | 4.1 x |
| Latin-Lipsum | 87 | 38 | 2.3 x |
| Russian-Lipsum | 7.4 | 2.7 | 2.7 x |


## Building the library
Expand Down
49 changes: 30 additions & 19 deletions benchmark/Benchmark.cs
Original file line number Diff line number Diff line change
Expand Up @@ -62,58 +62,70 @@ public string GetValue(Summary summary, BenchmarkCase benchmarkCase)
[Config(typeof(Config))]
public class RealDataBenchmark
{
// We only informs the user once about the SIMD support of the system.
private static bool printed;
#pragma warning disable CA1812
private sealed class Config : ManualConfig
{
public Config()
{
AddColumn(new Speed());


if (RuntimeInformation.ProcessArchitecture == Architecture.Arm64)
{
if (!printed)
{
#pragma warning disable CA1303
Console.WriteLine("ARM64 system detected.");
AddFilter(new AnyCategoriesFilter(["arm64", "scalar", "runtime"]));

Console.WriteLine("ARM64 system detected.");
printed = true;
}
}
else if (RuntimeInformation.ProcessArchitecture == Architecture.X64)
{
if (Vector512.IsHardwareAccelerated && System.Runtime.Intrinsics.X86.Avx512Vbmi.IsSupported)
{
if (!printed)
{
#pragma warning disable CA1303
Console.WriteLine("X64 system detected (Intel, AMD,...) with AVX-512 support.");
AddFilter(new AnyCategoriesFilter(["avx512", "avx", "sse", "scalar", "runtime"]));
Console.WriteLine("X64 system detected (Intel, AMD,...) with AVX-512 support.");
printed = true;
}
}
else if (Avx2.IsSupported)
{
if (!printed)
{
#pragma warning disable CA1303
Console.WriteLine("X64 system detected (Intel, AMD,...) with AVX2 support.");
AddFilter(new AnyCategoriesFilter(["avx", "sse", "scalar", "runtime"]));
Console.WriteLine("X64 system detected (Intel, AMD,...) with AVX2 support.");
printed = true;
}
}
else if (Ssse3.IsSupported)
{
if (!printed)
{
#pragma warning disable CA1303
Console.WriteLine("X64 system detected (Intel, AMD,...) with Sse4.2 support.");
AddFilter(new AnyCategoriesFilter(["sse", "scalar", "runtime"]));
Console.WriteLine("X64 system detected (Intel, AMD,...) with Sse4.2 support.");
printed = true;
}
}
else
{
if (!printed)
{
#pragma warning disable CA1303
Console.WriteLine("X64 system detected (Intel, AMD,...) without relevant SIMD support.");
AddFilter(new AnyCategoriesFilter(["scalar", "runtime"]));
Console.WriteLine("X64 system detected (Intel, AMD,...) without relevant SIMD support.");
printed = true;
}
}
}
else
{
AddFilter(new AnyCategoriesFilter(["scalar", "runtime"]));

}
AddFilter(new AnyCategoriesFilter(["default"]));

}
}
// Parameters and variables for real data
[Params(@"data/Arabic-Lipsum.utf8.txt",
[Params(@"data/twitter.json",
@"data/Arabic-Lipsum.utf8.txt",
@"data/Hebrew-Lipsum.utf8.txt",
@"data/Korean-Lipsum.utf8.txt",
@"data/Chinese-Lipsum.utf8.txt",
Expand Down Expand Up @@ -285,7 +297,6 @@ public unsafe void SIMDUtf8ValidationRealDataSse()
});
}
}

}
public class Program
{
Expand Down
3 changes: 3 additions & 0 deletions benchmark/benchmark.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
<None Update="data\*.utf8.txt">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="data\twitter.json">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
</ItemGroup>


Expand Down
Loading

0 comments on commit 441c72c

Please sign in to comment.