Skip to content

Releases: ermig1979/Simd

Simd v5.0.116

15 Aug 15:47
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Yuva444pToBgraV2.
  • Function SimdEmpty.
  • Checking of no man's land watermarks in function SimdFree.
Improving
  • AVX-512BW optimizations of AMX tile emulation.
  • AMX optimizations of class SynetConvolution32fBf16Nhwc.
  • AMX optimizations of class SynetMergedConvolution32fBf16Cdc.
  • AMX optimizations of class SynetMergedConvolution32fBf16Cd.
Bug fixing
  • GCC linker error when SIMD_AMX_EMULATE macro is switched on.
  • Error in SSE4.1, AVX2, AVX-512BW, AMX optimizations of class SynetConvolution32fBf16Nhwc.
  • Wrong assert in SSE4.1 and AVX-512BW optimizations of class ResizerNearest.
  • Error in AVX optimizations of class SynetMergedConvolution32fCdc.
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cdc.
  • External buffer reading overflow in class SynetMergedConvolution32fBf16Cdc.
  • External buffer reading overflow in class SynetMergedConvolution32fBf16Cd.
  • External buffer reading overflow in class SynetMergedConvolution32fBf16Dc.
  • FP32 overflow in SSE2, AVX2, AVX-512BW, NEON optimizations of function Tanh.
  • Error in function Base::SynetConvolution32fGemmNN::ImgToCol.
  • Error in SSE4.1, AVX2 optimizations of class ResizerByteArea2x2.
  • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerNearest.
  • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteBilinear.
  • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteBicubic.
  • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteArea1x1.
  • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteArea2x2.
Replacing
  • Replace SSE2 optimizations to SSE4.1 for function SvmSumLinear.
  • Replace SSE2 optimizations to SSE4.1 for function AbsDifference.
  • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSum.
  • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSumMasked.
  • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSums3x3.
  • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSums3x3Masked.
  • Replace SSE2 optimizations to SSE4.1 for function AbsGradientSaturatedSum.
  • Replace SSE2 optimizations to SSE4.1 for function AddFeatureDifference.
  • Replace SSE2 optimizations to SSE4.1 for function AlphaBlending.
  • Replace SSE2 optimizations to SSE4.1 for function AlphaBlendingUniform.
  • Replace SSE2 optimizations to SSE4.1 for function AlphaFilling.
  • Replace SSE2 optimizations to SSE4.1 for function AlphaPremultiply.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundGrowRangeSlow.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundGrowRangeFast.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundIncrementCount.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundAdjustRange.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundAdjustRangeMasked.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundShiftRange.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundShiftRangeMasked.
  • Replace SSE2 optimizations to SSE4.1 for function BackgroundInitMask.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundGrowRangeSlow.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundGrowRangeFast.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundIncrementCount.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundAdjustRange.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundAdjustRangeMasked.
  • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundShiftRangeMasked.
  • Replace SSE2 optimizations to SSE4.1 for function BayerToBgra.
  • Replace SSE2 optimizations to SSE4.1 for function BgraToGray.
  • Replace SSE2 optimizations to SSE4.1 for function BgraToYuv420p.
  • Replace SSE2 optimizations to SSE4.1 for function BgraToYuv422p.
  • Replace SSE2 optimizations to SSE4.1 for function BgraToYuv444p.
  • Replace SSE2 optimizations to SSE4.1 for function BgraToYuva420p.
  • Replace SSE2 optimizations to SSE4.1 for function BgrToGray.
  • Replace SSE2 optimizations to SSE4.1 for function RgbaToGray.
  • Replace SSE2 optimizations to SSE4.1 for function Bgr48pToBgra32.
  • Replace SSE2 optimizations to SSE4.1 for function Binarization.
  • Replace SSE2 optimizations to SSE4.1 for function AveragingBinarization.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalCount8u.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalCount16i.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalSum.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalSquareSum.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalSquareGradientSum.
  • Replace SSE2 optimizations to SSE4.1 for function ConditionalFill.
  • Replace SSE2 optimizations to SSE4.1 for function DeinterleaveUv.
  • Replace SSE2 optimizations to SSE4.1 for function Fill32f.
  • Replace SSE2 optimizations to SSE4.1 for function FillBgr.
  • Replace SSE2 optimizations to SSE4.1 for function FillBgra.
  • Replace SSE2 optimizations to SSE4.1 for function FillPixel.
  • Replace SSE2 optimizations to SSE4.1 for function CosineDistance32f.
  • Replace SSE2 optimizations to SSE4.1 for function Float32ToUint8.
  • Replace SSE2 optimizations to SSE4.1 for function Uint8ToFloat32.
  • Replace SSE2 optimizations to SSE4.1 for function GaussianBlur3x3.
  • Replace SSE2 optimizations to SSE4.1 for function GrayToBgra.
  • Replace SSE2 optimizations to SSE4.1 for function AbsSecondDerivativeHistogram.
  • Replace SSE2 optimizations to SSE4.1 for function HistogramMasked.
  • Replace SSE2 optimizations to SSE4.1 for function HistogramConditional.
  • Replace SSE2 optimizations to SSE4.1 for function HogDirectionHistograms.
  • Replace SSE2 optimizations to SSE4.1 for function HogDeinterleave.
  • Replace SSE2 optimizations to SSE4.1 for function HogFilterSeparable.
  • Replace SSE2 optimizations to SSE4.1 for function Int16ToGray.
  • Replace SSE2 optimizations to SSE4.1 for function InterferenceIncrement.
  • Replace SSE2 optimizations to SSE4.1 for function InterferenceIncrementMasked.
  • Replace SSE2 optimizations to SSE4.1 for function InterferenceDecrement.
  • Replace SSE2 optimizations to SSE4.1 for function InterferenceDecrementMasked.
  • Replace SSE2 optimizations to SSE4.1 for function InterleaveUv.
  • Replace SSE2 optimizations to SSE4.1 for function Laplace.
  • Replace SSE2 optimizations to SSE4.1 for function LbpEstimate.
  • Replace SSE2 optimizations to SSE4.1 for function MeanFilter3x3.
  • Replace SSE2 optimizations to SSE4.1 for function MedianFilterRhomb3x3.
  • Replace SSE2 optimizations to SSE4.1 for function MedianFilterRhomb5x5.
  • Replace SSE2 optimizations to SSE4.1 for function MedianFilterSquare3x3.
  • Replace SSE2 optimizations to SSE4.1 for function MedianFilterSquare5x5.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution2x2Forward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution3x3Forward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution4x4Forward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution5x5Forward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution2x2Backward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution3x3Backward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution4x4Backward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution5x5Backward.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution2x2Sum.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution3x3Sum.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution4x4Sum.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution5x5Sum.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAdaptiveGradientUpdate.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddVectorMultipliedByValue.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddVector.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralAddValue.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralConvert.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralDerivativeRelu.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralDerivativeSigmoid.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralDerivativeTanh.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralPooling1x1Max3x3.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralPooling2x2Max2x2.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralPooling2x2Max3x3.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralPow.
  • Replace SSE2 optimizations to SSE4.1 for function NeuralProdu...
Read more

Simd v5.0.115

01 Jul 16:22
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cdc.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cd.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Dc.
  • AVX-512BF16 extension support.
  • AVX-512BF16 optimizations of function Float32ToBFloat16.
  • AVX-512BF16, AMX optimizations of class SynetConvolution32fBf16Nhwc.
  • AMX extension support.
  • Support of 3D pooling in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetPoolingMax32f.
Improving
  • AVX-512BW optimizations of function Fill32f.
Renaming
  • Rename function SynetPoolingForwardAverage to SynetPoolingAverage.
  • Rename function SynetPoolingForwardMax32f to SynetPoolingMax32f.
  • Rename function SynetPoolingForwardMax8u to SynetPoolingMax8u.
Replacing
  • Replace AVX-512F optimizations to AVX-512BW for function SvmSumLinear.
  • Replace AVX-512F optimizations to AVX-512BW for function Fill32f.
  • Replace AVX-512F optimizations to AVX-512BW for class ResizerNearest.
  • Replace AVX-512F optimizations to AVX-512BW for class ResizerFloatBilinear.
  • Replace AVX-512F optimizations to AVX-512BW for function SquaredDifferenceSum32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SquaredDifferenceKahanSum32f.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralConvolutionForward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Forward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Backward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Sum.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Forward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Backward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Sum.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Forward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Backward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Sum.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Forward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Backward.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Sum.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralProductSum.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAdaptiveGradientUpdate.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling1x1Max3x3.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling2x2Max2x2.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling2x2Max3x3.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralUpdateWeights.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddValue.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddVector.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddVectorMultipliedByValue.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughSigmoid.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughSigmoid2.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeSigmoid.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughTanh.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeTanh.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeRelu.
  • Replace AVX-512F optimizations to AVX-512BW for function NeuralPow.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fGemmNN.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fGemmNT.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fWinograd.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetDeconvolution32fGemmNN.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetDeconvolution32fNhwcDirect2x2.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetDeconvolution32fInit.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetInnerProduct32fGemm.
  • Replace AVX-512F optimizations to AVX-512BW for class SynetInnerProduct32fProd.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetInnerProduct32fInit.
  • Replace AVX-512F optimizations to AVX-512BW for function ConvolutionBiasAndActivation.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetReorderImage.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetReorderFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function Gemm32fNN.
  • Replace AVX-512F optimizations to AVX-512BW for function Gemm32fNT.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward0.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward1.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward2.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward3.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward4.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward8.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward9.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetFilter.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetInput.
  • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetOutput.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetElu32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetHardSigmoid32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetHswish32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetMish32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetPreluLayerForward.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetRelu32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetRestrictRange32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetSigmoid32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetSoftplus32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetSwish32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetTanh32f.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetScaleLayerForward.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetPoolingAverage.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetAddBias.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetEltwiseLayerForward.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetInnerProductLayerForward.
  • Replace AVX-512F optimizations to AVX-512BW for function SynetLrnLayerCrossChannels.
Read more

Simd v4.10.114

01 Jun 08:21
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToUyvy422.
  • AVX-512BW, NEON optimizations of function Uyvy422ToYuv420p.
  • AVX-512BW, NEON optimizations of function Uyvy422ToBgr.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Float32ToBFloat16.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function BFloat16ToFloat32.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function BFloat16ToFloat32.
  • Base implementation of class SynetConvolution32fBf16Gemm.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetConvolution32fBf16Nhwc.
  • Base implementation of class SynetMergedConvolution32fBf16.
Removing
  • Remove external GEMM function parameter from function SynetConvolution32fInit.
  • Remove external GEMM function parameter from function SynetDeconvolution32fInit.

Test framework

New features
  • Tests for verifying functionality of function Yuv420pToUyvy422.
  • Tests for verifying functionality of function Float32ToBFloat16.
  • Tests for verifying functionality of function BFloat16ToFloat32.

Infrastructure

New features
  • Project files for Microsoft Visual Studio 2022.

Simd v4.9.113

04 May 11:56
Compare
Choose a tag to compare

Algorithms

New features
  • SSE4.1, AVX2, AVX-512BW optimizations of class ResizerByteArea2x2.
Improving
  • Base implementation of class ResizerByteArea1x1.
Bug fixing
  • Error in Base implementation of class ResizerByteArea2x2.
  • Error in AVX optimizations of class SynetConvolution32fDirectNchw.
Removing
  • SimdSynetCompatibilityFloatZero flag.

Infrastructure

New features
  • Git commit ID info in function SimdVersion.
  • Git branch name in function SimdVersion.

Simd v4.9.112

01 Apr 09:29
Compare
Choose a tag to compare

Algorithms

New features
  • NEON optimizations of function Base64Encode.
  • NEON optimizations of ImageJpegSaver class.
  • NEON optimizations of function Yuv420pSaveAsJpegToMemory.
  • NEON optimizations of function Nv12SaveAsJpegToMemory.
  • Owner method in View structure.
  • Owner method in Frame structure.
  • Capture method in View structure.
  • Capture method in Frame structure.
  • Base implementation of class ResizerByteAreaReduced2x2.
Bug fixing
  • MSVS compiler error in AVX-512BW optimizations of function Yuv420pToBgraV2.
  • Error in AVX2 optimizations of function BgraToRgb.
  • Error (aligned reading of unaligned memory) in SSE4.1, AVX2, AVX-512BW optimizations of function InterleaveBgra.
  • Error in function View::ToOcv.
  • Error in View copy constructor (from OpenCV Mat).

Test framework

Bug fixing
  • Wrong default ROOT_PATH for Linux.
  • Error in test SynetConvert32fTo8uAutoTest.
  • Special test ResizeYuv420pSpecialTest.

Simd v4.9.111

03 Mar 12:58
Compare
Choose a tag to compare

Algorithms

New features
  • AVX2, AVX-512BW optimizations of ResizerByteBicubic class.
  • SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Base64Decode.
  • NEON optimizations of function SynetSwish32f.
  • Swish activation function to NEON optimizations of SynetConvolution32f framework.
  • Swish activation function to NEON optimizations of SynetDeconvolution32f framework.
  • Swish activation function to NEON optimizations of SynetMergedConvolution32f framework.
  • Swish activation function to NEON optimizations of SynetConvolution8i framework.
  • Swish activation function to NEON optimizations of SynetMergedConvolution8i framework.
  • NEON optimizations of function Yuv444pToBgraV2.
  • SSE2, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToBgraV2.
Improving
  • SSE4.1 optimizations of ResizerByteBicubic class.
Bug fixing
  • Compiler error in NEON optimizations of function AlphaUnpremultiply.
  • MSVS Compiler warnings in SSE4.1, AVX2, AVX-512BW optimizations of function TransformImage.

Simd v4.9.110

03 Mar 12:53
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1 optimizations of ResizerByteBicubic class.
  • Base implementation of function BgraToYuv444pV2.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Nv12SaveAsJpegToMemory.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Yuv420pSaveAsJpegToMemory.
  • Base implementation of function BgraToYuv420pV2.
Bug fixing
  • Error in SSE4.1, AVX2, AVX-512BW optimizations of function BgraToRgba.
  • Error in SSE4.1, AVX2 optimizations of function BgraToBgr.
  • Error in SSE4.1, AVX2 optimizations of function BgraToRgb.
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function AlphaUnpremultiply.

Test framework

New features
  • Tests for verifying functionality of function BgraToYuv444pV2.
  • Tests for verifying functionality of function Nv12SaveAsJpegToMemory.
  • Tests for verifying functionality of function Yuv420pSaveAsJpegToMemory.
  • Tests for verifying functionality of function BgraToYuv420pV2.

Simd v4.9.109

03 Jan 07:51
Compare
Choose a tag to compare

Algorithms

New features
  • Parameter Uyvy422ToBgr to function.
  • SSE4.1, AVX2 optimizations of function Uyvy422ToBgr.
  • Base implementation, SSE4.1, AVX2 optimizations of function Uyvy422ToYuv420p.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Base64Encode.
  • Base implementation of function Base64Decode.
Improving
  • AVX2 optimizations of class ResizerNearest for Bgr24, Uv16.
Renaming
  • Function UyvyToBgr to Uyvy422ToBgr.

Test framework

New features
  • Tests for verifying functionality of function Uyvy422ToYuv420p.
  • Tests for verifying functionality of function Base64Encode.
  • Tests for verifying functionality of function Base64Decode.

Documentation

Changes
  • Update developers list.

Simd v4.9.108

01 Dec 11:28
Compare
Choose a tag to compare

Algorithms

New features
  • SSE4.1, AVX2, AVX-512F, AVX-512BW optimizations of class ResizerNearest.
  • Add SimdResizeMethodNearestPytorch to SimdResizeMethodType enumeration.
  • Add parameter BackgroundStatUpdateTime to Motion Detector.
  • MotionDetector performance optimization (case of falling star).
  • 16-bit UYVY image format in View.
  • Base implementation of function UyvyToBgr.
  • Base implementation, SSE2, AVX2, AVX-512F optimizations of function SynetSwish32f.
  • SimdConvolutionActivationSwish item of SimdConvolutionActivationType enumeration.
  • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetConvolution32f framework.
  • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetDeconvolution32f framework.
  • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetMergedConvolution32f framework.
  • Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8i framework.
  • Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
  • SimdYuvType enumeration.
  • Base implementation, SSE2, AVX2, AVX-512BW optimizations of function Yuv444pToBgraV2.
  • Function Simd::Resize supports images with 16-bit channel size.
  • Base implementation function Yuv420pToBgraV2.
Improving
  • Refactoring of SimdResizeMethodType enumeration.
Bug fixing
  • Stack corruption in function Simd::Avx2::JpegWriteBlockSubs.

Test framework

New features
  • Tests for verifying functionality of function UyvyToBgr.
  • Tests for verifying functionality of function SynetSwish32f.
  • Tests for verifying functionality of function Yuv444pToBgraV2.
  • Tests for verifying functionality of function Yuv420pToBgraV2.

Infrastructure

Bug fixing
  • Wrong compiler options correction in Cmake.

Simd v4.9.107

01 Nov 08:33
Compare
Choose a tag to compare

Algorithms

New features
  • Internal class Holder to replace std::unique_ptr for old compilers without support of C++11 standard.
  • SimdBayerLayoutType enumeration.
  • Base implementation of class ResizerNearest.
Bug fixing
  • Compiler error when defined macro SIMD_SSE2_DISABLE.
  • Compiler error when defined macro SIMD_NEON_DISABLE.

Infrastructure

New features
  • SIMD_ROOT Cmake parameter.