Commit Graph

5 Commits

Author SHA1 Message Date
Vadim Yanitskiy cfb1eaacf4 core/conv/viterbi.c: fix possible NULL-pointer reference
Change-Id: I36012d4443d97470050cdf9638a9d4cf67ea3b40
2017-06-13 20:20:40 +07:00
Vadim Yanitskiy 0d49f47aeb core/conv: do not mix up AVX and SSE code
According to GCC's wiki:

If you specify command-line switches such as -msse, the compiler
could use the extended instruction sets even if the built-ins are
not used explicitly in the program. For this reason, applications
that perform run-time CPU detection must compile separate files
for each supported architecture, using the appropriate flags. In
particular, the file containing the CPU detection code should be
compiled without these options.

So, this change introduces a separate Viterbi implementation,
which is almost the same as previous one, but is being compiled
with -mavx2. This implementation will be only used by CPUs with
both SSE and AVX support:

SSE3 and AVX2: viterbi_sse_avx.c
SSE3 only: viterbi_sse.c
Generic: viterbi_generic.c

Change-Id: I042cc76258df7e4c6c90a73af3d0a6e75999b2b0
2017-05-29 14:07:25 +00:00
Tom Tsou 34e228a9bc core/conv: add x86 SSE support for Viterbi decoder
Fast convolutional decoding is provided through x86 intrinsic based
SSE operations. SSE3, found on virtually all modern x86 processors,
is the minimal requirement. SSE4.1 and AVX2 are used if available.

Also, the original code was extended with runtime SIMD detection,
so only supported extensions will be used by target CPU. It makes
the library more partable, what is very important for binary
packages distribution. Runtime SIMD detection is currently
implemented through the __builtin_cpu_supports call.

Change-Id: I1da6d71ed0564f1d684f3a836e998d09de5f0351
2017-05-24 22:04:53 +00:00
Vadim Yanitskiy e604ee39cf core/conv: strip unused memalign() call
The alligned memory allocation is only required for SSE, which
is currently unsupported. Moreover, it's better to use dedicated
_mm_malloc() and _mm_free() from xmmintrin.h instead, which are
introduced by Intel specifically for SIMD computations.

Change-Id: Ide764d1c643527323334ef14335be7f8915f7622
2017-05-07 23:10:51 +07:00
Tom Tsou 35536807ab core/conv: implement optimized Viterbi decoder
Add a separate, faster convolution decoding implementation for rates
up to N=4 and constraint lengths of K=5 and K=7, which covers the
most GSM code uses. The decoding algorithm exploits the symmetric
structure of the Viterbi add-compare-select (ACS) operation - commonly
known as the ACS butterfly. This shift-register optimization can be
found in the well-known text by Dave Forney.

Forney, G.D., "The Viterbi Algorithm," Proc. of the IEEE, March 1973.

Implementation is non-architecture specific and improves performance on
x86 as well as ARM processors. Existing API is unchanged with optimized
code being called internally for supported codes.

The original code was relicensed under GPLv2-or-later with permission
of copyright holder - Tom Tsou.

Change-Id: I74d355274b4176a7d924f91ef3c96912ce338fb2
2017-04-11 00:36:08 +00:00