Currently we find SSE3 and SSE4.1 code mixed togehter along with
generic code in one file. This introduces the risk that the
compiler exidantly mixes SSE4.1 instructions into an SSE3, or
even worse into a generic code path.
This commit splits the SSE3 and SSE4.1 code into separate files
and compiles them with the matching target options.
Change-Id: I846e190e92f1258cd412d1b2d79b539e204e04b3
The ARM and the X86 implementation of the conversion functions share
the same, non cpu specific implementation in separate files.
This commit removes the code duplication by putting the generic
implementation into a convert_base.c, similar to to convolve_base.c
Change-Id: Ic8d8534a343e27cde79ddc85be4998ebd0cb6e5c
The compiler option -march=native instructs the compiler to auto-optimize
the code for the current build architecture. This is fine for building
and using locally, but contraproductive when generating binary packages.
This commit replaces -march=native with $(SIMD_FLAGS), which contains a
collection of supported SIMD options, so we won't loose the SSE support.
Change-Id: I3df4b8db9692016115edbe2247beeec090715687
When you build from an external path, compiler can't find convert.h
include, because it was specified relative to the current directory.
Change this to specify the include dit relative to the Makefile
location.
Signed-off-by: Tom Tsou <tom.tsou@ettus.com>
Similar to the existing Intel SSE cases, add support for NEON vector
floating point SIMD processing. In this case, use ARM assembly
directly as the NEON intrinsics do not generate preferential code
output.
Currently support NEON vectorized convolution and floating point
integer conversions.
Signed-off-by: Thomas Tsou <tom@tsou.cc>
Move x86 specific files into their own directory as this
area is about to get crowded with the addition of ARM
support.
Signed-off-by: Thomas Tsou <tom@tsou.cc>