The current implementation can select the SSE support level during
compiletime only.
This commit adds functionality to automatically detect and switch
the SSE support level and automatically switch the Implementation
if the CPU does not support the required SSE level.
Change-Id: Iba74f8a6e4e921ff31e4bd9f0c7c881fe547423a
Similar to the existing Intel SSE cases, add support for NEON vector
floating point SIMD processing. In this case, use ARM assembly
directly as the NEON intrinsics do not generate preferential code
output.
Currently support NEON vectorized convolution and floating point
integer conversions.
Signed-off-by: Thomas Tsou <tom@tsou.cc>