
The GNU C Library’s tanh and other hyperbolic functions are now as much as 14~17% faster on modern Intel and AMD CPUs with the FMA instruction support for fused multiply-add operations.
The FMA instruction set has been around for roughly the past decade with both Intel and AMD processors. Only now this week thanks to the work of Intel engineer Sunil K Pandey is there an FMA-optimized tanh function along with atanh and sinh functions.
Testing of the FMA’ed tanh on an Intel Skylake CPU is showing a max improvement around 14% while the min/mean improvement clocks in around 4% faster than the prior code. Those interested can find the FMA optimized tanh function via this Glibc commit.
The sinh optimization is 4~17% faster. The atanh optimization is around 2% faster.
Not bad for something that’s been commonplace among Intel/AMD x86_64 CPUs for years though surprising it took this long for the optimization to be cracked. In any event, Intel continues to deserve kudos for all their open-source toolchain optimizations over the years and especially when it comes to tuning the GNU C Library (glibc) for new x86_64 instruction set capabilities.
These improvements will be found in the GNU C Library 2.42 release due out around August.