avx512bf16

Support bfloat16 floating point.

CPUs: