fast-variable-crosslane-shuffle

Cross-lane shuffles with variable masks are fast.

CPUs: