Intel® Intrinsics Guide

Release Notes

3.6.9 Release

07/12/2024
  • Refined parameter type '__tile' into 'constexpr int'.
  • Refined the description of _may_i_use_cpu_feature_str to illustrate supported string literals.
  • Corrected the CPUID of the SHA512, SM3, and SM4 intrinsics.

3.6.8 Release

04/12/2024
  • Added intrinsics for USER_MSR.
  • Corrected description and operation of _mm_*_fixupimm_*_s*.
  • Corrected Description of *_nearbyint_p[h|s], *_logb_ph, *_floor_ph, *_round_ph, *_trunc_ph, and *_ceil_ph.
  • Added more supported flags in _may_i_use_cpu_feature_ext.

3.6.7 Release

12/07/2023
  • Removed const from first argument of _pconfig_u32.
  • Corrected related description of size_t argument.
  • Added 4 intrinsics for AVX512F: _mm(256)_rsqrt14_p*.
  • Removed extended gather/scatter intrinsics.
  • Removed more KNC intrinsics.
  • Added 2 general support intrinsics: _may_i_use_cpu_feature_ext and _may_i_use_cpu_feature_str.
  • Added intrinsics for AVX_VNNI_INT16.
  • Added intrinsics for SHA512.
  • Added intrinsics for SM3.
  • Added intrinsics for SM4.
  • Aligned stream series intrinsics use void * as 512 variants.

3.6.6 Release

05/10/2023
  • Added intrinsic for PRFCHW.
  • Added explanations for prefetch hint.
  • Added intrinsics for AMX-COMPLEX.
  • Moved F16C into AVX_Family.
  • Removed all KNC intrinsics.
  • Corrected description of *_range_* series intrinsics.
  • Corrected CPUID of FP16C to F16C.
  • Corrected typo "heirarchy".

3.6.5 Release

01/24/2023
  • Added 64 reduce_*_ep* series intrinsics.
  • Corrected parameter order of _mm_test_all_zeros and _mm_test_mix_ones_zeros.
  • Add new throughput and latency data for Sapphire Rapids.

3.6.4 Release

12/14/2022
  • Added SVML intrinsics for AVX512_FP16.
  • Added intrinsics for:
    • AMX-FP16
    • AVX_IFMA
    • AVX_NE_CONVERT
    • AVX_VNNI_INT8
    • CMPCCXADD
    • PREFETCHI
    • RAO_INT
  • Added 9 new AMX intrinsics: __tile_*.
  • Refined description of _mm_cvtsbh_ss and *_cvtpbh_ps.
  • Fixed Instruction of _mm_mask3_fmadd_(round)_sch.
  • Corrected _mm256_cvtps_ph/_mm_cvtps_ph's parameter name and description.
  • Extended _mm512_(mask/z)_cvt_roundps_ph, _mm512_(mask/z)_cvtps_ph's rounding and SAE control.
  • Corrected _tile_load/storeconfig's operation.
  • Corrected stride type of AMX intrinsics.

3.6.3 Release

08/10/2022
  • Removed legacy throughput and latency data for Knights Landing, Ivy Bridge, Haswell, and Broadwell.
  • Added new throughput and latency data for Icelake Intel Core, Icelake Xeon, and Alderlake.
  • Updated the header information for CPUID FP16C from emmintrin.h to immintrin.h.
  • Corrected the operation example for _mm_ucomineq_sh and _mm_comineq_sh.

3.6.2 Release

04/22/2022
  • Added 1 intrinsic for BMI: _tzcnt_u16.
  • Added a new tag <supported> to mark intrinsics' compiler support.
  • Added helper enum for ternary intrinsics.
  • Changed CPUID of _mm_(loadu/storeu)_si(16/64) from SSE to SSE2.
  • Added new CPUID CRC32 for _mm_crc32_u(8/16/32/64) intrinsics.
  • Corrected formatting error for _mm_div_ps and _mm_div_ss performance latency data (Ivy Bridge).

Release Notes

3.6.1 Release

12/06/2021
  • Corrected operation description for _mm_encodekey256_u32: Change handle size of encodekey256 from 384 to 512.
  • Added seven intrinsics for AVX512_FP16: _mm_mask3_fcmadd_(round_)sch, _mm_mask3_fmadd_(round_)sch, _mm(256, 512)_set1_pch.
  • Fixed part of KNCNI intrinsic CPUID flag.
  • Corrected description for _mm256_hsub_ps.
  • Corrected categories for *_shuffle_epi8.
  • Improved description of AVX512_FP16 *_min/max_* intrinsics.
  • Improved operation description of AVX512_FP16 *_comi_* intrinsics.
  • Added 11 intrinsics for AVX512_BF16: _mm(256/512)_(mask/maskz)_cvtpbh_ps, _mm_cvtsbh_ss, _mm_cvtness_sbh.
  • Added 36 intrinsics alias for AVX512_FP16: *_mul_*ch alias for *_f*mul_*ch.

Release Notes

3.6.0 Release

06/30/2021
  • Added intrinsics for AVX512_FP16, ENQCMD, AVX_VNNI, HRESET, KEYLOCKER, KEYLOCKER_WIDE and UINTR.
  • Added two intrinsics for AVX512F: _mm512_(mask_)i32loscatter_epi64.
  • Improved description for WAITPKG intrinsics.
  • Improved operation for *_reduce_* intrinsics.
  • Improved description for *_min/max_* float intrinsics.
  • Improved operation for *_comi_* intrinsics.
  • Updated XML format: removed <type> tag; reordered the tags and attributes.
  • Shared CPUID intrinsics now will be listed as two separate items.
  • Updated notices and disclaimers.
  • Update introduction with links to supporting documentation.
  • Updated fonts to use web-safe font stack.

3.5.4 Release

10/19/2020
  • Fixed a recurring typo ("Seqeuence" --> "Sequence").

3.5.3 Release

06/30/2020
  • Added intrinsics for AMX-TILE, AMX-BF16, and AMX-INT8.

3.5.2 Release

06/05/2020
  • Added intrinsics for SERIALIZE and TSXLDTRK.
  • Improved indication of intrinsics that are sequences of instructions.
  • Improved presentation of long lines in operations.

3.5.1 Release

05/22/2020
  • Updated latency and throughput data for Skylake.
  • Resolved an issue preventing intrinsics from loading.

3.5.0 Release

03/16/2020
  • Added latency and throughput data for 10th gen Intel® Core™ processor family (Icelake).
  • Updated XML format: added a <return> tag for the return type and removed the rettype attribute; added several attributes to <return> and <parameter>; added XED iform attribute to <instruction>.
  • Added instruction forms for many AVX512 intrinsics.
  • Clarified embedded-rounding vs suppress-all-exceptions distinction for many intrinsics.
  • Clarified masking operands in instruction forms.
  • Corrected operation for all AVX512_VNNI and GFNI intrinsics.
  • Clarified immediate bit-range access in many operations.
  • Corrected destination zeroing for many store intrinsics.
  • Clarified operator precedence for many intrinsic operations.
  • Clarified indication of signed integers in many descriptions.
  • Corrected scalar upper pass through variable for many intrinsics.
  • Clarified memory accesses for many intrinsics.
  • Added CPUID for CET_SS intrinsics.
  • Corrected operations for many scatter and gather intrinsics.
  • Corrected descriptions and operations for SVML intrinsics involving complex numbers.
  • Corrected operation for _mm512_set_epi8.
  • Corrected description for several *_set_epi* intrinsics.
  • Corrected instruction for some *_xor_* intrinsics.
  • Corrected description and operation for *_ktestc_* and *_kortestc_* intrinsics.
  • Corrected operation for several *_permutex2var_* intrinsics.
  • Clarified imm8 bit ranges for _alignr_ intrinsics.
  • Corrected write mask pass through for some broadcast_i* and broadcast_f* intrinsics.
  • Corrected imm8 bit range for some shuffle_i* intrinsics.
  • Corrected operation for _kor_* intrinsics.
  • Corrected operations for unmasked _movedup_ intrinsics.
  • Corrected operations for *_set4_* and *_setr4_* intrinsics.
  • Clarified imm8 bit range for _kshiftri_mask16.
  • Corrected parameter types for *_4fmadd_ps intrinsics.
  • Corrected operation for _mm256_mpsadbw_epu8.
  • Corrected operations for *_maskz_mulhrs_epi16 intrinsics.
  • Corrected operations for *_hadds_* intrinsics.
  • Corrected saturation function for *_cvtusepi* intrinsics.
  • Corrected shift direction for some *_srai_* intrinsics.
  • Corrected operation for _mm256_mask_srlv_epi16.
  • Corrected operation for _mm512_reduce_max_epi64.
  • Corrected operations for _mm512_mask_permute4f128_* intrinsics.
  • Corrected operations for _mm_tzcnti_* intrinsics.
  • Corrected operations for several *_cvtt_* intrinsics.
  • Corrected operations for several scalar conversion intrinsics.
  • Corrected operations for several *_exp2a23_* intrinsics.
  • Corrected operations for several intrinsics with single-precision reciprocals.
  • Corrected operations for *_fmadd_ps intrinsics.
  • Corrected operation for _mm_cvttss_si64.
  • Corrected comparison operator for several cmpgt and cmplt intrinsics.
  • Corrected operations for several right rotation intrinsics.
  • Corrected operation for _mm_ceil_pd.
  • Corrected operation for _mm512_mask_floor_ps.
  • Corrected operations for several loadu intrinsics.
  • Corrected src parameter for _mask_exp2a23_ intrinsics.

3.4.7 Release

07/15/2019
  • Corrected operations for add/sub and sub/add intrinsics.
  • Corrected operations for GFNI affine intrinsics.
  • Corrected operations for dpbf16 intrinsics.

3.4.6 Release

07/15/2019
  • Added intrinsics for AVX512_VP2INTERSECT.

3.4.5 Release

05/30/2019
  • Added additional latency & throughput data for AVX-512 instructions.
  • Improvements to operations for most intrinsics.
  • Corrections to operations for many intrinsics, including AVX512_BITALG and AVX512_VNNI.

3.4.4 Release

04/17/2019
  • Added BF16 intrinsics.
  • Corrected operations for *_reduce_ps/pd intrinsics.
  • Corrected operations for non-mask sllv_epi16 intrinsics.

3.4.3 Release

04/08/2019
  • Improved operation syntax for many intrinsics.
  • Corrected parameters for several cmp intrinsics.
  • Corrected details of _mm512[_mask]_log2_ps.
  • Corrected operations for: _mm_cvtpd_ps, _m_pmaddwd, *_madd_epi16, *_maskz_sub_epi8, *_maskz_sub_epi16.
  • Corrected instructions for _mm_comi_round_ss.
  • Corrected description and operation for _lrot*,.
  • Corrected instructions and descriptions and operation _tpause and _umwait.

3.4.2 Release

10/05/2018
  • Added intrinsics for AVX512_BITALG, AVX512_VBMI2, AVX512_VNNI, GFNI, VAES, VPCLMULQDQ, and others.
  • Corrected CPUID and description for load/storebe intrinsics.
  • Corrected operation for _mm_mask_4fmadd_ss.
  • Corrected description for _mm_srli_epi32.
  • Corrected description for _mm_alignr_pi8.
  • Corrected description and operation for _xtest.

3.4.1 Release

04/26/2018
  • Added ptwrite and encl* intrinsics.
  • Corrected description and operation for blend intrinsics with an immediate control.
  • Corrected CPUID for _mm512_4fmadd_ps.
  • Corrected description for masked compare intrinsics.
  • Clarified that includes should use <header.h> rather than "header.h".
  • Corrected operations for _mm512_kunpack* intrinsics.
  • Further clarified up and down conversion for intrinsics supported on both KNC and AVX-512.
  • Corrected return type for 4dpwssd* intrinsics.
  • Corrected description and operation for scalar mask3 intrinsics.

3.4 Release

09/07/2017
  • Added intrinsics for Knights Mill (KNM).
  • Corrected references to "_MM_CMPINT_NE".
  • Corrected instruction for _mm512_stream_si512.
  • Corrected CPUID for _mm256_extract_epi8, _mm256_extract_epi16, _mm[256,512]_cvt[ss,sd,si256,si512].
  • Improved description of _xbegin.
  • Clarified up and down conversion for intrinsics supported on both KNC and AVX-512.
  • Corrected description of _mm_countbits_64.

3.3.16 Release

01/26/2017
  • Added additional latency & throughput data up through 6th generation Intel® Core™ processor family (Skylake) and 2nd Generation Intel® Xeon Phi™ (Knights Landing).
  • Added documentation on 36 intrinsics.
  • Corrected operations for vsqrtss intrinsics.
  • Corrected movd CPUID bits.

3.3.15 Release

09/16/2016
  • Corrected operation for _mm_mpsadbw_epu8.
  • Corrected latency and throughput data for FP16C intrinsics.
  • Corrected operations for *_madd_epi16 intrinsics.
  • Corrected operations for _subborrow_u* intrinsics.
  • Corrected instruction forms for *_i32gather_* and *_i64gather_* intrinsics.
  • Corrected operations *_i64gather_* intrinsics.
  • Corrected operation for _mm256_shuffle_epi8.
  • Corrected operations for _mm512_shuffle_epi8, _mm512_mask_shuffle_epi8, _mm512_maskz_shuffle_epi8, _mm256_mask_shuffle_epi8, _mm256_maskz_shuffle_epi8.
  • Corrected descriptions and operations for *_sqrt_sd and *_sqrt_round_sd intrinsics.
  • Corrected operations for _mm512_mask2int intrinsics.
  • Corrected CPUIDs for _mm_undefined_* intrinsics.

3.3.14 Release

01/12/2016
  • Corrected operations for *_cmpunord_* intrinsics.

3.3.13 Release

12/02/2015
  • Clarified descriptions and operations for all intrinsics utilizing a bitwise AND NOT.
  • Corrected description and operation for _mm512_fmadd233_ps and _mm512_mask_fmadd233_ps.
  • Added AVX-512 versions of _mm512_mask2int and _mm512_int2mask.

3.3.12 Release

09/28/2015
  • Corrected operations for multi-shift intrinsics.
  • Corrected description and operation of _mm_test_all_ones.
  • Corrected description for _mm512_sad_epu8.
  • Corrected operation for _mm256_mpsadbw_epu8.

3.3.11 Release

07/27/2015
  • Corrected operations for fmsubadd intrinsics.

3.3.10 Release

06/24/2015
  • Corrected ADC, ADCX, and SBB intrinsics.

3.3.9 Release

06/15/2015
  • Corrected operations for variable shift intrinsics.

3.3.8 Release

05/26/2015
  • Corrected headers for all AVX-512 intrinsics.

3.3.7 Release

02/20/2015
  • Corrected operations for _mm_sha1rnds4_epu32, _mm_sha1nexte_epu32, and _mm_sha1msg2_epu32.

3.3.6 Release

02/02/2015
  • Corrected operations for floating-point hsub intrinsics.

3.3.5 Release

01/29/2015
  • Added missing _BitScan*, _bittest*, _mm256_extract_epi*, _mm256_insert_epi*, and _mm_clflushopt intrinsics.

3.3.4 Release

01/13/2015
  • Corrected parameter names, descriptions, and operations for intrinsics with immediate parameters, to clarify they are always 8-bits.
  • Corrected description and operation for all pmaddwd intrinsics.

3.3.3 Release

12/18/2014
  • Corrected description for _mm_maskmoveu_si128 and _mm_maskmove_si64.
  • Corrected operation for _mm512_[mask_]extload_* intrinsics.

3.3.2 Release

12/10/2014
  • Corrected description for vpmaksmov intrinsics.
  • Corrected description for (v)pmaddwd intrinsics.

3.3.1 Release

10/17/2014
  • Corrected category for *_mask_cmp_*_mask, *_test_*_mask, and *_testn_*_maks intrinsics.
  • Corrected description and operation for *_testn_*_maks intrinsics.
  • Added description and operation for _mm_testn_epi64_mask and _mm_test_epi64_mask.
  • Corrected typos in description of several getexp, packs, div, and broadcast intrinsics.
  • Corrected instructions for *_mask_permutex2var_* and *_mask2_permutex2var_* intrinsics.
  • Corrected for CPUID's for 128-bit and 256-bit lzcnt and conflict intrinsics.
  • Corrected description for all broadcastw and fpclass intrinsics.

3.3.0 Release

09/29/2014
  • Added AVX-512IFMA52 and AVX-512VBMI intrinsics.

3.2.2 Release

09/03/2014
  • Corrected _mm512_mask_permute4f128_epi32 operation.

3.2.1 Release

07/24/2014
  • Corrected CPUIDs for _mm_broadcastmb_epi64, _mm256_broadcastmb_epi64, _mm_broadcastmw_epi32, _mm256_broadcastmw_epi32, and _mm_movm_epi8.

3.2.0 Release

07/18/2014
  • Added AVX-512VL, AVX-512BW, and AVX-512DQ intrinsics.

3.1.9 Release

06/18/2014
  • Re-classified all *_reduce_* intrinsics (except gmin and gmax) as both AVX-512 and KNC, which were previously classified as only AVX-512.

3.1.8 Release

06/17/2014
  • Corrected instructions for _mm512_mask2int, _mm512_int2mask, _mm512_cvtfxpnt_round_adjustepi32_ps, _mm512_mask_mulhi_epu32, _mm512_mask_i32loscatter_epi64, _mm512_mask_subsetb_epi32, _mm_tzcnti_32, _mm_tzcnti_64, _mm_prefetch, _mm512_extload_*, and _mm512_mask_extload_*.
  • Re-classified 58 intrinsics as just AVX-512, which were previously classified as both AVX-512 and KNC (primarily broadcast intrinsics, and those containing __m128 or __m256 types).
  • Re-classified _mm512_log2_ps and _mm512_mask_log2_ps as KNC, which were previously classified as SVML.
  • Corrected parameters for _mm512_mask_prefetch_i32gather_ps, and re-classified as both AVX-512 and KNC.
  • Corrected description of _mm_sad_epu8.
  • Added _mm512_mask_prefetch_i32extgather_ps.
  • Added prefetchwt1 form of _mm_prefetch.

3.1.7 Release

05/30/2014
  • Corrected instructions for _mm512_permute_pd, _mm512_permutevar_pd, and _mm512_permutevar_ps, which all use vpermilpx instructions rather than vpermpx.
  • Corrected descriptions for _mm*_xor_si* intrinsics.
  • Added missing _mm512_mask_prefetch_i32gather_ps intrinsic.
  • Re-classified _mm512_abs_pd, _mm512_abs_ps, _mm512_mask_abs_pd, and _mm512_mask_abs_ps intrinsics as both AVX-512 and KNC, which were previously classified as only KNC.
  • Switched to HTTPS.

3.1.6 Release

03/21/2014
  • Corrected instructions for: _mm_sub_epi16, _mm512_cvtfxpnt_round_adjustps_epi32, _mm512_extpackstorelo_epi64, _mm512_mask_extpackstorelo_epi64, _mm512_extpackstorelo_pd, _mm512_mask_extpackstorelo_pd.
  • Added xsavec, xsaves, and xrstors intrinsics.

3.1.5 Release

03/18/2014
  • Re-classified 342 intrinsics as both AVX-512 and KNC, which were previously classified as only AVX-512.
  • Added 18 missing KNC intrinsics, which previously existed in AVX-512 but have different intrinsic signatures or instructions encodings for KNC.
  • Corrected fmadd233 intrinsic descriptions and operations.
  • Corrected CPUID for VPTESTN* intrinsics to AVX512F.

3.1.4 Release

02/12/2014
  • Corrected "hint" parameter description for prefetch scatter/gather intrinsics.
  • Corrected _mm512_prefetch_i32extgather_ps, _mm512_mask_prefetch_i32extscatter_ps, and _mm512_prefetch_i32extscatter_ps intrinsic descriptions and operations.

3.1.3 Release

02/06/2014
  • Added AVX512F version of kortestw intrinsics.

3.1.2 Release

01/28/2014
  • Updated throughput on Haswell for vblendpd/ps, vblendvp/ps, and vmulpd/ps.
  • Added performance data for _mm_alignr_epi8.
  • Resolved issues in Internet Explorer 8 and 9.
  • Added intro message.

3.1.1 Release

12/19/2013
  • Fixed VEX-equivalent warning message.

3.1 Release

12/18/2013
  • Added Knights Corner (KNC) intrinsics.
  • Added 512-bit SVML intrinsics.
  • Added instruction parameters for many intrinsics, including all AVX-512 intrinsics.
  • Added instruction latencies for many intrinsics.
  • Updated AVX-512 CPUID names.
  • Added feature flags for AVX-512(F,ER,PF,CD), SHA, and MPX.
  • Corrected descriptions, operations, parameters, and instructions for many intrinsics.

3.0.1 Release

07/23/2013
  • Corrected operations for set and setr intrinsics.

3.0 Release

07/17/2013
  • Added intrinsics for Intel® AVX-512, Intel® MPX, RDSEED, and ADX.
  • Added additional latency & throughput data up through 4th generation Intel® Core™ processor family.
  • Added 148 missing intrinsics, and corrected information for 96 intrinsics.

2.8.1 Release

05/13/2013
  • Fixed description for _mm_move_ss.
  • Fixed parameters for _mm_max_epu32.
  • Replaced references of __int datatype with int.

2.8 Release

02/06/2013
  • Updated descriptions and operations for all intrinsics.
  • Added additional latency & throughput data up through 3rd generation Intel® Core™ processor family.

2.7 Release

11/28/2012
  • Added intrinsics for SVML, BMI1, BMI2, FXSR, INVPCID, LZCNT, POPCNT, RDRAND, RDTSCP, RTM, TSC, XSAVE, XSAVEOPT.
  • Added header and CPUID feature flag information for each intrinsic.