Release Notes
3.6.9 Release
07/12/2024
- Refined parameter type '__tile' into 'constexpr int'.
- Refined the description of _may_i_use_cpu_feature_str to illustrate supported string literals.
- Corrected the CPUID of the SHA512, SM3, and SM4 intrinsics.
3.6.8 Release
04/12/2024
- Added intrinsics for USER_MSR.
- Corrected description and operation of _mm_*_fixupimm_*_s*.
- Corrected Description of *_nearbyint_p[h|s], *_logb_ph, *_floor_ph, *_round_ph, *_trunc_ph, and *_ceil_ph.
- Added more supported flags in _may_i_use_cpu_feature_ext.
3.6.7 Release
12/07/2023
- Removed const from first argument of _pconfig_u32.
- Corrected related description of size_t argument.
- Added 4 intrinsics for AVX512F: _mm(256)_rsqrt14_p*.
- Removed extended gather/scatter intrinsics.
- Removed more KNC intrinsics.
- Added 2 general support intrinsics: _may_i_use_cpu_feature_ext and _may_i_use_cpu_feature_str.
- Added intrinsics for AVX_VNNI_INT16.
- Added intrinsics for SHA512.
- Added intrinsics for SM3.
- Added intrinsics for SM4.
- Aligned stream series intrinsics use void * as 512 variants.
3.6.6 Release
05/10/2023
- Added intrinsic for PRFCHW.
- Added explanations for prefetch hint.
- Added intrinsics for AMX-COMPLEX.
- Moved F16C into AVX_Family.
- Removed all KNC intrinsics.
- Corrected description of *_range_* series intrinsics.
- Corrected CPUID of FP16C to F16C.
- Corrected typo "heirarchy".
3.6.5 Release
01/24/2023
- Added 64 reduce_*_ep* series intrinsics.
- Corrected parameter order of _mm_test_all_zeros and _mm_test_mix_ones_zeros.
- Add new throughput and latency data for Sapphire Rapids.
3.6.4 Release
12/14/2022
- Added SVML intrinsics for AVX512_FP16.
- Added intrinsics for:
- AMX-FP16
- AVX_IFMA
- AVX_NE_CONVERT
- AVX_VNNI_INT8
- CMPCCXADD
- PREFETCHI
- RAO_INT
- Added 9 new AMX intrinsics: __tile_*.
- Refined description of _mm_cvtsbh_ss and *_cvtpbh_ps.
- Fixed Instruction of _mm_mask3_fmadd_(round)_sch.
- Corrected _mm256_cvtps_ph/_mm_cvtps_ph's parameter name and description.
- Extended _mm512_(mask/z)_cvt_roundps_ph, _mm512_(mask/z)_cvtps_ph's rounding and SAE control.
- Corrected _tile_load/storeconfig's operation.
- Corrected stride type of AMX intrinsics.
3.6.3 Release
08/10/2022
- Removed legacy throughput and latency data for Knights Landing, Ivy Bridge, Haswell, and Broadwell.
- Added new throughput and latency data for Icelake Intel Core, Icelake Xeon, and Alderlake.
- Updated the header information for CPUID FP16C from emmintrin.h to immintrin.h.
- Corrected the operation example for _mm_ucomineq_sh and _mm_comineq_sh.
3.6.2 Release
04/22/2022
- Added 1 intrinsic for BMI: _tzcnt_u16.
- Added a new tag <supported> to mark intrinsics' compiler support.
- Added helper enum for ternary intrinsics.
- Changed CPUID of _mm_(loadu/storeu)_si(16/64) from SSE to SSE2.
- Added new CPUID CRC32 for _mm_crc32_u(8/16/32/64) intrinsics.
- Corrected formatting error for _mm_div_ps and _mm_div_ss performance latency data (Ivy Bridge).
Release Notes
3.6.1 Release
12/06/2021
- Corrected operation description for _mm_encodekey256_u32: Change handle size of encodekey256 from 384 to 512.
- Added seven intrinsics for AVX512_FP16: _mm_mask3_fcmadd_(round_)sch, _mm_mask3_fmadd_(round_)sch, _mm(256, 512)_set1_pch.
- Fixed part of KNCNI intrinsic CPUID flag.
- Corrected description for _mm256_hsub_ps.
- Corrected categories for *_shuffle_epi8.
- Improved description of AVX512_FP16 *_min/max_* intrinsics.
- Improved operation description of AVX512_FP16 *_comi_* intrinsics.
- Added 11 intrinsics for AVX512_BF16: _mm(256/512)_(mask/maskz)_cvtpbh_ps, _mm_cvtsbh_ss, _mm_cvtness_sbh.
- Added 36 intrinsics alias for AVX512_FP16: *_mul_*ch alias for *_f*mul_*ch.
Release Notes
3.6.0 Release
06/30/2021
- Added intrinsics for AVX512_FP16, ENQCMD, AVX_VNNI, HRESET, KEYLOCKER, KEYLOCKER_WIDE and UINTR.
- Added two intrinsics for AVX512F: _mm512_(mask_)i32loscatter_epi64.
- Improved description for WAITPKG intrinsics.
- Improved operation for *_reduce_* intrinsics.
- Improved description for *_min/max_* float intrinsics.
- Improved operation for *_comi_* intrinsics.
- Updated XML format: removed <type> tag; reordered the tags and attributes.
- Shared CPUID intrinsics now will be listed as two separate items.
- Updated notices and disclaimers.
- Update introduction with links to supporting documentation.
- Updated fonts to use web-safe font stack.
3.5.4 Release
10/19/2020
- Fixed a recurring typo ("Seqeuence" --> "Sequence").
3.5.3 Release
06/30/2020
- Added intrinsics for AMX-TILE, AMX-BF16, and AMX-INT8.
3.5.2 Release
06/05/2020
- Added intrinsics for SERIALIZE and TSXLDTRK.
- Improved indication of intrinsics that are sequences of instructions.
- Improved presentation of long lines in operations.
3.5.1 Release
05/22/2020
- Updated latency and throughput data for Skylake.
- Resolved an issue preventing intrinsics from loading.
3.5.0 Release
03/16/2020
- Added latency and throughput data for 10th gen Intel® Core™ processor family (Icelake).
- Updated XML format: added a <return> tag for the return type and removed the rettype attribute; added several attributes to <return> and <parameter>; added XED iform attribute to <instruction>.
- Added instruction forms for many AVX512 intrinsics.
- Clarified embedded-rounding vs suppress-all-exceptions distinction for many intrinsics.
- Clarified masking operands in instruction forms.
- Corrected operation for all AVX512_VNNI and GFNI intrinsics.
- Clarified immediate bit-range access in many operations.
- Corrected destination zeroing for many store intrinsics.
- Clarified operator precedence for many intrinsic operations.
- Clarified indication of signed integers in many descriptions.
- Corrected scalar upper pass through variable for many intrinsics.
- Clarified memory accesses for many intrinsics.
- Added CPUID for CET_SS intrinsics.
- Corrected operations for many scatter and gather intrinsics.
- Corrected descriptions and operations for SVML intrinsics involving complex numbers.
- Corrected operation for _mm512_set_epi8.
- Corrected description for several *_set_epi* intrinsics.
- Corrected instruction for some *_xor_* intrinsics.
- Corrected description and operation for *_ktestc_* and *_kortestc_* intrinsics.
- Corrected operation for several *_permutex2var_* intrinsics.
- Clarified imm8 bit ranges for _alignr_ intrinsics.
- Corrected write mask pass through for some broadcast_i* and broadcast_f* intrinsics.
- Corrected imm8 bit range for some shuffle_i* intrinsics.
- Corrected operation for _kor_* intrinsics.
- Corrected operations for unmasked _movedup_ intrinsics.
- Corrected operations for *_set4_* and *_setr4_* intrinsics.
- Clarified imm8 bit range for _kshiftri_mask16.
- Corrected parameter types for *_4fmadd_ps intrinsics.
- Corrected operation for _mm256_mpsadbw_epu8.
- Corrected operations for *_maskz_mulhrs_epi16 intrinsics.
- Corrected operations for *_hadds_* intrinsics.
- Corrected saturation function for *_cvtusepi* intrinsics.
- Corrected shift direction for some *_srai_* intrinsics.
- Corrected operation for _mm256_mask_srlv_epi16.
- Corrected operation for _mm512_reduce_max_epi64.
- Corrected operations for _mm512_mask_permute4f128_* intrinsics.
- Corrected operations for _mm_tzcnti_* intrinsics.
- Corrected operations for several *_cvtt_* intrinsics.
- Corrected operations for several scalar conversion intrinsics.
- Corrected operations for several *_exp2a23_* intrinsics.
- Corrected operations for several intrinsics with single-precision reciprocals.
- Corrected operations for *_fmadd_ps intrinsics.
- Corrected operation for _mm_cvttss_si64.
- Corrected comparison operator for several cmpgt and cmplt intrinsics.
- Corrected operations for several right rotation intrinsics.
- Corrected operation for _mm_ceil_pd.
- Corrected operation for _mm512_mask_floor_ps.
- Corrected operations for several loadu intrinsics.
- Corrected src parameter for _mask_exp2a23_ intrinsics.
3.4.7 Release
07/15/2019
- Corrected operations for add/sub and sub/add intrinsics.
- Corrected operations for GFNI affine intrinsics.
- Corrected operations for dpbf16 intrinsics.
3.4.6 Release
07/15/2019
- Added intrinsics for AVX512_VP2INTERSECT.
3.4.5 Release
05/30/2019
- Added additional latency & throughput data for AVX-512 instructions.
- Improvements to operations for most intrinsics.
- Corrections to operations for many intrinsics, including AVX512_BITALG and AVX512_VNNI.
3.4.4 Release
04/17/2019
- Added BF16 intrinsics.
- Corrected operations for *_reduce_ps/pd intrinsics.
- Corrected operations for non-mask sllv_epi16 intrinsics.
3.4.3 Release
04/08/2019
- Improved operation syntax for many intrinsics.
- Corrected parameters for several cmp intrinsics.
- Corrected details of _mm512[_mask]_log2_ps.
- Corrected operations for: _mm_cvtpd_ps, _m_pmaddwd, *_madd_epi16, *_maskz_sub_epi8, *_maskz_sub_epi16.
- Corrected instructions for _mm_comi_round_ss.
- Corrected description and operation for _lrot*,.
- Corrected instructions and descriptions and operation _tpause and _umwait.
3.4.2 Release
10/05/2018
- Added intrinsics for AVX512_BITALG, AVX512_VBMI2, AVX512_VNNI, GFNI, VAES, VPCLMULQDQ, and others.
- Corrected CPUID and description for load/storebe intrinsics.
- Corrected operation for _mm_mask_4fmadd_ss.
- Corrected description for _mm_srli_epi32.
- Corrected description for _mm_alignr_pi8.
- Corrected description and operation for _xtest.
3.4.1 Release
04/26/2018
- Added ptwrite and encl* intrinsics.
- Corrected description and operation for blend intrinsics with an immediate control.
- Corrected CPUID for _mm512_4fmadd_ps.
- Corrected description for masked compare intrinsics.
- Clarified that includes should use <header.h> rather than "header.h".
- Corrected operations for _mm512_kunpack* intrinsics.
- Further clarified up and down conversion for intrinsics supported on both KNC and AVX-512.
- Corrected return type for 4dpwssd* intrinsics.
- Corrected description and operation for scalar mask3 intrinsics.
3.4 Release
09/07/2017
- Added intrinsics for Knights Mill (KNM).
- Corrected references to "_MM_CMPINT_NE".
- Corrected instruction for _mm512_stream_si512.
- Corrected CPUID for _mm256_extract_epi8, _mm256_extract_epi16, _mm[256,512]_cvt[ss,sd,si256,si512].
- Improved description of _xbegin.
- Clarified up and down conversion for intrinsics supported on both KNC and AVX-512.
- Corrected description of _mm_countbits_64.
3.3.16 Release
01/26/2017
- Added additional latency & throughput data up through 6th generation Intel® Core™ processor family (Skylake) and 2nd Generation Intel® Xeon Phi™ (Knights Landing).
- Added documentation on 36 intrinsics.
- Corrected operations for vsqrtss intrinsics.
- Corrected movd CPUID bits.
3.3.15 Release
09/16/2016
- Corrected operation for _mm_mpsadbw_epu8.
- Corrected latency and throughput data for FP16C intrinsics.
- Corrected operations for *_madd_epi16 intrinsics.
- Corrected operations for _subborrow_u* intrinsics.
- Corrected instruction forms for *_i32gather_* and *_i64gather_* intrinsics.
- Corrected operations *_i64gather_* intrinsics.
- Corrected operation for _mm256_shuffle_epi8.
- Corrected operations for _mm512_shuffle_epi8, _mm512_mask_shuffle_epi8, _mm512_maskz_shuffle_epi8, _mm256_mask_shuffle_epi8, _mm256_maskz_shuffle_epi8.
- Corrected descriptions and operations for *_sqrt_sd and *_sqrt_round_sd intrinsics.
- Corrected operations for _mm512_mask2int intrinsics.
- Corrected CPUIDs for _mm_undefined_* intrinsics.
3.3.14 Release
01/12/2016
- Corrected operations for *_cmpunord_* intrinsics.
3.3.13 Release
12/02/2015
- Clarified descriptions and operations for all intrinsics utilizing a bitwise AND NOT.
- Corrected description and operation for _mm512_fmadd233_ps and _mm512_mask_fmadd233_ps.
- Added AVX-512 versions of _mm512_mask2int and _mm512_int2mask.
3.3.12 Release
09/28/2015
- Corrected operations for multi-shift intrinsics.
- Corrected description and operation of _mm_test_all_ones.
- Corrected description for _mm512_sad_epu8.
- Corrected operation for _mm256_mpsadbw_epu8.
3.3.11 Release
07/27/2015
- Corrected operations for fmsubadd intrinsics.
3.3.10 Release
06/24/2015
- Corrected ADC, ADCX, and SBB intrinsics.
3.3.9 Release
06/15/2015
- Corrected operations for variable shift intrinsics.
3.3.8 Release
05/26/2015
- Corrected headers for all AVX-512 intrinsics.
3.3.7 Release
02/20/2015
- Corrected operations for _mm_sha1rnds4_epu32, _mm_sha1nexte_epu32, and _mm_sha1msg2_epu32.
3.3.6 Release
02/02/2015
- Corrected operations for floating-point hsub intrinsics.
3.3.5 Release
01/29/2015
- Added missing _BitScan*, _bittest*, _mm256_extract_epi*, _mm256_insert_epi*, and _mm_clflushopt intrinsics.
3.3.4 Release
01/13/2015
- Corrected parameter names, descriptions, and operations for intrinsics with immediate parameters, to clarify they are always 8-bits.
- Corrected description and operation for all pmaddwd intrinsics.
3.3.3 Release
12/18/2014
- Corrected description for _mm_maskmoveu_si128 and _mm_maskmove_si64.
- Corrected operation for _mm512_[mask_]extload_* intrinsics.
3.3.2 Release
12/10/2014
- Corrected description for vpmaksmov intrinsics.
- Corrected description for (v)pmaddwd intrinsics.
3.3.1 Release
10/17/2014
- Corrected category for *_mask_cmp_*_mask, *_test_*_mask, and *_testn_*_maks intrinsics.
- Corrected description and operation for *_testn_*_maks intrinsics.
- Added description and operation for _mm_testn_epi64_mask and _mm_test_epi64_mask.
- Corrected typos in description of several getexp, packs, div, and broadcast intrinsics.
- Corrected instructions for *_mask_permutex2var_* and *_mask2_permutex2var_* intrinsics.
- Corrected for CPUID's for 128-bit and 256-bit lzcnt and conflict intrinsics.
- Corrected description for all broadcastw and fpclass intrinsics.
3.3.0 Release
09/29/2014
- Added AVX-512IFMA52 and AVX-512VBMI intrinsics.
3.2.2 Release
09/03/2014
- Corrected _mm512_mask_permute4f128_epi32 operation.
3.2.1 Release
07/24/2014
- Corrected CPUIDs for _mm_broadcastmb_epi64, _mm256_broadcastmb_epi64, _mm_broadcastmw_epi32, _mm256_broadcastmw_epi32, and _mm_movm_epi8.
3.2.0 Release
07/18/2014
- Added AVX-512VL, AVX-512BW, and AVX-512DQ intrinsics.
3.1.9 Release
06/18/2014
- Re-classified all *_reduce_* intrinsics (except gmin and gmax) as both AVX-512 and KNC, which were previously classified as only AVX-512.
3.1.8 Release
06/17/2014
- Corrected instructions for _mm512_mask2int, _mm512_int2mask, _mm512_cvtfxpnt_round_adjustepi32_ps, _mm512_mask_mulhi_epu32, _mm512_mask_i32loscatter_epi64, _mm512_mask_subsetb_epi32, _mm_tzcnti_32, _mm_tzcnti_64, _mm_prefetch, _mm512_extload_*, and _mm512_mask_extload_*.
- Re-classified 58 intrinsics as just AVX-512, which were previously classified as both AVX-512 and KNC (primarily broadcast intrinsics, and those containing __m128 or __m256 types).
- Re-classified _mm512_log2_ps and _mm512_mask_log2_ps as KNC, which were previously classified as SVML.
- Corrected parameters for _mm512_mask_prefetch_i32gather_ps, and re-classified as both AVX-512 and KNC.
- Corrected description of _mm_sad_epu8.
- Added _mm512_mask_prefetch_i32extgather_ps.
- Added prefetchwt1 form of _mm_prefetch.
3.1.7 Release
05/30/2014
- Corrected instructions for _mm512_permute_pd, _mm512_permutevar_pd, and _mm512_permutevar_ps, which all use vpermilpx instructions rather than vpermpx.
- Corrected descriptions for _mm*_xor_si* intrinsics.
- Added missing _mm512_mask_prefetch_i32gather_ps intrinsic.
- Re-classified _mm512_abs_pd, _mm512_abs_ps, _mm512_mask_abs_pd, and _mm512_mask_abs_ps intrinsics as both AVX-512 and KNC, which were previously classified as only KNC.
- Switched to HTTPS.
3.1.6 Release
03/21/2014
- Corrected instructions for: _mm_sub_epi16, _mm512_cvtfxpnt_round_adjustps_epi32, _mm512_extpackstorelo_epi64, _mm512_mask_extpackstorelo_epi64, _mm512_extpackstorelo_pd, _mm512_mask_extpackstorelo_pd.
- Added xsavec, xsaves, and xrstors intrinsics.
3.1.5 Release
03/18/2014
- Re-classified 342 intrinsics as both AVX-512 and KNC, which were previously classified as only AVX-512.
- Added 18 missing KNC intrinsics, which previously existed in AVX-512 but have different intrinsic signatures or instructions encodings for KNC.
- Corrected fmadd233 intrinsic descriptions and operations.
- Corrected CPUID for VPTESTN* intrinsics to AVX512F.
3.1.4 Release
02/12/2014
- Corrected "hint" parameter description for prefetch scatter/gather intrinsics.
- Corrected _mm512_prefetch_i32extgather_ps, _mm512_mask_prefetch_i32extscatter_ps, and _mm512_prefetch_i32extscatter_ps intrinsic descriptions and operations.
3.1.3 Release
02/06/2014
- Added AVX512F version of kortestw intrinsics.
3.1.2 Release
01/28/2014
- Updated throughput on Haswell for vblendpd/ps, vblendvp/ps, and vmulpd/ps.
- Added performance data for _mm_alignr_epi8.
- Resolved issues in Internet Explorer 8 and 9.
- Added intro message.
3.1.1 Release
12/19/2013
- Fixed VEX-equivalent warning message.
3.1 Release
12/18/2013
- Added Knights Corner (KNC) intrinsics.
- Added 512-bit SVML intrinsics.
- Added instruction parameters for many intrinsics, including all AVX-512 intrinsics.
- Added instruction latencies for many intrinsics.
- Updated AVX-512 CPUID names.
- Added feature flags for AVX-512(F,ER,PF,CD), SHA, and MPX.
- Corrected descriptions, operations, parameters, and instructions for many intrinsics.
3.0.1 Release
07/23/2013
- Corrected operations for set and setr intrinsics.
3.0 Release
07/17/2013
- Added intrinsics for Intel® AVX-512, Intel® MPX, RDSEED, and ADX.
- Added additional latency & throughput data up through 4th generation Intel® Core™ processor family.
- Added 148 missing intrinsics, and corrected information for 96 intrinsics.
2.8.1 Release
05/13/2013
- Fixed description for _mm_move_ss.
- Fixed parameters for _mm_max_epu32.
- Replaced references of __int datatype with int.
2.8 Release
02/06/2013
- Updated descriptions and operations for all intrinsics.
- Added additional latency & throughput data up through 3rd generation Intel® Core™ processor family.
2.7 Release
11/28/2012
- Added intrinsics for SVML, BMI1, BMI2, FXSR, INVPCID, LZCNT, POPCNT, RDRAND, RDTSCP, RTM, TSC, XSAVE, XSAVEOPT.
- Added header and CPUID feature flag information for each intrinsic.