AMD's newest security vulnerability.

Need Stronger Linear Hash functions?

AMD's Processors from 2011 through 2019 have recently been exploited like intel CPU's to be vulnerable to L1D cache attack vulnerablities. So, whats new?

This attack reads data memory via a 'side channel' without needing any kernel type access or shared memory. Shared memory/cache attacks use what are termed "Flush+Reload", "Flush+Flush", and "Evict+Reload". These are based on using the 'cflush' instruction to flush shared data or accessing congruent memory address. These are applicable to all CPU's if shared memory is avaiable accros all platforms, even in virtual environments. Normally this is disabled.

A "Prime+Probe" attack is used against the CPU using 'cflush' and the cache invalidations are propagated throught all physical processors on the same system. It uses 'timing' differences between the remote processors and DRAM(Dynamic Ramdom Access Memory).

I take it that these 'timing' attacks are using the refressh cycles of the DRAM as sort of like heartbeats to trace memory locations. DRAM is based on capacitors that leak and must be continually charged to hold the values of 1's in its matrix.

Unlike DRAM, SRAM(Static Random Access Memory) does not need to be continually refreshed as it is based on logic latched gates. Its faster than DRAM. SRAM however needs more space on chip and it is more expensive.

But, refreshing takes time, it slows down a system and makes it easier to exploit. Timing is related to the user having access to the 'rdtsc' instruction which gives unpriviledged access to registers returning CPU clock cycles. The sames cycles used to refresh DRAM. Newer AMD CPU's do not give access to these functions outside of kernel mode. So a technique of 'counting threads', which is constantly incrementing a shared variable. This was used to get a higher resolution better than nanoseconds using JavaScript and Chrome V8e and Firefox.

All this resulted in the reverse engineering of AMD's 'Way Prediction' alogarithm, essentially a linear hash function used to look up hashed cache addresses in a cache table. This allowed them to follow a so called 'uTag', a virtual address referencing a real cache bucket.

All linear hash fuctions use 'binary', not ternary or quaternary etc., either 0 or 1. AMD used an 8 bit hasing resolution scheme drawing between all 0's and all 1's equaling 256 different combinations.

The fiirst bit in all binary numbering is a '1'. If a system has only two real cache buckets, then the hashing index will always have a trailing '1' or '0' in the virtual address pointing to the actual bucket. But, the alogarithm can decide when to increase bucket size when needed and this grows a linear hash by n buckets. A common way to do this is by "if the average occupancy per bucket is > than some threshold, then increase bucket size". This splits one bucket into two. These are called overflow buckets. The function needs to have the inception bucket number (n - 1). It will also need the number of bits used to represent a bucket index i, which must be in binary and which is a log of n, the first bit in the binary number is always a '1'. Now any number x written as i bits in binary will always have the begining first bit as '1'. Since there are only n buckets is use, any hashed key number > (n - 1) will lead to non-existent buckets. These non-existent buckets will also have a begining '1'. When you change the '1' to a '0', you identify a real bucket. because the last bucket starts with a '1'. Use all this with 'average occupancy', the number of buckets, the number of search keys and the address block size, address space Layout randomization(ASLR)/kernel address space Layout randomization(KASLR) will typically make tracing/mapping easier.

A new 'side channel' attack on the 'uTag' virtual address is called "Collide+Probe". This supposedly exploits the timing of 'uTag' collisions in AMD's L1D way predictor upon the cache tables. If the address is accessed and the way predictor entry is updated, a subsequent access to a different address with the sames 'uTag' is made, a collision is made and data has to be loaded from L2D cache. This increased timing difference is used to exploit the system.

Just like Intel's unfixable CPU flaw. These schemes in some form or another may be used on all exisitng CPU's because of their inherent design. Cost factors push us to use less secure methods of storing and accessing information, and random number generation, entropy within systems are not as good as can be, and global variables are accessible to unpriviledged users. Besides, most systems in use today are based on binary logic. Everything can be hacked, and binary probably the easiest. But if it truly takes a lot of time to actually crack something, is it worth it? Or rather, will it be of any value afer it's cracked? So the best method of securing data is still 'Time' and 'Randomness', which i do not think anything like quantum computers will relinquish anytime soon, also a OTP(One Time Pad) is still 99.9999999999999999......% uncrackable without the key!

For a more thorough explanation see Daniel Gruss's Take A Way.


Peace be unto you. Thank you for visiting!