Controlled-Channel Attacks - Deterministic Side Channels for Untrusted Operating Systems

THURSDAY. JANUARY 14, 2021 •

OS

This paper addresses the issues behind the popular use case of using SGX as trusted hardware to prevent an untrusted OS from accessing user data and thus, excluding it from the TCB, yet still allowing the application to utilize the OS’s rich set of functionalities. Both Ryoan and Haven are demonstrations of using SGX as such an abstraction. However, even in shielded systems, the OS is still responsible for operations like memory management, storage, and network responsibilities. This opens up the possibility of side channel attacks from the OS due to the OS’s control over the applications. The authors have dubbed this “controlled-channel attacks”. In traditional side channels, the attacker doesn’t have direct access to the victim, but can use “noisy” side channels like cache access or network traffic to make inferences. However, the OS controls things like context switches, TLB flushes, exceptions, or page faults; from these noiseless, deterministic side channels, definitive inferences about the executed program can be made. This is an important vulnerability because it points out a glaring issue in the SGX hardware that a significant amount of previous research work has been founded on.

The main way side channel information is solicited is via input dependent memory access. For instance, an input is given to a conditional executing on a particular page. Based on the conditional evaluation of the input, it will proceed along with one of two paths (i.e., two distinct function calls). If one function is located on Page 1 while the other on Page 2, by passing in malicious inputs, the OS can infer the conditional from the different page fault sequences that would arise from executing either the function in Page 1 or the one in Page 2. In addition to conditionals, array indexing can also reveal information about data spread across multiple pages. The author uses Hunspell, a popular spell check tool with a hash table lookup, as a practical manifestation of the above ideas. In the Hunspell case study, collisions lead to page faults. Because a hash function is deterministic and is supposed to reduce collisions to avoid ambiguity, this inadvertently allows the attacker to readily distinguish page faults and identify what original inputs they correspond to.

The performance results were quite impressive, and the recovered images were particularly profound evidence that this attack is not just theoretical, but readily applicable in nature. With that being said, there is definitely a considerable overhead that comes with handling page faults, due to it being such an expensive operation. While it’s impressive that entire text inputs and images could be recovered, I’d imagine that for larger data sources, greater than the KB magnitude of data used in Hunspell and FreeType, this attack may still execute properly, but will take a much longer time to render. The original texts that were exposed were mostly confined to lexicographical characters and punctuation marks. It’d be interesting to see if a document with a more expansive set of characters used could still be recreated effectively. In short, I think this attack, while impressive, may become too costly for recovering data that is much larger in scale, or constantly changing. This attack targets a static program and data, in the sense that the document is not being constantly updated and memory is not being altered. It’d be interesting to see whether such an attack still works in a public cloud service where multiple users may be running more than one program with a variety of data sets (i.e., Haven). The authors do also mention that some of the run time may be consumed by side channel noise.

In the conclusion section, the authors discuss their thoughts regarding how to mitigate this vulnerability. While there’s no existing solution, the authors offer up two potential approaches. One involves rewriting applications to disguise access patterns. There are two approaches to this. Refactoring could be performed at the source code level. The responsibility could lie with the compiler or be refactored manually. This solution could work for future programs but would be an extremely laborious undertaking for complex legacy code that would incur a significant amount of manpower and man hours towards engineering. On the other hand, shielding systems can restrict the OS even further and cut down on the side channels that the OS could listen in on. In this particular case, restricting memory management would prevent the OS from observing page faults. This is much easier said than done, since further restrictions on the OS may corrupt or invalidate certain system calls that programs could depend on. In short, this may sacrifice functionality for privacy, which is not a desirable tradeoff.