Constants Count - Practical Improvements to Oblivious RAM

SATURDAY. JANUARY 16, 2021 •

One of the specific ways side channel attacks can be carried out is through reverse engineering data through memory access patterns. Access patterns can reveal a lot of information about encrypted files or private user data. Simply put, encryption itself is not enough to protect users because of how the deterministic ciphertext can lead to discernment of behaviors like data usage, full file scans, binary searches, or data access patterns at different stages. Similar to the context of SGX enclaves, a client may want to save data or execute programs on an untrusted server or storage such as Amazon S3. The client and ORAM client logic are trusted entities. The client may issue a command that results in a set of access patterns, each access detailing the operation, address, and data that are being acted upon. The goal of ORAM is that the memory access pattern that is visible to the server is an obfuscated set of addresses and ciphertext that cannot be backtraced to the original operations. The idea is that any 2 access patterns of the same length should be computationally indistinguishable. This paper is important because it provides a strong defense against a popular side channel vulnerability with many exploits and an issue on public cloud services.

The main contributions to this paper involve a new system design for returning encrypted directions. The core tenets are having a physically shielded (a.k.a. tamper-proof) CPU, an encrypted program, and a CPU equipped with the ability to fetch, decrypt, and execute instructions. What’s notable is that the RAM is not protected in this setup, and is in fact, fully controlled by the adversary. As a result, the design must hide both values stored in memory and the sequence of memory accessed. To solve the first problem, the CPU encrypts values with an IND-CPA (plaintext attack scheme). The encryption is designed such that for the same ReadBlock operation, the “v” corresponding to the read result is made indistinguishable by the WriteBlock function as long as “v” is different. The randomization of the memory access patterns is solved in a similar manner. Instead of encrypting a particular value, a permutation of the access pattern is generated and returned instead. A large part of the paper dives into the explanation and optimization of the randomization algorithm that affords it faster runtime than the naive O(n^2) solution, mainly with the help of a binary tree data structure.

Similar to the previous papers, the tradeoff of security versus overhead/performance is once again called into question. The bandwidth overhead per request is a factor of the number of blocks along with the size of each bucket, with the performance being the sum of the logs of each quantity multiplied by the other quantity. Storage scales up linearly with the number of blocks and bucket size. I would say, one weakness is the possibility of a stash overflow, which would require a ground up rebuild of the tree data structure. The authors mention that the probability is negligible if the stash is of a particular size, but the specific likelihood and cost of such an overflow isn’t mentioned in a particularly upfront or clear manner. With that being said, the authors do prove that path ORAM enforces obliviousness quite well, with a formal proof that demonstrates the probability of any single permutation is an exponential factor of the length of the sequence and the possible characters.

The three main applications of oblivious RAM at the time of the paper include cloud storage, securing processors, and making secure multi-party computation possible. Cloud storage is applicable to public cloud services like AWS or Google Cloud. The availability of oblivious RAM makes it possible for main memory to be excluded from a Trusted Computing Base but includes the CPU in the TCB. Multi-party computation is particularly interesting, as untrusting collaborators can now entertain the possibility of safely performing computations and contributing on a shared platform without the risk of revealing sensitive information about the computational process each is using.