Unobservable Communication over Fully Untrusted Infrastructure
Pung is one of a series of papers that attempts to alleviate the traditionally high cost of computational private information retrieval (PIR) techniques. As explored in the Riposte paper, metadata is often times a vertical for performing effective exploits even when encryption is employed, motivating metadata free communication without trusted parties. Like the Riposte authors, the Pung authors also discuss how Onion Routing and Mix Networks may meet this objective but have low tolerance for compromises or the number of incorrect servers. PIR crypto has been a proven technique for making such security primitives possible. The primary goal of Pung is to make these systems more computationally affordable and efficient. The main contribution is that through bucketing and batch coding techniques, users can receive multiple messages with sublinear costs most of the time. What’s more is that even if all infrastructure is compromised, Pung can provably hide metadata. Pung also supports point to point and group communication, while being able to process a 100K+ messages with just 4 servers. It also scales linearly with the number of servers.
Pung outlines its approach to hiding metadata by looking at the specific information related to a client-server request that should be hidden, including the participants of a conversation, the message’s size, the time of sending + delivering, and communication frequency. Each of these values are obfuscated in different ways. One of the first techniques is severing the association between put and get requests handling the same key value pair. To solve this, while the put request remains the same, the get request sends the encoded key, rather than the raw value itself, and receives the encrypted message back. They can no longer be related because they do not share anything that is distinguishable from one another. Servers can also answer queries obliviously with PIR techniques that hide the access pattern through cryptographic operations over every entry. Last but not least, rather than handle single requests, batching multiple queries into a single request allows the authors to take advantage of amortizing the cost by splitting messages into buckets of size k. This means, after a certain number of queries and buckets, partitioning flattens out the originally linear relationship.
While the benchmarks and evaluation do demonstrate improvements, Pung still relies on multiple rounds of retrieval per epoch, which is the same as most private messaging schemas. This is inefficient in the sense that even if the participant doesn’t need to send or receive a message, they still need to participate, a practice that probably wouldn’t sit well with current systems. For instance, for mobile clients who experience intermittent connectivity and incur high costs with LTE and cellular connections, the requirements for participation would be both costly and hard to maintain. In addition, while batching is helpful, there are still high network costs associated with it. In addition, users must know a shared secret, and delivering it safely to multiple users could be difficult. DOS attacks also still seem potent.
I think two challenges to making Pung work could be interesting research directions. One would be what was mentioned above, which is distributing a shared secret among multiple users in the Pung context. Another would be devising an efficient dialing protocol. I think for such an investment, the evaluation for pung could’ve been a bit more elaborate. It’s compared against previous systems discussed in past research conferences, but it could be interesting to see how it would perform in a production setting with real data. A service like a public bulletin board might be a good application for benchmarking Pung’s performance.