John Y

Pretzel - Email encryption and provider-supplied functions are compatible

SUNDAY, JANUARY 24, 2021 •

Privacy

Systems

One of the arguments held by existing mail service providers is that there exists a fundamental three-way tradeoff between privacy, functionality, and performance. The existing argument is that because the status of security primitives and technology today, only ⅔ can be chosen. As a result, having emails sitting as plaintext in both clients and, more worryingly, the mail servers themselves, is a huge security vulnerability that, until now, was a necessary risk to deliver on a better user experience. This argument has not boded well in recent years, with an increasing number of hacks and attacks geared towards breaking into mail servers with millions of emails without any protection at all. The main contribution of this paper is to demonstrate that end to end encryption in addition to important email provider functions like spam filtering and topic extraction can coexist without the plight of extreme overheads that would otherwise make it infeasible. The authors state that their objective is in no way to boast that their solution is the new benchmark, but rather, to bring to light new systems that demonstrate the long-standing argument of performance + functionality vs. tradeoff is not as much of a stalemate as it’s been made out to be.

The main cryptographic protocol used to make end to end encryption possible alongside important provider functionality (particular focus on classification tasks) is secure two-party computation. The authors recreate a mail service, using Naive Bayes for spam filtering and topic extraction, along with logistic regression and linear SVM classifiers for a variety of other tasks. The end-to-end encryption module is responsible for performing computations over email content, while function modules are used to encapsulate the aforementioned classification tasks. The e2e module is only loaded onto the client, but the function modules include both client and provider side components. The ability to perform two party computations that don’t reveal any one party’s information to another is made possible by none other than Yao’s garbled circuits that form a cryptographic building block for guaranteeing such security primitives. However, because this 2PC algorithm is quite expensive, Yao is only used selectively for specific tasks such as decrypting and encrypting a matrix, a heavily used representation for emails in Pretzel.

The relatively conspicuous limitations of Pretzel are that, because it is a basic system meant to test out theory rather than be used immediately for production, it only features spam filtering, topic extraction, and basic keyword search. There’s a deluge of other functions that could be explored to extend Pretzel’s body of functionality, such as virus scanning. Pretzel’s high overhead of extra metadata presents two new directions. One would be reducing the amount of cryptographic information that is generated by these protocols. Another would be hiding metadata that presumably could be used for side channel attacks. Finally, the authors end on a note, discussing how Pretzel cannot achieve the ideal perfect privacy because of how users and providers must agree on algorithms, thus compromising functionality. It seems like while no leaks is a stretch, bounding them and allowing for opt outs for concerned users could be a worthwhile compromise to pursue.

I think Pretzel’s limitations serve as possible extensions. Building out the rest of the platform to incorporate more quintessential email service provider features would not only get it closer to production level, but also allow for more exploration into the kinds of cryptographic functions that are needed for different features that span across machine learning and systems applications. For instance, incorporating virus scanning could require a new set of models beyond linear classifiers (maybe NNs and Delphi?). It would be interesting to examine the effect on the balance between functionality, performance, and security that adding new features would have. I think for different services, scale matters a lot. For a private institution with their own mail servers, a variant of Pretzel could be built to provide more security at the cost of flexibility, especially if it is tolerable. However, for more public groups like Gmail or Outlook, the scale is so large that while perfect security may be out of question, refactoring existing components with a more Pretzel oriented approach of security would certainly bolster confidence in the privacy preserving capabilities of such platforms.