John Y

Conclave - secure multi-party computation on big data

thursday, february 04, 2021 •

DBMS

Privacy

Security

Its [Conclave's] goal is to execute as many operators as possible outside of MPC, and to reduce data volume processed under MPC where possible, while maintaining MPC’s security guarantees.

Continuing off SMCQL from yesterday, Conclave takes on the challenge of speeding up SMCQL's performance. It achieves this primarily through first, rethinking how operations are split in the hybrid protocol between MPC and cleartext (same as plaintext) execution settings, and second, making joins and aggregations run faster through partial trust constructs, namely, STPs.

In a multi-party setting that SMCQL and Conclave deal with, the system should be able to scale with both 1. the number of participants a.k.a. data providers and 2. the size of the dataset provided by each provider. From figures 8 and 9 of the SMCQL paper, you might've noticed that as the input sizes (a.k.a. number of records in the dataset) increase, the execution runtime for secure (operations handled by MPC) increases faster than runtime for plaintext. What's more is that the tested input sizes are not very large (100, 200, 400, Full). Conclave points this out. In the motivations section, they state:

These applications [Credit card regulation, Market concentration] all compute on hundreds to thousands of records, but many useful computations on large data that might benefit from MPC are currently infeasible.

Before diving into the technical nitty gritty, it's worth noting that Conclave's threat model, security guarantees, and a couple foundational concepts differ from SMCQL's. First, the threat model considers only passive adversaries (a.k.a. eavesdroppers on network traffic and side channels, as opposed to active adversaries that take purposefully malicious actions). Conclave also introduces this concept known as STP (Selectively-trusted party), which is similar to the honest broker that SMCQL aims to be. However, instead of the SMCQL system being the honest broker itself, the STP role is selected at the discretion of the Conclave parties. Conclave's use of trust annotations preserves hints of the access-policy-per-column attribution that SMCQL provides. However, instead of marking sensitivity per column, Conclave allows parties to specify a trust set of 1+ parties for any column. This more granular form of access control allows any party in the trust set to be an STP for running functions on the annotated columns in plaintext, rather than with slower MPC.

To me, what felt slightly odd about the trust set concepts is that it seems to be underutilized by the fact that, according to the authors, only one STP can exist in a Conclave execution. In that case, even if multiple parties are specified in the trust set, Conclave only can reduce runtime via the one STP that it has running. Again, without an STP, Conclave must run the query entirely under MPC. Since the STP sounds like it's an establishment of trust, and not really a coordinator or honest broker like in SMCQL, why not have multiple STPs? (Comments on this would be super appreciated 😄).

Figure 2 is a great visualization of Conclave's revised query compilation and planning scheme. In a nutshell, Conclave starts off by converting SQL to a DAG via traditional compilation, then writing a query plan with the assumption of it being carried out in a singular, large MPC. Then, Conclave identifies whether parts of the query plan can be run outside MPC through several techniques, including... 1. Rewrite original query into equivalent query with fewer operators 2. Propagate trust annotations + STP through DAG (Section 5.1) 3. Split monolithic MPC at this point into smaller MPCs + local steps via hybrid protocol operators (Section 5.3) 4. Replace expensive operations with cheaper equivalents or moving them into local processing if possible (Section 5.4)

The upshot of this is in the evaluation. Conclave does in fact, scale much better than predecessors like Sharemind and SMCQL, pushing the number of records per party to 6-8 orders of magnitude.

Cool! Conclave demonstrates a tremendous performance gain which is awesome! I really enjoyed reading this paper, and one of the big takeaways for me as a (hopefully) incoming graduate student is that they did a great job with the evaluation, having taken the time to set up and run SMCQL and Sharemind for accurate comparison numbers. My only raised eyebrow is how Conclave's relatively weaker threat model, with just a semi-honest instead of malicious attacker, affects whether it can be used practically. Especially for collaborations on highly sensitive data, like government intelligence, I think it's reasonable to assume that there are actively malicious actors looking to compromise data confidentiality. As mentioned at the end of section 3.2...

Conclave makes no guarantees against an adversary who compromises both regular parties and the STP.

This defers some of the foundation for security guarantees to the parties and STP themselves, which hurts Conclave's security properties. When the authors say "regular parties" in the above quotation, they mean one party. By comparison, Senate allows for up to n-1 of n systems to be compromised, requiring only one honest, uncompromised party for the security guarantees to be upheld.

Another concern is that while Conclave has bolstered the number of records per party's database, their evaluation ran for only two to three parties.

In addition, the hybrid join schemes can only be carried out for non-sensitive attributes. For datasets with extremely sensitive columns, Conclave's query plan + optimizations will likely not take that much effect, regressing the performance to pure MPC frameworks like Sharemind.

With that said, this paper is a great continuation of SMCQL (I learned a lot from how the authors referenced and built upon previous work), and it's picked up quite some attention by subsequent research despite being published just a little less than two years ago.