Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Availability of OPRF? #26

Open
matthiasgeihs opened this issue Oct 10, 2024 · 21 comments
Open

Availability of OPRF? #26

matthiasgeihs opened this issue Oct 10, 2024 · 21 comments

Comments

@matthiasgeihs
Copy link

OPRFs can be useful primitives in privacy-preserving protocols.

Does this library contain any OPRF implementations? If not, are there plans to include some in the future?

In particular, I'm looking for a verifiable threshold OPRF with committed inputs that can be composed with other primitives. For example, I'd like to be able to prove that for a given value pid, I posses a credential for a value id (i.e., a signature sig = SIGN(id) from a credential issuer) and that OPRF(id) = pid, without revealing id.

@lovesh
Copy link
Member

lovesh commented Oct 11, 2024

Hi,

Does this library contain any OPRF implementations? If not, are there plans to include some in the future?

No.

I'd like to be able to prove that for a given value pid, I posses a credential for a value id (i.e., a signature sig = SIGN(id) from a credential issuer) and that OPRF(id) = pid, without revealing id.

If this is the exact requirement, then pid is what's called a pseudonym and you can use SyRA for this. The user's sig is what SyRA calls "user secret key" which contains user's unique attribute id and the user generates a pseudonym using his sig and a value Z given by the verifier as pseudo(sig, Z) and a proof that the inputs to pseudo were infact sig and Z without revealing sig or id. Note that pseudo is not randomized and depends only on sig and Z so the user cant create a different pseudonym from the same sig and Z.
Technically pseudo is not an OPRF, since the user learns Z but doesn't look like that's your requirement. But you can make it a OPRF by verifier first blinding Z by random r and then unblinding the output of pseudo by multiplying by 1/r.

You can combine (that was our intention) this construction with schemes like BBS+ to get anonymous credentials with attributes where the BBS+ credential has other user attributes and the issuer, in addition to BBS+ signature, gives a SyRA signature.

You said threshold OPRF. Which part of would you want to thresholdize, SIGN?

@matthiasgeihs
Copy link
Author

Hi, thx for the reply, will definitely have a look at SyRA.

Threshold OPRF because of the following. (Also wanted to ask whether this is a problem with respect to SyRA.)

So in my setting, the user id id certified in the credential is low entropy (you can consider this a passport number, which is essentially guessable). Still we want to hide it when proving eligibility (the application I have in mind is voting). Furthermore, the mapping from id to pid should be deterministic (there should be exactly one pid for a corresponding id), so that we can check if somebody voted twice. Still pid should not leak id. An OPRF can generate pid from id such that this holds as long as the OPRF key is not leaked. If the OPRF is evaluated by just a single party, then this party knows the key and could identify id from pid simply by evaluating the OPRF itself. Hence, I'd thought the solution to this problem would be to thresholdize the OPRF, so that no single party actually knows the OPRF key.

Does that make sense?

@matthiasgeihs
Copy link
Author

Had a look at SyRA. This looks very promising for my use case.

One differentiation though, I think, is that in SyRA the issuer plays a central role. If I understand correctly, credentials issued by different issuers cannot be mixed.

For example, I could not allow voters who received their credential from different issuers, to vote on the same poll.

Is my understanding correct?

@lovesh
Copy link
Member

lovesh commented Oct 11, 2024

So in my setting, the user id id certified in the credential is low entropy (you can consider this a passport number, which is essentially guessable). Still we want to hide it when proving eligibility (the application I have in mind is voting). Furthermore, the mapping from id to pid should be deterministic (there should be exactly one pid for a corresponding id), so that we can check if somebody voted twice. Still pid should not leak id.

This is achieved by SyRA.

An OPRF can generate pid from id such that this holds as long as the OPRF key is not leaked.

I should have mentioned before but when using SyRA as (O)PRF, id is the PRF key. But if the user want's to secret share id (seems impractical as all share holders have to be invoked on each usage of the credential), the current proving protocol won't work.

I could not allow voters who received their credential from different issuers, to vote on the same poll.

You mean if a single voter got multiple credentials, one from each issuer, then correct. Their pseudonyms (or PRF output) on same id will be different thus counting as different votes.
If you can share more details about your use-case, like what conditions must be met to get a credential, can issuers collaborate, etc, I can think a bit more

@matthiasgeihs
Copy link
Author

Read further into the paper. Another noticeable difference: In SyRA, the user private key is chosen by the issuer. Hence, if the issuer is compromised, it could generate signatures on behalf of the user.
(For that reason, thresholdizing the issuer seems absolutely necessary. As far as I can see, the thresholdized issuer protocol has not been implemented yet, correct?)

In my imagined design, the user secret key is chosen by the user and not revealed to anyone. Generating a pseudonym is a separate step. Compromising the pseudonym issuer would only compromise anonymity, but not allow voting on behalf of the user.

I'll write up a few more details below on what I have in mind. (Note that I have not been able to come up with a concrete and efficient realization of the protocol so far, but I think it's theoretically possible.)

@matthiasgeihs
Copy link
Author

matthiasgeihs commented Oct 11, 2024

Target application: anonymous voting
Involved parties: User/Voter (U), Credential Issuer (CI), Poll Organizer (PO), Append-only Public Bulletin Board (B)

Step 1: U proves identity and obtains credential from CI
U convinces CI of identity id
U chooses secret key sk
U requests credential cred for (id, sk) from CI, which is a partially blind signature for on (id, sk) where sk is blinded. (This can be realized using BBS.)

Step 2: U obtains poll id from PO
U and PO evaluate pid = OPRF(id), where PO holds OPRF key.
U obtains a proof pi of correct evaluation.
(Something like this can be realized using BLS, but there is a catch, see below.)

Step 3: U publishes vote on B
U creates a signature sig on vote v using sk.
U then creates a proof pi_vote for the following:

  • VerifyCred(cred, (id, sk)) = 1
  • VerifyPid(id, pid) = 1
  • VerifyVote(sk, sig, v) = 1

U publishes (pi_vote, v, pid) on B.
Anybody can verify that the voter is eligible by checking pi_vote and detect double-voting by checking for duplicate pid.
(Generating a proof of knowledge of a signature that has been created using a secret key contained in a credential shouldn't be too hard using known techniques. What I currently don't know is how to connect the OPRF proof with the rest of the two statements. Afaik, BLS can used as a verifiable OPRF (and is easy to thresholdize), but correct evaluation can only be verified if the input is revealed. I don't know how to make this work for committed inputs to the OPRF. I.e., we want to prove that for some commitment, the result of evaluating the OPRF on the committed value is correct, and there are certain relations of that committed input value to other inputs of the protocol (like there is a credential for that input).)

@lovesh
Copy link
Member

lovesh commented Oct 11, 2024

(For that reason, thresholdizing the issuer seems absolutely necessary. As far as I can see, the thresholdized issuer protocol has not been implemented yet, correct?)

Correct, it hasn't been. But I discussed with one of the SyRA authors to add it but the implementation is going to be different than the one in the paper.

Step 1: U proves identity and obtains credential from CI
U convinces CI of identity id
U chooses secret key sk
U requests credential cred for (id, sk) from CI

Do you assume CI can't be malicious, i.e. issue cred with different id to same U. And I don't think you need sk (see below)

Step 2: U obtains poll id from PO
U and PO evaluate pid = OPRF(id), where PO holds OPRF key.

Before PO generates pid, it needs to check if U has a cred with id.
And in your first comment, were you meaning to thresholdize PO?

Step 3: U publishes vote on B
U creates a signature sig on vote v using sk.
U then creates a proof pi_vote for the following:

sk isn't needed as pi_vote can just be the proof of knowledge of signature in cred with the same id as used in generation of pid and v is just included in the challenge thus forming a signature over v. The same idea of getting a Schnorr signature from Schnorr's identification protocol.
Also note that a malicious PO cannot vote as it does not have cred but it surely can break anonymity of voter if using SyRA as it is.

Finally, I need to look more into the OPRF with desired property. Will get back to you on this.

@lovesh
Copy link
Member

lovesh commented Oct 13, 2024

I found an OPRF with committed inputs in this paper which allows oblivious evaluation of DY PRF on a committed input. Its described in steps 1 and 2 of the protocol in Fig 4 of the paper. m in that figure would correspond to id of the cred. The challenge is that OPRF input (m) in this case belongs to a group of composite order where id from group belongs to a prime order group likely of different sizes so I don’t know of a protocol to prove equality among them. This paper might help but haven’t looked enough. Once U has the PRF output, creating a signature of knowledge of the OPRF input on vote v can be done using known techniques.

Also in practice, PO might organize several polls, each with a new poll id and its permission to U to vote, which you call pid and represent with an OPRF. It should be comprised of both the poll id poll_id and user id id. So PO's OPRF should be a Partially Oblivious PRF (POPRF) where the POPRF's public and private inputs will be poll_id and id respectively. I considered this OPRF, with the construction described in section 4 and tried making it accept committed inputs by treating hash function H1 as scalar multiplication with a public parameter (PRF output is H(id)*{1/(sk + poll_id)} = G*id*{1/(sk + poll_id)}) but proving the equality of PRF’s private input with cred while voting didn’t work for low entropy values (can provide more details if interested). Also I am not sure if constructing H1 as scalar multiplication is fine since its collision resistant but the output is malleable.
So ideally you are looking at a POPRF with committed inputs.

@matthiasgeihs
Copy link
Author

matthiasgeihs commented Oct 14, 2024

Correct, it hasn't been. But I discussed with one of the SyRA authors to add it but the implementation is going to be different than the one in the paper.

Nice to hear. Looking forward to that. Mind outlining what's the differences?

Do you assume CI can't be malicious, i.e. issue cred with different id to same U. And I don't think you need sk (see below)

I now realize that there is a strong trust assumption on my CI as well. (I've created the design a while ago, so have to remind myself of the details...) My use case idea revolves around usage of the active authentication feature of digital passports. Basically, the goal is to turn a regular passport into a versatile digital credential. And the CI's job is to create that binding. Obviously, if the CI is not trustworthy, it could just issue credentials for any identity of its choice. Ideally, the CI is therefore also thresholdized or running in a TEE. Nevertheless, I wanted to ensure that the CI does not learn user secrets in any case. User secret keys of honest users should never leave the user device.

Before PO generates pid, it needs to check if U has a cred with id. And in your first comment, were you meaning to thresholdize PO?

You are right. I missed that point. Otherwise, anyone could check whether pid = OPRF(id) for other id's and thereby reveal identities.

sk isn't needed as pi_vote can just be the proof of knowledge of signature in cred with the same id as used in generation of pid and v is just included in the challenge thus forming a signature over v. The same idea of getting a Schnorr signature from Schnorr's identification protocol. Also note that a malicious PO cannot vote as it does not have cred but it surely can break anonymity of voter if using SyRA as it is.

You are right. Technically sk isn't needed if we keep the credential secret. However, as outlined above, I wanted that the CI does not learn user secrets in the honest case. If we want this property (that the CI doesn't learn the credential), regular BBS does not suffice, right? We would have to use some form of BBS blind signatures, correct?
Moreover, i thought it might have practical benefits if the user secret is a "regular" secret key instead of the full credential.

@matthiasgeihs
Copy link
Author

I found an OPRF with committed inputs in this paper which allows oblivious evaluation of DY PRF on a committed input. Its described in steps 1 and 2 of the protocol in Fig 4 of the paper. m in that figure would correspond to id of the cred. The challenge is that OPRF input (m) in this case belongs to a group of composite order where id from group belongs to a prime order group likely of different sizes so I don’t know of a protocol to prove equality among them. This paper might help but haven’t looked enough. Once U has the PRF output, creating a signature of knowledge of the OPRF input on vote v can be done using known techniques.

Thanks for digging out that paper. It looks like an interesting approach.

Also in practice, PO might organize several polls, each with a new poll id and its permission to U to vote, which you call pid and represent with an OPRF. It should be comprised of both the poll id poll_id and user id id. So PO's OPRF should be a Partially Oblivious PRF (POPRF) where the POPRF's public and private inputs will be poll_id and id respectively.

The way I imagined it is that the PO uses a different OPRF key for each poll. This way, the pseudonyms are automatically scoped. The corresponding verification key can be considered as the unique global poll id.

@matthiasgeihs
Copy link
Author

Overall, I need to reevaluate my design in comparison to SyRA, which I haven't been aware of.

There seem to be the following core differences:

  • SyRA
    • One third party component: CI
    • User pseudonyms are derived from real id, CI key, public poll context: pid = Derive(id, k_CI, poll_ctx).
    • The CI generates the user secret.
  • OPRF approach
    • Two third party components: CI, PO
    • User pseudonyms are derived from real id, PO poll key: pid = Derive(id, k_PO).
    • The user secret never leaves the user device.

Because the pid is not bound to the CI with the OPRF approach, users who obtained their credentials from different CIs can still vote on the same poll. In a way, the credential issuance is more decoupled from the application layer.

@lovesh
Copy link
Member

lovesh commented Oct 14, 2024

Nice to hear. Looking forward to that. Mind outlining what's the differences?

It will be based on OT based multiplication, likely this one since we already do that for threshold BBS+ and SyRA signature has similar structure.

If we want this property (that the CI doesn't learn the credential), regular BBS does not suffice, right? We would have to use some form of BBS blind signatures, correct?
Moreover, i thought it might have practical benefits if the user secret is a "regular" secret key instead of the full credential.

It doesn't matter that the CI learns the credential since the any usage of cred doesn't reveal the signature and CI couldn't learn it even after interacting with PO or any verifier. But I see that you might have other uses of sk, like creating a pseudonym based on sk where the use-case permits.

The way I imagined it is that the PO uses a different OPRF key for each poll. This way, the pseudonyms are automatically scoped. The corresponding verification key can be considered as the unique global poll id.

This works in a very simple deployment where the verification key can just be published on bulletin board (BB).
If the BB has a significant publishing cost, you pay that for each poll. And if you have multiple concurrent polls, you need the BB to store multiple keys and "know" the correct one for verification so a bit more complex logic. And if the PO role is thresholdized, then they need to run a DKG for each poll.

SyRA
One third party component: CI
User pseudonyms are derived from real id, CI key, public poll context: pid = Derive(id, k_CI, poll_ctx).
The CI generates the user secret.

By user secret, you mean id? And you are implying the CI can know how user voted? And thresholdizing CI solves this issue?
Is the role PO not mandatory in your use-case?

Because the pid is not bound to the CI with the OPRF approach, users who obtained their credentials from different CIs can still vote on the same poll.

That would be true for SyRA as well if there are multiple CIs and they issue without checking if others issued as well.

@matthiasgeihs
Copy link
Author

It doesn't matter that the CI learns the credential since the any usage of cred doesn't reveal the signature and CI couldn't learn it even after interacting with PO or any verifier.

Perhaps I don't fully understand your proposed construction. What signature are you referring to here? I was assuming it is the signature created by the CI? For generating vote signatures, the user certainly needs to hold some secret. What is that secret comprised of in your proposal?

This works in a very simple deployment where the verification key can just be published on bulletin board (BB).
If the BB has a significant publishing cost, you pay that for each poll. And if you have multiple concurrent polls, you need the BB to store multiple keys and "know" the correct one for verification so a bit more complex logic. And if the PO role is thresholdized, then they need to run a DKG for each poll.

I agree the setup is a bit more complex because there is a corresponding secret PO key. For the public key, I think of it similar as a context string. I wouldn't think it needs to be particularly more costly to publish it in comparison to publishing a context string when using SyRA?

By user secret, you mean id?

By user secret, i refer to usk in the paper. The user secret key.

And you are implying the CI can know how user voted? And thresholdizing CI solves this issue?

No. The CI should not know how the user voted. In the OPRF approach this is ensured because the OPRF outputs look random to anybody not knowing the OPRF key. The main purpose of the CI is to certify the real user id and tie it to some secret so that later things can be signed / authenticated in relation to that id.

Is the role PO not mandatory in your use-case?

The poll organizer (PO) would be mandatory. It can be thought of as the party / committee hosting the poll.

Because the pid is not bound to the CI with the OPRF approach, users who obtained their credentials from different CIs can still vote on the same poll.

That would be true for SyRA as well if there are multiple CIs and they issue without checking if others issued as well.

I meant that the other way around. In SyRA, one cannot check whether two pseudonyms from different CIs belong to the same real world user. With the OPRF approach, the (with high probability) 1-to-1 mapping between real identity and pseudonym holds across CIs.

@lovesh
Copy link
Member

lovesh commented Oct 15, 2024

What signature are you referring to here?

I was referring to the SyRA signature (what SyRA calls user secret key) and optionally BBS signature (when you have other attributes besides id). During voting, user proves knowledge of these 2 signatures in zero knowledge.

For generating vote signatures, the user certainly needs to hold some secret. What is that secret comprised of in your proposal?

The above 2 signatures.

I wouldn't think it needs to be particularly more costly to publish it in comparison to publishing a context string when using SyRA?

SyRA public keys are certainly bigger than a context string but depending on your exact workflow it might not matter.

The main purpose of the CI is to certify the real user id and tie it to some secret so that later things can be signed / authenticated in relation to that id.

Got it.

The poll organizer (PO) would be mandatory. It can be thought of as the party / committee hosting the poll.

And its acceptable for CI to play that role as well (when using SyRA)?

I meant that the other way around. In SyRA, one cannot check whether two pseudonyms from different CIs belong to the same real world user.

I meant the same thing about SyRA. All CIs have to keep a shared state of user ids they have issued to.

With the OPRF approach, the (with high probability) 1-to-1 mapping between real identity and pseudonym holds across CIs.

Because there is only 1 PO?

I think we CI was a threshold type role, above problems are solved as there is only a single secret key and no single CI learns which user-id requested? Does the user need to prove something to CI before getting a credential?

@matthiasgeihs
Copy link
Author

The poll organizer (PO) would be mandatory. It can be thought of as the party / committee hosting the poll.

And its acceptable for CI to play that role as well (when using SyRA)?

I think it might work as well. The setup would change a bit. Like every application using the anonymous voting service would have to run their own CI (instead of their own PO). The downside is that real user identification is then at the application level, and users have to trust the application that they verify theirs users correctly.

I meant that the other way around. In SyRA, one cannot check whether two pseudonyms from different CIs belong to the same real world user.

I meant the same thing about SyRA. All CIs have to keep a shared state of user ids they have issued to.

Hm, why do CI's have to keep track of their users? I thought the SyRA CI is stateless? When I receive usk1 from CI1, and usk2, and generate sig1 and sig2, respectively, there is no way to check whether sig1 and sig2 belong to the same real user (i.e., are a duplicate vote)?

With the OPRF approach, the (with high probability) 1-to-1 mapping between real identity and pseudonym holds across CIs.

Because there is only 1 PO?

I meant per PO / poll. The point is that it doesn't matter if I use cred1 from CI1 or cred2 from CI2. The resulting pid (w.r.t. to a given poll) will be the same.

I think we CI was a threshold type role, above problems are solved as there is only a single secret key and no single CI learns which user-id requested? Does the user need to prove something to CI before getting a credential?

User should prove his real identity to the CI (e.g., using the passport's Active Authentication protocol), so that the CI can create a digital, verifiable credential for it.

@matthiasgeihs
Copy link
Author

matthiasgeihs commented Oct 15, 2024

What signature are you referring to here?

I was referring to the SyRA signature (what SyRA calls user secret key) and optionally BBS signature (when you have other attributes besides id). During voting, user proves knowledge of these 2 signatures in zero knowledge.

For generating vote signatures, the user certainly needs to hold some secret. What is that secret comprised of in your proposal?

The above 2 signatures.

Not sure if I'm getting it. Are you suggesting to combine my approach with SyRA?

In any case, thanks a lot for your plenty comments so far. Appreciate it a lot!
I realize that it's not that easy to follow what I write since it's now scattered across several posts. I have started writing up my design in a LaTex document. Might share that for a more fruitful discussion, once it's ready.

@lovesh
Copy link
Member

lovesh commented Oct 15, 2024

I think it might work as well. The setup would change a bit. Like every application using the anonymous voting service would have to run their own CI (instead of their own PO).

Can the POs trust the CI to issue "valid" credentials?
This is how I imagine it.

  • Every user has a unique user id like passport number in your case.
  • There are a bunch of trusted CIs to issue "valid" credentials, i.e. they do the background check and don't issue to same user id twice. Hence the shared (among CIs) database where they store the user ids to which they have issued.
    PS: There might be a way to avoid storing the database of issued user ids in plain by using the Oblivious accumulators which are accumulators that hide the members from everyone. And since they describe a trapdoorless setting, you can imagine the bulletin board enforcing that only CIs can write to it. Before requesting a credential, user proves non-membership of its id in the accumulator and after issuing, CI updates the accumulator without any other CI learning the user-id. But I haven't looked at that construction and generally accumulators are expensive, with non-membership checks being more.
  • The credential doesn't have to be voting specific, they can have broader usage and thus have other attributes like name, city, etc.
  • The credentials have a SyRA signature (user secret key) on the unique user id but can contain a BBS signature if the credential contains other attributes than just user id. So C1 and C2 are 2 valid credentials: C1 = (id, sig_syra) and C2 = (id, name, city, ..., sig_syra, sig_bbs) where sig_bbs is a BBS sig on (id, name, city, ...)
  • PO creates a poll with a unique context and broadcasts it on the bulletin board.
  • To vote, users create a pseudonym using SyRA signature (and optionally BBS) and corresponding proof and send these to the bulletin board.

Hm, why do CI's have to keep track of their users? I thought the SyRA CI is stateless? When I receive usk1 from CI1, and usk2, and generate sig1 and sig2, respectively, there is no way to check whether sig1 and sig2 belong to the same real user (i.e., are a duplicate vote)?

I explained above but its to mitigate the duplicate voting problem you mentioned.. Sorry I mixed it up. You're correct, there is no need of shared state.

I meant per PO / poll.

Ok. That's what i was looking for.

Not sure if I'm getting it. Are you suggesting to combine my approach with SyRA?

Yes but only if needed. I explained above.

In any case, thanks a lot for your plenty comments so far. Appreciate it a lot!

Thank you. Its been a very helpful discussion. Thanks for introducing me to OPRFs :)

I realize that it's not that easy to follow what I write since it's now scattered across several posts. I have started writing up my design in a LaTex document. Might share that for a more fruitful discussion, once it's ready.

Sure.

@lovesh
Copy link
Member

lovesh commented Nov 5, 2024

Hi @matthiasgeihs . Sorry but I forgot to share that the threshold SyRA implementation has been added here.

@matthiasgeihs
Copy link
Author

@lovesh Very cool, thx for letting me know. Had other priorities in the meantime, but this motivates me to pick this up again 👍

@lovesh
Copy link
Member

lovesh commented Nov 5, 2024

@matthiasgeihs Please check this. Here a share of the message (id in your case) is passed but not the whole message. In practice the user would still like to convince the signers that this message comes from a commitment (an existing signature like BBS+). This can be done using the protocol described here. I have not implemented these yet as the author (of "VSS from Distributed .." paper) told me that they will soon have a more efficient scheme.

@matthiasgeihs
Copy link
Author

matthiasgeihs commented Jan 2, 2025

@lovesh Happy new year 2025! Just found some time to look a bit more into the SyRA threshold issuance, downloaded the code, and ran the tests. What's the reason that threshold issuance is so slow?

test threshold_issuance::tests::issue_with_known_user_id has been running for over 60 seconds
test threshold_issuance::tests::issue_with_shared_user_id has been running for over 60 seconds

nvm, looked a bit closer, ran the tests with logging output (--nocapture) and realized that this is running several instances each and includes setup procedures like OTE setup, which is probably the reason why this takes much longer than the single party tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants