Leveraging Zero-Knowledge Proofs in Machine Learning and LLMs: Enhancing Privacy and Security

Published 09/20/2024

Written by Ken Huang, CEO of DistributedApps.ai and VP of Research at CSA GCR.

I recently attended Cloud Security Alliance's AI Controls Working Group's face-to-face meetings in Seattle. One interesting question was asked by one of our participants on if zero-knowledge proofs (ZKPs) are used in machine learning at all. This blog post is trying to answer this question and explore the potential applications of this technology in the rapidly evolving fields of machine learning (ML) and large language models (LLMs).

What are Zero-Knowledge Proofs?

Zero-knowledge proofs are cryptographic protocols that allow one party (the prover) to prove to another party (the verifier) that a statement is true without revealing any information beyond the validity of the statement itself. In the context of ML and LLMs, ZKPs can be used to verify the integrity and correctness of models or computations without exposing sensitive data or model architecture.

Applications in Machine Learning

1. Privacy-Preserving Model Training

ZKPs can be used to train ML models on sensitive data without exposing the underlying information. For example:

Healthcare: Hospitals can collaboratively train diagnostic models without sharing patient data.
Finance: Banks can develop fraud detection systems using aggregated transaction data while maintaining client confidentiality.

2. Secure Model Verification

ZKPs allow for the verification of ML model properties without revealing the model itself:

Fairness Audits: Proving that a hiring algorithm doesn't discriminate based on protected attributes.
Regulatory Compliance: Demonstrating that a financial model adheres to specific risk management criteria.

3. Federated Learning with Enhanced Privacy

ZKPs can strengthen privacy guarantees in federated learning setups:

Edge Computing: Ensuring that updates from edge devices are valid without exposing local data.
Cross-Organization Collaboration: Enabling multiple organizations to jointly train models while keeping their data private.

Applications in Large Language Models

ZKPs use in LLM is an active research area. For example, in the article titled zkLLM: Zero Knowledge Proofs for Large Language Models, the authors address a challenge within AI legislation: the need to establish authenticity of outputs generated by LLMs.

To tackle this issue, they present zkLLM, which stands as a zero-knowledge proof for LLMs. Addressing the challenge of operations in learning, they introduce tlookup, a lookup argument for tensor operations in learning, offering a solution with overhead. Furthermore, leveraging the foundation of tlookup, they introduce zkAttn, a zero-knowledge proof for the attention mechanism, balancing considerations of time, usage, and accuracy.

Empowered by their implementation, zkLLM emerges as a stride towards achieving computations over LLMs. For LLMs with parameters, their approach enables the generation of a proof for the inference process in minutes. The proof, sized at kB, is designed to uphold privacy of the parameters, ensuring no leakage. The following are some other examples.

1. Data Privacy in Fine-Tuning

ZKPs can be employed to fine-tune LLMs on sensitive data:

Legal Tech: Law firms fine-tuning models on confidential case data without exposing client information.
Proprietary Knowledge: Companies adapting LLMs to their domain-specific knowledge without revealing trade secrets.

2. Verifiable AI-Generated Content

ZKPs can help authenticate AI-generated content:

Media Integrity: Proving that an image or text was generated by a specific AI model without human intervention.
Academic Integrity: Verifying that a student's work was not generated by an LLM.

3. Secure Model Serving

ZKPs can enhance the security of LLM inference:

Confidential Computing: Proving that an LLM inference was performed correctly without revealing the input or the model parameters.
Auditable AI Decisions: Demonstrating that an AI-driven decision followed a specific process without exposing the underlying logic.

Companies Pioneering ZKPs in Machine Learning

Several companies are at the forefront of integrating ZKPs into ML and AI:

Inpher: Inpher leverages ZKPs to enhance the privacy and security of their machine learning solutions. By integrating ZKPs, Inpher ensures that sensitive data used in machine learning models remains confidential. This allows organizations to perform data analysis and build predictive models without exposing the underlying data.
Zama.ai: While Zama primarily focuses on fully homomorphic encryption (FHE), they also explore the use of ZKPs to further secure machine learning processes. ZKPs can be used to verify computations on encrypted data, ensuring that the results are accurate without revealing the data itself.
OpenMined: OpenMined is an open-source community that actively develops privacy-preserving AI technologies, including ZKPs. They use ZKPs to enable secure data analysis and machine learning on private data. This ensures that sensitive information remains protected while still allowing for advanced data processing and model training.

Why Use ZKPs as Security Controls in ML?

Data Privacy: ZKPs allow for computations on sensitive data without exposing the data itself, crucial for compliance with regulations like GDPR and HIPAA.
Model Protection: Intellectual property in ML models can be protected while still allowing for public verification of model properties.
Trust and Transparency: ZKPs enable auditable AI systems, fostering trust in automated decision-making processes.
Collaborative Innovation: Secure multi-party computation becomes feasible, allowing for collaborative model development across organizations.
Resistance to Adversarial Attacks: By limiting information exposure, ZKPs can make it harder for attackers to craft adversarial examples or reverse-engineer models.