[Shared by Zhang Bingsheng of Zhejiang University] Multi-party privacy function evaluation under the RAM model

IEEE x ATEC

IEEE x ATEC Technology Thinking Conference is a technology salon jointly sponsored by the professional technical society IEEE and the cutting-edge technology exploration community ATEC. Invite industry experts and scholars to share cutting-edge exploration and technical practice to help digital development.

In the era of big data, where everything is connected, data links all aspects of our lives. On the one hand, big data greatly facilitates our work and life; on the other hand, the massive amount of data also increases the risk and challenge of many private information leakage. This issue of Science and Technology Sharing Conference has invited four heavyweight guests to jointly focus on the theme of “Frontier Technology and Application of Privacy Protection”, from different levels such as machine learning algorithm, communication protocol, APP and operating system, to discuss privacy security risks and Discussion on the application of technological innovation.

The following is a speech given by Researcher Zhang Bingsheng from Zhejiang University, “Multi-party Privacy Function Evaluation under the RAM Model”.

Speaker | Zhang Bingsheng

Researcher of the Hundred Talents Program of Zhejiang University, winner of the National Youth Talent Project, Chief Scientist of the Major Scientific Research Project of the Ministry of Science and Technology

“Multi-party privacy function evaluation under the RAM model”

Hello everyone, I am Zhang Bingsheng from Zhejiang University. Today, I would like to share with you the latest collaborative research result of our group and Ant Moss, “Multi-party Privacy Function Evaluation under the RAM Model”. What is RAM model and what is privacy function evaluation? In this talk, I will introduce it to you slowly.

Purpose and Scenario

The first is the purpose and the context. The problem of privacy breaches is becoming more and more serious, and it has also received more and more attention, including the state, government and local authorities at all levels. Most of my country’s Internet companies have more or less areas that need to be rectified, and they have also received rectification regulations and constraints from the state. The above figure lists some related cases of privacy leakage, and other cases will not be listed one by one here.

Recently, our country has accelerated the implementation of relevant legislation on data security and privacy protection. From 2012 to the present, our country has successively promulgated a series of laws and regulations such as the Cybersecurity Law, the Personal Information Protection Law, and the Data Security Law. At present, these laws and regulations are only relatively high-level guidance. For example, it is mentioned in the Personal Security Law that data containing sensitive personal information is not protected by this law if the original data cannot be reversed or reversed through processing. That is, it provides a constraint on privacy computing and data exchange from the side. The data cannot be reversed into original data after processing. These laws and regulations are still relatively high-level, and the specific evaluation criteria are still being formulated. But we believe that in the near future, our evaluation criteria will become more and more clear, at least now there are relevant criteria for de-identification.

Last year Gartner gave a forecast, according to the forecast report: by the end of 2023, the personal data of 75% of the world’s population will be protected by privacy laws. By the end of 2023, more than 80% of companies worldwide will face at least one privacy-focused data protection regulation. Global privacy-driven data protection and compliance technologies will top $15 billion by 2024. By 2025, 60% of large organizations and enterprises will implement data analysis, cloud computing, modeling, and intelligent data processing through at least one or more privacy-enhancing technologies.

There are many technologies for privacy protection and data security, and our work mainly belongs to the category of secure multi-party computing. So what is secure multi-party computation? Its English name is Secure Multiparty Computation, which is an important research branch of cryptography. If you go to the Internet to search what is secure multi-party computing, you will generally find the following definition: To solve the problem of collaborative computing between a group of distrusting parties in the premise of protecting private information and without a trusted third party The proposed cryptographic protocol and theoretical framework.

If you are an insider studying cryptography or cryptographic protocols or privacy computing, then we have a narrow definition. In the narrow definition, secure multi-party computing can generally be divided into two main implementation forms: one is Yao’s obfuscation circuit, which is mainly used for two-party computing, of course, can also be used for multi-party computing. The better feature of Yao’s obfuscation circuit is that it supports Boolean circuits very well, which is also secure multi-party computation. Professor Yao Qizhi proposed the idea when he proposed secure multi-party computation in 1982. The current Yao’s obfuscation circuit has been optimized and improved for 40 years, and its efficiency has been very, very high. The second is for Boolean circuits or algebraic circuits. We implement the agreement between two or more parties in the form of secret sharing. What are the benefits of this implementation? Its communication volume will be smaller than the overall communication volume of Yao’s obfuscation circuit, but its communication rounds will be more, which is more suitable for application scenarios with relatively small bandwidth and relatively low delay. If the delay is relatively high, it is recommended to use Yao’s obfuscation circuit. Of course, the secret sharing support for algebraic circuits is definitely better than Yao’s obfuscation circuit.

Of course there are some mixed protocols now, where you have to solve both Boolean circuits and algebraic circuits in the same function or in the same computational task, such as the ABY series. Broadly speaking, as far as I understand it, secure multi-party computation can include some primitives of cryptography, such as fully homomorphic encryption.

Some people ask, what is the relationship between fully homomorphic encryption and secure multi-party computation? Are they two different technologies?

In my opinion they are an unequal comparison. Because fully homomorphic encryption is just an encryption primitive, this is equivalent to whether I can apply AES encryption for example in secure multi-party computation, you say yes. In my opinion, AES encryption and fully homomorphic encryption are both an encryption algorithm, and secure multi-party computation is a protocol and a higher-dimensional thing. If the protocol for secure multi-party computation uses, for example, a signature algorithm and an encryption algorithm, the protocol is still a secure multi-party computation protocol. So in my personal understanding, even if the secure multi-party computation protocol uses homomorphic encryption, semi-homomorphism, and full homomorphism, it is still a secure multi-party computation protocol. In the industry, because of some requirements, it calls the secure multi-party computing based on trusted hardware Confidential Computing (confidential computing), and separates federated learning from secure multi-party computing. In fact, when federated learning was proposed, it did not have a definition of security. In my opinion, federated learning and secure multi-party computation are actually incomparable concepts. Because in federated learning, secure multi-party computing technology can be applied to do some privacy protection.

What is the purpose of our research on this result?

Our main purpose is to solve the problem that two or more parties need to protect computing tasks and specific algorithms when, for example, cloud computing is used. What does that mean? Traditional secure multi-party computation means that all secure multi-party computation participants will know what task you want to calculate when you are computing, that is to say, my algorithm is open to all secure multi-party computation participants. And what we want to protect is just the data, just the input, just the input to this calculation. But for some application scenarios, its algorithm is very important, such as other people’s intellectual property rights, such as DNA precision, targeted pharmaceuticals. For example, I have an algorithm. My algorithm needs to be applied to your DNA to produce the formula of medicine, but my algorithm cannot be disclosed to you. Although your DNA is also privacy-protected, when I do two-party calculation with you, you do not can get my algorithm. That is to say, I have a need to protect the algorithm, and you have a need to protect the input. This involves a branch. This branch is generally called Private Function Evaluation in the Community. I translate it into Private Function Evaluation, which means that while protecting the input, we also need to protect what we calculate. function, the task we want to compute. We cannot let the computing party know what task is being calculated, that is, protect the IP of your computing algorithm.

Obviously, this is a branch of secure multi-party computation, which means that it still belongs to the field of secure multi-party computation. As far as I know, the current traditional secure multi-party computation, all commercial secure multi-party computation are based on circuit format. What does Sort format mean? That is to say, you can only execute sequentially, and you cannot support the RAM computing model in secure multi-party computing. What is RAM computing? RAM is Random Access Machine. Our current conventional computer is a von Neumann structure, which can access random access or jump Random Access in memory. For example, if you do a Binary Search, binary search, you don’t need to scan the entire Memory, you can jump to the location you specify to read, write, compare, and then do the next step. However, due to some technical limitations, we cannot do this kind of similar operations in secure multi-party computation, and we can only execute them sequentially. And this kind of sequential execution will greatly limit the scenarios that we can use for secure multi-party computation. For example, there are some algorithms that we need to jump or jump a lot, which cannot be efficiently implemented by secure multi-party computation.

One of the work we want to do this time is to implement a Private Function Evaluation in the RAM model. That is to say, our secure multi-party computing system can no longer be compiled into a circuit to support the computing structure of RAM. If you have an infinite loop, you have a While Loop, and even the Condition in your While Loop is an indeterminate Condition. For example, in an infinite loop, you have a condition for privacy protection data based on private data, then our model can also support it, and our model will protect your algorithm. Generally speaking, our approach is to compile some high-level languages (such as C++ language or Java, etc.) with a compiler (such as a compiler using LLVM) into the instruction set of TinyRAM. Why is TinyRAM? Because we need a reduced instruction set, if the instruction set is too complex, the overhead of our entire system will be very large. So we chose a reduced instruction set, TinyRAM. Of course, RISC-V is also possible, and we protect the privacy of programs and inputs. Specifically, we all share secrets, and I will explain the specific operations to you one by one later.

We perform safe execution on the premise of program secret sharing and data secret sharing, and we can execute in secret.

MPC Distributed ORAM

Let’s talk about a premise of our construction, that is, we want to build a distributed ORAM.

What is distributed ORAM?

The traditional ORAM is that you have one server or multiple servers, and then there is a Client. This Client reads and writes on the server. It cannot let the server know which position you are reading and writing. Distributed ORAM refers to the fact that I no longer have such a Client. I want to read and write the location of which grid I want to read and write during secure multi-party computing. In fact, the data I want to read and write is also protected by privacy. For example, it is based on secrets. It is protected in the form of sharing, and then read and written on the server. It is also a distributed structure. Specifically, we used 4 server architectures. If you want to read only, we can also make 3PC format. We will talk about the format of this 2PC later. This is a classic PIR, but it is not a standard ORAM. It is just an implementation of Private Information Retrieval, which is mainly based on DPF (Distributed Point Function), which is FSS (Functioning Secret Sharing). ).

The distribution we built has several characteristics: The first characteristic is that it can read and write any number of times, and the conversion between reading and writing does not require Refresh or Eviction. That is, for some common operations we have seen, it may be very fast to read, but it takes a lot of time to read and write, or its reading and writing are relatively fast, but it suddenly needs to be read and written once after several times of reading and writing. Refresh or perform an Image, which greatly limits the use of these ORAMs in our scenario. Specifically, our O-READ, its communication volume is 12log2(n) bits, and this N is the approximate size of your Memory. Then you fetch a number in the Memory, so that the server does not know which number you fetch. The traffic we need is 12 times log2(n), and then you want to oblivious to rewrite the memory, that is, you are in a memory of N size, and you want to write a number in a certain entry. For example, if you write a number in i, the traffic of All Right here is 24 times log2(n) plus 12t bits. T is actually the length of the data you want to write. The specific operation is divided into Open Delta, Cyclic-shift, Scale, Re-randomize, I will introduce them one by one later.

Let me briefly introduce DPF first. DPF is a Distributed Point Function. Its most classic structure is a two-party DPF, that is, there is a Generate, which will generate two Ks, one for the left, one for the right, and two servers S1 and S2. These two servers do Evaluate for this K respectively, and add up the two things from β0 to β3 in the last line, which is actually a Unit vector. What is a Unit vector? Only one bit of it is non-zero, and the none-zero bit is usually taken as 1 in this application scenario, and every other bit is 0, which means that this is actually a Compression process, giving a Unit vector to compress into a T of log2(n) Size. Due to time problems, I will not talk about the specific methods here. This is a relatively classic technique. It uses a Tree’s Structure, and there will be a Correction in each layer. That is to say, it has a total of log2(n) layers, so its K Size will have log2(n). When the amount of data is relatively large, the expansion time of this tree will be relatively long; but when the amount of data is relatively small, its operation will be very fast.

2PC will have some disadvantages, this FSS of 2PC, the traditional FSS-based PIR:

First, it cannot be used directly here, because it requires two servers and two servers. Seeing the same inscription, it is actually a multi-Server PIR. The two servers above the server must have exactly the same data, and then you select the exact same PIR location in the data, and then go to your own local for synthesis.

The second is that the Client knows which number it wants to take. Then we need to do this distributed RAM. First, the server cannot know what the inscription of this data is. Secondly, in our application scenario, the server can’t even know which number it wants to take. That is to say, both must be shared secretly and must be protected.

So we will have a 4Server structure. The general idea is that a DPF requires two servers with the same value. In the case of dual servers, both servers must store clear text. But if we have 4 servers, we can actually share this data secretly in two. For example, S3 and S4 take the same share of secretly shared shares respectively, and S1 and S2 take the same share of secretly shared shares, that is, Replicated Shares. These two take the same share, and the other two take the same share, so that we can let S1 generate a DPF’s K and distribute it to S3 and S4. S3 and S4 can do the previous PIR evaluation for their shared shares. In the same way, S1 and S2 can also do PIR Evaluation for their shared Share.

There will also be a problem here, that is, the point we want to take cannot be disclosed to others. The point to be taken is not disclosed to others. Our approach is to make a DPF for a random point in the offline stage. For example, i is any number from 0 to N minus one. When the value you actually want to access comes out, you subtract the value you actually want to access from the value that we randomly selected when we prepared before. , the number that comes out is actually an offset, which is a correction amount. We let all the servers shift their own data, the Cyclic shift, which is the shift offset of the loop, so that the DPF we do is equivalent to the DPF for an unknown secret shared target index. I won’t go into details here.

After the completion, each of the 4 parties will do a linear function, which means that they can get the number to be taken in the form of secret sharing without communication. For example, if you want to take the i-th number, then this number is called Xi. According to the formula above, you can actually figure out what this Xi is, that is, Xi also exists in the form of secret sharing.

How to write it? Writing is a little more troublesome than reading, because if it is read-only, 3-way is enough. Our algorithm has a 3-way version. If you want to write, you must get 4 squares. Write, we actually have a limit. The general principle of our writing method is that you need to know what the original data is, you need to know what the data you wrote in is, and then you subtract the original data from the data written in, and you will have a delta, that is There is a modifier and an offset. How much do you need to increase the original data to become the data you want it to be now.

Because in our application scenario, you must have read the value of that grid before writing, so we will not add any extra overhead here. That is, we assume that you need to know what the original data is? What is the modified data now? This is the default assumption in our application scenario. You will use DPF to add an offset to each value in your memory, which is the difference between the latest value you just calculated and the previous value. Of course, when you increase this, you will need to multiply it by a Point Function, which means that most of the offsets you increase in all but one position are 0. And then to that position, the offset you add is Δv, which is 0 in all other parts. After adding and updating, your output results are still shared secretly, so you can continue to go down without needing to do read and write conversions.

This is a result of one of our specific Benchmarks. We compare some common or current state-of-the-art Distributed ORAMs. For example, I use red, yellow and green to represent FLORAM ORAM, SQRT ORAM, CIRCUIT ORAM, and I use blue to represent our own. Because we can do the separation of offline and online, I will draw two lines in the figure, one is the running time of direct online, and the other is the running time of combining online and offline. Then our initialization time is significantly better than any other job and we can say we hardly ever initialize. In the practice of reading, we are better than any other work in most cases, only when the amount of data is relatively small, for example, when the amount of data is less than 20 times, then SQRT ORAM will be faster than ours A little bit. This is only under certain scenarios, such as under the WAN scenario. In terms of writing, our overall efficiency is not much different from FLORAM ORAM. But because we can separate online and offline, if you only look at the online efficiency, when the database is relatively large and exceeds 220 times, our efficiency is significantly higher than all known Distributed ORAMs.

MPC emulates CPU

With this very efficient Distributed ORAM, we can start simulating the CPU of this RAM structure. The idea of our simulation is to manage the storage structure in the form of RAM, including all storage structures (we use the von Neumann structure), including your pointer PC value and some Flag values, some register values when the overall CPU is running , some memory values, some instructions, everything is taken over by the RAM form. Specifically, we compile ordinary code, that is, any code in C language or high-level language, into the instruction set of TinyRAM with a conventional compiler. We will parse the instructions in the TinyRAM instruction set. After parsing, we will implement privacy protection. We’ll talk about how to do it later. It is executed in a dense state and then selected in a dense state, updated at a loss, and the whole and all processes are Oblivious and memory-protected.

There are about 25 specific instruction sets. The first half is the boolean instruction set, that is, bit operations, such as AND, OR, XOR, NOT, Shift, and so on. In the middle are some conventional operations of addition, subtraction, multiplication and division, and the third category in the back is some various comparisons. Its Flag will be different, and then there will be some mov operations, access operations and jump instructions.

The entire Oblivious Cycle is shown on the left in the above figure. You can imagine that it is actually a TinyRAM machine. All the states of this machine are all stored in the form of a secret shared secret state, which has a Private PC (Program Counter). Every time you come to a command, where is the command read from? For instructions, we have a dedicated memory for memory, and our structure and memory for memory can not distinguish between the instruction area and the data area. That is to say, if your code is written in a way that dynamically generates code and new code during the execution of your code, we can support this form. That is, you don’t need to customize the code in this Memory, that part of the code is read-only, and the memory of the data can be read and written, we don’t need this. Code and data can be mixed together, and you can dynamically modify your code. Some viruses like it very much and can turn the data in its own data area into code and then run it.

I am not talking about this from the perspective of program security, but from the perspective of functionality. In fact, we can meet such functional requirements. Our program will fetch an instruction from the memory and privacy protection for parsing. After parsing, the instruction will have operands, what operations it does itself, and what registers to use. For example, what registers will we use, register addressing or immediate addressing. We then go to the memory to fetch the relevant numbers. After fetching the relevant numbers, we will do the operation. After the operation is completed, we will make a selection, which is your actual result. We will save the result back, and then according to your Flag, we will update the Program Counter, which is the next instruction we want to run. We basically go to this Memory according to this Program Counter and then fetch the next instruction.

Specifically, in fact, the biggest overhead of our Oblivious Cycle is that we need to support more than 20 TinyRAM instruction sets. We want to do all the instruction sets in the same step. The general idea is how to prevent you from knowing which instruction I am doing, we will do all the instructions, of course not do each instruction individually. Looking at the left part of the above figure, you will find that we have organically combined all the instruction sets.

There are also predecessors who have compiled several instruction sets with Yao-style obfuscation circuits in their work. The Yao-style obfuscation circuit will be relatively large. We use the ABY structure here. Some are made with Yao-style confusion circuits, some are made with Boolean circuits, and some are made with algebraic circuits, so ours is very efficient. For all instruction sets, we can complete in three rounds, including Compare, some comparisons, some jumps, and various instructions are completed in three rounds. We have some instructions that will be faster, some instructions will be slower, and some instructions share some intermediate results, so we organically combine them.

The general running structure of our overall MPC Private Function Evaluation is shown in the figure above: the first round requires Fetch, and one instruction is read. The second round of Decode this instruction, by the way, reads the relevant operand. The third round is Execute, because we need to cover all the instructions in the instruction set, so Execute actually requires three rounds. The fourth round is write, because write and fetch can be executed at the same time, so they can overlap in the same pipeline. Basically our program loops like this.

In order to show the efficiency of this program, we have done some evaluations of the more common functions in the related RAM structure. Such as Binary Search, such as Set Intersection, such as Quick Sort. Specifically, we measured the time of Offline, Fetch, Decode, Evaluation, and Write separately, including Total time. Note that since we are protecting the function, the time is indeed a bit slower compared to some secure multi-party computation that does not protect the function.

Some people say that Binary Search feels similar to Linear Scan. Note that we are protecting functions, you don’t actually know I’m doing Binary Search. Why this Binary Search will be taken out to do it alone? Because if this is not the structure of the RAM model, it is very difficult to do Binary Search, and the entire Memory scan must be done once. We basically only need to do log2(n) steps now, which means you only need to do log2(n) comparisons and you can get this result.

Because of time constraints, we will share it here today. If you have any questions, welcome to Email, my email is [email protected], thank you, goodbye.

This article is reprinted from: https://www.leiphone.com/category/industrynews/Lw965LWvOUHo3Mnj.html
This site is for inclusion only, and the copyright belongs to the original author.

IEEE x ATEC

Speaker | Zhang Bingsheng

Researcher of the Hundred Talents Program of Zhejiang University, winner of the National Youth Talent Project, Chief Scientist of the Major Scientific Research Project of the Ministry of Science and Technology

“Multi-party privacy function evaluation under the RAM model”

MPC Distributed ORAM

Leave a Comment Cancel Reply