Compilation丨Zhang Takiling, Yang Liu
Edited by Victor
In January this year, Professor Stefan Feuerriegelc of ETH Zurich published the article “Artificial Intelligence Across Company Borders” in the journal “Communications of the ACM”. In the article, the professor pointed out the common challenges in the implementation of artificial intelligence (AI) industry: how to Cross-company collaboration?
The professor said: Constructing large-scale cross-company datasets through data sharing is a way, but there are risks of data confidentiality and privacy leakage, and it is restricted by privacy-related laws.
The distributed machine learning framework that protects privacy, federated learning, can keep data out of the local area and solve the above pain points.
However, traditional federated learning currently cannot provide a normative proof of privacy protection, and its scenarios are vulnerable to causal attacks.
Therefore, the professor pointed out that combining federated learning and domain adaptation can maximize the benefits of partner companies from collaborative AI models, while keeping the original training data locally.
The following is Professor Stefan Feuerriegelc ‘s introduction to domain adaptive federated learning, translated by Zhang Takiling and Yang Liu, senior algorithm engineers of Nebula Clustar.
In recent years, digital technologies centered on AI are driving economic and social development. Data shows that in 2030, AI will increase economic activity in the global industrial sector by $13 trillion.
However, the potential of this technology remains largely untapped due to the inability to obtain or effectively use multinational company data. AI benefits from a large amount of representative data, which usually needs to come from multiple companies, especially in practical industrial scenarios, in the face of rare unexpected events or key system states, to make AI models achieve good results Performance is extremely challenging.
A straightforward way to implement cross-company AI technology is to construct large-scale cross-company datasets through data sharing. But most companies are reluctant to share data directly due to data confidentiality and privacy risks. And in most cases, sharing data is restricted by privacy-related laws. Therefore, federated learning with domain adaptation is the key to solving cross-company AI problems. On the one hand, federated learning can realize model training and inference without leaking the data privacy of each company; on the other hand, domain adaptation allows each The company customizes the federation model according to its own specific application scenarios and conditions.
1
Barriers to AI Collaboration
There are two main barriers to cross-company AI:
The first is data privacy across companies. Because direct sharing of raw data may expose rival companies to proprietary information about their own company’s operational processes or intellectual property. This hurdle often arises when companies seek to partner with suppliers, customers, or rival companies that want to collaborate on AI.
For example, data from a manufacturing plant can reveal parameter settings, product composition, yield, output, routing, and machine uptime. If such data were to be leaked, it could be misused by customers in company negotiations or in turn help competitors increase productivity and improve products. At the same time, in addition to intellectual property rights, some deep constraints can also reduce the willingness or propensity of companies to share data, such as the level of trust between companies, ethical constraints, laws and regulations to protect the privacy of company users, and cybersecurity risks. Therefore, we need a solution that protects data privacy, that is, model inference without exposing the source data of each company.
Second, cross-company cooperation needs to take into account the impact of domain shifts. Domain skew is the mismatch in the distribution of data collected for different companies using machines with different configurations or operating systems. For example, machine data collected from one company may not be representative of another company due to different machine data collection conditions. Domain shift presents a barrier to potential inference: a model trained on one company’s data may perform poorly when deployed to another company with a significantly different data distribution.
2
Cross-company AI
Recent advances in AI research are expected to break through these two challenges. Federated learning is a privacy-preserving distributed machine learning framework designed to allow multiple edge devices or servers to jointly conduct machine learning model training by sharing local model parameters (gradients or weights) without sharing data samples. .
Vertical federated learning across companies can be performed from joint data of all participating companies (e.g., from multiple factories, rolling stock plants, or power plants), by sharing model parameters (gradients or weights) across companies, jointly Model training for machine learning.
To achieve this, vertical federated learning across companies works by decoupling model training from access to raw training data: companies align common data through cryptographic techniques without exposing their respective raw data. Model training is performed by using the local data of each participant, and the intermediate results are returned to the coordinator. The coordinator aggregates the intermediate results of each participant and builds a collaborative model to improve the performance and effect of the model as a whole. During this process, no company has direct access to the raw training data of other companies.
In the context of cross-company AI, aiming at the problem of domain offset in cross-company cooperation, since the data distribution of different companies usually only has less overlap, that is, the target domain and the source domain are different to a certain extent, we introduce domain adaptation theory, The goal is to learn invariants that are not subject to the specific operating conditions of partner firms, thereby mitigating the effects of poor model performance across firms due to domain shifts.
Specifically, it mainly learns the common feature representation of the source domain and the target domain. In the common feature space, the distribution of the source domain and the target domain should be the same as possible, so that the edge distributions are aligned in the feature space.
Cross-company AI collaboration can address both the privacy-preserving barriers to direct data sharing and the domain-biased barriers to domain adaptation through the use of federated learning. This combination is often referred to as federated transfer learning.
Two types of transfer learning methods are commonly encountered in industrial ecosystems, usually treating failures as labels but unbalanced as failures are usually uncommon in the system. Often the labels are present in the source domain but not in the target domain (called unsupervised domain adaptation); labels are absent in both the source and target domains (called unsupervised transfer learning)
3
Cross-company AI implementation
Companies can combine federated learning and domain adaptation to enable collaborative AI in industrial ecosystems. Once deployed, it allows partner companies to benefit from collaborative AI models while keeping the original training data local. At the same time, the collaborative model is trained in a way that can generalize well on each company’s data. And no proprietary data is shared across firm boundaries at any time, only the intermediate results of the model (e.g. gradients) are shared between firms, furthermore, collaborative models represent the degree of heterogeneity between firms by learning invariants. For example, independent of company-specific operating conditions, each participating stakeholder company is able to extend its own operating experience with the experience of other partner companies.
For industrial ecosystems, the training process in traditional federated learning is usually coordinated by a central server, but on the one hand, due to the bottleneck characteristics of the central server, potential loopholes may be created. On the other hand, this centralized architecture is currently only applied to the common scenario of bilateral cooperation.
There is great potential and great value in implementing cross-company AI collaboration in a decentralized manner, so a decentralized learning setup is introduced. In decentralized federated learning, communication with a central server is replaced by peer-to-peer communication, which is critical for cross-company collaboration within sub-networks dynamically shaped by the similarity of applications or operating conditions and the evolution of specific use cases and operating conditions. At the same time, in order to complete the tasks of the traditional central server, the application of distributed ledger technology is also feasible here. Finally, the methods discussed here need to be chosen based on practical experience across enterprises, so that companies can choose whether to prefer a centralized or decentralized approach to federated learning.
Although federated learning can provide significant privacy-preserving strategies and encourage collaboration across company boundaries, so far, traditional federated learning cannot currently provide normative privacy-preserving proofs, and semi-honest participants are possible from gradient updates and previous Some information is inferred from the model parameters of . Furthermore, traditional federated learning scenarios are vulnerable to causal attacks, i.e., a trained model may be corrupted by wrong model updates by the participants. It is very important for companies to avoid the implementation of such attacks, and one solution here is to propose the use of additional privacy-preserving techniques, such as differential privacy or cryptography.
4
Combining Federated Learning and Domain Adaptation
The power of AI can be unleashed in a cross-company environment
For practitioners, bringing cross-company AI collaboration into industrial ecosystems will require a set of design principles to guide and implement the process. For example, if there is no significant domain shift in the data distribution within the applications of the two companies, federated learning can be directly applied without combining it with domain adaptation, etc.
Furthermore, the implementation of cross-company AI collaboration must meet further demands of practice, which may require more expansion, such as continuous learning and solutions for data heterogeneity. For example, for highly heterogeneous systems, model implementations must be chosen that are robust enough to enable portability (e.g., across different product models, different sensor set combinations, or different manufacturers). At the same time, with the passage of time, after the industry matures, it should also do a good job of guiding work to formulate a series of standards and norms for cross-company cooperation to further unleash the power of AI.
5
Direction of development
Combining federated learning with domain adaptation can unleash the power of AI in cross-company collaboration. This cross-company AI collaboration can extend beyond traditional supply chains or domains. For example, creating a large ecosystem of cooperative rating organizations. While this vision may be realized in the near future, companies can start learning and using this new technology with trusted partners. At the same time, it is still necessary to develop a model for the distribution of fairness indicators, which is the microeconomic implication of cross-company AI cooperation. Industry managers should identify data partners who can help optimize their performance more holistically, aligning with systems thinking.
AI across companies can also inspire new business models, such as through AI-as-a-service or data-backed by third-party companies. Especially small and medium-sized companies will benefit from leveraging the data resources of other companies. In this regard, service systems engineering can help formulate system principles for designing and developing service system networks based on cross-company AI. A first step in this direction is to systematically understand patterns of value co-creation between stakeholders and resources.
Collaboration across companies leveraging AI will benefit from ongoing research. Current research is also making new attempts to advance federated learning, improving its scalability, robustness, and effectiveness, while enhancing privacy protection and improving model performance. Federated learning with these domain-adaptive capabilities can facilitate exponential growth in the use of AI collaboration across corporate boundaries.
Reference link: https://ift.tt/sHQKeY1
Leifeng Network Leifeng Network
This article is reprinted from: https://www.leiphone.com/category/academic/xCTMAWPbJZHWOays.html
This site is for inclusion only, and the copyright belongs to the original author.