Author | Li Mei
Editor | Chen Caixian
“Let’s dig out some more features.”
“See if there is any chance of overtaking.”
“Finally found the hidden bug.”
“Submitted successfully!”
In this extreme challenge that lasted for 60 hours, 24 programmers sometimes stared at the screen and held their foreheads in meditation, and sometimes walked quickly on the keyboard with their hands, cheering each other up with their teammates, and jointly using trusted AI technology to complete the qualitative and fraudulent reporting of fraud. tasks such as transaction identification.
This is the offline final of last year’s ATEC Technology Classic. Every moment of excitement or anxiety of the contestants was truly recorded, forming a popular programmer-exclusive reality show “Burn!” Genius Programmer”.
Caption: “Burn!” Genius Programmer 2″ program screen
This year, under the guidance of the Chinese Association for Artificial Intelligence, the ATEC Frontier Science and Technology Exploration Community launched the third ATEC competition. Tsinghua University, Xi’an Jiaotong University, Zhejiang University, Shanghai Jiaotong University, and Ant Group jointly participated in the proposition and organization of the competition. With the theme of “Science and Technology Help Reality”, the competition simulated the real scene of enterprise digitalization, and set up two tracks of digital operation and digital security. Technologists will let AI get out of the ivory tower by solving real problems, and give full play to the power of technology for good. In this passionate competition, the champion team that finally stands out will receive a bonus of 1 million yuan.
–1–
the power of technology
The most cutting-edge and most valuable technology is the first dazzling label of the ATEC competition.
At present, industrial digitalization has become an irreversible wave, and the use of digital technology to achieve transformation and upgrading has become an important issue facing enterprises. However, after all, science and technology is a very big proposition. In the tide of industrial digitalization, what kind of technology is more powerful?
There is no doubt that technologies such as privacy computing, graph intelligence, and intelligent recommendation have become powerful tools that cannot be bypassed in the process of enterprise digitalization. “Whether it is for ants or for more small and medium-sized enterprises, they are very important technical propositions, which everyone feels very deeply.” Zhang Zhiqiang, senior algorithm expert of Ant Group and technical director of graph learning, is this year’s ATEC 2022 Science and Technology Elite Competition The person in charge of the competition group, he told AI Technology Review.
Therefore, this year’s ATEC Science and Technology Elite Competition aimed at two specific application scenarios of coupon distribution and risk merchant identification .
The distribution of consumer coupons or shopping coupons is an important channel for small and medium-sized enterprises to improve profitability and efficiency in digital operations.
In the spring of 2020, when the peak of the first wave of the epidemic in China had just fallen, the government invested a lot of money in distributing consumer coupons to users through Alipay or other platforms in order to stimulate economic recovery. Merchants offer product discounts of different magnitudes, and the Alipay platform predicts user preferences through the ability of big data, and distributes coupons to users accurately. To this day, Ant is still running this project, and the algorithms in it are constantly iterating.
Such a scenario first involves the concept of traffic distribution, so recommendation-related technologies need to be used. The recommendation system, search engine, and advertising are listed as the three largest technical directions of AI in the industry, and it undoubtedly has a huge technical force in the digitalization of the industry.
At the same time, the distribution of coupons will also generate a lot of unstructured graph data. For example, if a user clicks to receive or cancel a coupon, a relationship between the user and the coupon is formed, and this relationship can be represented by a graph. In addition, the sequence of user historical behavior, the relationship between users, etc. can also be expressed as graph data. The uniqueness of graph data lies in the fact that the samples are not independent and identically distributed. It is necessary to use graph learning techniques such as graph neural networks to express, understand or abstract graph data, and then realize intelligent modeling on this basis.
Therefore, in the recommendation scenario, using graph learning technology can improve the efficiency and accuracy of coupon distribution, and ultimately improve the digital operation capabilities of small and medium businesses. In fact, it is not common to introduce graph learning technology in recommendation-related competitions. The features involved in conventional recommendation competitions mainly include user characteristics, consumption coupon characteristics and exposure click characteristics, and this ATEC competition also added two additional Graph data source.
“Graph learning has great potential for implementation. We hope that players can get more training in graph learning.” This is the starting point of Zhao Qian, an algorithm engineer of the graph learning technology department of Ant Group, when he participated in the question. Cutting-edge technologies come from the industry, and will eventually go to the industry. Only rich and diverse application scenarios can open up a larger space for technology implementation for graph learning. This time the consumer coupon distribution competition is one of the typical scenarios.
Not only recommendation, risk control is also an important landing scene of graph learning.
When enterprises enter the digital world, in addition to improving their digital operation capabilities, they also need to minimize security risks. For example, fraud, black and gray merchants, or the credit and business risks of small and micro enterprises in the financial field will endanger the entire digital ecology. Therefore, this year’s ATEC Technology Classic also set up a risk merchant identification track. This question also has a lot of practical experience in Ant’s online merchant banking business.
MYbank provides financial services to small and micro groups. For example, Taobao merchants may have to wait up to 14 days to receive payment from users after sending out products, and small and micro enterprises may therefore face the danger of breaking their capital chain. Therefore, MYbank has launched a delivery loan. As long as the merchant actually delivers the goods, it will promptly relieve the merchant’s pressure on the tight flow of funds in the transaction based on the true value of the goods.
However, some unscrupulous merchants will take the opportunity to take advantage of loopholes, such as using methods such as swiping orders, false shipments, and forged transactions to defraud loans. In order to safeguard the rights and interests of other normal merchants and the healthy operation of the entire digital ecosystem, we need to identify and eliminate risky merchants.
Graph learning technology comes into play here. Transactions between merchants and users can be represented by graph data. If a merchant implements a bad transaction, we can find anomalies in the transaction graph pattern and then deal with them. .
In the task of identifying black and gray merchants, privacy computing is also a great technical weapon.
Behind privacy protection is actually a contradiction about “data”. On the one hand, as AI research and development turns to be data-centric, people are increasingly aware of the huge value of data as a production factor. In order to maximize the release of data value, data needs to be used jointly. But on the other hand, each subject of data production does not want to reveal their data privacy in the process of data collaboration.
Zhang Zhiqiang gave us an example of risk merchant identification: in many cases, a black and gray production group may involve multiple platforms, and their data may include different types and be scattered in different computing nodes. We need to implement an efficient The cooperation of multiple parties will combine data from multiple parties to more completely describe the whole picture of black and gray production gangs, so as to maximize the efficiency of identification. In this process, it is necessary to use federated learning technology to improve the identification accuracy of risky merchants by combining transaction information from multiple parties under the premise of protecting merchant data privacy.
Essentially, the technology behind the two tracks revolves around data capabilities. “Graph intelligence is for characterization and modeling of a specific type of data, while privacy computing is for solving the problem of data islands. The two data capabilities are complementary,” Zhang Zhiqiang explained.
Therefore, through the ATEC competition, what the contestants can gain is not only the experience of solving two competition problems, but also an exploration of the most cutting-edge technology in the current industry.
–2–
More realistic industrial scene proposition
The strength of AI technicians can only be seen by going deep into the jungle of industrial scenes.
“If there is no large-scale and influential real scene, it is actually difficult to produce heavyweight technology.” Zhang Zhiqiang firmly believes that the real scene can drive technicians to think about what kind of technical direction is more valuable. Therefore, restoring the real industrial scene as much as possible runs through the proposition of this ATEC competition.
From the perspective of the competition group, this is exactly what is lacking in many competitions in the computer field. Taking graph learning as an example, most of the existing related competitions are for academic environments, using public or structured data sets, such as the huge graph data formed by the academic literature network, but rarely highlighting graph learning in real industrial propositions technical value.
For AI to truly go out of the laboratory, scenarios and talents are indispensable.
This is why ATEC explores the integration model of industry, university and research. A large part of the annual contestants are college students, most of whom lack the platform to put research theory into practice. For example, privacy computing is an emerging hot topic in recent years. Although many colleges and universities have opened related courses, there are not many related competitions. Public academic papers are the main window for them to understand this technology, but there is usually a big gap between the data sets provided by the papers and their evaluation methods and the real technical application scenarios.
What is learned on paper is always shallow, and Liu Yu, who has just graduated, has a deep understanding of this. He explained that unlike public datasets in academic environments, the test set data in industrial scenarios is completely invisible. We can only develop models on the training set, and then feed the test set data to the model for evaluation. This difference will cause students to not understand why the temporal features on the test set are not available, and may ignore features such as feature crossing or feature leakage, and mistakenly add leaked features to the model. But in practice, these features cannot be used in real scenarios.
In addition, the threshold for internships in enterprises is high and the opportunities are few. In fact, there are not many jobs nationwide for directions such as privacy computing. There are a large number of technical application scenarios inside Ant. With the help of such resources, ATEC provides everyone with a proving ground that is closest to the real scene, and opens the door to all those who are passionate about technology. This is why ATEC has attracted thousands of people in the past two years. The reason for the technical youth to participate in the competition.
In the first ATEC Science and Technology Elite Competition in 2020, the competition set up a real environmental protection proposition. Participants trained AI models to intelligently identify endangered wild species, and launched a digital offensive and defensive game with poachers. One of the exam questions in the second year, “Internet Fraud Transaction Identification”, comes from the real business scenario of Alipay. The contestants use the data set about digital currency that has been modified and desensitized to discuss the joint calculation and analysis under the data protection scenario. A technical practice of privacy protection.
So, where is the reality of this year’s ATEC Technology Classic?
First of all, the design of the competition questions presents the actual appearance of the problems faced by enterprises in the process of digitalization.
Taking Track 1 as an example, the actual coupon distribution is different from general recommendation tasks. Most general recommendation tasks only care about a certain overall ranking index, or some ranking indexes surrounding users. However, the actual voucher issuance scenario involves multiple parties, including the platform, merchants, and users. Some indicators on the merchant side must also be taken into consideration. Some long-tail and disadvantaged small and micro enterprises with lower exposure and fewer customers also hope that Users can use their vouchers to purchase products. This requires us to improve the accuracy of the click ranking prediction of each coupon, to ensure that both top merchants and long-tail merchants can get a better click rate and write-off rate, so that small and micro enterprises can also take the express train of the digital era , to achieve cost reduction and efficiency increase.
Therefore, when designing the recommendation system, the contestants must improve the two indicators at the same time, not only to ensure the user experience, but also to distribute the coupons of small and medium businesses more accurately, so as to get a good ranking in the list.
In order to allow players to use their technical imagination more freely, ATEC also provides an underlying data environment that is very close to industrial scenarios while ensuring data security.
For example, in the coupon distribution scenario, on the premise of strictly desensitizing the data and ensuring data privacy, ATEC opens up real data such as user behavior, user relationship, and coupon knowledge graph to the contestants.
“If I were a contestant, I would hope that the data source of the question would not be fixed too rigidly, and it would be better to be in the original state.” Zhang Zhiqiang has won awards in many competitions and is also an experienced question maker. He understands the contestants very well I hope that there is no highest ceiling in the game, only a higher mentality.
Therefore, under the premise of data security, the competition provides some desensitized real transaction record information of users and merchants, instead of only giving highly abstract artificial features. If the experts do the feature extraction behind the scenes in advance, then the contestants only need to do some model combination work, but ATEC hopes that the contestants can spend enough energy to understand the data itself and do information extraction and construction in the simulated industrial environment. mold.
In addition, the ATEC competition is a technical competition after all, so the competition group must also consider a difficulty when narrowing the gap between the competition topic and the real industrial problem, that is, the simplification of the competition topic to the complex real environment to a reasonable extent. It is also one of the most deliberated and debated aspects within the competition team. They are all on the front line of technology application, and they know what powerful models and baselines are currently available in the industry, so they spent a lot of time testing different models, and based on this, set the difficulty of the competition that can widen the score gap , in the selection of each data source, they must ensure that the scores of the players can be differentiated on the list.
In short, on the road of using technology to solve digital economic problems, young technicians can only understand the logic behind the technology and improve their technical capabilities only by visiting real industrial scenes. Before entering the actual combat, ATEC is undoubtedly a rare opportunity for rehearsal.
——3——
The Birth of a Tech Icon
From “Wildlife Protection” in 2020 to “Technology Anti-Fraud” in 2021, a group of young programmers who firmly believe that technology can change the world have written lines of code on the field to overcome hot social propositions. We have also seen the unique ethos of this group of people: loyalty to logic, obsession with efficiency, and love for solving puzzles.
This year, under the proposition of “Science and Technology Assist Reality”, another group of young people are rushing to the technological arena. This year’s ATEC Science and Technology Elite Competition is divided into two stages: online competition and offline competition.
The online competition is currently in progress. The players train the model and evaluate the effect of the model on the platform provided by Alipay. Scoring adopts the form of test list (A list) + final list (B list), and the final ranking and award evaluation of each track is based on the results of the B list data set. During the competition, the leaderboard shows the results of the A list, and each team can submit the results 3 times per natural day.
In the end, the top 20% of the teams in the online competition and above the score line will share a prize pool of 300,000 yuan equally. In the post-competition defense, the top 8 contestants of each track will be selected to share a prize pool of 160,000 yuan.
The offline finals will be held from March 3rd to 5th, 2023. During the 48-hour time limit, the contestants will engage in multiple rounds of intense confrontation based on a number of public and hidden levels in the real scene simulation, competing for the championship team and winning millions of prizes.
It is worth mentioning that this year’s competition also specially set up two “women’s special awards” (10,000 yuan bonus) , hoping that more female programmers will appear on this stage.
Letting the power of technology be seen and the voices of programmers heard is the original intention of the ATEC competition. So this year, the offline competition will continue to be presented in the reality show, showing the interesting souls of programmers in different poses and with different expressions to the public.
Caption: The champion team of the 2nd ATEC Science and Technology Elite Competition
As the first programmer reality show variety show in China, the previous two seasons of “Burning!” “Genius Programmer” successfully broke the circle after it was broadcast. Under the real camera record, we have witnessed the birth of many technical “top streamers”, such as Xiao Dao, Belly Hei, Guo Daya, Li Jinchao, Zeng Zhaoyang, etc. Their blood, wisdom, and humility have infected countless people with technological ideals. people.
ATEC has prepared the most authentic technical test, extremely attractive bonuses and a unique opportunity to be “seen”, just waiting for you with a technical soul to start a wave-breaking journey. (Public account: Leifeng.com)
ATEC competition website: https://ift.tt/gQSsZlP
(Public account: Leifeng.com)
This article is transferred from:https://www.leiphone.com/category/ai/JNJhPL17DVbq0wMH.html
This site is only for collection, and the copyright belongs to the original author.