Human-machine voice interaction service provider “Yizhi Intelligent” completed Series B financing of over 100 million yuan

Visit the original URL

36氪 was informed that the human-machine voice interaction service provider “Yizhi Intelligence” has completed a Series B financing of over 100 million yuan, which was jointly invested by Kaitai Capital, Yealink Kaitai and CITIC Securities Investment. It is reported that this round of financing funds will be used for algorithm development, product upgrades, team building and new business development.

Yizhi Intelligence is a company reported by 36氪. It was established in 2017. The founding team is from the Artificial Intelligence Research Institute of Zhejiang University. It is a SaaS service provider focusing on human-computer voice interaction technology. Enterprises reach users and realize refined and intelligent operations. With its self-developed three core human-computer interaction algorithms of speech recognition, semantic understanding and speech synthesis , the company provides AI scenario-based intelligent services for retail, life services and other pan-consumer industries, as well as public services. According to Yizhi Intelligence, the company has successively won the Zhejiang Provincial High-tech Enterprise Research and Development Center, the Hangzhou Leading Innovation Team, and the Chinese Artificial Intelligence Society Speech Dialogue and Hearing Professional Committee unit.

In recent years, in the wave of AI industry landing, intelligent voice interaction includes ASR (speech recognition), NLP (natural language processing), TTS (text-to-speech conversion) and many other highly difficult AI technologies due to its interdisciplinary attributes. There is a high threshold for application landing. From the perspective of scene requirements, in commercial activities and government affairs, personnel investigation, information collection, directional notification, and government affairs consultation with WeChat and telephone as the main communication carriers are often the most time-consuming and labor-intensive parts. The government’s epidemic prevention reminder calls, as small as the new notice of consumer brand promotion, and investment in intelligent voice interaction technology have also become one of the trends for government services to reduce costs and increase efficiency. Therefore, how to quickly adapt to the call scene and improve the conversation experience of AI voice has become the primary test for human-computer interaction service providers in the industry to optimize their technology.

Chen Zheqian, founder and CEO of Yizhi Intelligence, compared the entrepreneurial story to the scientific research-based entrepreneurial road of “finding a nail with a hammer” when introducing the team’s track selection. At that time, the founding team of the company, who studied at the Artificial Intelligence Research Institute of Zhejiang University, has obtained rich research results in the field of human-computer intelligent interaction, and has won many international NLP competitions on behalf of Zhejiang University. Similar to other teams with scientific research genes in the AI ​​​​track, Yizhizhi’s entrepreneurial method is also to find a landing scene with business prospects with strong technology. After nearly 3 years, after investigating many industries where machines can replace human work, the Yizhi team finally positioned the product in the high-frequency human-computer interaction carrier – telephone, and then used the accumulated AI voice technology for intelligent external Called to find the application scenario, which determined the commercialization direction of the company’s core technology. Also because of the support of core algorithms, computing power and data, the subsequent development of the company’s business anchors the fields of consumer advertising and public services that require more two-way interaction, and it has entered the development stage of “opening a road in the mountains and crossing the bridges in the water”. .

Compared with Baiying Technology, NetEase Qiyu, Wisdom Tooth Customer Service and other companies in the industry that also focus on AI voice robots and intelligent voice tracks, Chen Zheqian believes that the main difference of Yizhi Intelligence lies in positioning the company as a technology research and development service provider, and through human-machine Continuous iteration of dialogue technology to provide intelligent services for members of Shenzhen consumer brands. Starting from this position, Yizhi Intelligence has made the following upgrades in core technologies in recent years:

·ASR: In the process of speech recognition and extraction, based on the framework of webRTCNS (noise reduction processing code), Yizhi Intelligence performs two-way optimization on ambient sound noise reduction and human voice enhancement to achieve dual-mode parallelism. At the same time, the company uses the MFCC+resCNN extraction algorithm scheme to refine the perception granularity of the robot in gender, age and emotion recognition.

In the E2E (end-to-end) model of general scene recognition, Yizhi Intelligence has made a scene-based upgrade on the basis of the traditional speech recognition architecture AED (Attention-Encoder-Decoder based on the attention mechanism encoding and decoding model). The Context-Aware Encoder algorithm based on the context text customization enhancement module uses additional supplementary scene text as reinforcement information during the model training process, so that the model can establish a reinforcement architecture for specific input and improve the speech recognition rate in specific vertical scenes.

Reflected in the scene application, this technology can filter the environmental noise interference in communication, and enable the robot to quickly identify the user’s age, gender and emotion, and select the most suitable dialogue content according to the current situation.

NLP: Yizhi Intelligence has recently upgraded the new generation of dialogue architecture NLP2.0. In addition to realizing common NLP tasks such as intent recognition, emotion recognition, question and answer recognition, task dialogue, intelligent error correction, and knowledge graphs, the new dialogue architecture also proposes a special large-scale pre-training language model in the field of pan-consumption.” EAZI”——Based on the Transformer architecture, based on linguistic knowledge and a large amount of vertical data, NLP algorithms are used to drive semantic understanding, improve model architecture and pre-training strategies, and conduct dialogues with the company’s accumulated consumption scenarios for a large amount of consumption field information special training.

Reflected in the application of scenarios, this innovation solves the difficulty of extracting entity information such as time, address, and organization name in outbound calls such as questionnaire surveys and user satisfaction surveys. , you can also quickly find the corresponding answer.

·Fastspeech series of speech synthesis algorithms : Zhejiang University Yizhi Artificial Intelligence Joint Research Center and Microsoft jointly launched the FastSpeech1 and FastSpeech2 two-generation algorithm series, which integrate speech synthesis, emotion synthesis and speech cloning algorithms, as well as the corresponding tone particle reply generation algorithm.

According to Yizhi Intelligence, this algorithm series is 38 times and 260 times faster than Google’s two generations of tacotron algorithms in terms of the same effect. Reflected in the scene application, the robot can realistically simulate human emotions in communication, and can respond in time after recognizing the emotions of the dialogue party.


A knowledge of the smart industry knowhow precipitation

Regarding the company’s business model and application, Yizhi Intelligent CFO Zhang Lei introduced that the company’s service scenarios mainly include: member activation, invitation to add WeChat private domain, birthday & member festival key moment creation, big promotion node event notification, public security anti-telephone fraud , epidemic return visit notice, bank overdue reminder, etc. At present, it has provided AI voice services for more than 300 consumer brands and more than 100 municipal public security bureaus. In the field of pan-consumption, the company has reached cooperation with consumer brands such as Estee Lauder, Winona, Dr. Cheese, By-Health, and the main demand side is the e-commerce, user growth and marketing departments of consumer brands. The core products are recharged quarterly/yearly based on advertising logic, settled by CPA, and charged for successful contact. The unit price of annual cooperation customers for medium and large customers ranges from 500,000 to 1,000,000 yuan.

In terms of team, Yizhi Intelligence currently has more than 200 employees, and the core founding team is from the Institute of Artificial Intelligence of Zhejiang University. The company established Zhejiang University Yizhi Artificial Intelligence Joint Research Center in February 2019, focusing on multi-modal human-computer interaction to carry out the combination of production, education and research. Carry out joint research and development, and carry out corresponding engineering implementation.

After this round of financing, the company will promote overseas business layout, promote self-research and commercial implementation of cross-language human-machine dialogue technology, and build a global-oriented intelligent voice interaction SaaS platform.

From the perspective of market development, according to the data of Whale Zhun Research Institute, the current domestic call center market size is about 10 billion, but the stock market that the intelligent voice track mainly enters is always a state of coexistence of opportunities and challenges. On the one hand, the current consumption With the deepening of the intelligent reform of public life services, human-machine voice interaction services have become a powerful tool for consumer enterprises and government departments to improve efficiency. Strict regulation has also narrowed the market size within the framework of data privacy security. At present, the AI ​​voice companies in the track, in addition to the vertical service providers with AI voice call algorithm as the core of their business, like Yizhi Intelligence, there are also large Internet companies deploying intelligent cloud customer service systems, such as NetEase Qiyu, JD Yanxi, and more. Companies with private domain marketing to try new technologies of human-computer interaction. Smart outbound companies with the same underlying technology and open source architecture, how to differentiate themselves on the track, need to find answers at the dual level of technology and customer scenarios. For the domestic market that is gradually becoming saturated, the business model of overseas services is available. It has also become one of the new opportunities. In addition, while continuously expanding the boundaries of service scenarios, how to solve the accompanying hidden dangers of information privacy and security is also a problem that enterprises in the track need to think deeply about in strategy optimization.

media coverage

36Kr Venture State Investment China Network
Related events

This article is reprinted from:
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment

Your email address will not be published.