56. The Present and Future of the Metaverse / Software and Hardware

Original link: https://www.lidingzeyu.com/metaverse-ruofei/

In this episode, I chat with Ruofei Du of Google Research about various topics about the Metaverse.

For example, what are the current hardware differences among major manufacturers, and how Apple’s M1/M2 series of chips will help their XR devices. What are the shortcomings of Qualcomm’s XR2 chip?

In the current situation of digital human being very popular, what are the challenges? There are both technical and ethical aspects.

Meta/Facebook is a company that seems to be All in the Metaverse on the market, and even changed the company name for this reason. Will the pioneers of this technology be transformed into the ultimate winners in the market?

Ruo Fei and I also discussed the difference in attitudes towards games between game makers (such as Nintendo, which comes out of Switch) and Internet makers (such as Meta/Apple).

This issue has a wide range of topics. If you want to have follow-up discussions, please leave a message.

Youtube: https://www.youtube.com/watch?v=cflXi-NxyAU

Bilibili: https://www.bilibili.com/video/BV1Pa411D7HY/

Text Version: Scroll down

Audio version Search “Li Ding Chat Room” in your podcast app

for example:

– Apple Podcasts https://liding.page/apple

– Podcast Casts: https://liding.page/pocketcasts

– Overcast: https://liding.page/overcast

– Spotify: https://liding.page/spotify

Guest: Ruofei Du @ Google https://duruofei.com/

My MakeItTalk with Zhou Yang and other collaborators SIGGRAPH paper on generating video animation from a single image https://people.umass.edu/~yangzhou/MakeItTalk/ Github: https://github.com/yzhou359/MakeItTalk

Zhou Yang’s homepage: https://people.umass.edu/~yangzhou/

Li Ding’s job search list: https://www.liding.page/jobs

Chat with me for half an hour: https://www.liding.page/1on1

Production team

* Li Ding Zeyu

* Pengdi Zhang

* Guo Yu

* Lin Xuhui

* Zhang Tianjin

Text version full text

Li Ding: Hello everyone, welcome to the new issue of Li Ding chat room. Today we have invited Du Ruofei to discuss it with you. Hello Ruofei!

Du Ruofei: Hello Li Ding! My name is Ruofei Du, I graduated from the ACM class of Shanghai Jiaotong University, and then I followed Amitabh Varshney at the University of Maryland to do graphics and some HCI projects. Then one of the graduate projects was Geollery. At that time, by climbing Google Street View and OpenStreetMap, the world was reconstructed as a mirror world, and some digital human simulations could have some interactive chat and social interaction in it. Then you can access some social media information in a mirror world, for example, you can see nearby Twitter, or what to eat and play on yelp. Then I am doing scientific research in the Google AR team. Some of my research ideas are mainly to make virtual reality and augmented reality more interactive, and can be applied to daily life instead of just staying on papers. This It is also a big difference between industrial research and academic research. Of course, my research in the last two years has also focused more on how to make useful metaverse technologies. Some of the more representative works include DepthLab At present, the depth perception of a single camera can be realized on more than 5 million ARCore mobile phones, and then we have made a series of interactions based on these depth maps. They are now used in Douyin or Snapchat and some apps such as TeamViewer. . Also I would like to stress that this discussion represents only my personal views and content is then limited to my company-approved public papers and presentations, my personal views have nothing to do with the company’s position.

Li Ding: The introduction just now was very detailed. We can start with one point, because you mentioned that you joined Google to make a useful metaverse. What can you say in more detail? What is useful?

Ruofei Du: What is useful is that especially we look at the papers of the annual virtual reality and augmented reality conferences, many of which are very forward-looking, often focusing on scientific research in the next 30 or even 50 years. I hope that my research can be used by everyone in the past five years. Let me give an example first including our year 2022, our group has been working on developing this kind of real-time translation based on AR glasses, it is such a small demo, we have found that it can greatly improve the communication between people, For example, in a multilingual family, some people speak Chinese, and your child may only speak English. In this case, the older generation and the younger generation cannot communicate with each other. At that time, our director also handed some product prototypes to them, and found that it really allowed the grandparents and grandchildren to communicate and understand better. We also deeply feel that useful augmented reality and virtual reality can bring people a power to strengthen communication and communication.

Li Ding: So in my understanding, as you just said, you are more interested in using this technology in the Metaverse within 5 years. In fact, it is a trade-off between short-term research and long-term research. You may What I want to do now is a short-term technology that can be quickly implemented and really help everyone, right?

Ruofei Du: That’s right, and of course we are generally willing to hire some interns to do some basic research, such as how to render the digital twin, and how to render it is NeRF. These studies may not be directly implemented in the short term, but we believe that this kind of long-term investment will be able to shine in the future, maybe 5 years later.

Li Ding: The asynchronous calculation between mobile phones and AR glass, this paradigm is actually very common, especially now that there are many such as want to do this kind of cloud game, this is similar, but the computer may be in that environment. For less powerful calculations, you need to do some more advanced calculations in the cloud. In AR, glasses have relatively slow computing power and limited power supply, while mobile phones are more reliable. This kind of hybrid computation is indeed true. It is a future model that I feel is very interesting. What effect do you think this has on the metaverse?

Du Ruofei: I think this has a very big impact. In fact, I think one of the main problems that people feel dizzy in the Metaverse is its delay, especially for virtual reality equipment if you want to render a very immersive When the three-dimensional picture is displayed, the computing power of our normal hardware is now insufficient. For example, our latest ARVR chip in 2022 generally uses Qualcomm Snapdragon XR2, and its computing power is just on the same level as the desktop-level AMD graphics card in 2008, so its rendering ability is very low. So sometimes everyone just wants to transmit the rendering signal through cloud rendering or wireless rendering through WiFi, or from adding a physical line to render. However, these three solutions actually have their drawbacks. For example, the delay of cloud rendering can reach 100 milliseconds or even hundreds of milliseconds, and people’s perception of delay can be basically felt if it is higher than 20 milliseconds. It’s not suitable for you, or you can see its flaws. There is still another problem with video rendering. Even if the current chip is used for computing, if the resolution of the screen is extremely high, such as 8K or even 16K, the chip cannot be used within 20 milliseconds, and the video buffer is directly sent to the display. It caused a delay problem, so I think the future is to solve this solution. In fact, hardware companies need to develop their own chips.

Li Ding: The chip you were talking about was Qualcomm’s XR2, right? Then in fact, many of you are in the technology industry. In the past two years, you know that Apple has released the M1. After the M2, the entire mobile phone chip was basically overturned and rebuilt. Everyone now has a new benchmark. I have absolutely no understanding of the XR2’s Qualcomm chip, so he and M2, for example, if I don’t know, and I don’t have any inside information from Apple, if Apple later puts the M2 on their mix reality device, It has the computing power of XR2, or as you just said, there are some limitations that XR2 can’t achieve, can M2 achieve it?

Du Ruofei: My expectation is that M2 will solve this problem to a large extent, because after I have experienced the M1, the computing power of many of its neural networks is even faster than that of my desktop computer. Even the 1080 Ti is much higher.

Li Ding: I saw something like this on Twitter before, for example, using pytorch to run on a 1080 Ti, and then at the same time they used the custom build of pytorch on the MAC M1 to run there, they still thought that the desktop graphics card would be faster, and then I’m curious about the use case you’re talking about. What is the specific situation? M1 will be faster.

Du Ruofei: I may not be referring to pytorch, because pytorch is for M1 and it seems that as far as I know it has not been fully optimized, at least in the first half of this year, the support of pytorch was not very good, and then I feel that if we run TensorFlow The M1 is obviously better than the desktop 1080, I haven’t tried the 3080.

Li Ding: So the optimization ratio of TensorFlow is better in your experience, the optimization of MAC M1.

Du Ruofei: It is much better, even several times.

Li Ding: I understand, and then the topic of M1 and XR2 was discussed. It sounds like if you assume that the M2 mix reality device appears next year or the year after, many of the delays you just mentioned will be automatically resolved, and you think everyone will swarm in the next step. What type of problem did you encounter?

Du Ruofei: I don’t think the problem of delay will be solved in the short term. I believe that the big solution is that M2 will solve the problem of communication between a GPU and a CPU. Many more difficult problems need to be solved, such as how to enhance the field of view. For example, we almost Oculus now, and its field of view is generally difficult to reach the maximum range that the human eye can see, which is about 190 degrees. He may be able to achieve 120 degrees. It is more than 130 degrees, and then in fact, there will be a tunnel effect after people wear it, that is, if you wear it for a long time, you will feel that you are not in that world, and you will feel dizzy or nauseated. There is a problem of resolution. Even on the current consumer-level virtual reality device, I can still see each pixel clearly, because I have been doing graphics for a long time, and I may be very picky about the pixels, and then I personally think that to solve this problem, 16K resolution or even higher may be needed in the future, even 8K is not enough.

Li Ding: Understood, it sounds like in the short term, no matter how powerful Apple’s chip is, it can only partially solve this problem, and there are new ones such as optical lenses made, and then display This kind of technical problem is about the field of view, and there is no way to solve it in the short term.

Du Ruofei: Yes, I am actually most interested in the field of view, because people’s perspective is actually very wide, and its preference usually tells you a lot of information. If you do a lot of preferred rendering, you will feel that you are not well placed in the environment. During a meeting, people often use the corner of their eyes to look at other people to express some information that I can’t directly say, and this kind of information is still difficult in current VR.

Li Ding: We actually talked about the metaverse just now. We discussed a lot about hardware chips, including some other display technologies such as lenses that we may not be familiar with. Next, we can discuss this aspect of software, for example. I feel that you should do a lot of research and productization, and then tell me what you want to expand?

Du Ruofei: Yes, in fact, I have always felt that this metaverse is not a new concept, but from a conceptual level, it is more like a commercial hype in the past two years, or a definition of the next-generation Internet. People generally refer to the metaverse, and generally want to use this concept to break the boundary between virtual and reality. I personally think that this concept currently includes many popular concepts, including virtual reality, augmented reality, mixed reality, blockchain, and decentralization. Tokenization, virtual currency, NFT non-fungible tokens, digital twin, mirror the world, some even define it as web3.0. Personally speaking, it actually includes this kind of social interaction that brings everyone together to break the virtual and reality. In fact, the earliest can be traced back to the BBS in the 1980s, that is, many people will have a nickname, which was first composed of ASCII characters. There are personal user portraits, and then there are picture portraits, and then there are even moving portraits, and even QQ shows. Because I also did forums at the earliest, I was the first to have a forum called Xinghai Bikong, which had more than 5,000 users in its heyday, so I knew very well about doing forums in elementary school, junior high school, and high school. At that time, there was a kind of game in our class, a web game called Jianghu. In fact, it can also be regarded as a small metaverse, because everyone will change their names to various people, and then in it, There will even be its virtual currency system, and with its martial arts sect, everyone can form various small circles, and even bring such small circles to the class. In this way, to a certain extent, it also broke the boundaries of virtual reality, but because the graphics rendering technology was not so advanced at that time, people could not concretize the text in this kind of web game into a three-dimensional mode, but for children For example, this kind of Zhihu that breaks virtual reality may appear in his mind. Later, I think it can be traced back to this kind of MMORPG, and then it actually reached a virtual world that immerses people. You can see that there are some papers dedicated to studying this kind of MMORPG, people who live in it for 4 hours or even 8 hours a day, and then their lives are really blurring the boundaries of virtual reality, it is more and more like The world of Cyberpunk described in Avalanche. Later, it was actually the word Metaverse. The reason why it is so beautiful in the market is mainly due to Roblox. Roblox probably mentioned the word Metaverse many times in its listing prospectus, and then suddenly made this concept irritating. But in fact, if you really want to be immersive and play Roblox games, you will find that it is a combination of our creativity as a child, that is, we build a Lego or even build a map in Red Alert, or the Age of Empires, so that A single-player mode of the game, and traditional MMORPGs, such as World of Warcraft, a mode played by many people, mixing them together, and forming a mode that allows children to create by themselves, and then socialize in it and go to Playing a virtual character is such a scene, so it can attract tens of millions of users, which I did not expect, but this is also an opportunity for the metaverse to flourish. Later, I want to mention that recently Facebook is also all in, or Meta has started the concept of all in metaverse, and also proposed horizon work space. In the metaverse, they want to create a kind of social world, so that people can not only be able to Socialize, and even work. In fact, I personally have always been skeptical about working in the metaverse, or I look forward to working in the metaverse in the future, but the current technological limitations, I don’t know how long it will take us to achieve. For example, I recently read a paper on arxiv, and its title is very interesting, called Quantifying the Effects of Working in VR for One Week, probably they really let some users go to VR, in virtual reality, Went to work for a week, and then finally came to the conclusion that this thing is significantly worse than working with a computer and mouse and keyboard, which is also expected, because our current delay still has users, who are not interested in virtual reality. The sense of adaptation, and this normal effect, which is the acceptance of a new thing, cannot make virtual reality work well and become something we can do every day. But I think it is a very interesting research direction in the future, and an opportunity for everyone to work on how to make Yuanyu’s surroundings useful.

Li Ding: I have several comments, I can look at them and discuss them with you. First of all, I also strongly agree with your point of view, that is, under the current software and hardware, I think it is unrealistic to work in the entire VR, especially the example of the arxiv paper you just mentioned, working in the virtual world Weeks are painful, and the efficiency becomes less that negative argument. If you say you want remote work 30 years ago, the efficiency will also be reduced, because for example, there was no zoom, no Google doc, no such remote code check in, and no whole eco system. I have been thinking about 20 or 30 years this year, and the pandemic came and COVID also stayed at home, but it happened that we had these technologies, so our remote experience would be very good, which led to many companies being fully remote now, which can save money It’s a pros and cons for renting an office. The other one is that I do feel that this technology is not very mature, but on the other hand, whether it is Meta, Apple or other companies, it is better than your company or Roblox. They are indeed recruiting so-called industry players. Elites, right? The PhDs, masters, and undergraduates who work here are the most powerful group of people doing it here. We don’t always say that what you do is not particularly important. As long as you do it with powerful people, you can always Make a point. If you reason according to this logic, will you always make some reliable application scenarios in VRAR?

Du Ruofei: Yes, in fact, this is also some of the directions that our company and Meta have been working on recently. However, when it comes to recruiting very powerful people to do very powerful things, these people are not necessarily recruited to make products, but a large number of people. At this year’s SIGGRAPH, we can also see that they are doing scientific research. For example, a paper recently published by Meta is to use a mobile phone to scan a face, and then render the neural radiation field, and then render the real yourself into a virtual reality scene. This is the job. It is a good illustration of which direction scientific research is working towards now, that is, to create a digital human, as well as such things as digital twins and mirror worlds.

Li Ding: This is true, that is, there are indeed many people in FAIR who are not working on products, but there are also many people who go in and work on their secret projects, and then do it for many years, and some have become Quest 1 or 2 or 3. Me too Forgot, anyway, I made Quest and sold it. Some are still in the secret project, there are all of them.

Du Ruofei: I can still expect everyone to bring some product prototypes to the market as soon as possible, so that the market can verify whether it is mature enough, instead of letting him die in the laboratory, which is also a pity as a user. a little.

Li Ding: You said this is very interesting, because I have a point of view before, I think generally a technology, the first company to commercialize a technology, it may not be successful, and then it may be the second or third place, Instead it succeeded. I don’t know and I can’t predict the future. For example, if you say so many quests like Facebook, they do have some kind of market, this kind of Feedback, but maybe for whatever reason, apple may be more suitable for hardware and better for hardware , and then they were three or four years behind. He launched the first one, the so-called mix reality, and it was a blockbuster. Everyone feels that the speed is very fast. You don’t need to use Qualcomm’s XR2 anymore. You can use M directly. I don’t know how much M is, and it may be M3 or M4 by then, and it will feel very fast. On the contrary, the first mover did not get the advantages he should have, and then when the time comes, such as digging up the people from Meta, it will be completely finished.

Du Ruofei: I agree with your point of view, but I am not selfish personally. I think that these pioneers actually paved the way for the later ones.

Li Ding: Yes, their efforts are not in vain, they are all necessary market research, all of which are explosions, you won’t know if you don’t try.

Du Ruofei: But from my personal point of view, I do think that if the chip operating system can be coupled together, the product will be based on Android, and then build a product made by an ecosystem on it, and then use other products. If you do this, the products you make will come faster, or have a better experience, which is a personal opinion.

Li Ding: The next thing we can talk about. You mentioned it many times just now, such as digital twins, then mirror the word, and then digital human. These are some hot topics recently, including jiaming before the last two issues. , its startup is actually doing, in a sense, this kind of digital people, and then I want to see what you think about this aspect?

Du Ruofei: I have also been paying attention to the field of digital people recently, and then because I was in my own Geollery system, I also went to get it. Many cartoon digital people are called avatar, and then let them live in this digital city. Going back and forth, and then this thing, does not actually have an uncanny valley effect. And now the best digital people should be on the B station, you can search for lab 3d, you can see a lot of lifelike cartoon characters, they will become up masters, and even attract millions of viewers to see their figures people perform. I think this path is feasible, and it will create a different self to a certain extent, but it does not represent your performance in the real world. So another direction is how to render digital humans that are as lifelike as photos. The main difficulty in this direction is that if you are a digital human being that is a little bit different from what everyone will find scary, the so-called Uncanny Valley effect will occur. Moreover, there will be an ethical issue in the 3D reconstruction based on portraits. For example, some people may have the data set from, that is, the data set is biased or covers less people, or your data set is used and you should not use it. people, and then this will lead to the digital people you generate will be more inclined to a certain type of race. For example, there is a recent work that criticized gan paper, which reduced Obama’s portrait into a mosaic, and then restored it back, and found that the skin color had changed. So I personally hope that the direction of digital people can be more and more perfect based on the direction of digital people based on photos. Another way to build a digital human is to build a digital human through voice. In fact, I really like Li Ding’s recent paper, which is MakeItTalk, which should be a paper by SIGGRAPH. Just drive a digital human based on audio, and then I think the future is that when everyone wears VR glasses, you have no way to accurately capture your facial expressions. At this time, it is based on sound to create a digital human. The reconstruction of , will be a very promising future direction.

Li Ding: Yes, our group did this MakeItTalk before, and then my intern Zhou Yang, and now he also joined Adobe to do full time. If the audience wants to do, audio driven, animation, digital human direction doctoral students, welcome to contact Zhou Yang, I remember he seems to be doing research in this direction. Then I want to actually go back to what you just said, uncanny valley. Actually, I don’t know how much the audience pays attention to these metaverse platforms, such as Roblox and horizon. Roblox actually uses a bunch of wooden blocks. It’s just a bunch, without any photo realistic in it, it’s not real at all. Then horizon is similar, it’s all a bunch of bubble heads, we have seen it before, it doesn’t feel like a photo, and that’s precisely because the photo realistic like you said is too difficult to do. Our whole brain, he is very keen to detect whether a thing is a human face, as long as he is a little bit wrong, we feel wrong. Including the recently popular DALL-E, which is also a requirement he gave to early testers before. As long as there is a face, they are not put on the Internet on time, and then it seems to be yesterday or the day before yesterday. They should put this rule on Friday, when we were recording It was the end of June, and then he canceled this rule anyway, saying that even if you use DALL·E 2 to generate a photo, you can also throw it out with a face. I guess their openai team estimated that they did something internally. The legal discussion decided that it should always be exposed anyway, so it is better to let these early testers collect more feedback. But I summed up that I agree with your vision, if it can achieve photo realistic, this kind of digital human is good, but I feel that there is still a long way to go.

Du Ruofei: Yes, in addition, my own research direction for the past two years is also on how to enhance the user’s interaction in this augmented reality or virtual reality, as well as some directions of communication. A good colleague is also working as a digital person, and then includes the relighting transport that Google will post a few years ago, and the recent one is also about the direction of the digital person in charge. If you are interested, you can also contact me, and then we will We are hiring interns every year.

Li Ding: We discussed this digital human aspect just now. I saw that you also wrote some interaction and other aspects in the outline. Do you want to talk about those?

Du Ruofei: Well, it is based on the original universe equipment, and interaction is also an enduring topic of HCI. Because in fact, virtual reality equipment should be the third wave or even the fourth wave from the last century to the present. The earliest VR equipment can be traced back to the use of electronic components in a large room, and then users need to fix their heads on the headset. Such a scene of a helmet, and then look inside to see the future. Then by the 1990s, some lightweight prototypes may be made. In the last wave of doing it through mobile phones, VR was Google cardboard, the earliest such civilian virtual reality device, and this year’s consumer-grade helmet from Meta. In fact, the interaction has been progressing a little bit. From the earliest helmets, it did not have this kind of visual sensing, no sound sensing, and now Meta can even pass the visual sensor, even without the base station of htc. This kind of interaction can sense the operation of people’s hands, and although it has some differences and slight inaccuracies, it can already achieve many interesting applications, such as fitness or playing. In the future, I think the interaction needs to be further improved, because there is a problem with the camera, that is, when the hand is not within the range of the camera, it still cannot be captured. At this time, some sensors are actually needed to capture the movement of the hand in other places. There is also a question, whether it is possible to combine some distant sensors with virtual reality in the future, for example, including eye, EEG, and then various body sensors, and then can directly read your thoughts, and then even Be able to use your idea to draw or write in virtual reality. It should be last year or the year before, and some people used brain computers to help people with disabilities to write, so it can be achieved in the future.

Li Ding: Like what you just said about the camera to capture human hand movements, in fact, this is a flaw. The camera has Line of sights, and it has occlusions and obstructions, so it cannot be seen. Sometimes when I was chatting with my colleagues and friends, I would think about some projects that are more likely, maybe I won’t do, but I feel like discussing, for example, if you see Apple Watch or other such wearables If so, you can actually use it to get some aspects of this signal, I don’t know how accurate it is, but because if you just want to fill the short-term occlusion of the camera, maybe use it to make a difference or With a prediction, is it complimentary? Then go in this direction, which goes back to this Apple, it has a lot of existing things, it may already be able to check you, for example, put a mobile phone in your pocket, you can check the movement of your lower body, and then you have a wrist A watch, he wears an apple thing on his eyes, in fact, it can track a lot of things. This is something I feel that other companies may have to catch up, and I think it is somewhat difficult.

Du Ruofei: I also hope that this kind of multi-sensor fusion in the future can make hand sensing more robust. Then I included a few days ago when I read that interesting paper, that is, by installing a camera on the watch, and then The words that the user is typing can be perceived by observing the movement of the back of the hand, which is a combination of such work, and then in the future fusion may lead to very accurate perception of the hand in all directions.

Li Ding: I’m imagining that you say that there is a camera on the watch, and then there is a camera on the glasses. When I think about whether everyone will be the same in the future, it will be a walking dash cam. If a company can collect all these data, it can build a better global street view.

Du Ruofei: Then I think your vision is very good. I think on the positive side. In fact, after you just finished speaking, I will immediately think of some bad things, including Meta’s recent augmentation of virtual reality. The glasses of reality, and then he sent them to some employees in their headquarters, and some collaborators of CMU and Carnagie Mellon University, and then asked them to take him to collect data, and then if your watch There is a camera on the helmet, then there is a camera on the helmet, and then there is a body camera on the body, and the combination of this data will actually violate all aspects of privacy.

Li Ding: Yes, yes.

Ruofei Du: For example, everyone doesn’t want to be monitored when going to the bathroom, or walking on the street. Anyone can know your exact location. If your camera is connected to the Internet, it is such a security and privacy issue. Before this device is popularized, I think it is also a matter that needs to be solved urgently.

Li Ding: This is definitely a big concern, including mass surveillance. I think this is a root problem. If there are many centralized companies dealing with such companies, they will definitely be used by the government, and then there have been various large and small scandal scandals in every country and every major country. Okay, so we talked about that, and then did you want to talk about Raytracing or AI?

Du Ruofei: Let’s talk about AI. Raytracing can’t talk too much, because I think the current Raytracing is doing a good job at the desktop level, but it is largely constrained by the development of hardware and AI. Raytracing has already begun to use AI to filter.

Li Ding: Yes, he rendered four samples and started to use AI to predict.

Du Ruofei: So there is no room to discuss how Raytracing can be done faster. AI can already guess that it is inseparable and enough. For AI and the Metaverse, I always feel that AI should be a catalyst for the Metaverse, and these two aspects complement each other. On the one hand, AI can help virtual humans to better obtain information in the Metaverse, for example, even in the future. If you are in the metaverse, maybe you still need to watch short videos, and AI can help you recommend it, and then either you need to query some information in the metaverse, or you need to use AI to draw, or To generate a text, you need a large language model, or an image generation model like DALL·E 2 just now, to help you create in the metaverse. On the other hand, the development of AI can also benefit from the development of the metaverse. For example, the autonomous driving industry is currently subject to the fact that we need drivers to drive the car to collect data, but this kind of thing is difficult to cover. In some edge cases, if there are some extreme events, such as suddenly seeing a white wall, or you see an old man staggering, or something like crossing the road, it is a small probability event. But in the metaverse, you can simulate a virtual human and then help the self-driving AI to better perceive such small-probability events. This is one thing I hope that I can do in the future metaverse.

Li Ding: I feel that the artificial intelligence you just said is the catalyst of the metaverse, which is very interesting, and then I thought of a lot before, whether it is a classmate who came to me for an internship or a colleague, I asked why ,我不再接着做,更多这种MakeItTalk follow up 的这种工作了。我现在目前换了一些其他的方向在做,然后这其中的一部分的一个原因,是我觉得给一个audio生成这种表情,这种东西基本上是一个有既定的道路,去解决这个问题的,你就train 更多的data ,这总是可以解决的更好的质量,每年就相当于是一个开始刷榜的一个事情了。我感觉可能更困难的事情是你该说什么,假设你真的进入了一个这种世界里面,你跟所谓的AI engine 人在打交道,现在困难的是我根本在市面上,没有看到任何一款的AI engine,可以跟我好好的对话,所有的所谓automated chat 、chat system,都非常的糟糕,他们根本不聪明,就算所谓苹果Siri,或者Hey Google 都很差都很烂。我感觉就是说在MakeItTalk 之前的那步,可能更多的是是不是NLP,我感觉都在NLP 之前,就等于说是,你得去conversation 这个方面的,怎么把它训练的真的像一个人一样在说话。我当时想了很久,我觉得bottleneck并不是在我animation 这一个部分,而是在脑子那个部分。然后后来我说ok,因为我不是做这方面的研究说,那就得让这方面的专业人士去做多年的研究,等到这个bottleneck解决了,我可能感觉要么我回来做animation这个系列,要么animation反正也被别人已经做完了,就已经早已是一个solved problem 这种感觉了。

杜若飞:你刚才说到NLP 的一点,我其实也不是NLP 领域的专家,但是我这两年在公司工作也是接触了很多大语言模型,也是对这个领域的飞速发展,表示了很大的震惊,比如说这两年的GPT3,他其实已经能够实现机器与人,甚至能够欺骗过部分人实现这种图灵测试的通过,通过了这点也是很可怕的,以至于让我甚至怀疑我们人会说话,究竟是我们的自主的理智和意识,我们在让我们说话还是我们只是在从概率学角度,把词语进行排列组合所说出来的话。但是我现在觉得,目前的大语言模型所生成的人工智能,还是不具有意识的,然后举个例子就是你要问一个大语言模型,你是有意识的人吗还是你是没有意识的人,你把两个悖论同时问给一个大语言模型,他得出来的结论往往是相悖的,或者是它并没有一个上下文的关联的关系。

李丁:最近好像也是Google,还是别的公司叫LaMDA 还是什么的是吗?


李丁:对,是你司出的,然后你现在说的有GPT3,然后有LaMDA 这样子的demo,现在为什么,我现在在网上用到了这种chat APP 还是那么的糟糕,为什么这个还没有进入千家万户? 这个技术,因为你说它可以所谓的pass图灵test 图灵测试对吧,这个意思基本上就是可以骗到一部分人,如果是50% 50% 那就已经成功了。然后我的问题为什么我每天用到的engine 还是那么差呢?

杜若飞:我猜测是过于危险的缘故,就是包括你去openai 测试,就是我们不谈我司可以,我们可以谈其他公司,openai 如果你去测试它的GPT3 的话,你会发现它其实有极小的概率,可能会去泄露一些虚假的姓名或者是地址,然后甚至他的回答在有的时候,可能会驴唇不对马嘴,或者是有一些政治的偏见。因为毕竟这个数据是从互联网的大海中,去得到的数据,对所以说如何过滤信息,使得机器人,不说一些不该说的话是一个亘古的难题我觉得。

李丁:对你说的这点,刚好今天我在看Hacker News 的时候有一个新闻,就是说GPT3 leak my real name,然后那个人就是说他在互联网上,他试图去隐藏自己真实的名字,他就用了一个假的用户名,然后但是他可能at some point,在某个小网站或者某个地方,他把这个用户名跟自己真名联系起来了,那糟糕的是GPT3 学会了。所以当有人问说who is 用户名,然后GPT3 就答出来了, this person is blah blah blah,然后他就很shock。就开始在Hacker News 上面,有很长的一个讨论。对,然后我觉得的确就是像你说的,你没法控制住,他看到什么东西,他讲出什么不该讲的话这样子。

杜若飞:是的,然后这个问题未来怎么解决,我觉得可能解决不了,可能我希望未来的人工智能,是一个服务性的角色,而不是真的要赋予人工智能与意识。如果比如说它能帮助我完成一些日常任务,或者把一些人类的枯燥的任务,比如说一些排列组合的代码任务,它能够帮我们自动写代码补全,然后或者是我们要去点咖啡的时候,它能记住你的喜好自动帮你下单,然后帮你运到你的工位上。就是这样的人工智能、服务性的人工智能,是我觉得未来一个比较安全的方向。而另外一个NLP 最近比较火的方向,是根据一个通用人工智能,然后把它通过小样本数据的提示,使得它变成了一个服务于特定情况的这种,基于特定情形的人工智能,然后这样我可以把他的回答限制为ok,然后咖啡又就这么几种,然后他不会跳出这个scope,然后这样的话可能是一种更安全的方式,而不是直接把一个通用人工智能,然后千家万户大家都来用。里面可能真的会泄露谁的隐私信息。

李丁:你这样说就让我很想到了,就什么sky nets 天网那种感觉,就是要各种给它设限制,否则它就会go rogue,然后就没法控制了。

杜若飞:因为这个事情微软应该在很多年前,在Twitter 上发布过一个聊天机器人,它里面其实带了一个学习就train 的功能,你可以教聊天机器人,当用户问什么问题的时候,他回答什么,然后是不是可能是叫Tay,然后后来这个机器人就被关掉了,而且可能微软也再也不敢发布出来了。

李丁:因为被abuse,好像是像你说的,就有人feed 了很多糟糕的问题和糟糕的答案,然后顿时就等下没有足够的moderation,我感觉moderation in general,都特别难的问题,我最近也在我们的听众群,包括我们制作组团队也在讨论这个问题,一旦上了一定的规模,你怎么去moderate, moderate 的翻译是什么,我也不知道中文是什么了,怎么怎么去moderate,整个社区对审查过于强烈了,反正对去管理整个社区,是非常难的一个事情。然后管理AI 怎么想的,感觉就是更难的问题了,AI 现在就是一堆black box,然后就更难去控制了,所以你刚才说了,这个只给定几个multiple choice 的选择题,让他去选,相比于给一个general AI,我觉得的确可能是从落地角度,可能更能够合法合规。然后公司会在compliance 上面,会通过的一个短期的一条道路,这可能是真的对。好,我们下一part,我们就来聊一聊元宇宙的这些用处,刚才我们涉及到了一些,但是我们可以现在把它汇总一下然后总结再讨论一下。

杜若飞:好的,就是目前来讲,我觉得这种虚拟现实的设备,真的还主要用于游戏中,它可以给玩家一个更加沉浸式的体验,所以说一般如果有人问我就是推荐买什么样的虚拟现实设备,我一般不会推荐HTC Vive这种高端性的设备,而还是会推荐Meta 家的这个quest2,因为毕竟上面的生态已经建立的差不多了,又有足够多的游戏让你去玩和体验,而且肉眼可见的平台会在不断的完善,然后增加新的功能,所以我一直觉得搭建生态也是元宇宙中一个必不可少的一环。这一方面你刚才说苹果,可能通过后发优势,把硬件做得更好一些,但是我还是很钦佩Meta,能够在生态这一方面先发一步,让大家在上面去已经开始用开始玩,然后开发者能够源源不断的,给社区贡献力量。另外一个就是元宇宙社交,这个事情也很有趣,我最近有幸就参加了, UCLA Zhang Yang 的HiLab 实验室的几次实验室的团建,我们都是在VR 里进行的,然后远程参加的,


杜若飞:对,大家会把学生老师然后还有我们这些external collaborate,集中在一个叫rec room 里的一个环境里,然后大家去玩paint ball,就是射台球,然后你可以去而且可以聊天、可以互动,然后一般玩下来一个小时,会真的觉得拉近了大家之间的距离,而且大家可以简单的聊一聊事情。但是确实就是一个小时待下来之后,我即使还是一个经常玩虚拟现实,或者是对3D 不眩晕的人,我也会觉得很累。

李丁:你们这个是在比如说用quest 玩的,然后就在里面你是要走来走去的,还是你坐在椅子上面?



杜若飞:对我的房间没有那么大,另外真的你要走起来玩paint ball 的话,可能你走不远的话被人打死了。



李丁:对于游戏其实刚才我还以为,你会说PS4 VR,因为我之前,有一次玩PS4 VR的机会,他还有一些VR 的游戏,比如说我之前玩那款是蝙蝠侠,然后就要在那种高楼上面走很窄的那种桥,然后时不时就往下掉,然后我记得当时给我的感觉就是我真的很怕,因为他做的效果还真挺好的。那款游戏的那一个环节挺好的,然后你觉得像是比如说像索尼这种传统做游戏加硬件的,这种厂商的话,跟Oculus 相比的话,你觉得哪一个会更,你怎么看?

杜若飞:这是个好问题,我没有提PS VR 主要是因为我没有玩过,这是我少有的几个没有玩过的VR 系统。它上面的生态,应该相比Oculus 来讲,对于硬核玩家会更友善一些,然后因为很多大作是PS 独家的,反正各种感知来讲,我担心就是游戏厂商它相对于互联网厂商对于新技术以及交互的投入可能没有那么透彻,比如说对守护的追踪,还有对这种混合现实的交互,可能他不会花大力气去研究那个东西。而转向就是我如何增强游戏性,然后让更多的玩家,能够感受到PS VR 的魅力,然后以及它可以通过现有的游戏机市场,来拓宽自己的一个细分市场,这我觉得是游戏厂商的一个独到的优势。所以我觉得我也挺希望看到,未来能够多家鼎立,然后不同垂直细分领域的玩家可以选择自己喜欢的平台。

李丁:我觉得总结的很好。游戏厂商他们可能对于纯硬核的这种叫什么交互性的开发,可能并不是特别擅长,但他们对游戏性很厉害,比如说switch 你现在所以但其实技术方面它其实真的很,就没有那么高级,但他很多游戏都非常有游戏性,无论是单人游戏还是多人游戏聚会,感觉我很多都是玩switch,很舒服。而且还有很多出圈的游戏,在疫情当中,这种是在其他的这种互联网的游戏平台里面,互联网公司的游戏平台里面比较少见的。对,我们下一个可以聊一聊元宇宙办公,毕竟大家至少说,像你说的horizon 这个平台,他们也是想要变成, workplace这种感觉的。

杜若飞:我其实也想体验一下元宇宙办公,到底是一个什么样的感受。因为说实话,我自己从来没有带虚拟现实头盔超过一个小时,可能真的是团建是带的时间最长的了,因为大家一般看一个demo真的就是待个最多半个小时就摘下来了,然后会感感觉很酷很炫,但是真是硬核办公的话可能有比如说一些头脑风暴会议或者是一些大的组会,你真的需要开很长时间,然后我个人觉得目前设备和渲染的力度,还无法达到传统视频会议所能达到的那种效果,以及低延迟,然后大家能够捕捉到表情的细微变化,我觉得这还是任重而道远。另外我比较喜欢提到的一点是,谷歌的Starline 系统,大概去年I/O 发布的一个宣传片,就是你可以面对面的去跟一个在高清晰3D 电视渲染的一个人去进行这种远程交互,我觉得是另外一个元宇宙办公的方向,就是大家并没有戴头盔,然后仅限于这种一对一对话,他们应该是在去年这块SIGGRAPH Asia 发了一篇paper,就讲project Starline,然后这种情景下,因为我亲身体验过这个系统,我是能够聊一个小时,而且不觉得累的在这个系统,因为你就会觉得,给你等身大的人很平等的都在你前面,然后在你左右晃动身体的时候,它通过眼部坠动摄像头,可以捕捉到你的视角,然后给你渲染出正确的3D 的形象,就我是真的希望这样的设备,能够至少普及的更普及一些,把它的成本做下来,然后或者能走进千家万户。




杜若飞:我另外比较感兴趣的两点,一个是元宇宙教学,另外一个是元宇宙的旅游业,关于教学的话也是因为最近,我和其他学校的老师也是有一些合作,然后我也很希望能够感受到大家真正用虚拟现实去传授知识,然后并且这种方式因为是身体力行的一种教学,就是你真正的能够尝试自己的身体,是否和元宇宙中的一些老师的身体姿态所吻合,比如说我们要进行体育运动的训练,然后我们进行瑜伽的训练,然后这样的一种场景下,我觉得元宇宙教学还是蛮靠谱的。然后另外一个比较钦佩的老师,就是Ken Perlin在纽约大学的,也是奥斯卡奖获得者,然后他很擅长让学生们在他的课堂上,就直接每个人用上元宇宙,他真的会在里面进行教学。而且他自己开发了一套ChalkTalk 系统,你可以在用一个带LED 灯泡的一个支笔,在屏幕前画画,比如说我画一条鱼,然后它可以变成一条animated fish,然后可以走起来,然后画一个四维的立方体,然后之后可以再转动起来,然后给大家去讲解什么是四维的这样一种概念,就是这样我觉得都是一种元宇宙里独一无二的应用,而且你在传统的屏幕上,你很难去了解一个四维的立方体是什么样的,但是在VR 里你是真实的能看到四方体立方体转起来。然后你再做一个三维的人,然后看这个四维的立方体,是什么样一种感觉,这个有点意思。另外一个元宇宙旅游,也是我很感兴趣的事情,就是我博士期间最后做的数字城市系统,然后我当时我的梦想实现的就是一种,大家可以在谷歌街景里,去做一种自由无缝的旅游,就因为现在的谷歌街景,还包括Matterport 的这种看房的应用,有一个最大的痛点就是用户需要从一个地方去传送到另外一个地方,而不能够在里面进行无缝的衔接,就是我个人的一个对未来技术的愿景,是能够通过这种神经辐射场,或者是一些最新的渲染技术,能够真正把世界连接起来,然后大家就在一个虚拟世界里自由的飞翔。

李丁:刚才说元宇宙旅游,我之前多年前,Snapchat 还很火的时候,我下那个APP 用了一下,然后当时比如说附近它有个map 的功能,比如说年轻人都去参加一个当地的一个event 的时候,我会考虑去不去,因为可能当时感觉坐地铁去好累或者说坐个Uber 去那里,都已经search pricing,也搭不到Uber 了,我说直接打开Snapchat 的map,点一下看了很多实时的照片,视频虽然不是所谓的元宇宙,但是it’s as good as he gets,就是我有音频,我有视频,我也除了闻不到之外,其他我基本上也都能够感受到了,然后我感觉如果连这样子的感官,都已经给我很充足的感受的话,我觉得如果能够实现更下一个级别。像你说的这种元宇宙,更加沉浸式的这种旅游,肯定是有一定市场的,特别在现在疫情管控情况下,就是国外管控或者国内管控的情况下,这种旅游是不用physically去的,是很好的。

杜若飞:对的,其实最早你说的地图上显示图片的功能,最早是有一家叫panoramio 的公司,在十多年前就已经做了这么一个商业化的公司,然后后来被谷歌买了之后就不知所踪了,我也是很遗憾,就是很多很好的点子到最后都陨落了。




杜若飞:谢谢李丁邀请我来到他的聊天室,我非常开心能够和听众们分享,我对元宇宙的一些看法,然后我们组其实我个人是做偏人机交互,或者是搭建交互式系统的一个方向,然后我常年会招收AR 和communication相关的学生,然后这个名额比较充裕,然后但是如果有人对depth 或者3D 比较感兴趣的话,那个我一般一年也会有一个名额去做一些side project,然后我们组我会经常会和一些CVPR或者SIGGRAPH 的一些研究员合作,然后他们基本上主攻数字人,或者是一些hardcore,neural rendering 的方向,然后我也可以把你引荐给他们,欢迎和我联系。

李丁: 我会把若飞的联系方式放在下方,可以跟若飞去联系,我们这期非常感谢若飞来跟大家分享,他对元宇宙的一些看法,我们下期再见拜拜。


本文转自: https://www.lidingzeyu.com/metaverse-ruofei/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment

Your email address will not be published.