Three technical difficulties with Apple Vision Pro

Original link: https://www.skyue.com/23060822.html

I have never used VR and AR equipment, and I don’t have a deep understanding of related technologies. My understanding of immersive experience stays at Shanghai Disney’s Flying Over the Horizon. But after reading some publicity and analysis about Apple Vision Pro on the Internet in recent days, I really feel that the future has come.

The three problems mentioned below are also a summary of the relevant content I have read recently. During the period, I also fully realized the difficulty of this matter. No wonder the wave of virtual reality has gone through several rounds, but it has never been really popularized. Hope Apple can take off with the industry.

1. How to realize the physical feedback of spatial interaction?

We often see scenes of “spatial gesture interaction” on holographic projections (different from mobile phone “plane gestures”) in sci-fi movies, but if you think about it carefully, virtual space gesture interaction has no physical feedback (touch, resistance, vibration, etc.) etc.), which is counter-intuitive and a poor experience. Although the Nintendo Switch is not a 3D device, some games that require physical movements will have vibration feedback on the handle. Anyone who has played it should understand how important this vibration feedback is to the gaming experience. It is the same reason to interact in 3D space.

Apple Vision Pro failed to solve the feedback problem of spatial gesture interaction. It found another way to minimize gesture interaction and make the only gesture interaction as intuitive as possible.

At present, many scenes demonstrated by Apple Vision Pro are still flat interactions, such as browsing the web, viewing photos, Facetime calls, etc., similar to the iPhone, except that this platform hovers in the virtual space.

Plane interaction is nothing more than selection, clicking and dragging. Apple Vision Pro completely entrusts the selection operation to the eyes. Wherever you see it, you will be selected, without relying on gestures. Then click and drag, without raising your hand to face the interface, you only need some slight movements of your fingers, such as pinching with two fingers is to click to confirm, and swiping left and right or up and down is to slide, just like operating the mobile phone interface. At the same time, there is no requirement for the position of the hand in the movement of the fingers. If you are sitting, you can put it on your knees. These interactions on the fingers are more intuitive.

The realization principle of finger interaction is not complicated. It is to capture the user’s gestures through the cameras all around the glasses. These cameras have a very wide field of view, so your hands can be placed anywhere.

Of course, if you need to enter text without voice, you can only tap on the air on the virtual keyboard. A virtual keyboard might provide audible and visual feedback (such as buttons lighting up) when you tap it, but it certainly won’t provide the tactile feel of a physical keyboard.

If it involves complex interactions on the 3D interface, such as putting together 3D building blocks, you probably still have to raise your hand to operate against the air. There is no other way.

In the long run, it is not impossible for people to fully adapt to interaction without feedback, but it is not the best form of virtual reality. After all, in reality, the sense of touch is very important and cannot be lost.

A more likely development is that, like the three-body game, there will be peripherals such as gloves and clothes in the future to simulate tactile feedback.

2. How to model the user himself?

Apple Vision Pro is fully enclosed, with two 4K screens inside, and the external environment seen by both eyes is the image captured by the camera and modeled and rendered in real time on the screen, because the analysis rate is high enough (4K) and the rendering speed is fast enough (it is said 12ms), so it seems as if the naked eye directly sees the external environment.

Because the cameras are mainly concentrated around the glasses, they can only model the environment in front of the user, but cannot model the environment behind the user and behind him . If it is only used by one person alone, there is no problem, because the field of view of the glasses is close to that of the naked eye, which is in line with intuition.

But if two users use glasses in different places and enter the same virtual space, there will be problems. Two people need to see each other’s whole body. Imagine a scene: you are trying on clothes, and then let a friend out of town to see the effect. Apple Vision Pro doesn’t seem to be able to do anything about it.

At Apple’s press conference, Facetime calls were demonstrated. It is still a flat video, but it is projected into space, which is no different from the iPhone’s flat video. It’s not even as good as the iPhone’s flat video. Because the flat video of the iPhone is directly shot by the front camera, and because Apple Vision Pro cannot shoot the face, it virtualizes the image of the user through AI modeling. Apple calls this virtual portrait modeling technology “EyeSight.”

In my opinion, Apple Vision Pro cannot model the user itself, especially the facial expression of the user, so that any scene that requires face-to-face interaction between multiple users cannot be realistically restored, which greatly restricts the imagination of its application scenarios.

Similar to Facetime, another example is the meeting scene. Colleagues who work remotely gather in a virtual meeting room for a meeting. In this meeting room, there can be a common whiteboard and a round table. If it is realistic enough, especially facial expressions, it will not be difficult to obtain the same experience as an offline face-to-face meeting.

Of course, 3D scenes without outsiders are 100% competent. The simplest is to “watch” 3D movies, and complex points with interaction, such as:

  • Lego should immediately develop a building block game for it, which can get the same experience as playing Lego in the real world.
  • All kinds of 3D design software must catch up quickly. Architects no longer need to drag the viewing angle on the flat-panel display, but can directly design in the 3D scene.

3. Volume and weight?

There is nothing to elaborate on this. The choice of Apple Vision Pro is to separate the battery. Less elegant, but there’s no better way.

Of course, there are other technical difficulties, such as real-time high-fidelity modeling and rendering of the external environment. This point, thanks to Apple’s excellent hardware strength (especially chip development strength) and the route selection of software and hardware integration, coupled with the stacking of materials at any cost, has been well resolved. At least it is the best among all kinds of virtual reality devices.

In terms of the existing functional experience, if it is light enough and cheap enough, it is estimated to sell well. This also shows that the current functional experience has reached the consumer level. With the advancement of technology, the size, weight and price will gradually come down, so it may not be too far before this thing becomes a mass consumer product.

Ref

I’ve read a lot, but the three that impressed me the most are the following:

This article is transferred from: https://www.skyue.com/23060822.html
This site is only for collection, and the copyright belongs to the original author.