ChatGPT’s desktop app has been updated to work directly with the app. Is this some new direction? Hello everyone and welcome to the Old Van Storytelling YouTube channel. Today we’re going to talk about an update to the desktop version of ChatGPT that was released on November 15th. Note that this update is not for the website, but for the desktop, which means you have to have Windows or MacOS to use this version.
MacOS will always be ahead of Windows in this one because the MacOS operating system is actually relatively simple. Why? Because of the simplicity of MacOS’s hardware, it doesn’t have as many compatibility issues, so its entire operating system, or the entire operating system environment, is relatively a little bit more homogeneous, and Windows, now, is keeping up, but all of the features of Windows are lagging a little bit behind for the desktop version of ChatGPT.
What about on Windows now, there are already apps, and you can upload files, take photos, take screenshots, do advanced voice, all of this can be done. macOS is all of these features have been there for a long time. What’s the addition this time? This addition is called integration with applications. So with what kind of application combination? Not all applications can be Oh, the main thing is to work with programming tools.
Xcode is this programming tool from Apple’s own house, and then a text editor. A lot of people who write programs don’t look at IDEs, which are called integrated development environments, and they’ll just turn on a text editor and do their work. Then there is VSCode, which is the most commonly used IDE, made by Microsoft. Terminal, Terminal is called the terminal, Linux or MacOS, there are a lot of operations are done in the terminal, so you can also be directly combined with the terminal. There is also a more advanced terminal program called ITerm2, which is also supported.
Right now, it’s all about supporting those 5 apps and nothing else. So is this going to copy Github Copilot? Is it necessary to spend $10 a month to subscribe to Github Copilot? I quickly went to test it, first test the first step, check the ladder, because then, our website up there is actually still relatively easy.
But then, you use his desktop client, and sometimes you need to make some special settings for the ladder. If you don’t have this problem yourself, this piece can be crossed over. And then after that, it’s updating your desktop client, which is this app from ChatGPT on macOS. Of course, the amazing thing about this one is that he actually does this update manually, which is going to be a slight gripe. Why? Because when we used to make apps, one of the key metrics was when you came out with each new version, did the users update. Because your new version of the update is often to change the advertiser. If users don’t follow the update, you won’t be able to make money once the update is made. If your advertisers change and your users don’t update, it’s not going to work. Maybe the previous advertiser’s ad service agreement has expired and you have to renew it, this is a very important indicator.
But ChatGPT has to be updated manually, it doesn’t say, “Oops, I found a new version, let’s update it”. Anyway, he didn’t warn me, so I updated it manually. After updating, you need to tap on the settings, which is at the bottom left corner of the app, where there is a person’s head. Tap it, and then you need to allow the use of the application, and the name is something like this, “Allow the use of the application”. The name is “Allow Apps”. All I see is in Chinese because I’ve switched my desktop version of ChatGPT to Chinese.
Then there is the authorization. For a program that is going to go and manipulate your computer, all operating systems are going to have strict management, because if you don’t care, this is going to come out with a virus that could potentially cause you damage. What is the authorization? It’s a program called Accessibility in Settings, where you can allow the following applications to control your computer. There’s a section here where you can find ChatGPT, turn it on and you can use it.
After that, it is to manage the applications, that is, which ones I want to use. xcode because I don’t use it myself, so I didn’t test it. terminal as long as you open the authorization above, after you open the terminal window, you can in the ChatGPT application under the bottom of the dialog box under an icon. This icon is a small square with a pointer in the center. Haha, I’m not going to take a screenshot of this, that is, after you click on this icon, you can select the window of your Terminal.
He’ll just do a Q&A based on the information inside your Terminal window. He’ll read out the last 200 lines of the Terminal window, and he’ll go and give you all kinds of responses based on the information in those 200 lines. That’s one way to connect to the terminal window. And this terminal, as you know, we often use SSH to connect to remote cloud servers or to our NAS. So this one, it’s still working. I’m doing all sorts of things on the cloud server, and I get an error or something, and you can just ask ChatGPT, “Hey, what’s the error, what should I change?” And he’ll go and fix it for you.
The text editor works as well. I tried it, open the text editor, there is content in it, it will reply according to the content.VScode this is my main IDE.When using VSCode, you need to install a plugin first. In VSCode, you need to download a VSIX file from the OpenAI website. After downloading, go to VSCode to install the plug-in. After installing it, you can use it. But note that this plugin is not available in the VScode plugin store. This is a kind of “turnip in a hurry” approach. That is to say, according to the logic that should be uploaded to the Vscode plugin store, but then, it is not uploaded. Possibly because of too anxious, did not go up; it is also possible that, because they want to grab the GitHub Copilot’s rice bowl, to go to Microsoft to go on the shelves, because VSCode is Microsoft’s family well, to go on the shelves of the time by Microsoft picking the nose and picking the eye, said: “You can not, this is not very safe, go back to then go. ” To give him a drag, so simply they upgraded the functionality and made their own plugin from the outside for you to go and load it up.
What about this system? It works locally. What does it mean? It means that when VScode is editing code, you can edit the local code, or you can edit the code on the remote cloud server via SSH. But if you want to edit the remote code, you need to reinstall all the plug-ins, this can not be done, so it can only work locally. This system, is only able to view the current window of the code, because VScode can open a lot of windows, it can only view the current.
If you select some highlighted code in the current window, he will prioritize the highlighted code to focus on parsing, that is, some of the features. All the generated code, for example, where I’m wrong, the code is what the problem, he will generate a lot of code to help you to explain to help you solve the problem. However, these codes must be copied and pasted, he will not directly apply these codes to your program, and will not directly send these instructions to the Terminal, that is, the command line terminal to execute. You have to manually paste the code or commands he suggests to you from the chat window into the appropriate place and hit enter. This is one of the things that makes people feel bad.
So does this replace GitHub Copilot or not? I switched back after a few tries, and I went on to use GitHub Copilot.Why? First, ChatGPT can not directly modify the code, every time you need to copy and paste, change it yourself, it is very likely to change the wrong ah. You then ask him to say this is right or wrong, it is very troublesome to operate, this is the first reason. The second one, not being able to work with remote code, this is unacceptable to me. I am often need to deal with the code on the cloud server, he can only work locally, this can not be. Thirdly, he can’t use different models. Now you can choose different models whether you use Cursor or GitHub Copilot. But OpenAI’s ChatGPT can only select OpenAI’s own model. Do you think it’s okay if I want to go for Anthropic’s Claude? Is it okay if I want to go pick Google’s Gemini? Nope, don’t have the ability to do that, so that’s what makes this one less enjoyable.
The other thing that’s really annoying is that he can only deal with the current file, he can’t deal with the stuff on the workspace. We do a program, this program is a very complex directory structure, this directory structure we want to call it workspace, that is, the workspace, there will be a lot of files. If it can not be based on the entire workspace together to give me suggestions, only according to my current open this file to suggest that this is completely meaningless, change the program will change the more chaotic.
We push for each program to be as short as possible and for each program to work with each other, not that we write an exceptionally long program. Because a particularly long program is not good for maintenance. You’re using ChatGPT’s app to directly call the current window’s code for the prompt, then this should work poorly. My Cursor, on the other hand, was out of date, so I didn’t go back and test it. Now my main programming tool is GitHub Copilot, and in terms of user experience, Cursor should be the best, followed by GitHub Copilot, and then the worst is this ChatGPT we’re talking about today.
Why do you think ChatGPT chose such a path out, obviously doing poorly again, to go hard with GitHub Copilot and Cursor? In fact, a lot of companies that make big models, they expected this kind of way. What does it mean to be an all-in-one assistant? It means that you don’t need to open GitHub Copilot when you’re programming, you don’t need to open Office Copilot, you don’t need to open Web Copilot in your browser, you don’t need to open OS Copilot, you don’t need to open GitHub Copilot when you’re programming, you don’t need to open Office Copilot, you don’t need to open Web Copilot, you don’t need to open OS Copilot, you don’t need to open OS Copilot. That’s what these big modeling companies, whether it’s Anthropic or OpenAI, they want to do.
So you look at his app, you can take screenshots, screenshots, photos, upload files, you can hear your voice, and now you can lift the code right out of the IDE page. They want to say I’ve done it all, you don’t have to bother anyone else. But there are some dilemmas here in that you can’t operate a computer directly. Didn’t Anthropic originally demonstrate operating a computer directly? The last time I was out at an event, someone else asked me if this was so scary that he just operated the computer directly. I said it’s not that scary. Firstly, the program demonstrated by Anthropic to operate the computer directly is a laboratory version, and they don’t dare to give this kind of thing to the users directly. Because once they give it to users, let’s not say he is malicious or anything, how can you fix it if he breaks it? This is impossible to fix.
All the IT staff in the company, the biggest fear is: “Oops, my computer is not working, I have a program can not be found, which thing can not get up.” You have to go to the site to deploy, go to the site to see what the environment is like at that time, can be troublesome for this process. This is not allowing Claude to operate the computer directly. Right now users are able to use it or Claude reads things from the computer and operates it with you having to copy and paste it all yourself.ChatGPT does the same thing, so it can’t operate the computer.
So user experience, what exactly is user experience? Or what exactly is the user experience of this big model application? There are actually two key points here. The first key point is called finding the right content, and the second key point is operating within predictable limits.
Let’s start with the first one, finding appropriate content. So what does it mean to find appropriate content? It means that if we want to solve a problem, you need all the content you need to solve the problem. Just like we just said, we want to change the program, if you want to change the program, you need all the programs, all the code in the Workspace, in this workspace, and then you’re able to go and change the program. What do you even need? It’s just a lot of environmental information. You say, “Is this computer of mine a Ubuntu host on top of Oracle Cloud, or is it an Oracle OS host, or what kind of host is it? What kind of CPU and what kind of RAM do I have on this host?” You say, “Do I have an arm CPU, or an Intel CPU?” and you have to tell this big model all of that, otherwise a lot of the advice it gives you will be wrong, so you need all of that information.
And the second one, what’s the other problem? You can’t be distracted by other useless information. What does it mean to be disturbed by useless information? Let me tell you a little story: when the antivirus program used to kill viruses in this hard disk, what did it fear the most, you know? It was other antivirus programs. Why? Because the way antivirus programs work is to take the virus database and compare it. I read this code, who to my virus database to compare is not a virus. But once he encountered another antivirus program, installed in the hard disk antivirus program, there is also a virus library, then you take out a comparison, ouch, with me here the same, and then compared to the next is also the same. The last few bytes may not be the same, this is not a complete virus, do not worry about it.
And then the next one, hey, it’s the same thing. Because people have all the viruses in the virus database, he goes to the person to search and says, “This is great. You see that you have all the viruses in your virus database, which is interfered by useless information. It’s the same thing with our hard disks, there’s a lot of different kinds of information. When you, the big model, need to go and answer a question for me, you can’t just find everything on my hard disk and come and answer it for me. Some of it I want you to see, some of it I don’t want you to see, or some of it I don’t want you to take for reference this time around, and that’s got to be clear.
And one more thing, what do you mean by not scaring the user, this is very critical. If a user tells him, “Now that the big model is here, I’m going to search all the contents of your hard disk, and I’m going to solve the problem for you,” the user says, “Forget it, let’s not bother. I still have some learning power ah, or this kind of directory in the head of the file, I do not want you to use it to interpret it, may be some treasure, a small movie or so on. He still has to let the user know that you’re working within the bounds of his license to work with this big model, and not to scare him.
This one is about finding appropriate content. What about the next one, operating within predictable limits. What does that mean? The first one starts with making sure you get it right, and this one is actually hard. Why? The answers that are given out now, Sensei’s GitHub Copilot or ChatGPT or Claude or products like its Cursor, it’s basically hard to get the answers it gives out right all at once. The reason is also very simple, that is, the information they get is not complete enough, the environment information is incomplete, the other code is not studied.
Even if you tell him that you’re going to do the processing based on all my code, there’s actually no way for him to actually do that because big models have context windows. You can’t say I’m going to cram all the code of a program into it, and that’s very, very laborious. Even if his context window is large enough, you still have a lot of overhead for network transfers, and this is also a pain in the ass.
So what’s the way these big models nowadays deal with this kind of WORKSPACE? It is to do embedding. He takes all your code and does this embedding process, and when he’s done with it, he finds the relevant content in the embedded code based on your commands, and then he answers them. That’s the only way to do it now, so they give the answer.
Actually, it’s usually wrong. When you get it, you have to manually judge it and then tweak it to get it right. This first one is going to be hard to get right. The second one, what? You still can’t make users afraid. For one thing, we run a command called “RM-RF *”, so what does that mean? It means delete everything on the hard disk. There’s a lot about executing commands that’s a little more intimidating. That’s why neither ChatGPT nor Claude is afraid to go straight to executing commands these days, that’s why. They are afraid of scaring the users. Moreover, if he is really allowed to carry out the work, and if he has no remedies in case something goes wrong, this is also a big expense.
Why? It’s that you have to look at the regular engineers, or network engineers, what are their operating manuals? Before you do any upgrades, before you do any operations, you have to make a backup. You can only do the operation after the backup is done, and you can restore it later in case you make a mistake. But the overhead of this operation would be very high. So, right now, they’re afraid to go straight for both applications.
So, is this a purely visual program? Let’s think a little bit farther, what do you mean by a pure vision solution? When Tesla pushed Autopilot, it pushed this pure vision solution. When others are still researching millimeter wave radar and laser radar, Musk said no, I want a pure vision solution, I’m not going to bother with you. So, when on the pure vision program? It’s to go against all kinds of radar, against the vehicle road cooperative system. Radar is still a relatively simple play, the car plus millimeter wave, plus this LiDAR, this is still simple. What is vehicle-circuit coordination? It is to add all kinds of sensors, all kinds of processing terminals, and all kinds of markings on the road, so that your car can be unimpeded on such a smart highway.
However, it seems that this road should not be the mainstream direction now. Even the new domestic forces in China have not moved toward the vehicle-circuit synergy approach. Everyone is still thinking about how to get this done in the car and that’s the end of it. So what does this have to do with the ChatGPT MacOS client we’re talking about today? By analogy, what do desktop clients like ChatGPT and Claude do? They add eyes, ears, mouths, arms and legs to desktop applications. What does that mean? Originally the thing was called a chat tool, but now we want the chat tool itself to see the screen on your computer.
It can take screenshots, it can see your code, it can hear you and speak to you, and it can even do certain actions. So in this process, if you’re going for a purely visual solution, it’s actually the one that ChatGPT and Claude are using right now. What does it mean? It means that we can see the parts that the user can see and forget about the parts that the user can’t see. Then, through the camera screenshot and integration with IDE, there is no need to do other modifications, we only need to do the simplest one, you can use it. Or even just say I give you a license, you can use it, and there is no need for deep integration with the operating system vendors. You want to do this kind of very complex underlying operations, you must have the operating system vendor to give you a license before you can, otherwise he can not go to do.
Ultimately, it’s about working like a human being. What does that mean? Why does Musk push pure vision, people think? Two reasons. The first one is that you can’t get this thing straightened out when you go to deal with the highway systems in various countries and say you’re going to put a label on the highway for me, or a sign, or a sensor of some kind. What if you go on a highway that doesn’t have a sensor, that hasn’t been retrofitted, is your smart driving system gone? That’s one reason. Say we don’t want to deal with you, I’ll just settle it all myself in the car and be done with it.
What’s the second reason? He wants this car to work like a human because your big model is still iterating and upgrading. So in which direction is it upgraded? For example, if I upgraded to a big model like this with a Vehicle to Road Collaboration system, with LIDAR and Haumi Vortex Radar, is this the right thing to do or not? It’s actually hard to measure. But say, hey, I’ll shift toward people. How do people drive? How do I drive? People don’t have millimeter wave radar, or lidar, or any kind of vehicle road collaboration anyway. Wherever we go, this road doesn’t matter if I recognize it or not, I can drive away. Then we train on that and we’re done. This is a core underlying logic of a pure vision system, that we take the human standard and keep moving in that direction.
So this is a pure vision scenario. What does it look like if it’s a vehicle-road collaboration solution? It’s the same scenario that Github Copilot went with, along with various operating systems, like whatever AIPC, AI phones, the new Windows 11 with Copilot.
MacOS and iPhone 16 with this Apple Intelligence, as well as various Android phones in our country that are claimed to come with a large model, this is the standard Vehicle-to-Guideway system. In this process, then you need to modify the infrastructure. Anyway, this Microsoft is fine, he said, “I’m in the operating system, I’ll just change it on the operating system.” Apple said, “I’ll also change the operating system.” Android, the gang of cell phones, never mind Xiaomi or Huawei, said, “Let’s also go on the operating system IC and modify it directly at the bottom so that the user experience is the best.” It became that way.
In this area, the OS vendors have some natural advantages with these vendors in IDEE. If you’re a big model vendor, you can’t get by because you can’t figure out how the OS works. And even if you can figure it out, I don’t give you a license, you cross my license and it’s a virus. That’s how he came to work.
So, do we need this purely visual solution on the computer? Is this the right path? You say that when Musk was doing FSD and doing purely visual solutions, everyone thought it was the right path. So shouldn’t we do the same on computers? The direction of model development is the most important test, in addition to the degree of cooperation from the environmental side. Autonomous driving thing, in fact, relatively simple, pure vision program can be fixed. People just go in this direction on the line, you just need to let the self-driving car to reach the height of the driver. I do not need to reach the F1 formula car, nor do I need to reach the level of a car repair teacher, you just need to be able to reach the level of a normal driver on OK.
So, autonomous driving can go to a pure vision solution, but this won’t work on a computer, why? Because we need expert mode, we need to solve a lot of problems that we can’t solve by ourselves. This is different from the automatic driving we just talked about. Autopilot is, you just need the model to become an ordinary driver is enough, and we go to operate the computer is needed to assist on the computer, our expert model can reach a higher level than us. Then it’s not appropriate to go to pure vision if that’s what you’re doing.
What is the dilemma facing OpenAI and Anthropic now? What do they need to overcome? It’s the unwillingness of the operating system vendors to cooperate with them. You say, “I want to have more features for the user to use up, I can obviously do more, I can do better.”
But you can’t do it without the cooperation of the OS vendor. The number of users goes up, but they can’t use all the features. People are still chatting with you, so the willingness of users to pay is low. The likelihood that these two companies will be able to realize profitability is basically equal to zero. He is now encountering such a problem, the original thought is to say, “Oh, Microsoft, you just pay the license fee to OpenAI honestly, that’s it, I’ll let you go and use it.” As a result, GitHub Copilot is now able to use not only OpenAI’s GPT-4, or Anthropic’s Claude, but also Google’s Gemini.
This OpenAI said no, you guys don’t hang yourselves from one tree, and I can’t hang myself from one tree either, so let’s run in both directions. You’re not loyal, we’re not loyal, and it’s a thing like that. Well, the big model vendors are still on the road to solving user needs. Sensei in the end whether his business problems can be solved or not, whether he can make money or not, this thing we do not care about him. However, if you have no way to solve the user’s needs, this road is destined to be a dead end.
Now, there are two roads. As we have just said, one is the road of vehicle and road coordination, and the other is the road of pure vision. The pure visual road is actually I close the door and take care of things by myself, you don’t have to care about me, I don’t have to cooperate with me, I can take care of it by myself. The way of vehicle-road collaboration is that I change the infrastructure from the bottom. Now the way of vehicle-circuit cooperation is AIPC, AIPHONE, this road is the way of vehicle-circuit cooperation.
The purely visual approach, now Sensei’s ChatGPT from OpenAI or Anthropic’s Claude, are trying to go this independent route. I don’t need you to work with me, I take care of it myself. I’ll see what the user can see, and then I’ll solve the problem as far as the user can see it. If the user can’t see something, I can’t see it either. As for which way will work, we’ll see.
Well, that’s it for this story with you, thanks for listening. Please help out by liking it, clicking the little bells, and joining the Discord discussion group. Anyone interested and able is also welcome to join our paid channel. See you soon.