>UI stuff just has an input problem. But it is not that hard to think that ChatGPT could place widgets once it can consume images and has a way to move a mouse.
I remember one of the OpenAI guys on Lex Fridman podcast talking about how one of the early things they tried and failed at was training a model that could use websites, and he alluded to maybe giving it another go once the tech had matured a bit.
I think with GPT-4 being multi-modal, it's potentially a very close to being able to do this with the right architecture wrapped around it. I can imaging an agent using LangChain and feed it a series of screenshots and maybe it feeds you back a series of co-ordinates for where the mouse should go and what action to take (i.e. click). Alternatively, updating the model itself to be able to produce those outputs directly somehow.
I remember one of the OpenAI guys on Lex Fridman podcast talking about how one of the early things they tried and failed at was training a model that could use websites, and he alluded to maybe giving it another go once the tech had matured a bit.
I think with GPT-4 being multi-modal, it's potentially a very close to being able to do this with the right architecture wrapped around it. I can imaging an agent using LangChain and feed it a series of screenshots and maybe it feeds you back a series of co-ordinates for where the mouse should go and what action to take (i.e. click). Alternatively, updating the model itself to be able to produce those outputs directly somehow.
Either way, I think that's going to happen.