OpenAI has released its latest AI model, GPT-4, which has the ability to understand both text and images. The AI model is available to OpenAI’s paying users via ChatGPT Plus, and developers can sign up on a waitlist to access the API. GPT-4 is priced at $0.03 per 1,000 prompt tokens and $0.06 per 1,000 completion tokens. The model has been used by companies such as Stripe, Duolingo, Morgan Stanley, and Khan Academy for various applications.
GPT-4 can generate text and accept image and text inputs, an improvement over its predecessor GPT-3.5, which only accepted text, and performs at a “human level” on various professional and academic benchmarks. OpenAI spent six months “iteratively aligning” GPT-4 using lessons from an internal adversarial testing program, as well as ChatGPT, resulting in “best-ever results” on factuality, steerability, and refusing to go outside of guardrails.
GPT-4’s ability to understand images is one of its most interesting aspects. GPT-4 can caption and interpret relatively complex images, such as identifying a Lightning Cable adapter from a picture of a plugged-in iPhone. The image understanding capability is currently being tested with a single partner, Be My Eyes, to start with. Powered by GPT-4, Be My Eyes’ new Virtual Volunteer feature can answer questions about images sent to it. For example, if a user sends a picture of the inside of their refrigerator, the Virtual Volunteer will not only be able to correctly identify what’s in it but also extrapolate and analyze what can be prepared with those ingredients.
Another improvement in GPT-4 is the steerability tooling. With GPT-4, OpenAI is introducing a new API capability, “system” messages, that allows developers to prescribe style and task by describing specific directions. System messages are essentially instructions that set the tone and establish boundaries for the AI’s next interactions.
Despite GPT-4’s improvements, OpenAI acknowledges that the AI model is far from perfect. It still “hallucinates” facts and makes reasoning errors, sometimes with great confidence. GPT-4 generally lacks knowledge of events that have occurred after the vast majority of its data cuts off (September 2021) and does not learn from its experience. It can sometimes make simple reasoning errors which do not seem to comport with competence across so many domains or be overly gullible in accepting obvious false statements from a user. Sometimes it can fail at hard problems the same way humans do, such as introducing security vulnerabilities into code it produces.
In conclusion, GPT-4 is a significant improvement over its predecessor, with its ability to understand both text and images and its steerability tooling. However, it is far from perfect and still has limitations, such as its lack of knowledge of recent events and its tendency to make reasoning errors.