OpenAI has announced the creation of their latest deep learning model, GPT-4, which is capable of accepting image and text inputs and emitting text outputs. Despite being less capable than humans in real-world scenarios, GPT-4 has shown human-level performance on various academic and professional benchmarks. After six months of iterative alignment using lessons from adversarial testing and ChatGPT, GPT-4 has demonstrated improved factuality, steerability, and guardrail adherence.
Over the past two years, OpenAI has rebuilt their entire deep learning stack and co-designed a supercomputer with Azure specifically for their workload. The GPT-4 training run was unprecedentedly stable, making it their first large model with accurately predictable training performance. OpenAI aims to continue scaling reliably and hone their methodology to predict and prepare for future capabilities well in advance for safety purposes.
GPT-4’s text input capability is available through ChatGPT and the API with a waitlist, and OpenAI is collaborating closely with a single partner to prepare the image input capability for wider availability. OpenAI is also open-sourcing OpenAI Evals, their framework for automated evaluation of AI model performance, to allow users to report shortcomings and guide further improvements.
GPT-4 is more reliable, creative, and able to handle more nuanced instructions than its predecessor, GPT-3.5, particularly for complex tasks. OpenAI tested GPT-4 on various benchmarks, including simulated exams originally designed for humans, and found it to be more capable than GPT-3.5. Although a minority of the problems in the exams were seen by the model during training, OpenAI believes the results to be representative.