OpenAI Released GPT-2 and the Text It Generated Was Unsettling

When OpenAI finally released the full GPT-2 model in August 2019 after months of staged rollouts, it prompted a genuine reckoning with what capable language models actually meant.

OpenAI had announced GPT-2 in February with an unusual accompanying decision: they would not be releasing the full model. The stated reason was concern about misuse. A model capable of generating coherent, convincing text at scale could be used to produce misinformation, spam, or synthetic content designed to manipulate. The organisation wanted to study the effects of staged release before putting the full capability into the world.

That decision itself became a significant story. Some researchers criticised it as unnecessary and paternalistic. Others argued it was a responsible approach to a genuinely novel risk. The debate about how AI capabilities should be disclosed and released, who should get access and when and under what conditions, was one that the field had not previously had to have at this level of concreteness.

By August, after releasing progressively larger versions of the model and observing the response, OpenAI released the full version. The concerns about immediate catastrophic misuse had not materialised at the scales they had feared. And the model was good enough that limiting access was itself generating a distorted picture of what was available to well-resourced actors who could train their own versions.

Reading the output of GPT-2 that August was a strange experience. Ask it to continue a news article and it would produce text that was grammatically fluent and topically coherent for several paragraphs. It did not understand what it was writing. It was pattern matching at enormous scale. But the output looked like understanding, and at normal reading pace it was hard to immediately identify as machine-generated.

The practical implications were genuinely uncertain. The model required significant computational resources to run at speed and scale. But the trajectory was clear. Models would get larger and cheaper to run. The text generation capability demonstrated in GPT-2 was not the ceiling. It was closer to the floor of what was coming. That was the thing I kept thinking about in August 2019.