Introducing This AI Model That Can Make Creative Connection Puzzles

Every day, millions of players engage with Connections, a popular category-matching game from The New York Times, which launched in mid-2023 and quickly accumulated 2.3 billion plays within its first six months. The game challenges players to identify four themes from 16 words using only four guesses, offering a blend of simplicity and intrigue.

The enjoyment for players comes from applying abstract reasoning and semantic understanding to find connections among words. However, creating these puzzles is a complex task. Researchers from New York University recently evaluated OpenAI’s GPT-4 large language model (LLM) for its ability to generate engaging and inventive puzzles. Their study, released as a preprint on arXiv in July, found that while LLMs struggle with metacognition and predicting player reasoning, they can still produce puzzles of comparable quality to those from The New York Times with proper prompting and specific tasks.

Lead author Timothy Merino, a Ph.D. student at NYU’s Game Innovation Lab, noted that LLMs don’t fully grasp human thinking processes, which affects their ability to gauge puzzle difficulty. Nonetheless, these models possess a strong linguistic understanding due to their extensive text training.

Connection puzzle 2
Introducing This AI Model That Can Make Creative Connection Puzzles 4

The researchers first explored the fundamental mechanics and appeal of the game. While some word groups might be familiar, the real challenge lies in dealing with misleading words that create ambiguous categories. These “red herrings” are integral to the game’s complexity.

The study mirrored the approach of Connections creator and editor Wyna Liu, who incorporates decoys to enhance puzzle difficulty. Senior puzzle editor Joel Fagliano emphasized that recognizing these decoys is a challenging skill, with more overlap making puzzles tougher.

The NYU paper identifies three key difficulty factors from Liu’s approach: word familiarity, category ambiguity, and wordplay variety, presenting a unique challenge for LLM systems.

Connection puzzle 3
Introducing This AI Model That Can Make Creative Connection Puzzles 5

To test GPT-4, the researchers provided game rules and examples, then asked the model to create new puzzles. They found it challenging to develop a comprehensive ruleset for GPT-4 to follow consistently. Adding more rules did not improve results, as the model often overlooked them.

Instead, success came from breaking the task into smaller segments. One LLM generated puzzles through iterative prompting, another edited the categories, and a human evaluator selected the best results. This approach allowed each LLM to focus on specific rules without needing a full understanding of the game.

To assess the model’s effectiveness, the researchers collected feedback from 52 players who compared GPT-4-generated puzzles with real Connections puzzles. The results indicated that GPT-4 could produce puzzles that were equally or more difficult and engaging.

Connection puzle
Introducing This AI Model That Can Make Creative Connection Puzzles 6

Greg Durrett, a computer science professor at the University of Texas at Austin, praised the study as a valuable benchmark for future work on semantic groupings and puzzle creation. While LLMs might not match human creativity, they can assist puzzle designers by providing expansive word pools and unique insights.

Julian Togelius, Director of NYU’s Game Innovation Lab, suggested that the methods used could be applied to other games like Codenames, which also involves finding commonalities between words. Although LLMs may not fully replicate human creativity, Merino believes they will be valuable tools for puzzle creators, offering quick and diverse word suggestions.

Latest articles