AI Learns to Draw Like a Human: The SketchAgent Revolution from MIT and Stanford

6 minutes de lecture

Artificial intelligence is no longer content with generating spectacular images: it is now tackling the art of sketching, this mode of spontaneous and universal expression that allows ideas to be communicated in just a few strokes. MIT and Stanford University have just taken a major step forward with SketchAgent, a system capable of drawing sequentially, “stroke by stroke,” in the manner of a human. An analysis of this breakthrough and the perspectives it opens for human-machine collaboration, creativity, and education.


Why teach AI to sketch like we do?

Sketching is much more than a simple drawing: it is a tool for reflection, communication, and problem-solving. In everyday life, we sketch out a plan, diagram a circuit, or doodle to explain an abstract idea. Yet until now, AIs were primarily skilled at generating final images, whether realistic or stylized, but incapable of reproducing the iterative and progressive creative process of human sketching (MIT News).


SketchAgent: An AI that draws “stroke by stroke”

Developed by MIT CSAIL and Stanford, SketchAgent uses a multimodal language model (such as Claude 3.5 Sonnet) to transform natural language instructions into a succession of strokes on a virtual grid. The major innovation: the AI is not trained on massive databases of human drawings, but learns to “think” of drawing as a sequence of actions, with each stroke identified and described according to its function (door, window, etc.) (MIT NewsarXiv).

This sketching language allows SketchAgent to collaborate with a human: the user can ask the AI to add part of the drawing, or intervene themselves on the canvas, with the AI then taking over to complete or correct the work. This collaborative mode opens the door to interactive educational tools, creative games, or assistants for rapidly designing complex diagrams.


Performance that surpasses traditional models

The team tested SketchAgent with different multimodal models: Claude 3.5 Sonnet proved to be the most effective at generating smooth and legible sketches, outperforming GPT-4o or Claude 3 Opus. The strength of the approach lies in its ability to generalize: the AI can draw objects or concepts it has never encountered, simply based on text descriptions (MIT News, HuggingFace).

Nevertheless, SketchAgent is not yet able to rival a professional illustrator: it excels at abstract or schematic sketches, but struggles with complex shapes, detailed logos, or precise human representations. Sometimes, the interpretation of collaborative instructions can produce unexpected results: a two-headed rabbit, for example, if the AI and human did not understand well each other’s role in dividing the tasks.


A step further toward augmented creativity

This advance is part of a broader trend: the rise of multimodal models, capable of processing text, image, sound, and other data simultaneously to understand and generate increasingly rich and nuanced content. In 2025, these multimodal AIs are becoming the new standard, enabling more natural and intuitive human-machine interactions, whether for artistic creation, education, or engineering (BytePlusSketchAgent MIT).

Other teams, such as those from the University of Surrey and Stanford, have shown that AI can now recognize and understand sketches made by non-artists, identifying objects and scenes with accuracy close to that of humans. This ability to “read” and “write” the language of sketching opens the door to visual search tools, assisted design, or universal communication, regardless of the user’s artistic skills (ToolPilot).


Human-machine collaboration: toward new uses

Research on artistic co-creation between humans and AI, such as the work of Sougwen Chung or the CollabDraw and DuetDraw projects, shows that the boundary between machine and creator is becoming increasingly porous. AI is no longer just a tool, but a partner in visual dialogue, capable of proposing, adjusting, and even inspiring the creative approach (YouTube – Sougwen ChungGoogle Research – CollabDrawFrontiers in Robotics and AI).


Perspectives and limitations

While SketchAgent marks a key milestone, challenges remain: improving spatial understanding, refining collaboration management, and enabling more detailed or expressive sketches. Researchers are considering enriching learning with synthetic data from diffusion models, or integrating more intuitive interfaces to facilitate graphical dialogue.

Ultimately, these advances could transform the way we communicate, teach, or design: imagine an assistant that helps you sketch out your ideas during a meeting, a teacher who guides students in solving visual problems, or a creative brainstorming tool accessible to all, without barriers to artistic skill.


To go further

Partager cet article
Laisser un commentaire