Probing Compositional Understanding of ChatGPT with SVG

Foundational models can generate realistic images from prompts, but do these models understand their own drawings? Generating SVG (Scalable Vector Graphics) gives us a unique opportunity to ask this question. SVG is programmatic, consisting of circles, rectangles, and lines. Therefore, the model must schematically decompose the target object into meaningful parts, approximating each part using simple shapes, then arrange the parts together in a meaningful way. Compared generating a drawing of a bicycle using a pixel-based representation (e.g. a diffusion model), generating SVG forces the model to explain what it is drawing using code (i.e. symbolically) first. <img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*XnKKQvOUKfvHOlXAZ9cJoQ.png" style="height:416px; width:700px" /> tl;dr: I asked ChatGPT to draw 100+ objects in SVG with explanations. Browse them all <a href="https://evanthebouncy.github.io/chatgpt-svg/" rel="noopener ugc nofollow" target="_blank">at this url</a>. Raw data <a href="https://github.com/evanthebouncy/chatgpt-svg/blob/master/data.tsv" rel="noopener ugc nofollow" target="_blank">.tsv here</a>. Just look at them! Aside from being fun, they inform us quite a lot about LLMs. <a href="https://evanthebouncy.medium.com/probing-compositional-understanding-of-chatgpt-with-svg-74ec9ca106b4">Visit Now</a>