AI-Pastiche is a carefully curated dataset comprising 953 AI-generated paintings in well-known artistic styles. It was created using 73 manually crafted text prompts, used to test 12 modern image generators, with one or more images generated for each of the selected models. The dataset includes comprehensive metadata describing the details of the generation process.
Prompt: Generate a detailed coastal landscape painting in the Impressionist style movement of the 19th century. The painting depicts a coastal scene with a small boat near the water's edge and distant buildings on the shore. The skyline is dominated by a cloudy sky, suggesting an overcast day. The colors are muted and earthy, conveying a serene yet slightly melancholic atmosphere. The technique used appears to be Impressionism, characterized by short, visible brushstrokes that capture the essence of the scene rather than detailed realism. The focus is on the play of light and color, creating a sense of movement and the transient nature of the landscape.
Metadata Description
Metadata comprise the following columns:
generative model: the model used to generate the image
prompt: the prompt passed as input to the generator
subject: a collection of comma separated tags describing the content (as described in the prompt)
style: the style to be imitated
period: the period the generated image should belong to
generated image: the name of the generated image in the generated_images dataset
Three additional columns provide human metrics collectd through extensive surveys. All values are in the range [0,1].
defects: presence of notable defects and artifacts (0=no evident defects, 1=major problems)
authenticity: perceived authentity of the sample (please, check the article for details about this metric)
adherence: adherence of the sample to the prompt request
Do LLMs perceive Art the same way we do?
Does CLIP perceive art the same way we do? Andrea Asperti, Leonardo Dessì, Maria Chiara Tonetti, Nico Wu. Proceedings of IEEE International Conference on Content-Based Multimedia Indexing (IEEE CBMI 2025), Dublin, Ireland, 22-24 October 2025.
3D UMAP projection of image embeddings of National Gallery of Art Dataset extracted from the CLIP ViT-L/14 model. Each point represents a painting from the National Gallery of Art dataset, colored by artistic style. The visualization reveals the semantic organization of the artworks in the latent space, where similar styles tend to cluster together.
abstract: CLIP has emerged as a powerful multimodal model capable of connecting images and text through joint embeddings, but to what extent does it 'see' the same way humans do - especially when interpreting artworks? In this paper, we investigate CLIP's ability to extract high-level semantic and stylistic information from paintings, including both human-created and AI-generated imagery. We evaluate its perception across multiple dimensions: content, scene understanding, artistic style, historical period, and the presence of visual deformations or artifacts. By designing targeted probing tasks and comparing CLIP's responses to human annotations and expert benchmarks, we explore its alignment with human perceptual and contextual understanding. Our findings reveal both strengths and limitations in CLIP's visual representations, particularly in relation to aesthetic cues and artistic intent. We further discuss the implications of these insights for using CLIP as a guidance mechanism during generative processes, such as style transfer or prompt-based image synthesis. Our work highlights the need for deeper interpretability in multimodal systems, especially when applied to creative domains where nuance and subjectivity play a central role.