Food-101 – Automatic point-of-interest image cropping via ensembled convolutionalization

Andrea Asperti, Pietro Battilana

beef tartare beignets caprese salad ceviche

Convolutionalization of discriminative neural networks, introduced by J.Long et al. for segmentation purposes, is a simple technique allowing to generate heat-maps relative to the location of a given object in a larger image. In this article, we apply this technique to automatically crop images at their actual point of interest, fine tuning them with the final aim to improve the quality of a dataset. The use of an ensemble of fully convolutional nets sensibly reduce the risk of overfitting, resulting in reasonably accurate croppings. The methodology has been tested on a well known dataset, particularly renowned for containing badly centered and noisy images: the Food-101 dataset, composed of 101K images spread over 101 food categories. The quality of croppings can be testified by a sensible and uniform improvement (3 − 5%) in the classification accuracy of classifiers, even external to the ensemble.

Errors in the Food-101 Data Set

It is also possible to use our technique to find missclassified samples in the dataset. If, after convolving at several different scales, no classifier is able to find the object it is extremely likely that the sample had a wrong label. Some example are given below. We are collecting all these mistakes for the FOOD-101 dataset in a black list (please, help us to improve it!)

carrot cake (?) chicken quesadilla (?) fried rice (?) grilled salmon (?)
pho (?) tiramisu (?) mussels (?) steak (?)


      title = {Automatic point-of-interest image cropping via ensembled convolutionalization},
      author = {Asperti, Andrea and Battilana, Pietro},
      journal = {International Journal of Neural Networks and Advanced Applications},
      volume = {5},
      year = {2018},
      pages = {17-24},
      url = {},