Geolocalized Modeling for Dish Recognition

Abstract
Food-related photos have become increasingly popular, due to social networks, food recommendation and
dietary assessment systems. Reliable annotation is essential in those systems, but unconstrained automatic food
recognition is still not accurate enough. Most works focus on exploiting only the visual content while ignoring the
context. To address this limitation, in this paper we explore leveraging geolocation and external information about
restaurants to simplify the classification problem. We propose a framework incorporating discriminative classification
in geolocalized settings and introduce the concept of geolocalized models, which in our scenario are trained locally
at each restaurant location. In particular, we propose two strategies to implement this framework: geolocalized voting
and combinations of bundled classifiers. Both models show promising performance, and the latter is particularly
efficient and scalable. We collected a restaurant-oriented food dataset with food images, dish tags and restaurantlevel
information, such as the menu and geolocation. Experiments on this dataset show that exploiting geolocation
improves around 30% the recognition performance, and geolocalized models contribute with an additional 3~8%
absolute gain, while can be trained up to five times faster.

Food recognition
Previous works on dish recognition are mainly based on analyzing the visual appearance. Some works address
food recognition using conventional visual features trying to capture the global appearance of the food. Joutou
and Yanai [6] proposed an automatic food image recognition system based on multiple kernel learning (MKL),
which integrates several kinds of image features (e.g. color, texture, SIFT) to learn an optimal linear combination
of feature-specific kernels. Hoashi et al. [7] extended the system proposed in [6] with more image features and
food classes. Maruyama et al. [8] improved the recognition accuracy by incrementally updating the classifier based
on a Bayesian network. Zong et al. [2] proposed to exploit the structure of the food object which is represented as
the spatial distribution of the local textural structures and encoded using shape context. Kawano et al.[9] compute
Fisher vectors over HOG patches to develop a real-time mobile food recognition system. Recently, they extended
the system to 256 food categories [10].
Other works consider food as a certain combination of different components (ingredients). Yang et al. [1] proposed an American fast food recognition system by using pairwise local features, which effectively captures important shape characteristics and spatial relationships between food ingredients. Dietcam [3] analyzes a meal by taking several images (or a short video), estimating the volume of each food items and finally estimating the caloric
intake. The recognition accuracy is increased through modeling food geometric locations and a joint probability model. Zhang [17] proposed to classify plates of food to the correct cuisine using attribute-based classification,

SPRING SOURCE TECHNOLOGIES

Search This Blog

Geolocalized Modeling for Dish Recognition

Comments

Post a Comment

Popular posts from this blog

Jio

Enabling Cloud Storage Auditing with Verifiable Outsourcing of Key Updates

PUNCHING MACHINE