Geo localized Modeling for Dish Recognition
Abstract:
Food-related photos have become
increasingly popular, due to social networks, food recommendation and dietary
assessment systems. Reliable annotation is essential in those systems, but
unconstrained automatic food recognition is still not accurate enough. Most
works focus on exploiting only the visual content while ignoring the context.
To address this limitation, in this paper we explore leveraging geo location
and external information about restaurants to simplify the classification
problem. We propose a framework incorporating discriminative classification
in geo localized settings and introduce
the concept of geo localized models, which in our scenario are trained locally
at each restaurant location. In particular, we propose two strategies to
implement this framework: geo localized
voting and combinations of bundled classifiers. Both models show
promising performance, and the latter is particularly efficient and scalable.
We collected a restaurant-oriented food dataset with food images, dish tags and
restaurant level information, such as the menu and geo location. Experiments on
this dataset show that exploiting geo location improves around 30% the
recognition performance, and geo localized models contribute with an additional
3~8% absolute gain, while can be trained up to five times faster.
Architecturte Diagram:
Existing System:
Eating is essential for human life, both
from personal and socio-cultural perspectives. Thus, food is connected to many
aspects and activities in daily life, including health, culture, leisure and
social events. For instance, one new trend is sharing dining-out experiences on
photo-enabled social networks. In fact, people are increasingly interested in
discovering and sharing new cuisines, and knowing more about different aspects
of the food they consume. Another popular application is keeping a personal log
of daily meals and food intake. Food photos are popular, but in general users
annotate them poorly, either with rather useless tags (e.g. “today’s lunch”,
“delicious”), not accurate or generic tags (e.g. “Italian food”, “yellow rice”)
and even wrong tags. In fact, this is not surprising, as accurate photo
annotation requires specific domain knowledge and manual textual input is
time-consuming and prone to typos. Thus, automatic annotation from a photo
taken with the smart phone is much more convenient for the user, and automatic
tags are more accurate and useful for retrieval applications.
Disadvantage:
Reliable automatic food recognition can
enable countless functionalities in these systems. Examples include automatic
photo tagging, image-based retrieval (e.g. recipe, dietary properties),
recommendation (e.g. food, recipe, restaurant).
Proposed System:
In order to address complex recognition
problems, humans leverage prior and contextual knowledge. From the perspective
of neuroscience, cells in superior can associate information from multiple
sensory modalities (multisensory integration) and make it more useful than that
from one singular modality. Similarly, automatic systems also incorporate
multiple context and external knowledge to solve a simpler problem. For
instance, a tagging system can exploit internal and external context, by
exploiting the personal tagging history and other users’ tags in similar
photos. Personal interests and social circles are also helpful contexts. In
particular, mobile phones can capture rich contextual information, in
particular they can estimate their geographic location (i.e. geo location) from
GPS and mobile networks via their location services. The paradigmatic
application is touristic or urban landmark recognition with smart phones.
Typically, using image retrieval techniques, the most similar images are
retrieved, but only considering images whose geo location is in geographic
neighborhood. Classifiers can be used instead and the candidate classes are
limited to those a few candidates based on the geo location (i.e. shortlist
approach), although the nature of landmarks and buildings (rigid objects, and
intrinsically invariant) usually makes retrieval-based approaches with
geometric verification more effective. The advantage of exploiting contextual
information is twofold: simplifies the problem making it easier to solve, and
reduces the computational complexity. Similarly, we could exploit context for
food recognition.
Advantage:
Dish recognition in restaurants (we use
the term dish to emphasize that it is related to restaurants, and also more
specific than the term food). This scenario has contextual information we can
exploit ,since the ingredients, cooking style and presentation of dishes and
which dishes (i.e. menu) are very restaurant specific, and restaurants are also
naturally linked to a geo location.
Implementation Modules:
1.
Food recognition
2.
Image recognition exploiting geo location
3.
Dish recognition in restaurants
4.
Test-training mismatch in geo localized settings
Food recognition:
Dish recognition are mainly based on
analyzing the visual appearance. Some works address food recognition using
conventional visual features trying to capture the global appearance of the
food. proposed an automatic food image recognition system based on multiple
kernel learning (MKL),which integrates several kinds of image features (e.g. color,
texture, SIFT) to learn an optimal linear combination of feature-specific
kernels. extended the system proposed in [6] with more image features and food
classes. Maruyama et al. improved the
recognition accuracy by incrementally updating the classifier based on a
Bayesian network. proposed to exploit the structure of the food object which is
represented as the spatial distribution of the local textural structures and
encoded using shape context.
Image recognition exploiting geo location:
Previous works exploiting geographical
information to help visual recognition mainly target landmarks geographical
location recognition, (2) landmark mining, (3) tourism recommendation and (4)
3D scene modeling recognition, in which content classifiers are offline trained
and context is used to shortlist several candidate landmarks, then content analysis
is performed for recognition (for convenience we refer to this approach as
shortlist). Chen et al. score the images in database using a vocabulary tree
trained on SIFT descriptors, and geographically distant landmarks are excluded
using GPS coordinates associated with the query image, then approximate nearest
neighbors (ANN) is applied to find the nearest feature vectors within the
candidates. A photo recognition by including two types of geographical
information: raw values of latitude and longitude and visual features extracted
from aerial photos around the geo tagged location. In these two kinds of features are combined
using mined and modeled worldwide
landmarks by using agglomerative hierarchical clustering on the geo tag
coordinates. More related works can be found in recent surveys . Not only the
geo location can be used to recognize images, but images themselves can be
useful to estimate the Geo location. estimate the unknown geo location of an
image by searching visually similar images in a large set of geo tagged images.
Dish recognition in restaurants:
In contrast to generic food recognition,
dish recognition in restaurants emphasizes two elements. First, the problem is
localized, that is, we assume the user (and consequently the photo) is located
inside a restaurant. Second, we use the more specific term dish rather than
food to emphasize the relation with the menu of a restaurant. In general,
accurate dish recognition is very challenging, since the combined number of
classes can be very large. Variations in the ingredients and different cooking
and presentations used in different restaurants can cause a large visual
variability for the same dish, while accidental similarity between non-related
dishes causes inter-class similarity And
these problems become more significant in larger datasets with more restaurants
and dishes.
Test-training mismatch in geo localized settings:
The shortlist approach is an example of
classification in geo localized settings, in which the classification process
is modified by geo location information at query time. The reason why the
global model in the toy example cannot discriminate properly between the
candidate classes is because it has been trained to discriminate between the
original classes. We refer to this problem as test-training mismatch, because
classes and data involved during training are different from the classes and
data involved during test after adapting the classifier to the query. Even
ignoring classifiers from classes that are not included in the candidate set
(i.e. shortlist) does not guarantee that the classifier will discriminate
properly between the remaining classes, since the remaining classifiers were
also trained with negative samples from the discarded classes. Thus, those
negative samples from the discarded classes introduce certain training noise
and bias the models.
Configuration:-
H/W System Configuration:-
Processor - Pentium –III
Speed - 1.1 Ghz
RAM - 256
MB(min)
Hard Disk - 20 GB
Floppy Drive - 1.44 MB
Key Board
- Standard Windows Keyboard
Mouse - Two or Three Button Mouse
Monitor - SVGA
S/W System
Configuration:-
v Operating System :Windows/XP/7.
v Application
Server : Tomcat5.0/6.X
v Front End : HTML, Java, Jsp
v Scripts : JavaScript.
v Server side Script :
Java Server Pages.
v Database :
Mysql 5.0
v
Database
Connectivity : JDBC.
Comments
Post a Comment