Geo localized Modeling for Dish Recognition

Abstract:

Food-related photos have become increasingly popular, due to social networks, food recommendation and dietary assessment systems. Reliable annotation is essential in those systems, but unconstrained automatic food recognition is still not accurate enough. Most works focus on exploiting only the visual content while ignoring the context. To address this limitation, in this paper we explore leveraging geo location and external information about restaurants to simplify the classification problem. We propose a framework incorporating discriminative classification in geo localized settings and introduce the concept of geo localized models, which in our scenario are trained locally at each restaurant location. In particular, we propose two strategies to implement this framework: geo localized voting and combinations of bundled classifiers. Both models show promising performance, and the latter is particularly efficient and scalable. We collected a restaurant-oriented food dataset with food images, dish tags and restaurant level information, such as the menu and geo location. Experiments on this dataset show that exploiting geo location improves around 30% the recognition performance, and geo localized models contribute with an additional 3~8% absolute gain, while can be trained up to five times faster.

Architecturte Diagram:

Existing System:

Eating is essential for human life, both from personal and socio-cultural perspectives. Thus, food is connected to many aspects and activities in daily life, including health, culture, leisure and social events. For instance, one new trend is sharing dining-out experiences on photo-enabled social networks. In fact, people are increasingly interested in discovering and sharing new cuisines, and knowing more about different aspects of the food they consume. Another popular application is keeping a personal log of daily meals and food intake. Food photos are popular, but in general users annotate them poorly, either with rather useless tags (e.g. “today’s lunch”, “delicious”), not accurate or generic tags (e.g. “Italian food”, “yellow rice”) and even wrong tags. In fact, this is not surprising, as accurate photo annotation requires specific domain knowledge and manual textual input is time-consuming and prone to typos. Thus, automatic annotation from a photo taken with the smart phone is much more convenient for the user, and automatic tags are more accurate and useful for retrieval applications.

Disadvantage:

Reliable automatic food recognition can enable countless functionalities in these systems. Examples include automatic photo tagging, image-based retrieval (e.g. recipe, dietary properties), recommendation (e.g. food, recipe, restaurant).

Proposed System:

In order to address complex recognition problems, humans leverage prior and contextual knowledge. From the perspective of neuroscience, cells in superior can associate information from multiple sensory modalities (multisensory integration) and make it more useful than that from one singular modality. Similarly, automatic systems also incorporate multiple context and external knowledge to solve a simpler problem. For instance, a tagging system can exploit internal and external context, by exploiting the personal tagging history and other users’ tags in similar photos. Personal interests and social circles are also helpful contexts. In particular, mobile phones can capture rich contextual information, in particular they can estimate their geographic location (i.e. geo location) from GPS and mobile networks via their location services. The paradigmatic application is touristic or urban landmark recognition with smart phones. Typically, using image retrieval techniques, the most similar images are retrieved, but only considering images whose geo location is in geographic neighborhood. Classifiers can be used instead and the candidate classes are limited to those a few candidates based on the geo location (i.e. shortlist approach), although the nature of landmarks and buildings (rigid objects, and intrinsically invariant) usually makes retrieval-based approaches with geometric verification more effective. The advantage of exploiting contextual information is twofold: simplifies the problem making it easier to solve, and reduces the computational complexity. Similarly, we could exploit context for food recognition.

Advantage:

Dish recognition in restaurants (we use the term dish to emphasize that it is related to restaurants, and also more specific than the term food). This scenario has contextual information we can exploit ,since the ingredients, cooking style and presentation of dishes and which dishes (i.e. menu) are very restaurant specific, and restaurants are also naturally linked to a geo location.

Implementation Modules:

1. Food recognition

2. Image recognition exploiting geo location

3. Dish recognition in restaurants

4. Test-training mismatch in geo localized settings

Food recognition:

Dish recognition are mainly based on analyzing the visual appearance. Some works address food recognition using conventional visual features trying to capture the global appearance of the food. proposed an automatic food image recognition system based on multiple kernel learning (MKL),which integrates several kinds of image features (e.g. color, texture, SIFT) to learn an optimal linear combination of feature-specific kernels. extended the system proposed in [6] with more image features and food classes. Maruyama et al. improved the recognition accuracy by incrementally updating the classifier based on a Bayesian network. proposed to exploit the structure of the food object which is represented as the spatial distribution of the local textural structures and encoded using shape context.

Image recognition exploiting geo location:

Previous works exploiting geographical information to help visual recognition mainly target landmarks geographical location recognition, (2) landmark mining, (3) tourism recommendation and (4) 3D scene modeling recognition, in which content classifiers are offline trained and context is used to shortlist several candidate landmarks, then content analysis is performed for recognition (for convenience we refer to this approach as shortlist). Chen et al. score the images in database using a vocabulary tree trained on SIFT descriptors, and geographically distant landmarks are excluded using GPS coordinates associated with the query image, then approximate nearest neighbors (ANN) is applied to find the nearest feature vectors within the candidates. A photo recognition by including two types of geographical information: raw values of latitude and longitude and visual features extracted from aerial photos around the geo tagged location. In these two kinds of features are combined using mined and modeled worldwide landmarks by using agglomerative hierarchical clustering on the geo tag coordinates. More related works can be found in recent surveys . Not only the geo location can be used to recognize images, but images themselves can be useful to estimate the Geo location. estimate the unknown geo location of an image by searching visually similar images in a large set of geo tagged images.

Dish recognition in restaurants:

In contrast to generic food recognition, dish recognition in restaurants emphasizes two elements. First, the problem is localized, that is, we assume the user (and consequently the photo) is located inside a restaurant. Second, we use the more specific term dish rather than food to emphasize the relation with the menu of a restaurant. In general, accurate dish recognition is very challenging, since the combined number of classes can be very large. Variations in the ingredients and different cooking and presentations used in different restaurants can cause a large visual variability for the same dish, while accidental similarity between non-related dishes causes inter-class similarity And these problems become more significant in larger datasets with more restaurants and dishes.

Test-training mismatch in geo localized settings:

The shortlist approach is an example of classification in geo localized settings, in which the classification process is modified by geo location information at query time. The reason why the global model in the toy example cannot discriminate properly between the candidate classes is because it has been trained to discriminate between the original classes. We refer to this problem as test-training mismatch, because classes and data involved during training are different from the classes and data involved during test after adapting the classifier to the query. Even ignoring classifiers from classes that are not included in the candidate set (i.e. shortlist) does not guarantee that the classifier will discriminate properly between the remaining classes, since the remaining classifiers were also trained with negative samples from the discarded classes. Thus, those negative samples from the discarded classes introduce certain training noise and bias the models.

Configuration:-

H/W System Configuration:-

Processor - Pentium –III

Speed - 1.1 Ghz

RAM - 256 MB(min)

Hard Disk - 20 GB

Floppy Drive - 1.44 MB

Key Board - Standard Windows Keyboard

Mouse - Two or Three Button Mouse

Monitor - SVGA

S/W System Configuration:-

v Operating System :Windows/XP/7.

v Application Server : Tomcat5.0/6.X

v Front End : HTML, Java, Jsp

v Scripts : JavaScript.

v Server side Script : Java Server Pages.

v Database : Mysql 5.0

v Database Connectivity : JDBC.

SPRING SOURCE TECHNOLOGIES

Search This Blog

Geo localized Modeling for Dish Recognition

Configuration:-

H/W System Configuration:-

Processor - Pentium –III

S/W System Configuration:-

Comments

Post a Comment

Popular posts from this blog

Enabling Cloud Storage Auditing with Verifiable Outsourcing of Key Updates

PUNCHING MACHINE

garbage monitoring using arduino code with gsm