A CROWDSOURCING WORKER QUALITY EVALUATION ALGORITHM ON MAPREDUCE FOR BIG DATA APPLICATIONS

A CROWDSOURCING WORKER QUALITY EVALUATION ALGORITHM ON MAPREDUCE FOR BIG DATA APPLICATIONS

ABSTRACT

Crowd sourcing is a new emerging distributed computing and business model on the backdrop of Internet blossoming. With the development of crowd sourcing systems, the data size of crowdsourcers, contractors and tasks grows rapidly. The worker quality evaluation based on big data analysis technology has become a critical challenge. This paper first proposes a general worker quality evaluation algorithm that is applied to any critical tasks such as tagging, matching, filtering, categorization and many other emerging applications, without wasting resources. Second, we realize the evaluation algorithm in the Hadoop platform using the Map Reduce parallel programming model. Finally, to effectively verify the accuracy and the effectiveness of the algorithm in a wide variety of big data scenarios, we conduct a series of experiments. The experimental results demonstrate that the proposed algorithm is accurate and effective. It has high computing performance and horizontal scalability. And it is suitable for large-scale worker quality evaluations in a big data environment.

CHAPTER 2

INTRODUCTION

1.1 OVERVIEW

Crowdsourcing is a distributed problem-solving and production model. In this distributed computing model, enterprises distribute tasks through the Inter-net and recruit more suitable workers to involve in the task to solve technical difficulties. Nowadays, more and more businesses and enterprises have begun to use the crowdsourcing model. For enterprises, the crowdsourcing model can reduce production cost, and promote their technology and creativity. The crowdsourcing model is oriented to the public, and every Internet user can choose to participate in the crowdsourcing tasks that they are interested in to provide solutions for enterprises. However, for one task, there may be a large number of workers involved in it and provide solutions. The crowdsourcers will be confused when they faced with such a huge number of solutions and it is difficult for them to make a final choice.

Moreover, not every person is qualified to serve enterprises because of their different backgrounds and different personal qualities. There may even be malicious workers in crowdsourcing platform. Therefore, worker quality control has gradually become an important challenge for the crowdsourcing model. It is of great importance to mine the information about the worker’s self quality from a large number of worker data to provide the crowdsourcers some reference.

This paper mainly studies the core problem of worker quality control: worker quality evaluation. The worker quality evaluation will help enterprises recruit high-quality workers who can provide them high-quality solutions. It is of great significance to both the quality of the tasks and the environment of the crowdsourcing platform.

1.2 CLOUD COMPUTING

Cloud computing is a recently evolved computing terminology or metaphor based on utility and consumption of computing resources. Cloud computing involves deploying groups of remote servers and software networks that allow centralized data storage and online access to computer services or resources. Clouds can be classified as public, private or hybrid.

http://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Cloud_computing.svg/400px-Cloud_computing.svg.png

Cloud computing relies on sharing of resources to achieve coherence and economies of scale, similar to a utility (like the electricity grid) over a network. At the foundation of cloud computing is the broader concept of converged infrastructure and shared services.

Cloud computing, or in simpler shorthand just "the cloud", also focuses on maximizing the effectiveness of the shared resources. Cloud resources are usually not only shared by multiple users but are also dynamically reallocated per demand. This can work for allocating resources to users. For example, a cloud computer facility that serves European users during European business hours with a specific application (e.g., email) may reallocate the same resources to serve North American users during North America's business hours with a different application (e.g., a web server). This approach should maximize the use of computing power thus reducing environmental damage as well since less power, air conditioning, rack space, etc. are required for a variety of functions. With cloud computing, multiple users can access a single server to retrieve and update their data without purchasing licenses for different applications.

1.3 MODELS

The term "moving to cloud" also refers to an organization moving away from a traditional CAPEX model (buy the dedicated hardware and depreciate it over a period of time) to the OPEX model (use a shared cloud infrastructure and pay as one uses it).

Proponents claim that cloud computing allows companies to avoid upfront infrastructure costs, and focus on projects that differentiate their businesses instead of on infrastructure.Proponents also claim that cloud computing allows enterprises to get their applications up and running faster, with improved manageability and less maintenance, and enables IT to more rapidly adjust resources to meet fluctuating and unpredictable business demand. Cloud providers typically use a "pay as you go" model. This can lead to unexpectedly high charges if administrators do not adapt to the cloud pricing model.

The present availability of high-capacity networks, low-cost computers and storage devices as well as the widespread adoption of hardware virtualization, service-oriented architecture, and autonomic and utility computing have led to a growth in cloud computing. Cloud storage offers an on-demand data outsourcing service model, and is gaining popularity due to its elasticity and low maintenance cost. However, security concerns arise when data storage is outsourced to third-party cloud storage providers. It is desirable to enable cloud clients to verify the integrity of their outsourced data, in case their data have been accidentally corrupted or maliciously compromised by insider/outsider attacks.

1.4 MAJOR USE OF CLOUD STORAGE

One major use of cloud storage is long-term archival, which represents a workload that is written once and rarely read. While the stored data are rarely read, it remains necessary to ensure its integrity for disaster recovery or compliance with legal requirements . Since it is typical to have a huge amount of archived data, whole-file checking becomes prohibitive. Proof of retrievability (POR) and proof of data possession(PDP) have thus been proposed to verify the integrity of a large file by spot-checking only a fraction of the file via various crypto-graphic primitives.

This system continues to use random masking to support data privacy during public auditing, and leverage index hash tables to support fully dynamic operations on shared data. A dynamic operation indicates an insert, delete or update operation on a single block in shared data.

CLOUD computing has been considered as a new model of enterprise IT infrastructure, which can organize huge resource of computing, storage and applications, and enable users to enjoy ubiquitous, convenient

and on-demand network access to a shared pool of configurable computing resources with great efficiency and minimal economic overhead. Attracted by these appealing features, both individuals and enterprises are motivated to outsource their data to the cloud, instead of purchasing software and hardware to manage the data themselves.

1.5 ADVANTAGES

Despite of the various advantages of cloud services, outsourcing sensitive information (such as e-mails, personal health records, company finance data, government documents, etc.) to remote servers brings privacy concerns. The cloud service providers (CSPs) that keep the data for users may access users’ sensitive information without authorization. A general approach to protect the data confidentiality is to encrypt the data before outsourcing. However, this will cause a huge cost in terms of data usability.

Cloud computing, also known as 'on-demand computing', is a kind of Internet-based computing, where shared resources, data and information are provided to computers and other devices on-demand. It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources.Cloud computing and storage solutions provide users and enterprises with various capabilities to store and process their data in third-party data centers. It relies on sharing of resources to achieve coherence and economies of scale, similar to a utility (like the electricity grid) over a network. At the foundation of cloud computing is the broader concept of converged infrastructure and shared services.

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort.

Cloud computing poses privacy concerns because the service provider can access the data that is in the cloud at any time. It could accidentally or deliberately alter or even delete information.Many cloud providers can share information with third parties if necessary for purposes of law and order even without a warrant. That is permitted in their privacy policies which users have to agree to before they start using cloud services. Solutions to privacy include policy and legislation as well as end users' choices for how data is stored.Users can encrypt data that is processed or stored within the cloud to prevent unauthorized access.

The shared data in cloud servers, however, usually contains users’ sensitive information (e.g., personal profile, financial data, health records, etc.) and needs to be well protected. As the ownership of the data is separated from the administration of them, the cloud servers may migrate users’ data to other cloud servers in outsourcing or share them in cloud searching. Therefore, it becomes a big challenge to protect the privacy of those shared data in cloud, especially in cross-cloud and big data environment. In order to meet this challenge, it is necessary to design a comprehensive solution to support user-defined authorization period and to provide fine-grained access control during this period.

CHAPTER 2

SYSTEM ANALYSIS

In this phase a detailed appraisal of the existing system is explained. This appraisal includes how the system works and what it does. It also includes finding out in more detail- what are the problems with the system and what user requires from the new system or any new change in system. The output of this phase results in the detail model of the system. The model describes the system functions and data and system information flow. The phase also contains the detail set of user requirements and these requirements are used to set objectives for the new system.

2.1 CURRENT SYSTEM:

Crowdsourcers almost release tasks at all times due to the large-scale crowdsourcing platform. Additionally, a large number of workers participate in these tasks. Therefore, the crowdsourcing platform will generate a large amount of data every moment, including crowdsourcing tasks, worker behaviours, and the solutions of tasks. The large amount of data put forward new demands to the calculated performance of crowdsourcing platform. The use of big data technology to specially process these massive data is a key issue that the crowdsourcing platform needs to consider.

2.2 SHORTCOMINGS OF THE CURRENT SYSTEM:

· Most of these crowdsourcing systems rely on offline or artificial worker quality control and evaluation or simply ignore the quality control issues.

· High computational cost.

· Performance accuracy is less.

2.3 PROPOSED SYSTEM:

Therefore, to evaluate the quality of the workers in the crowdsourcing platform accurately, we first propose a general worker quality evaluation algorithm. This algorithm achieves the worker quality evaluation for multiple workers and multiple problem types with no pre-developed answer, and the algorithm has a stronger scalability and practicality compared with the algorithm. Second, we propose to use the MapReduce programming model to realize large-scale parallel computing for worker quality and implement the proposed algorithm in the Hadoop platform. Finally, we conduct a series of experiments to analyse and evaluate the performance of worker quality evaluation algorithm.

2.4 ADVANTAGE OF PROPOSED SYSTEM:

· The proposed algorithm is effective and has a high performance.

· It can meet the needs of parallel evaluation of the large-scale workers in a crowdsourcing platform.

CHAPTER 3

LITERATURE SURVEY

3.1 OVERVIEW:

A literature review is an account of what has been published on a topic by accredited scholars and researchers. Occasionally you will be asked to write one as a separate assignment, but more often it is part of the introduction to an essay, research report, or thesis. In writing the literature review, your purpose is to convey to your reader what knowledge and ideas have been established on a topic, and what their strengths and weaknesses are. As a piece of writing, the literature review must be defined by a guiding concept (e.g., your research objective, the problem or issue you are discussing or your argumentative thesis). It is not just a descriptive list of the material available, or a set of summaries

Besides enlarging your knowledge about the topic, writing a literature review lets you gain and demonstrate skills in two areas

1. INFORMATION SEEKING: the ability to scan the literature efficiently, using manual or computerized methods, to identify a set of useful articles and books

2. CRITICAL APPRAISAL: the ability to apply principles of analysis to identify unbiased and valid studies.

3.2 Using Crowdsourcing and Active Learning to Track Sentiment in Online Media

Author- Anthony Brew, Derek Greene

Abstract

Tracking sentiment in the popular media has long been of interest to media analysts and pundits. With the availability of news content via online syndicated feeds, it is now possible to automate some aspects of this process. There is also great potential to crowdsource Crowdsourcing is a term, sometimes associated with Web 2.0 technologies, that describes outsourcing of tasks to a large often anonymous community. much of the annotation work that is required to train a machine learning system to perform sentiment scoring. We describe such a system for tracking economic sentiment in online media that has been deployed since August 2009. It uses annotations provided by a cohort of non-expert annotators to train a learning system to classify a large body of news items. We report on the design challenges addressed in managing the effort of the annotators and in making annotation an interesting experience.

3.3 Parallel rough set based knowledge acquisition using MapReduce from big data

Author – Junbo Zhang, Yi Pan

Abstract

Nowadays, with the volume of data growing at an unprecedented rate, big data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently introduced MapReduce technique has received much attention from both scientific community and industry for its applicability in big data analysis. To mine knowledge from big data, we present parallel rough set based methods for knowledge acquisition using MapReduce in this paper. Comprehensive experimental evaluation on large data sets shows that the proposed parallel methods can effectively process big data.

3.4 Dryad: distributed data-parallel programs from sequential building blocks

Author - M. Isard, M. Budiu

Abstract

Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational "vertices" with communication "channels" to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of available computers, communicating as appropriate through flies, TCP pipes, and shared-memory FIFOs. The vertices provided by the application developer are quite simple and are usually written as sequential programs with no thread creation or locking. Concurrency arises from Dryad scheduling vertices to run simultaneously on multiple computers, or on multiple CPU cores within a computer. The application can discover the size and placement of data at run time, and modify the graph as the computation progresses to make efficient use of the available resources.

Dryad is designed to scale from powerful multi-core single computers, through small clusters of computers, to data centers with thousands of computers. The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.

3.5 CrowdER: crowdsourcing entity resolution

Author - J. Wang, T. Kraska

Abstract

Entity resolution is central to data integration and data cleaning. Algorithmic approaches have been improving in quality, but remain far from perfect. Crowdsourcing platforms offer a more accurate but expensive (and slow) way to bring human insight into the process. Previous work has proposed batching verification tasks for presentation to human workers but even with batching, a human-only approach is infeasible for data sets of even moderate size, due to the large numbers of matches to be tested. Instead, we propose a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are used to verify only the most likely matching pairs. We show that for such a hybrid system, generating the minimum number of verification tasks of a given size is NP-Hard, but we develop a novel two-tiered heuristic approach for creating batched tasks. We describe this method, and present the results of extensive experiments on real data sets using a popular crowdsourcing platform. The experiments show that our hybrid approach achieves both good efficiency and high accuracy compared to machine-only or human-only alternatives.

3.6 Robust Trajectory Estimation for Crowdsourcing-Based Mobile Applications

Author - Yunhao Liu ; Kai Xing

Abstract:

Crowdsourcing-based mobile applications are becoming more and more prevalent in recent years, as smartphones equipped with various built-in sensors are proliferating rapidly. The large quantity of crowdsourced sensing data stimulates researchers to accomplish some tasks that used to be costly or impossible, yet the quality of the crowdsourced data, which is of great importance, has not received sufficient attention. In reality, the low-quality crowdsourced data are prone to containing outliers that may severely impair the crowdsourcing applications. Thus in this work, we conduct pioneer investigation considering crowdsourced data quality. Specifically, we focus on estimating user motion trajectory information, which plays an essential role in multiple crowdsourcing applications, such as indoor localization, context recognition, indoor navigation, etc. We resort to the family of robust statistics and design a robust trajectory estimation scheme, name TrMCD, which is capable of alleviating the negative influence of abnormal crowdsourced user trajectories, differentiating normal users from abnormal users, and overcoming the challenge brought by spatial unbalance of crowdsourced trajectories. Two real field experiments are conducted and the results show that TrMCD is robust and effective in estimating user motion trajectories and mapping fingerprints to physical locations.

CHAPTER 4

IMPLEMENTATION

Implementation is the stage of the project when the theoretical design is turned out into a working system. Thus it can be considered to be the most critical stage in achieving a successful new system and in giving the user, confidence that the new system will work and be effective.

The implementation stage involves careful planning, investigation of the existing system and it’s constraints on implementation, designing of methods to achieve changeover and evaluation of changeover methods.

4.1 MODULES:

A module is a part of a program. Programs are composed of one or more independently developed modules that are not combined until the program is linked. A single module can contain one or several routines.

Our project modules are given below:

1. Task One

In task one will first pre-process the initial data by Ptype to obtain the data set that can be processed by multi-worker evaluation scheme of the M-1 algorithm. And then it will group the workers who are involved in the same task.

Reduce-1 receives the output of Map-1 as an input and groups the workers who own the same task id. To maintain the consistency of worker grouping on different reduce tasks, we need to sort the workers with the same task id before grouping and then adopt the sliding window to group every three workers as a unit.

2. Task Two

The second MapReduce task receives the output of Reduce-1 as an input. It aims at calculating the accuracy of various workers. Map-2 processes the output data of Reduce-1. To as-sign the three workers who participate in the same task and also in the same sub group to the same Reducer, Map-2 takes Tid and the user’s combination ID Wi+Wj+Wk as the output key of the map task.

Reduce-2 receives the output of Map-2 as an input. For all of the values that have the same key, we group them according to Ptype and use the proposed M-1 algorithm to calculate workers’ partial accuracy on different Ptype.

Then, we calculate each worker’s accuracy according to each worker’s partial accuracy. Finally, the output is in the form of <Wi+Wj+Wk+Tid, Ai+Aj+Ak>.

3. Task Three

The algorithm adopts the sliding window algorithm to calculate the worker accuracy, so each worker’s accuracy is calculated three times. We take the average value of the three accuracies as the indicator to evaluate the worker quality to avoid the accuracy evaluation bias caused by a single calculation.

Map-3 takes <Wid+Tid> as the key to shuffle and as-signs the same worker’s three accuracies of one task to the same Reducer. Reduce-3 receives the output of Map-3 as an input to calculate the average accuracy of each worker. The output is in the form of <Wid+Tid,avgAid>, and avgAid is the final result.

CHAPTER 5

5.1 METHODOLOGY

WORKER QUALITY EVALUATION ALGORITHM

M-1 Algorithm

The idea of the M-1 algorithm is described as follows: Suppose all of the provided problems are of the same type (single choice), and have no pre-developed answer.

Let three workers w1, w2 and w3 answer these problems independently at the same time. The number of problems is N. Then, we will calculate each worker’s accuracy to these problems according to the similarities in their responses.

Multi-worker Evaluation Scheme Based on M-1 Algorithm

In the M-1 algorithm, we solve the problem of three-worker quality evaluation. However, there may be multiple workers involved in the same task at the same time in actual crowdsourcing environment. How to evaluate the quality of multiple workers is a more practical problem that remains to be solved. Therefore, we propose a multi-worker evaluation scheme based on the M-1 algorithm, which uses the idea of sliding window.

M-X Algorithm

Compared to single choice, multiple-choice is a more general problem type. Or rather, single choice is a special form of multiple-choice. For example, for some labeling issues, we only need to assign one label to each object. However, for most cases, we need to assign multiple labels to each object.

5.2 OJECTIVE AND MOTIVATION

OBJECTIVE

A general worker quality evaluation algorithm that is applied to any critical tasks such as tagging, matching, filtering, categorization and many other emerging applications, without wasting resources. Second, we realize the evaluation algorithm in the Hadoop platform using the MapReduce parallel programming model. Finally, to effectively verify the accuracy and the effectiveness of the algorithm in a wide variety of big data scenarios, we conduct a series of experiments.

MOTIVATION

To evaluate the quality of the workers in the crowdsourcing platform accurately, we first propose a general worker quality evaluation algorithm. This algorithm achieves the worker quality evaluation for multiple workers and multiple problem types with no pre-developed answer, and the algorithm has a stronger scalability and practicality compared with the algorithm presented in reference. Second, we propose to use the MapReduce programming model to realize large-scale parallel computing for worker quality and implement the proposed algorithm in the Hadoop platform. Finally, we conduct a series of experiments to analyse and evaluate the performance of worker quality evaluation algorithm.

CHAPTER 6

SYSTEM SPECIFICATION

The purpose of system requirement specification is to produce the specification analysis of the task and also to establish complete information about the requirement, behavior and other constraints such as functional performance and so on. The goal of system requirement specification is to completely specify the technical requirements for the product in a concise and unambiguous manner.

6.1 HARDWARE REQUIREMENTS

• Processor - Pentium –III

• Speed - 1.1 Ghz

• RAM - 256 MB(min)

• Hard Disk - 20 GB

• Floppy Drive - 1.44 MB

• Key Board - Standard Windows Keyboard

• Mouse - Two or Three Button Mouse

• Monitor - SVGA

6.2 SOFTWARE REQUIREMENTS

• Operating System : Windows 8

• Front End : Java

• Database : Mysql

CHAPTER 7

SOFTWARE ENVIRONMENT

JAVA:

Java is a programming language created by James Gosling from Sun Microsystems (Sun) in 1991. The target of Java is to write a program once and then run this program on multiple operating systems. The first publicly available version of Java (Java 1.0) was released in 1995. Sun Microsystems was acquired by the Oracle Corporation in 2010. Oracle has now the steermanship for Java. In 2006 Sun started to make Java available under the GNU General Public License (GPL). Oracle continues this project called OpenJDK.

8.2 PLATFORM INDEPENDENT

Unlike many other programming languages including C and C++ when Java is compiled, it is not compiled into platform specific machine, rather into platform independent byte code. This byte code is distributed over the web and interpreted by virtual Machine (JVM) on whichever platform it is being run.

JAVA VIRTUAL MACHINE

Java was designed with a concept of ‘write once and run everywhere’. Java Virtual Machine plays the central role in this concept. The JVM is the environment in which Java programs execute. It is a software that is implemented on top of real hardware and operating system. When the source code (.java files) is compiled, it is translated into byte codes and then placed into (.class) files. The JVM executes these bytecodes. So Java byte codes can be thought of as the machine language of the JVM. A JVM can either interpret the bytecode one instruction at a time or the bytecode can be compiled further for the real microprocessor using what is called a just-in-time compiler. The JVM must be implemented on a particular platform before compiled programs can run on that platform.

JAVA DEVELOPMENT KIT

The Java Development Kit (JDK) is a Sun product aimed at Java developers. Since the introduction of Java, it has been by far the most widely used Java software development kit (SDK). It contains a Java compiler, a full copy of the Java Runtime Environment (JRE), and many other important development tools.

TOOLS

You will need a Pentium 200-MHz computer with a minimum of 64 MB of RAM (128 MB of RAM recommended).

You will also need the following softwares :

· Linux 7.1 or Windows xp/7/8 operating system

· Java JDK 8

· Microsoft Notepad or any other text editor

FEATURES

· Reusability of Code

· Emphasis on data rather than procedure

· Data is hidden and cannot be accessed by external functions

· Objects can communicate with each other through functions

· New data and functions can be easily added

What is a Java Web Application?

A Java web application generates interactive web pages containing various types of markup language (HTML, XML, and so on) and dynamic content. It is typically comprised of web components such as JavaServer Pages (JSP), servlets and JavaBeans to modify and temporarily store data, interact with databases and web services, and render content in response to client requests.

Because many of the tasks involved in web application development can be repetitive or require a surplus of boilerplate code, web frameworks can be applied to alleviate the overhead associated with common activities. For example, many frameworks, such as JavaServer Faces, provide libraries for templating pages and session management, and often promote code reuse.

What is Java EE?

Java EE (Enterprise Edition) is a widely used platform containing a set of coordinated technologies that significantly reduce the cost and complexity of developing, deploying, and managing multi-tier, server-centric applications. Java EE builds upon the Java SE platform and provides a set of APIs (application programming interfaces) for developing and running portable, robust, scalable, reliable and secure server-side applications.

Some of the fundamental components of Java EE include:

Enterprise JavaBeans (EJB): a managed, server-side component architecture used to encapsulate the business logic of an application. EJB technology enables rapid and simplified development of distributed, transactional, secure and portable applications based on Java technology.
Java Persistence API (JPA): a framework that allows developers to manage data using object-relational mapping (ORM) in applications built on the Java Platform.

JavaScript and Ajax Development

JavaScript is an object-oriented scripting language primarily used in client-side interfaces for web applications. Ajax (Asynchronous JavaScript and XML) is a Web 2.0 technique that allows changes to occur in a web page without the need to perform a page refresh. JavaScript toolkits can be leveraged to implement Ajax-enabled components and functionality in web pages.

Web Server and Client

Web Server is software that can process the client request and send the response back to the client. For example, Apache is one of the most widely used web server. Web Server runs on some physical machine and listens to client request on specific port.

A web client is software that helps in communicating with the server. Some of the most widely used web clients are Firefox, Google Chrome, Safari etc. When we request something from server (through URL), web client takes care of creating a request and sending it to server and then parsing the server response and present it to the user.

HTML and HTTP

Web Server and Web Client are two separate softwares, so there should be some common language for communication. HTML is the common language between server and client and stands for HyperText Markup Language.

Web server and client needs a common communication protocol, HTTP (HyperText Transfer Protocol) is the communication protocol between server and client. HTTP runs on top of TCP/IP communication protocol.

Some of the important parts of HTTP Request are:

HTTP Method – action to be performed, usually GET, POST, PUT etc.
URL – Page to access
Form Parameters – similar to arguments in a java method, for example user,password details from login page.

Sample HTTP Request:

GET /FirstServletProject/jsps/hello.jsp HTTP/1.1

Host: localhost:8080

Cache-Control: no-cache

Some of the important parts of HTTP Response are:

Status Code – an integer to indicate whether the request was success or not. Some of the well known status codes are 200 for success, 404 for Not Found and 403 for Access Forbidden.
Content Type – text, html, image, pdf etc. Also known as MIME type
Content – actual data that is rendered by client and shown to user.

MIME Type or Content Type: If you see above sample HTTP response header, it contains tag “Content-Type”. It’s also called MIME type and server sends it to client to let them know the kind of data it’s sending. It helps client in rendering the data for user. Some of the mostly used mime types are text/html, text/xml, application/xml etc.

Understanding URL

URL is acronym of Universal Resource Locator and it’s used to locate the server and resource. Every resource on the web has it’s own unique address. Let’s see parts of URL with an example.

http://localhost:8080/FirstServletProject/jsps/hello.jsp

http:// – This is the first part of URL and provides the communication protocol to be used in server-client communication.

localhost – The unique address of the server, most of the times it’s the hostname of the server that maps to unique IP address. Sometimes multiple hostnames point to same IP addresses and web server virtual host takes care of sending request to the particular server instance.

8080 – This is the port on which server is listening, it’s optional and if we don’t provide it in URL then request goes to the default port of the protocol. Port numbers 0 to 1023 are reserved ports for well known services, for example 80 for HTTP, 443 for HTTPS, 21 for FTP etc.

FirstServletProject/jsps/hello.jsp – Resource requested from server. It can be static html, pdf, JSP, servlets, PHP etc.

Why we need Servlet and JSPs?

Web servers are good for static contents HTML pages but they don’t know how to generate dynamic content or how to save data into databases, so we need another tool that we can use to generate dynamic content. There are several programming languages for dynamic content like PHP, Python, Ruby on Rails, Java Servlets and JSPs.

Java Servlet and JSPs are server side technologies to extend the capability of web servers by providing support for dynamic response and data persistence.

Web Container

Tomcat is a web container, when a request is made from Client to web server, it passes the request to web container and it’s web container job to find the correct resource to handle the request (servlet or JSP) and then use the response from the resource to generate the response and provide it to web server. Then web server sends the response back to the client.

When web container gets the request and if it’s for servlet then container creates two Objects HTTPServletRequest and HTTPServletResponse. Then it finds the correct servlet based on the URL and creates a thread for the request. Then it invokes the servlet service() method and based on the HTTP method service() method invokes doGet() or doPost() methods. Servlet methods generate the dynamic page and write it to response. Once servlet thread is complete, container converts the response to HTTP response and send it back to client.

Some of the important work done by web container are:

Communication Support – Container provides easy way of communication between web server and the servlets and JSPs. Because of container, we don’t need to build a server socket to listen for any request from web server, parse the request and generate response. All these important and complex tasks are done by container and all we need to focus is on our business logic for our applications.
Lifecycle and Resource Management – Container takes care of managing the life cycle of servlet. Container takes care of loading the servlets into memory, initializing servlets, invoking servlet methods and destroying them. Container also provides utility like JNDI for resource pooling and management.
Multithreading Support – Container creates new thread for every request to the servlet and when it’s processed the thread dies. So servlets are not initialized for each request and saves time and memory.
JSP Support – JSPs doesn’t look like normal java classes and web container provides support for JSP. Every JSP in the application is compiled by container and converted to Servlet and then container manages them like other servlets.
Miscellaneous Task – Web container manages the resource pool, does memory optimizations, run garbage collector, provides security configurations, support for multiple applications, hot deployment and several other tasks behind the scene that makes our life easier.

CHAPTER 8

SYSTEM DESIGN

8.1 USE CASE DIAGRAM:

To model a system the most important aspect is to capture the dynamic behaviour. To clarify a bit in details, dynamic behaviour means the behaviour of the system when it is running /operating. So only static behaviour is not sufficient to model a system rather dynamic behaviour is more important than static behaviour.

In UML there are five diagrams available to model dynamic nature and use case diagram is one of them. Now as we have to discuss that the use case diagram is dynamic in nature there should be some internal or external factors for making the interaction. These internal and external agents are known as actors. So use case diagrams are consists of actors, use cases and their relationships.

The diagram is used to model the system/subsystem of an application. A single use case diagram captures a particular functionality of a system. So to model the entire system numbers of use case diagrams are used. A use case diagram at its simplest is a representation of a user's interaction with the system and depicting the specifications of a use case. A use case diagram can portray the different types of users of a system and the case and will often be accompanied by other types of diagrams as well.

8.2 CLASS DIAGRAM:

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static structure diagram that describes the structure of a system by showing the system's classes, their attributes, operations (or methods), and the relationships among the classes. It explains which class contains information.

8.3 SEQUENCE DIAGRAM:

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams.

8.4 COLLABORATION DIAGRAM

8.5 ACTIVITY DIAGRAM:

Activity diagrams are graphical representations of workflows of stepwise activities and actions with support for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams can be used to describe the business and operational step-by-step workflows of components in a system. An activity diagram shows the overall flow of control.

ABLE DESIGN:

Upload

Transaction

Request

Cloud Register

Attacker

CHAPTER 9

INPUT DESIGN AND OUTPUT DESIGN

INPUT DESIGN

The input design is the link between the information system and the user. It comprises the developing specification and procedures for data preparation and those steps are necessary to put transaction data in to a usable form for processing can be achieved by inspecting the computer to read data from a written or printed document or it can occur by having people keying the data directly into the system. The design of input focuses on controlling the amount of input required, controlling the errors, avoiding delay, avoiding extra steps and keeping the process simple. The input is designed in such a way so that it provides security and ease of use with retaining the privacy. Input Design considered the following things:’

Ø What data should be given as input?

Ø How the data should be arranged or coded?

Ø The dialog to guide the operating personnel in providing input.

Ø Methods for preparing input validations and steps to follow when error occur.

OBJECTIVES

1.Input Design is the process of converting a user-oriented description of the input into a computer-based system. This design is important to avoid errors in the data input process and show the correct direction to the management for getting correct information from the computerized system.

2. It is achieved by creating user-friendly screens for the data entry to handle large volume of data. The goal of designing input is to make data entry easier and to be free from errors. The data entry screen is designed in such a way that all the data manipulates can be performed. It also provides record viewing facilities.

3.When the data is entered it will check for its validity. Data can be entered with the help of screens. Appropriate messages are provided as when needed so that the user

will not be in maize of instant. Thus the objective of input design is to create an input layout that is easy to follow

OUTPUT DESIGN

A quality output is one, which meets the requirements of the end user and presents the information clearly. In any system results of processing are communicated to the users and to other system through outputs. In output design it is determined how the information is to be displaced for immediate need and also the hard copy output. It is the most important and direct source information to the user. Efficient and intelligent output design improves the system’s relationship to help user decision-making.

1. Designing computer output should proceed in an organized, well thought out manner; the right output must be developed while ensuring that each output element is designed so that people will find the system can use easily and effectively. When analysis design computer output, they should Identify the specific output that is needed to meet the requirements.

2.Select methods for presenting information.

3.Create document, report, or other formats that contain information produced by the system.

The output form of an information system should accomplish one or more of the following objectives.

v Convey information about past activities, current status or projections of the

v Future.

v Signal important events, opportunities, problems, or warnings.

v Trigger an action.

v Confirm an action.

CHAPTER 10

SYSTEM STUDY

FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. For feasibility analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are

¨ Economical feasibility

¨ Technical feasibility

¨ Social feasibility

ECONOMICAL FEASIBILITY:

This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.

TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.

SOCIAL FEASIBILITY:

The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

CHAPTER 11

SYSTEM TESTING

The purpose of testing is to discover errors. Testing is the process of trying to discover every conceivable fault or weakness in a work product. It provides a way to check the functionality of components, sub assemblies, assemblies and/or a finished product It is the process of exercising software with the intent of ensuring that the Software system meets its requirements and user expectations and does not fail in an unacceptable manner. There are various types of test. Each test type addresses a specific testing requirement.

TYPES OF TESTS:

Testing is the process of trying to discover every conceivable fault or weakness in a work product. The different type of testing are given below:

UNIT TESTING:

Unit testing involves the design of test cases that validate that the internal program logic is functioning properly, and that program inputs produce valid outputs. All decision branches and internal code flow should be validated. It is the testing of individual software units of the application .it is done after the completion of an individual unit before integration.

This is a structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform basic tests at component level and test a specific business process, application, and/or system configuration. Unit tests ensure that each unique path of a business process performs accurately to the documented specifications and contains clearly defined inputs and expected results.

INTEGRATION TESTING:

Integration tests are designed to test integrated software components to determine if they actually run as one program. Testing is event driven and is more concerned with the basic outcome of screens or fields. Integration tests demonstrate that although the components were individually satisfaction, as shown by successfully unit testing, the combination of components is correct and consistent. Integration testing is specifically aimed at exposing the problems that arise from the combination of components.

FUNCTIONAL TEST:

Functional tests provide systematic demonstrations that functions tested are available as specified by the business and technical requirements, system documentation, and user manuals.

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/ Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key functions, or special test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields, predefined processes, and successive processes must be considered for testing. Before functional testing is complete, additional tests are identified and the effective value of current tests is determined.

SYSTEM TEST:

System testing ensures that the entire integrated software system meets requirements. It tests a configuration to ensure known and predictable results. An example of system testing is the configuration oriented system integration test. System testing is based on process descriptions and flows, emphasizing pre-driven process links and integration points.

WHITE BOX TESTING:

White Box Testing is a testing in which in which the software tester has knowledge of the inner workings, structure and language of the software, or at least its purpose. It is purpose. It is used to test areas that cannot be reached from a black box level.

BLACK BOX TESTING:

Black Box Testing is testing the software without any knowledge of the inner workings, structure or language of the module being tested. Black box tests, as most other kinds of tests, must be written from a definitive source document, such as specification or requirements document, such as specification or requirements document. It is a testing in which the software under test is treated, as a black box .you cannot “see” into it. The test provides inputs and responds to outputs without considering how the software works.

UNIT TESTING:

Unit testing is usually conducted as part of a combined code and unit test phase of the software lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.

Test objectives

· All field entries must work properly.

· Pages must be activated from the identified link.

· The entry screen, messages and responses must not be delayed.

Features to be tested

· Verify that the entries are of the correct format

· No duplicate entries should be allowed

· All links should take the user to the correct page.

INTEGRATION TESTING:

Software integration testing is the incremental integration testing of two or more integrated software components on a single platform to produce failures caused by interface defects.

The task of the integration test is to check that components or software applications, e.g. components in a software system or – one step up – software applications at the company level – interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

ACCEPTANCE TESTING:

User Acceptance Testing is a critical phase of any project and requires significant participation by the end user. It also ensures that the system meets the functional requirements.

Test Results: All the test cases mentioned above passed successfully. No defects encountered

CHAPTER 12

FUTURE WORK

In our future studies, we will further consider other factors that affect worker quality, such as answer time and task difficulty. And these factors will help realize the comprehensive evaluation of worker quality to adapt the worker quality evaluation issue under different situations for the crowdsourcing mode in a big data environment.

CHAPTER 13

<%@page import="com.oreilly.servlet.*,java.sql.*,java.lang.*,java.text.SimpleDateFormat,java.util.*,java.io.*,javax.servlet.*,javax.servlet.http.*" %>

<%@ page import="java.sql.*"%>

<%@ include file="connect.jsp" %>

<%@ page import="java.util.Date" %>

<title>User Register</title>

ArrayList list = new ArrayList();

ServletContext context = getServletContext();

String dirName =context.getRealPath("Gallery/");

String paramname = null;

String uname = "", pass = null, email = null, mobile = null, address = null;

String dob = null, gender = null, pincode = null, location = null, image = null;

File file1 = null;

FileInputStream fs = null, fs1 = null;

try {

MultipartRequest multi = new MultipartRequest(request, dirName, 10 * 1024 * 1024); // 10MB

Enumeration params = multi.getParameterNames();

while (params.hasMoreElements()) {

paramname = (String) params.nextElement();

if (paramname.equalsIgnoreCase("userid")) {

uname = multi.getParameter(paramname);

}

if (paramname.equalsIgnoreCase("pass")) {

pass = multi.getParameter(paramname);

}

if (paramname.equalsIgnoreCase("email")) {

email = multi.getParameter(paramname);

}

if (paramname.equalsIgnoreCase("mobile")) {

mobile = multi.getParameter(paramname);

}

if (paramname.equalsIgnoreCase("address")) {

address = multi.getParameter(paramname);

}

if (paramname.equalsIgnoreCase("dob")) {

dob = multi.getParameter(paramname);

}

if (paramname.equalsIgnoreCase("gender")) {

gender = multi.getParameter(paramname);

}

if (paramname.equalsIgnoreCase("pin")) {

pincode = multi.getParameter(paramname);

}

if (paramname.equalsIgnoreCase("loc")) {

location = multi.getParameter(paramname);

}

if (paramname.equalsIgnoreCase("pic")) {

image = multi.getParameter(paramname);

}

int f = 0;

Enumeration files = multi.getFileNames();

while (files.hasMoreElements()) {

paramname = (String) files.nextElement();

if (paramname.equals("d1")) {

paramname = null;

}

if (paramname != null) {

f = 1;

image = multi.getFilesystemName(paramname);

String fPath = context.getRealPath("Gallery\\" + image);

file1 = new File(fPath);

fs = new FileInputStream(file1);

list.add(fs);

String query1 = "SELECT * FROM reg WHERE name='"+ uname + "' ";

Statement st1 = connection.createStatement();

ResultSet rs1 = st1.executeQuery(query1);

if (rs1.next()) {

out.print("UserNmae Already Exists");

} else {

PreparedStatement ps = connection

.prepareStatement("INSERT INTO reg(name,pass,email,mobile,addr,dob,gender,pin,location,image,st) values(?,?,?,?,?,?,?,?,?,?,?) ");

ps.setString(1, uname);

ps.setString(2, pass);

ps.setString(3, email);

ps.setString(4, mobile);

ps.setString(5, address);

ps.setString(6, dob);

ps.setString(7, gender);

ps.setString(8, pincode);

ps.setString(9, location);

ps.setString(11,"Waiting");

if (f == 0)

ps.setObject(10, null);

else if (f == 1) {

fs1 = (FileInputStream) list.get(0);

ps.setBinaryStream(10, fs1, fs1.available());

}

int x = ps.executeUpdate();

if (x > 0) {

out.print("Registered Successfully!!!!");

String suc="Registered Successfully!!!!";

application.setAttribute("msg",suc);

response.sendRedirect("owner.jsp");

}

}}}

catch (Exception e) {

e.printStackTrace();

out.print(e.getMessage());

}

<%@ page isThreadSafe="false" %>

<title>Authentication Page</title>

<%@ page language="java" contentType="text/html; charset=ISO-8859-1"

pageEncoding="ISO-8859-1"%>

<%@page import="java.util.*"%>

<%@ include file="connect.jsp"%>

String name = request.getParameter("userid");

String pass = request.getParameter("pass");

try {

String aut = "Authorized";

String sql = "SELECT * FROM reg where name='" + name

+ "' and pass='" + pass + "' and st='" + aut + "' ";

Statement stmt = connection.createStatement();

ResultSet rs = stmt.executeQuery(sql);

String utype = "";

if (rs.next()) {

String id=rs.getString(1);

application.setAttribute("uename", name);

application.setAttribute("id", id);

session.setAttribute("name",name);

String email=rs.getString(4);

session.setAttribute("email",email);

System.out.println(name);

response.sendRedirect("ownerhome.jsp");

} else {

response.sendRedirect("wronglogin.html");

}

} catch (Exception e) {

out.print(e);

e.printStackTrace();

}

Upload

<%@ page import="java.sql.*"%>

<%@ page import="databaseconnection.*"%>

<%@ page import="java.io.*,java.util.*, javax.servlet.*" %>

<%@ page import="javax.servlet.http.*" %>

<%@ page import="org.apache.commons.fileupload.*" %>

<%@ page import="org.apache.commons.fileupload.disk.*" %>

<%@ page import="org.apache.commons.fileupload.servlet.*" %>

<%@ page import="org.apache.commons.io.output.*" %>

String memory=null,used=null,free=null,upload_kbs=null,status1=null;

double kilobytes=0, size=0, rem=0;

int free1=0,oo2=0;

double kb=0;

boolean isMultipart = ServletFileUpload.isMultipartContent(request);

Connection conn = databasecon.getconnection();

Statement stt = conn.createStatement();

String email=(String)session.getAttribute("email");

String domain_key=request.getParameter("domain_key");

String site=request.getParameter("Site");

Connection con2 = databasecon.getconnection();

Statement st2 = con2.createStatement();

String sss2 = "update domain set used='0 kb' where customer_mail='"+email+"' and domain_name='"+site+"'";

int rs2=st2.executeUpdate(sss2);

if(rs2>0)

System.out.println("ready to update domain");

String ssss = "select Status from domain where customer_mail='"+email+"' ";

ResultSet rss=stt.executeQuery(ssss);

if(rss.next())

{

status1=rss.getString(1);

}

if (!status1.equals("Proccessing"))

{

// Create a factory for disk-based file items

FileItemFactory factory = new DiskFileItemFactory();

// Create a new file upload handler

ServletFileUpload upload = new ServletFileUpload(factory);

try {

// Parse the request

List items = upload.parseRequest(request);

Iterator iterator = items.iterator();

while (iterator.hasNext())

{

FileItem item = (FileItem) iterator.next();

if (!item.isFormField())

{

String fileName = item.getName();

String root = getServletContext().getRealPath("/");

//File path = new File(root + "/uploads");

File path = new File("D://PROJECTS-2015-2016//profit_netbeans//source_code//profit_net//web//uploads");

if (!path.exists())

{

boolean status = path.mkdirs();

}

File uploadedFile = new File(path + "/" + fileName);

System.out.println("root:"+root);

System.out.println("fileName:"+fileName);

item.write(uploadedFile);

//File file =new File("D:\\ssss.html");

double bytes = uploadedFile.length();

size = (bytes / 1024);

double megabytes = (kilobytes / 1024);

System.out.println("bytes : " + bytes);

System.out.println("kilobytes : " + size);

try

{

Connection con = databasecon.getconnection();

Statement st = con.createStatement();

String sss = "select memory,used,Status from domain where customer_mail='"+email+"' ";

ResultSet rs=st.executeQuery(sss);

if(rs.next())

{

memory=rs.getString(1);

used=rs.getString(2);

}

free1=Integer.parseInt(memory);

Vector v1=new Vector();

StringTokenizer str= new StringTokenizer(memory);

while(str.hasMoreElements())

{

v1.add(str.nextElement());

}

String o1=v1.get(0).toString();

int oo1=Integer.parseInt(o1);

Vector v2=new Vector();

StringTokenizer st2= new StringTokenizer(used);

while(st2.hasMoreElements())

{

v2.add(st2.nextElement());

}

String o2=v2.get(0).toString();

oo2=Integer.parseInt(o2);

free1=oo1-oo2;

free=Integer.toString(free1);

}

catch(Exception e)

{

System.out.println("eeeeeeeeee"+e);

}

double d = (double) free1;

System.out.println("memory "+d);

System.out.println("size "+size);

if(d>size)

{

System.out.println(uploadedFile.getAbsolutePath());

item.write(uploadedFile);

rem=d-size;

System.out.println("rem "+rem);

}

else

{

response.sendRedirect("create_site.jsp?msg=There is no free memory!");

}

String name=(String)session.getAttribute("name");

String site_name=(String)session.getAttribute("site_nm");

String status=null,uid=null;

try{

Connection con = databasecon.getconnection();

Statement st = con.createStatement();

String sss = "select d.Status,u.id from domain d,reg u where d.customer_mail= u.email && d.customer_mail='"+email+"' ";

ResultSet rs=st.executeQuery(sss);

if(rs.next())

{

status=rs.getString(1);

uid=rs.getString(2);

}

String sss1 = "update site set file='"+fileName+"',email='"+email+"',username='"+name+"',status='"+status+"',id='"+uid+"' where sitename='"+site_name+"'";

int rs1=st.executeUpdate(sss1);

if(rs1<=0)

System.out.println("update site");

Double kk=oo2+kb;

String kkk=Double.toString(kk);

String sss11 = "update domain set used='"+size+"',CONTROL='SITE ACTIVATED' where customer_mail='"+email+"'";

int rs11=st.executeUpdate(sss11);

if(rs11<=0)

System.out.println("update domain");

}

catch(Exception e)

{

System.out.println("upd :"+e);

}

response.sendRedirect("create_site.jsp?msg=Domain Successfully Uploaded");

}

catch (FileUploadException e)

{

System.out.println("ERR 1 "+e);

}

catch (Exception e)

{

System.out.println("ERR 2 "+e);

}

else{

response.sendRedirect("create_site.jsp?msg=Admin_not_allocated_the _memory");

}

View Files

<%@ page import="java.text.SimpleDateFormat,java.util.*,java.io.*,javax.servlet.*, javax.servlet.http.*" %>

<%@ page import = "java.util.Date,java.text.SimpleDateFormat,java.text.ParseException"%>

<%@ page import="java.sql.*,databaseconnection.*"%>

String s2="",s3="",s4="",s5="",s6="",s7="",s8="";

int i=0,j=0;

try{

Connection con = databasecon.getconnection();

Statement st = con.createStatement();

String name = (String) session.getAttribute("name");

System.out.println("cloudhome:" +name);

String sql="SELECT * FROM upload where cloud= '"+name+"' ";

ResultSet rs=st.executeQuery(sql);

while(rs.next())

{

s2=rs.getString("username");

s3=rs.getString("filename");

s4=rs.getString("filetype");

s5=rs.getString("cloud");

session.setAttribute("user",s3);

System.out.println("call:" +s3);

s6=rs.getString("Key");

s7=rs.getString("date");

s8=rs.getString("count");

}

catch(Exception e)

{

out.println(e.getMessage());

}

<%@ page import="java.sql.*" import="databaseconnection.*"%>

<%@ page import="java.io.*,java.util.*, javax.servlet.*" %>

<%@ page import="javax.servlet.http.*" %>

String fname= request.getParameter("filename");

//session.setAttribute("key_word", filename);

String username=null,date=null,document=null,keyword=null,cloud=null;

try

{

Connection con4 = databasecon.getconnection();

Statement st4 = con4.createStatement();

String sss4 = "select * from upload where filename='"+fname+"' ";

ResultSet rs4=st4.executeQuery(sss4);

if(rs4.next())

{

try

{

Connection con = databasecon.getconnection();

Statement st = con.createStatement();

String sss = "select * from upload where filename='"+fname+"' ";

ResultSet rs=st.executeQuery(sss);

while(rs.next())

{

username=rs.getString("username");

session.setAttribute("username", username);

fname=rs.getString("filename");

//document=rs.getString("document");

keyword=rs.getString("keyword");

date=rs.getString("date");

cloud =rs.getString("cloud");

session.setAttribute("cloud",cloud);

session.setAttribute("fname", fname);

}

catch(Exception e)

{

System.out.println(e);

}

else

{

out.println("<script>alert('NO SUCH KEYWORD MATCH..!')</script>");

//response.sendRedirect("search.jsp");

}

catch(Exception e4)

{

System.out.println(e4);

}%>

Download

<%@page import="java.sql.ResultSet"%>

<%@page import="java.sql.Statement"%>

<%@page import="java.sql.Connection"%>

<%@ page import="java.sql.*,java.io.*" %>

<%@page import="com.oreilly.servlet.*,java.sql.*,java.lang.*,databaseconnection.*,java.text.SimpleDateFormat,java.util.*" %>

<%@ page import = "java.util.Date,java.text.SimpleDateFormat,java.text.ParseException"%>

<%@page import="java.io.OutputStream"%>

String fname = (String) session.getAttribute("file");

System.out.println(fname);

Blob b=null;

String getFile = request.getQueryString();

Connection con = databasecon.getconnection();

Statement st = con.createStatement();

ResultSet rs = st.executeQuery("select * from upload where filename = '" + fname + "'");

if (rs.next())

{

b = rs.getBlob(1);

String document= null;

document+=".doc";

byte[] ba = b.getBytes(1, (int)b.length());

response.setContentType("application/txt");

response.setHeader("Content-Disposition", "attachment; filename="+rs.getString(3));

OutputStream os = response.getOutputStream();

os.write(ba);

os.close();

ba = null;

fname=rs.getString("filename");

try{

Class.forName("com.mysql.jdbc.Driver");

st=con.createStatement();

String sql1="select * from upload where filename='"+fname+"'";

rs=st.executeQuery(sql1);

while(rs.next())

{

int count=0;

try{

Class.forName("com.mysql.jdbc.Driver");

Connection con2 = DriverManager.getConnection("jdbc:mysql://localhost:3306/search","root","root");

PreparedStatement ps=con.prepareStatement("Update upload set count=count+1 where filename='"+fname+"' ");

//ps.setInt(1,hit);

int x=ps.executeUpdate();

}

catch (Exception ex)

{

out.println(ex.getMessage());

}}}

catch (Exception e){

out.println(e.getMessage());}}%>

SCREEN SHORT

HOME PAGE:

LOGIN:

VIEW JOB:

CLOUD LOGIN:

VIEW COMPANY DETAILS:

SEARCH JOB:

SEARCH:

DOWNLOAD FILE:

CHAPTER 14

CONCLUSION

In this paper, we first proposed a general worker quality evaluation algorithm, which is applied to any critical crowdsourcing tasks without pre-developed answers. Then, to satisfy the demand of parallel evaluation for a multitude of workers in a big data environment, we implement the proposed algorithm in the Hadoop platform using the MapReduce programming model. The experimental results show that the algorithm is accurate and has high efficiency and performance in a big data environment.

REFERENCES

[1] D.C. Brabham, “Crowdsourcing as a Model for Problem Solv-ing: An Introduction and Cases,” Convergence the Internation-al Journal of Research Into New Media Technologies, vol. 14, no. 1, pp. 75-90, 2008.

[2] M. Allahbakhsh, B. Benatallah, A. Ignjatovic, et al, “Quality Control in Crowdsourcing Systems: Issues and Directions,” IEEE Internet Computing, vol. 17, no. 2, pp. 76-81, 2013.

[3] A. Doan, R. Ramakrishnan, and A.Y. Halevy, “Crowdsourcing Systems on the World-Wide Web,” Communications of the ACM, vol. 54, no. 4, pp. 86-96, 2011.

[4] P. Clough, M. Sanderson, J. Tang, et al, “Examining the Limits of Crowdsourcing for Relevance Assessment,” IEEE Internet Computing, vol. 17, no. 4, pp. 32-38, 2013.

[5] B. Carpenter, “Multilevel Bayesian Models of Categorical Data Annotation,” unpublished, 2008.

[6] A. Brew, D. Greene, and P. Cunningham, “Using crowdsourc-ing and active learning to track sentiment in online media,” In Proceedings of the 6th Conference on Prestigious Applications of Intelligent Systems, 2010.

[7] J. Howe, “The Rise of Crowdsourcing,” Wired Magazine, vol. 14, no.14, pp. 176-183, 2006.

[8] V. C. Raykar, S. Yu, L. H. Zhao, et al, “Learning From Crowds,”Journal of Machine Learning Research, vol. 11, no. 2, pp. 1297-1322, 2010.

[9] J. Manyika, M. Chui, B. Brown, et al, “Big Data: The next frontier for innovation, competition, and productivity,” 2011.

[10] S. C.H. Hoi, J. Wang, P. Zhao, et al, “Online feature selection for mining big data,” BigMine, pp. 93-100, 2012.

[11] K. Michael, K.W. Miller, “Big Data: New Opportunities and New Challenges,” Computer, vol. 46, no. 6, pp. 22-24, 2013.

[12] C. Lynch, “Big Data: How do your data grow?,” Nature, Vol.455, No. 7209, pp. 28-29, 2008.

[13] F. Chang, J. Dean, S. Ghemawat, et al, “Bigtable: A distributed storage system for structured data,” ACM Transactions on Computer Systems, Vol. 26, No.4, 2008.

[14] M. Joglekar, H. Garcia-Molina, and A. Parameswaran, “Evalu-ating the crowd with confidence,” Proceedings of the 19th ACM SIGKDD international conference on Knowledge discov-ery and data mining, ACM, pp. 686-694, 2013.

[15] J. Zhang, T. Li, and Y. Pan, “Parallel rough set based knowledge acquisition using MapReduce from big data,” Big-Mine, pp. 20-27, 2012.

[16] J. Dean, and S. Ghemawat, “MapReduce: Simplified data pro-cessing on large clusters,” Communications of the ACM, vol. 51, no.1, pp. 107-113, 2005.

[17] D. Hastorun, M. Jampani, G. Kakulapati, et al, “Dynamo: Ama-zons highly available key-value store,” In: Proceedings of the 21st ACM Symposium on Operating Systems Principles, pp. 205-220, 2007.

[18] M. Isard, M. Budiu, Y. Yu, et al, “Dryad: Distributed data-parallel programs from sequential build-ing blocks,” European Conference on Computer Systems, pp. 59-72, 2007.

[19] J. Wang, T. Kraska, M. J. Franklin, et al, “CrowdER: crowdsourcing entity resolution,” Proceedings of the VLDB Endowment, vol. 5, no. 11, pp. 1483-1494, 2012.

[20] N. Maisonneuve, and B. Chopard, “Crowdsourcing Satellite Imagery Analysis: Study of Parallel and Iterative Models,” GIScience, pp. 116-131, 2012.

SPRING SOURCE TECHNOLOGIES

Search This Blog

A CROWDSOURCING WORKER QUALITY EVALUATION ALGORITHM ON MAPREDUCE FOR BIG DATA APPLICATIONS

3.3 Parallel rough set based knowledge acquisition using MapReduce from big data

3.4 Dryad: distributed data-parallel programs from sequential building blocks

3.5 CrowdER: crowdsourcing entity resolution

Understanding URL

Why we need Servlet and JSPs?

Web Container

TECHNICAL FEASIBILITY:

Comments

Post a Comment

Popular posts from this blog

Jio

Enabling Cloud Storage Auditing with Verifiable Outsourcing of Key Updates

PUNCHING MACHINE