A CROWDSOURCING
WORKER QUALITY EVALUATION ALGORITHM ON MAPREDUCE FOR BIG DATA APPLICATIONS
ABSTRACT
Crowd
sourcing is a new emerging distributed computing and business model on the
backdrop of Internet blossoming. With the development of crowd sourcing systems,
the data size of crowdsourcers, contractors and tasks grows rapidly. The worker
quality evaluation based on big data analysis technology has become a critical
challenge. This paper first proposes a general worker quality evaluation
algorithm that is applied to any critical tasks such as tagging, matching,
filtering, categorization and many other
emerging applications, without wasting resources. Second, we realize the
evaluation algorithm in the Hadoop platform using the Map Reduce parallel
programming model. Finally, to effectively verify the accuracy and the
effectiveness of the algorithm in a wide variety of big data scenarios, we
conduct a series of experiments. The experimental results demonstrate that the
proposed algorithm is accurate and effective. It has high computing performance
and horizontal scalability. And it is suitable for large-scale worker quality
evaluations in a big data environment.
CHAPTER 2
INTRODUCTION
1.1
OVERVIEW
Crowdsourcing is a
distributed problem-solving and production model. In this distributed computing model, enterprises
distribute tasks through the Inter-net and recruit
more suitable workers to involve in the task to solve technical difficulties. Nowadays, more and more businesses and enterprises have begun to use the crowdsourcing
model. For enterprises, the
crowdsourcing model can reduce
production cost, and promote their
technology and creativity. The
crowdsourcing model is oriented to the public, and every Internet user can choose to participate in the crowdsourcing tasks that they are
interested in to
provide solutions for
enterprises. However, for one
task, there may be a large number of
workers involved in it and provide solutions.
The crowdsourcers will be confused when they faced with such a huge number of
solutions and it is difficult for them to make a final choice.
Moreover,
not every person is qualified
to serve enterprises because
of their different backgrounds
and different personal qualities.
There may even be malicious workers in
crowdsourcing platform. Therefore, worker quality
control has gradually become an
important challenge for the crowdsourcing model. It is of great importance to mine the
information about the worker’s self quality from a large number of worker data
to provide the crowdsourcers some reference.
This
paper mainly studies the core problem of
worker quality control:
worker quality evaluation.
The worker quality evaluation will help enterprises recruit high-quality workers who can provide them high-quality solutions.
It is of great significance to both the quality of the tasks and the
environment of the crowdsourcing platform.
1.2 CLOUD COMPUTING
Cloud computing is a recently evolved computing terminology or metaphor based on utility and consumption of computing
resources. Cloud computing involves
deploying groups of remote servers and software networks that allow centralized data storage and online access to computer
services or resources. Clouds can be classified as public, private or hybrid.
Cloud computing relies on sharing of resources to
achieve coherence and economies of scale,
similar to a utility (like the electricity grid) over a network. At the foundation of cloud computing
is the broader concept of converged
infrastructure and shared services.
Cloud computing, or in
simpler shorthand just "the cloud", also focuses on maximizing the
effectiveness of the shared resources. Cloud resources are usually not only
shared by multiple users but are also dynamically reallocated per demand. This
can work for allocating resources to users. For example, a cloud computer facility
that serves European users during European business hours with a specific
application (e.g., email) may reallocate the same resources to serve North
American users during North America's business hours with a different
application (e.g., a web server). This approach should maximize the use of
computing power thus reducing environmental damage as well since less power,
air conditioning, rack space, etc. are required for a variety of functions.
With cloud computing, multiple users can access a single server to retrieve and
update their data without purchasing licenses for different applications.
1.3 MODELS
The term "moving to
cloud" also refers to an organization moving away from a traditional CAPEX model (buy the dedicated hardware and
depreciate it over a period of time) to the OPEX model
(use a shared cloud infrastructure and pay as one uses it).
Proponents claim that
cloud computing allows companies to avoid upfront infrastructure costs, and
focus on projects that differentiate their businesses instead of on
infrastructure. Proponents also claim that cloud computing allows
enterprises to get their applications up and running faster, with improved
manageability and less maintenance, and enables IT to more rapidly adjust
resources to meet fluctuating and unpredictable business demand. Cloud
providers typically use a "pay as you go" model. This can lead to
unexpectedly high charges if administrators do not adapt to the cloud pricing
model.
The present availability
of high-capacity networks, low-cost computers and storage devices as well as
the widespread adoption of hardware
virtualization, service-oriented
architecture, and autonomic and utility computing have led to a
growth in cloud computing. Cloud storage offers an on-demand data outsourcing
service model, and is gaining popularity due to its elasticity and low
maintenance cost. However, security concerns arise when data storage is
outsourced to third-party cloud storage providers. It is desirable to enable
cloud clients to verify the integrity of their outsourced data, in case their data
have been accidentally corrupted or maliciously compromised by insider/outsider
attacks.
1.4 MAJOR
USE OF CLOUD STORAGE
One major use of cloud storage is long-term archival, which
represents a workload that is written once and rarely read. While the stored
data are rarely read, it remains necessary to ensure its integrity for disaster
recovery or compliance with legal requirements . Since it is typical to have a
huge amount of archived data, whole-file checking becomes prohibitive. Proof of
retrievability (POR) and proof of data possession(PDP) have thus been proposed
to verify the integrity of a large file by spot-checking only a fraction of the
file via various crypto-graphic primitives.
This system continues to use
random masking to support data privacy during public auditing, and leverage
index hash tables to support fully dynamic operations on shared data. A dynamic
operation indicates an insert, delete or update operation on a single block in
shared data.
CLOUD computing has been considered as a new model of enterprise
IT infrastructure, which can organize huge resource of computing, storage and
applications, and enable users to enjoy ubiquitous, convenient
and on-demand network access to a shared pool of configurable
computing resources with great efficiency and minimal economic overhead.
Attracted by these appealing features, both individuals and enterprises are
motivated to outsource their data to the cloud, instead of purchasing software
and hardware to manage the data themselves.
1.5 ADVANTAGES
Despite
of the various advantages of cloud services, outsourcing sensitive information
(such as e-mails, personal health records, company finance data, government
documents, etc.) to remote servers brings privacy concerns. The cloud service
providers (CSPs) that keep the data for users may access users’ sensitive
information without authorization. A general approach to protect the data
confidentiality is to encrypt the data before outsourcing. However, this will
cause a huge cost in terms of data usability.
Cloud
computing,
also known as 'on-demand computing', is a kind of Internet-based computing,
where shared resources, data and information are provided to computers and
other devices on-demand. It is a model for enabling ubiquitous, on-demand
access to a shared pool of configurable computing resources. Cloud
computing and storage solutions provide users and enterprises with various
capabilities to store and process their data in third-party data centers. It relies on sharing of resources to
achieve coherence and economies of scale, similar to a utility (like the electricity grid) over a network. At
the foundation of cloud computing is the broader concept of converged infrastructure and shared
services.
Cloud computing is a
model for enabling ubiquitous, convenient, on-demand network access to a shared
pool of configurable computing resources (e.g., networks, servers, storage,
applications and services) that can be rapidly provisioned and released with
minimal management effort.
Cloud computing poses privacy concerns because
the service provider can access the data that is in the cloud at any time. It
could accidentally or deliberately alter or even delete information. Many
cloud providers can share information with third parties if necessary for
purposes of law and order even without a warrant. That is permitted in their
privacy policies which users have to agree to before they start using cloud
services. Solutions to privacy include policy and legislation as well as end
users' choices for how data is stored. Users can encrypt data that
is processed or stored within the cloud to prevent unauthorized access.
The shared data in cloud servers, however,
usually contains users’ sensitive information (e.g., personal profile,
financial data, health records, etc.) and needs to be well protected. As the
ownership of the data is separated from the administration of them, the cloud
servers may migrate users’ data to other cloud servers in outsourcing or share
them in cloud searching. Therefore, it becomes a big challenge to protect the
privacy of those shared data in cloud, especially in cross-cloud and big data
environment. In order to meet this challenge, it is necessary to design a
comprehensive solution to support user-defined authorization period and to
provide fine-grained access control during this period.
CHAPTER 2
SYSTEM ANALYSIS
In this phase a detailed appraisal of the existing
system is explained. This appraisal includes how the system works and what it
does. It also includes finding out in more detail- what are the problems with
the system and what user requires from the new system or any new change in
system. The output of this phase results in the detail model of the system. The
model describes the system functions and data and system information flow. The
phase also contains the detail set of user requirements and these requirements
are used to set objectives for the new system.
2.1 CURRENT SYSTEM:
Crowdsourcers
almost release tasks at all times due to the large-scale crowdsourcing
platform. Additionally, a large number
of workers participate in these tasks. Therefore, the crowdsourcing platform will generate a large amount
of data every moment, including crowdsourcing tasks, worker behaviours, and the solutions of
tasks. The large amount of data put forward
new demands to the calculated
performance of crowdsourcing platform. The use of big data
technology to specially
process these massive
data is a key issue that the crowdsourcing platform needs to consider.
2.2
SHORTCOMINGS OF THE
CURRENT SYSTEM:
·
Most of these crowdsourcing systems rely on offline or artificial worker
quality control and evaluation or simply ignore the quality control
issues.
·
High computational cost.
·
Performance accuracy is less.
2.3 PROPOSED SYSTEM:
Therefore,
to evaluate the quality of the workers in
the crowdsourcing platform
accurately, we first propose a general
worker quality evaluation
algorithm. This algorithm achieves the worker quality evaluation for
multiple workers and multiple problem
types with no pre-developed answer, and the
algorithm has a stronger scalability
and practicality compared with
the algorithm. Second, we propose to use the MapReduce programming model
to realize large-scale parallel
computing for worker quality and
implement the proposed algorithm in the Hadoop platform. Finally, we
conduct a series of experiments to analyse and evaluate the performance of
worker quality evaluation algorithm.
2.4 ADVANTAGE OF PROPOSED SYSTEM:
·
The proposed algorithm is effective and
has a high performance.
·
It can meet the needs of parallel
evaluation of the large-scale workers in a crowdsourcing platform.
CHAPTER 3
LITERATURE SURVEY
3.1
OVERVIEW:
A
literature review is an account of what has been published on a topic by
accredited scholars and researchers. Occasionally you will be asked to write
one as a separate assignment, but more often it is part of the introduction to
an essay, research report, or thesis. In writing the literature review, your
purpose is to convey to your reader what knowledge and ideas have been
established on a topic, and what their strengths and weaknesses are. As a piece
of writing, the literature review must be defined by a guiding concept (e.g.,
your research objective, the problem or issue you are discussing or your
argumentative thesis). It is not just a descriptive list of the material
available, or a set of summaries
Besides
enlarging your knowledge about the topic, writing a literature review lets you
gain and demonstrate skills in two areas
1.
INFORMATION
SEEKING: the ability to scan the literature efficiently,
using manual or computerized methods, to identify a set of useful articles and
books
2.
CRITICAL
APPRAISAL: the ability to apply principles of analysis to
identify unbiased and valid studies.
3.2 Using Crowdsourcing
and Active Learning to Track Sentiment in Online Media
Abstract
Tracking sentiment in the popular media has long
been of interest to media analysts and pundits. With the availability of news
content via online syndicated feeds, it is now possible to automate some
aspects of this process. There is also great potential to crowdsource
Crowdsourcing is a term, sometimes associated with Web 2.0 technologies, that
describes outsourcing of tasks to a large often anonymous community. much of
the annotation work that is required to train a machine learning system to
perform sentiment scoring. We describe such a system for tracking economic
sentiment in online media that has been deployed since August 2009. It uses
annotations provided by a cohort of non-expert annotators to train a learning
system to classify a large body of news items. We report on the design
challenges addressed in managing the effort of the annotators and in making
annotation an interesting experience.
3.3 Parallel rough set based knowledge acquisition using MapReduce
from big data
Author – Junbo Zhang, Yi Pan
Abstract
Nowadays, with the volume of data growing at an
unprecedented rate, big data mining and knowledge discovery have become a new
challenge. Rough set theory for knowledge acquisition has been successfully
applied in data mining. The recently introduced MapReduce technique has received
much attention from both scientific community and industry for its
applicability in big data analysis. To mine knowledge from big data, we present
parallel rough set based methods for knowledge acquisition using MapReduce in
this paper. Comprehensive experimental evaluation on large data sets shows that
the proposed parallel methods can effectively process big data.
3.4 Dryad: distributed data-parallel programs from sequential
building blocks
Author - M. Isard, M. Budiu
Abstract
Dryad is a general-purpose
distributed execution engine for coarse-grain data-parallel applications. A
Dryad application combines computational "vertices" with
communication "channels" to form a dataflow graph. Dryad runs the
application by executing the vertices of this graph on a set of available
computers, communicating as appropriate through flies, TCP pipes, and
shared-memory FIFOs. The vertices provided by the application developer are
quite simple and are usually written as sequential programs with no thread
creation or locking. Concurrency arises from Dryad scheduling vertices to run
simultaneously on multiple computers, or on multiple CPU cores within a
computer. The application can discover the size and placement of data at run
time, and modify the graph as the computation progresses to make efficient use
of the available resources.
Dryad is designed to
scale from powerful multi-core single computers, through small clusters of
computers, to data centers with thousands of computers. The Dryad execution
engine handles all the difficult problems of creating a large distributed,
concurrent application: scheduling the use of computers and their CPUs,
recovering from communication or computer failures, and transporting data
between vertices.
3.5 CrowdER: crowdsourcing entity resolution
Author
- J. Wang, T. Kraska
Abstract
Entity resolution is central to data integration
and data cleaning. Algorithmic approaches have been improving in quality, but
remain far from perfect. Crowdsourcing platforms offer a more accurate but
expensive (and slow) way to bring human insight into the process. Previous work
has proposed batching verification tasks for presentation to human workers but
even with batching, a human-only approach is infeasible for data sets of even
moderate size, due to the large numbers of matches to be tested. Instead, we
propose a hybrid human-machine approach in which machines are used to do an
initial, coarse pass over all the data, and people are used to verify only the
most likely matching pairs. We show that for such a hybrid system, generating
the minimum number of verification tasks of a given size is NP-Hard, but we
develop a novel two-tiered heuristic approach for creating batched tasks. We
describe this method, and present the results of extensive experiments on real
data sets using a popular crowdsourcing platform. The experiments show that our
hybrid approach achieves both good efficiency and high accuracy compared to
machine-only or human-only alternatives.
3.6 Robust Trajectory Estimation for
Crowdsourcing-Based Mobile Applications
Author - Yunhao Liu ; Kai Xing
Abstract:
Crowdsourcing-based
mobile applications are becoming more and more prevalent in recent years, as
smartphones equipped with various built-in sensors are proliferating rapidly.
The large quantity of crowdsourced sensing data stimulates researchers to
accomplish some tasks that used to be costly or impossible, yet the quality of
the crowdsourced data, which is of great importance, has not received
sufficient attention. In reality, the low-quality crowdsourced data are prone
to containing outliers that may severely impair the crowdsourcing applications.
Thus in this work, we conduct pioneer investigation considering crowdsourced
data quality. Specifically, we focus on estimating user motion trajectory
information, which plays an essential role in multiple crowdsourcing
applications, such as indoor localization, context recognition, indoor
navigation, etc. We resort to the family of robust statistics and design a
robust trajectory estimation scheme, name TrMCD, which is capable of
alleviating the negative influence of abnormal crowdsourced user trajectories,
differentiating normal users from abnormal users, and overcoming the challenge
brought by spatial unbalance of crowdsourced trajectories. Two real field
experiments are conducted and the results show that TrMCD is robust and
effective in estimating user motion trajectories and mapping fingerprints to
physical locations.
CHAPTER 4
IMPLEMENTATION
Implementation is the stage of the
project when the theoretical design is turned out into a working system. Thus
it can be considered to be the most critical stage in achieving a successful
new system and in giving the user, confidence that the new system will work and
be effective.
The implementation stage involves
careful planning, investigation of the existing system and it’s constraints on
implementation, designing of methods to achieve changeover and evaluation of
changeover methods.
4.1 MODULES:
A module is a part of a program. Programs are
composed of one or more independently developed modules that are not combined
until the program is linked. A single module can contain one or
several routines.
Our project modules are given below:
1. Task One
In
task one will first pre-process the initial data by Ptype to obtain the data
set that can be processed by multi-worker evaluation scheme of the M-1
algorithm. And then it will group the
workers who are involved in the same task.
Reduce-1
receives the output of Map-1 as an input and groups the workers who own the
same task id. To maintain the
consistency of worker grouping on
different reduce tasks, we need
to sort the workers with the same task id before grouping and then
adopt the sliding window to group every three workers
as a unit.
2. Task Two
The
second MapReduce task receives the output of Reduce-1 as an input. It aims at calculating the
accuracy of various workers. Map-2 processes the output data of
Reduce-1. To as-sign the three workers who participate in the same task and
also in the same sub group to the same Reducer, Map-2 takes Tid and the user’s
combination ID Wi+Wj+Wk as the output key of the map task.
Reduce-2
receives the output of Map-2 as an input. For all of the
values that have the same key, we group them according to Ptype and use
the proposed M-1 algorithm to calculate workers’ partial accuracy on
different Ptype.
Then,
we calculate each worker’s accuracy according to each worker’s partial
accuracy. Finally, the output is in the form of <Wi+Wj+Wk+Tid, Ai+Aj+Ak>.
3. Task Three
The
algorithm adopts the sliding window algorithm to calculate the worker accuracy,
so each worker’s accuracy is calculated three times. We take the average value
of the three accuracies as the
indicator to evaluate the worker quality to avoid the accuracy
evaluation bias caused by a single
calculation.
Map-3
takes <Wid+Tid> as the key to shuffle and as-signs the same worker’s
three accuracies of one task to the same Reducer. Reduce-3 receives the output
of Map-3 as an input to calculate the average accuracy of
each worker. The output is in the form of <Wid+Tid,avgAid>, and avgAid is
the final result.
CHAPTER
5
5.1
METHODOLOGY
WORKER QUALITY
EVALUATION ALGORITHM
M-1 Algorithm
The idea of the
M-1 algorithm is described as follows:
Suppose all of the provided problems are of
the same type (single choice), and have no pre-developed
answer.
Let three workers w1, w2 and w3 answer
these problems independently at the same
time. The number of problems is N. Then,
we will calculate each worker’s accuracy to these problems according to the similarities in
their responses.
Multi-worker Evaluation
Scheme Based on M-1 Algorithm
In the M-1 algorithm, we solve the
problem of three-worker quality evaluation. However, there may be multiple workers
involved in the same task at the same time in actual crowdsourcing environment.
How to evaluate the quality of multiple
workers is a more practical problem that remains to be solved. Therefore, we propose a multi-worker
evaluation scheme based on the M-1 algorithm, which uses the idea of sliding
window.
M-X Algorithm
Compared to single choice,
multiple-choice is a more general problem type. Or rather, single choice is a
special form of multiple-choice. For
example, for some labeling issues, we only need to assign one label to each
object. However, for most cases, we need
to assign multiple labels to each object.
5.2 OJECTIVE AND MOTIVATION
OBJECTIVE
A general worker quality evaluation algorithm that
is applied to any critical tasks such as tagging, matching, filtering,
categorization and many other emerging applications, without wasting resources.
Second, we realize the evaluation algorithm in the Hadoop platform using the
MapReduce parallel programming model. Finally, to effectively verify the accuracy
and the effectiveness of the algorithm in a wide variety of big data scenarios,
we conduct a series of experiments.
MOTIVATION
To evaluate the quality of the workers in the crowdsourcing platform accurately, we first propose
a general worker quality evaluation algorithm. This algorithm achieves the worker quality evaluation for
multiple workers and multiple problem
types with no pre-developed answer, and the algorithm has a stronger
scalability and practicality
compared with the algorithm presented
in reference. Second, we propose to use the MapReduce programming model to
realize large-scale parallel computing
for worker quality and
implement the proposed algorithm in the Hadoop platform. Finally, we
conduct a series of experiments to analyse and evaluate the performance of
worker quality evaluation algorithm.
CHAPTER 6
SYSTEM SPECIFICATION
The purpose of system requirement
specification is to produce the specification analysis of the task and also to
establish complete information about the requirement, behavior and other
constraints such as functional performance and so on. The goal of system
requirement specification is to completely specify the technical requirements
for the product in a concise and unambiguous manner.
6.1 HARDWARE REQUIREMENTS
•
Processor - Pentium –III
•
Speed - 1.1 Ghz
•
RAM - 256
MB(min)
•
Hard
Disk - 20 GB
•
Floppy
Drive - 1.44 MB
•
Key
Board - Standard Windows Keyboard
•
Mouse
-
Two or Three Button Mouse
•
Monitor
-
SVGA
6.2 SOFTWARE REQUIREMENTS
• Operating System :
Windows 8
• Front End : Java
• Database :
Mysql
CHAPTER
7
SOFTWARE ENVIRONMENT
JAVA:
Java
is a programming language created by James Gosling from Sun Microsystems (Sun)
in 1991. The target of Java is to write a program once and then run this
program on multiple operating systems. The first publicly available version of
Java (Java 1.0) was released in 1995. Sun Microsystems was acquired by the
Oracle Corporation in 2010. Oracle has now the steermanship for Java. In 2006
Sun started to make Java available under the GNU General Public License (GPL).
Oracle continues this project called OpenJDK.
8.2
PLATFORM INDEPENDENT
Unlike
many other programming languages including C and C++ when Java is compiled, it
is not compiled into platform specific machine, rather into platform
independent byte code. This byte code is distributed over the web and
interpreted by virtual Machine (JVM) on whichever platform it is being run.
JAVA VIRTUAL MACHINE
Java
was designed with a concept of ‘write once and run everywhere’. Java Virtual
Machine plays the central role in this concept. The JVM is the environment in
which Java programs execute. It is a software that is implemented on top of
real hardware and operating system. When the source code (.java files) is
compiled, it is translated into byte codes and then placed into (.class) files.
The JVM executes these bytecodes. So Java byte codes can be thought of as the
machine language of the JVM. A JVM can either interpret the bytecode one
instruction at a time or the bytecode can be compiled further for the real
microprocessor using what is called a just-in-time compiler. The JVM
must be implemented on a particular platform before compiled programs can run
on that platform.
JAVA DEVELOPMENT KIT
The Java Development Kit (JDK) is a Sun product aimed at Java developers. Since the introduction of Java, it has
been by far the most widely used Java software
development kit (SDK). It contains a Java
compiler, a full copy of the Java
Runtime Environment (JRE), and many other important development tools.
TOOLS
You will need a Pentium
200-MHz computer with a minimum of 64 MB of RAM (128 MB of RAM recommended).
You will also need the
following softwares :
·
Linux 7.1 or Windows xp/7/8 operating
system
·
Java JDK 8
·
Microsoft Notepad or any other text
editor
FEATURES
·
Reusability of Code
·
Emphasis on data rather than procedure
·
Data is hidden and cannot be accessed by
external functions
·
Objects can communicate with each other
through functions
·
New data and functions can be easily
added
What is a Java Web Application?
A Java web application generates
interactive web pages containing various types of markup language (HTML, XML,
and so on) and dynamic content. It is typically comprised of web components
such as JavaServer Pages (JSP), servlets and JavaBeans to modify and
temporarily store data, interact with databases and web services, and render
content in response to client requests.
Because many of the tasks involved in
web application development can be repetitive or require a surplus of
boilerplate code, web frameworks can be applied to alleviate the overhead
associated with common activities. For example, many frameworks, such as
JavaServer Faces, provide libraries for templating pages and session
management, and often promote code reuse.
What is Java EE?
Java EE (Enterprise Edition) is a widely
used platform containing a set of coordinated technologies that significantly
reduce the cost and complexity of developing, deploying, and managing
multi-tier, server-centric applications. Java EE builds upon the Java SE
platform and provides a set of APIs (application programming interfaces) for
developing and running portable, robust, scalable, reliable and secure
server-side applications.
Some of the fundamental components of
Java EE include:
- Enterprise
JavaBeans (EJB): a managed, server-side component architecture used to
encapsulate the business logic of an application. EJB technology enables
rapid and simplified development of distributed, transactional, secure and
portable applications based on Java technology.
- Java
Persistence API (JPA): a framework that allows developers to manage data
using object-relational mapping (ORM) in applications built on the Java
Platform.
JavaScript and Ajax Development
JavaScript is an object-oriented
scripting language primarily used in client-side interfaces for web
applications. Ajax (Asynchronous JavaScript and XML) is a Web 2.0 technique
that allows changes to occur in a web page without the need to perform a page
refresh. JavaScript toolkits can be leveraged to implement Ajax-enabled
components and functionality in web pages.
Web Server and Client
Web Server is software that can process
the client request and send the response back to the client. For example,
Apache is one of the most widely used web server. Web Server runs on some
physical machine and listens to client request on specific port.
A web client is software that helps in
communicating with the server. Some of the most widely used web clients are
Firefox, Google Chrome, Safari etc. When we request something from server
(through URL), web client takes care of creating a request and sending it to
server and then parsing the server response and present it to the user.
HTML and HTTP
Web Server and Web Client are two
separate softwares, so there should be some common language for communication.
HTML is the common language between server and client and stands for HyperText
Markup Language.
Web server and client needs a common
communication protocol, HTTP (HyperText Transfer Protocol)
is the communication protocol between server and client. HTTP runs on top of
TCP/IP communication protocol.
Some of the important parts of HTTP
Request are:
- HTTP
Method – action to be performed, usually
GET, POST, PUT etc.
- URL
– Page to access
- Form
Parameters – similar to arguments in a java
method, for example user,password details from login page.
Sample HTTP Request:
1
2
3
|
GET
/FirstServletProject/jsps/hello.jsp HTTP/1.1
Host: localhost:8080
Cache-Control: no-cache
|
Some of the important parts of HTTP
Response are:
- Status
Code – an integer to indicate whether the request
was success or not. Some of the well known status codes are 200 for
success, 404 for Not Found and 403 for Access Forbidden.
- Content
Type – text, html, image, pdf etc. Also known as
MIME type
- Content
– actual data that is rendered by client and shown to user.
MIME Type or Content Type: If you see above sample HTTP
response header, it contains tag “Content-Type”. It’s also called MIME type and
server sends it to client to let them know the kind of data it’s sending. It
helps client in rendering the data for user. Some of the mostly used mime types
are text/html, text/xml, application/xml etc.
Understanding
URL
URL is acronym of Universal Resource
Locator and it’s used to locate the server and resource. Every resource on the
web has it’s own unique address. Let’s see parts of URL with an example.
http://localhost:8080/FirstServletProject/jsps/hello.jsp
http:// – This is the first part of URL and
provides the communication protocol to be used in server-client communication.
localhost – The unique address of the server,
most of the times it’s the hostname of the server that maps to unique IP
address. Sometimes multiple hostnames point to same IP addresses and web server
virtual host takes care of sending request to the particular server instance.
8080 – This is the port on which server
is listening, it’s optional and if we don’t provide it in URL then request goes
to the default port of the protocol. Port numbers 0 to 1023 are reserved ports
for well known services, for example 80 for HTTP, 443 for HTTPS, 21 for FTP
etc.
FirstServletProject/jsps/hello.jsp – Resource requested from server. It
can be static html, pdf, JSP, servlets, PHP etc.
Why we need Servlet and JSPs?
Web servers are good for static
contents HTML pages but they don’t know how to generate dynamic content or how
to save data into databases, so we need another tool that we can use to
generate dynamic content. There are several programming languages for dynamic
content like PHP, Python, Ruby on Rails, Java Servlets and JSPs.
Java Servlet and JSPs are server side
technologies to extend the capability of web servers by providing support for
dynamic response and data persistence.
Web
Container
Tomcat is a web container, when a
request is made from Client to web server, it passes the request to web
container and it’s web container job to find the correct resource to handle the
request (servlet or JSP) and then use the response from the resource to
generate the response and provide it to web server. Then web server sends the
response back to the client.
When web container gets the request
and if it’s for servlet then container creates two Objects HTTPServletRequest
and HTTPServletResponse. Then it finds the correct servlet based on the URL and
creates a thread for the request. Then it invokes the servlet service() method
and based on the HTTP method service() method invokes doGet() or doPost()
methods. Servlet methods generate the dynamic page and write it to response.
Once servlet thread is complete, container converts the response to HTTP
response and send it back to client.
Some of the important work done by
web container are:
- Communication Support
– Container provides easy way of communication between web server and the
servlets and JSPs. Because of container, we don’t need to build a server
socket to listen for any request from web server, parse the request and
generate response. All these important and complex tasks are done by
container and all we need to focus is on our business logic for our
applications.
- Lifecycle and Resource
Management – Container takes care of managing
the life cycle of servlet. Container takes care of loading the servlets
into memory, initializing servlets, invoking servlet methods and
destroying them. Container also provides utility like JNDI for resource
pooling and management.
- Multithreading Support
– Container creates new thread for every request to the servlet and when
it’s processed the thread dies. So servlets are not initialized for each
request and saves time and memory.
- JSP Support
– JSPs doesn’t look like normal java classes and web container provides
support for JSP. Every JSP in the application is compiled by container and
converted to Servlet and then container manages them like other servlets.
- Miscellaneous Task
– Web container manages the resource pool, does memory optimizations, run
garbage collector, provides security configurations, support for multiple
applications, hot deployment and several other tasks behind the scene that
makes our life easier.
CHAPTER 8
SYSTEM DESIGN
8.1
USE CASE DIAGRAM:
To model a
system the most important aspect is to capture the dynamic behaviour. To
clarify a bit in details, dynamic behaviour means the
behaviour of the system when it is running /operating. So only static behaviour
is not sufficient to model a system rather dynamic behaviour is more important
than static behaviour.
In UML
there are five diagrams available to model dynamic nature and use case diagram
is one of them. Now as we have to discuss that the use case diagram is dynamic
in nature there should be some internal or external factors for making the
interaction. These internal and external agents are known as actors. So use
case diagrams are consists of actors, use cases and their relationships.
The diagram
is used to model the system/subsystem of an application. A single use case
diagram captures a particular functionality of a system. So to model the entire
system numbers of use case diagrams are used. A use case diagram at its simplest is a representation of a user's interaction with the
system and depicting the specifications of a use case. A use case diagram can portray the different types of users of
a system and the case and will often be accompanied by other types of diagrams
as well.
8.2
CLASS DIAGRAM:
In software engineering, a class diagram
in the Unified Modeling Language (UML) is a type of static structure diagram
that describes the structure of a system by showing the system's classes, their
attributes, operations (or methods), and the relationships among the classes.
It explains which class contains information.
8.3
SEQUENCE DIAGRAM:
A sequence
diagram in Unified Modeling Language (UML) is a kind of interaction diagram
that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called
event diagrams, event scenarios, and timing diagrams.
8.4 COLLABORATION DIAGRAM
8.5
ACTIVITY DIAGRAM:
Activity diagrams are graphical
representations of workflows of stepwise activities and actions with support
for choice, iteration and concurrency. In the Unified Modeling Language,
activity diagrams can be used to describe the business and operational
step-by-step workflows of components in a system. An activity diagram shows the
overall flow of control.
ABLE DESIGN:
Register
Upload
Transaction
Request
Cloud Register
Attacker
CHAPTER 9
INPUT DESIGN AND OUTPUT
DESIGN
INPUT DESIGN
The input design is the link between the
information system and the user. It comprises the developing specification and
procedures for data preparation and those steps are necessary to put
transaction data in to a usable form for processing can be achieved by
inspecting the computer to read data from a written or printed document or it
can occur by having people keying the data directly into the system. The design
of input focuses on controlling the amount of input required, controlling the
errors, avoiding delay, avoiding extra steps and keeping the process simple.
The input is designed in such a way so that it provides security and ease of
use with retaining the privacy. Input Design considered the following things:’
Ø What
data should be given as input?
Ø How the data should be
arranged or coded?
Ø The dialog to guide the
operating personnel in providing input.
Ø Methods
for preparing input validations and steps to follow when error occur.
OBJECTIVES
1.Input
Design is the process of converting a user-oriented description of the input
into a computer-based system. This design is important to avoid errors in the
data input process and show the correct direction to the management for getting
correct information from the computerized system.
2. It is achieved by creating user-friendly
screens for the data entry to handle large volume of data. The goal of
designing input is to make data entry easier and to be free from errors. The
data entry screen is designed in such a way that all the data manipulates can
be performed. It also provides record viewing facilities.
3.When
the data is entered it will check for its validity. Data can be entered with the
help of screens. Appropriate messages are provided as when needed so that the
user
will not be in maize of instant. Thus the
objective of input design is to create an input layout that is easy to follow
OUTPUT DESIGN
A quality output is one, which meets the
requirements of the end user and presents the information clearly. In any
system results of processing are communicated to the users and to other system
through outputs. In output design it is determined how the information is to be
displaced for immediate need and also the hard copy output. It is the most
important and direct source information to the user. Efficient and intelligent
output design improves the system’s relationship to help user decision-making.
1. Designing
computer output should proceed in an organized, well thought out manner; the
right output must be developed while ensuring that each output element is
designed so that people will find the system can use easily and effectively.
When analysis design computer output, they should Identify the specific output
that is needed to meet the requirements.
2.Select
methods for presenting information.
3.Create
document, report, or other formats that contain information produced by the
system.
The output form of an information system
should accomplish one or more of the following objectives.
v Convey
information about past activities, current status or projections of the
v Future.
v Signal
important events, opportunities, problems, or warnings.
v Trigger
an action.
v Confirm
an action.
CHAPTER 10
SYSTEM STUDY
FEASIBILITY STUDY:
The feasibility of the project is
analyzed in this phase and business proposal is put forth with a very general
plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to
ensure that the proposed system is not a burden to the company. For feasibility analysis, some understanding
of the major requirements for the system is essential.
Three
key considerations involved in the feasibility analysis are
¨ Economical
feasibility
¨ Technical
feasibility
¨ Social
feasibility
ECONOMICAL
FEASIBILITY:
This study is carried out to check the
economic impact that the system will have on the organization. The amount of
fund that the company can pour into the research and development of the system
is limited. The expenditures must be justified. Thus the developed system as
well within the budget and this was achieved because most of the technologies
used are freely available. Only the customized products had to be purchased.
TECHNICAL FEASIBILITY:
This study is carried out to check the technical feasibility, that is,
the technical requirements of the system. Any system developed must not have a
high demand on the available technical resources. This will lead to high
demands on the available technical resources. This will lead to high demands
being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this
system.
SOCIAL
FEASIBILITY:
The aspect of study is to check the
level of acceptance of the system by the user. This includes the process of
training the user to use the system efficiently. The user must not feel threatened
by the system, instead must accept it as a necessity. The level of acceptance
by the users solely depends on the methods that are employed to educate the
user about the system and to make him familiar with it. His level of confidence
must be raised so that he is also able to make some constructive criticism,
which is welcomed, as he is the final user of the system.
CHAPTER 11
SYSTEM TESTING
The purpose of testing is to
discover errors. Testing is the process of trying to discover every conceivable
fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished
product It is the process of exercising software with the intent of ensuring
that the Software system meets its requirements and user expectations and does
not fail in an unacceptable manner. There are various types of test. Each test
type addresses a specific testing requirement.
TYPES OF TESTS:
Testing
is the process of trying to discover every conceivable fault or weakness in a
work product. The different type of
testing are given below:
UNIT TESTING:
Unit
testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It
is the testing of individual software units of the application .it is done
after the completion of an individual unit before integration.
This
is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific
business process, application, and/or system configuration. Unit tests ensure
that each unique path of a business process performs accurately to the
documented specifications and contains clearly defined inputs and expected
results.
INTEGRATION TESTING:
Integration tests are designed to
test integrated software components to determine if they actually run as one
program. Testing is event driven and is
more concerned with the basic outcome of screens or fields. Integration tests
demonstrate that although the components were individually satisfaction, as
shown by successfully unit testing, the combination of components is correct
and consistent. Integration testing is specifically aimed at exposing the problems that arise from the
combination of components.
FUNCTIONAL TEST:
Functional tests provide systematic demonstrations that functions tested
are available as specified by the business and technical requirements, system
documentation, and user manuals.
Functional testing is centered on the
following items:
Valid Input : identified
classes of valid input must be accepted.
Invalid Input : identified
classes of invalid input must be rejected.
Functions :
identified functions must be exercised.
Output : identified
classes of application outputs must be exercised.
Systems/ Procedures: interfacing systems or
procedures must be invoked.
Organization and preparation of functional
tests is focused on requirements, key functions, or special test cases. In
addition, systematic coverage pertaining to identify Business process flows; data
fields, predefined processes, and successive processes must be considered for
testing. Before functional testing is complete, additional tests are identified
and the effective value of current tests is determined.
SYSTEM TEST:
System
testing ensures that the entire integrated software system meets requirements.
It tests a configuration to ensure known and predictable results. An example of
system testing is the configuration oriented system integration test. System
testing is based on process descriptions and flows, emphasizing pre-driven
process links and integration points.
WHITE BOX TESTING:
White
Box Testing is a testing in which in which the software tester has knowledge of
the inner workings, structure and language of the software, or at least its
purpose. It is purpose. It is used to test areas that cannot be reached from a
black box level.
BLACK BOX TESTING:
Black Box Testing is testing the
software without any knowledge of the inner workings, structure or language of
the module being tested. Black box tests, as most other kinds of tests, must be
written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a
testing in which the software under test is treated, as a black box .you cannot
“see” into it. The test provides inputs and responds to outputs without
considering how the software works.
UNIT TESTING:
Unit
testing is usually conducted as part of a combined code and unit test phase of
the software lifecycle, although it is not uncommon for coding and unit testing
to be conducted as two distinct phases.
Test strategy and approach
Field
testing will be performed manually and functional tests will be written in
detail.
Test objectives
·
All field entries must work properly.
·
Pages must be activated from the
identified link.
·
The entry screen, messages and responses
must not be delayed.
Features to be tested
·
Verify that the entries are of the
correct format
·
No duplicate entries should be allowed
·
All links should take the user to the
correct page.
INTEGRATION TESTING:
Software
integration testing is the incremental integration testing of two or more
integrated software components on a single platform to produce failures caused
by interface defects.
The
task of the integration test is to check that components or software
applications, e.g. components in a software system or – one step up – software
applications at the company level – interact without error.
Test Results: All
the test cases mentioned above passed successfully. No defects encountered.
ACCEPTANCE TESTING:
User
Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the
functional requirements.
Test Results: All
the test cases mentioned above passed successfully. No defects encountered
CHAPTER 12
FUTURE
WORK
In our future studies,
we will further consider other factors
that affect worker
quality, such as answer time and
task difficulty. And these factors will help realize the comprehensive evaluation of worker quality
to adapt the worker quality
evaluation issue under different situations for the crowdsourcing mode in a big
data environment.
CHAPTER 13
Register
<%@page import="com.oreilly.servlet.*,java.sql.*,java.lang.*,java.text.SimpleDateFormat,java.util.*,java.io.*,javax.servlet.*,javax.servlet.http.*"
%>
<%@ page import="java.sql.*"%>
<%@ include file="connect.jsp" %>
<%@ page import="java.util.Date" %>
<title>User Register</title>
<%
ArrayList list = new ArrayList();
ServletContext context = getServletContext();
String dirName
=context.getRealPath("Gallery/");
String paramname = null;
String uname = "", pass = null, email =
null, mobile = null, address = null;
String dob = null, gender = null, pincode = null,
location = null, image = null;
File file1 = null;
FileInputStream fs = null, fs1 = null;
try {
MultipartRequest multi = new
MultipartRequest(request, dirName, 10 *
1024 * 1024); // 10MB
Enumeration params = multi.getParameterNames();
while (params.hasMoreElements()) {
paramname = (String) params.nextElement();
if (paramname.equalsIgnoreCase("userid"))
{
uname = multi.getParameter(paramname);
}
if (paramname.equalsIgnoreCase("pass")) {
pass = multi.getParameter(paramname);
}
if (paramname.equalsIgnoreCase("email")) {
email = multi.getParameter(paramname);
}
if (paramname.equalsIgnoreCase("mobile"))
{
mobile = multi.getParameter(paramname);
}
if (paramname.equalsIgnoreCase("address"))
{
address = multi.getParameter(paramname);
}
if (paramname.equalsIgnoreCase("dob")) {
dob = multi.getParameter(paramname);
}
if (paramname.equalsIgnoreCase("gender"))
{
gender = multi.getParameter(paramname);
}
if (paramname.equalsIgnoreCase("pin")) {
pincode = multi.getParameter(paramname);
}
if (paramname.equalsIgnoreCase("loc")) {
location = multi.getParameter(paramname);
}
if (paramname.equalsIgnoreCase("pic")) {
image = multi.getParameter(paramname);
}
}
int f = 0;
Enumeration files = multi.getFileNames();
while (files.hasMoreElements()) {
paramname = (String) files.nextElement();
if (paramname.equals("d1")) {
paramname = null;
}
if (paramname != null) {
f = 1;
image = multi.getFilesystemName(paramname);
String fPath =
context.getRealPath("Gallery\\" + image);
file1 = new File(fPath);
fs = new FileInputStream(file1);
list.add(fs);
String query1 = "SELECT * FROM reg WHERE
name='"+ uname + "' ";
Statement st1 = connection.createStatement();
ResultSet rs1 = st1.executeQuery(query1);
if (rs1.next()) {
out.print("UserNmae Already Exists");
%>
<p><a
href="RegisterS.jsp">Back</a> <a
href="index.jsp">Home</a> </p>
<%
} else {
PreparedStatement ps = connection
.prepareStatement("INSERT INTO
reg(name,pass,email,mobile,addr,dob,gender,pin,location,image,st)
values(?,?,?,?,?,?,?,?,?,?,?) ");
ps.setString(1, uname);
ps.setString(2, pass);
ps.setString(3, email);
ps.setString(4, mobile);
ps.setString(5, address);
ps.setString(6, dob);
ps.setString(7, gender);
ps.setString(8, pincode);
ps.setString(9, location);
ps.setString(11,"Waiting");
if (f == 0)
ps.setObject(10, null);
else if (f == 1) {
fs1 = (FileInputStream) list.get(0);
ps.setBinaryStream(10, fs1, fs1.available());
}
int x = ps.executeUpdate();
if (x > 0) {
out.print("Registered Successfully!!!!");
String suc="Registered Successfully!!!!";
application.setAttribute("msg",suc);
response.sendRedirect("owner.jsp");
%>
<p><a
href="ownerreg.jsp">Back</a> <a
href="index.jsp">Home</a></p>
<%
}
}
}}}
catch
(Exception e) {
e.printStackTrace();
out.print(e.getMessage());
}
%>
Login
<%@ page isThreadSafe="false" %>
<title>Authentication Page</title>
<%@ page language="java"
contentType="text/html; charset=ISO-8859-1"
pageEncoding="ISO-8859-1"%>
<%@page import="java.util.*"%>
<%@ include file="connect.jsp"%>
<%
String name =
request.getParameter("userid");
String pass =
request.getParameter("pass");
try {
String aut = "Authorized";
String sql = "SELECT * FROM reg where
name='" + name
+ "' and pass='" + pass + "' and
st='" + aut + "' ";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(sql);
String utype = "";
if (rs.next()) {
String id=rs.getString(1);
application.setAttribute("uename", name);
application.setAttribute("id", id);
session.setAttribute("name",name);
String email=rs.getString(4);
session.setAttribute("email",email);
System.out.println(name);
response.sendRedirect("ownerhome.jsp");
} else {
response.sendRedirect("wronglogin.html");
}
} catch (Exception e) {
out.print(e);
e.printStackTrace();
}
%>
Upload
<%@ page import="java.sql.*"%>
<%@ page import="databaseconnection.*"%>
<%@ page import="java.io.*,java.util.*,
javax.servlet.*" %>
<%@ page import="javax.servlet.http.*"
%>
<%@ page
import="org.apache.commons.fileupload.*" %>
<%@ page
import="org.apache.commons.fileupload.disk.*" %>
<%@ page import="org.apache.commons.fileupload.servlet.*"
%>
<%@ page
import="org.apache.commons.io.output.*" %>
<%
String
memory=null,used=null,free=null,upload_kbs=null,status1=null;
double kilobytes=0, size=0, rem=0;
int free1=0,oo2=0;
double kb=0;
boolean isMultipart = ServletFileUpload.isMultipartContent(request);
Connection conn = databasecon.getconnection();
Statement stt = conn.createStatement();
String
email=(String)session.getAttribute("email");
String
domain_key=request.getParameter("domain_key");
String site=request.getParameter("Site");
Connection con2 = databasecon.getconnection();
Statement st2 = con2.createStatement();
String sss2 = "update domain set used='0 kb'
where customer_mail='"+email+"' and
domain_name='"+site+"'";
int rs2=st2.executeUpdate(sss2);
if(rs2>0)
System.out.println("ready to update
domain");
String ssss = "select Status from domain where
customer_mail='"+email+"' ";
ResultSet rss=stt.executeQuery(ssss);
if(rss.next())
{
status1=rss.getString(1);
}
if (!status1.equals("Proccessing"))
{
// Create a factory for disk-based file items
FileItemFactory factory = new DiskFileItemFactory();
// Create a new file upload handler
ServletFileUpload upload = new
ServletFileUpload(factory);
try {
// Parse the request
List items = upload.parseRequest(request);
Iterator iterator = items.iterator();
while (iterator.hasNext())
{
FileItem item = (FileItem) iterator.next();
if (!item.isFormField())
{
String fileName = item.getName();
String root =
getServletContext().getRealPath("/");
//File path = new File(root + "/uploads");
File path = new
File("D://PROJECTS-2015-2016//profit_netbeans//source_code//profit_net//web//uploads");
if (!path.exists())
{
boolean status = path.mkdirs();
}
File uploadedFile = new File(path + "/" +
fileName);
System.out.println("root:"+root);
System.out.println("fileName:"+fileName);
item.write(uploadedFile);
//File file =new File("D:\\ssss.html");
double bytes = uploadedFile.length();
size = (bytes / 1024);
double megabytes = (kilobytes / 1024);
System.out.println("bytes : " + bytes);
System.out.println("kilobytes : " + size);
try
{
Connection con = databasecon.getconnection();
Statement st = con.createStatement();
String sss = "select memory,used,Status from
domain where customer_mail='"+email+"' ";
ResultSet rs=st.executeQuery(sss);
if(rs.next())
{
memory=rs.getString(1);
used=rs.getString(2);
}
free1=Integer.parseInt(memory);
/*
Vector v1=new Vector();
StringTokenizer str= new StringTokenizer(memory);
while(str.hasMoreElements())
{
v1.add(str.nextElement());
}
String o1=v1.get(0).toString();
int oo1=Integer.parseInt(o1);
Vector v2=new Vector();
StringTokenizer st2= new StringTokenizer(used);
while(st2.hasMoreElements())
{
v2.add(st2.nextElement());
}
String o2=v2.get(0).toString();
oo2=Integer.parseInt(o2);
free1=oo1-oo2;
free=Integer.toString(free1);
}
catch(Exception e)
{
System.out.println("eeeeeeeeee"+e);
}
double d = (double) free1;
System.out.println("memory "+d);
System.out.println("size "+size);
if(d>size)
{
System.out.println(uploadedFile.getAbsolutePath());
item.write(uploadedFile);
rem=d-size;
System.out.println("rem "+rem);
}
else
{
response.sendRedirect("create_site.jsp?msg=There
is no free memory!");
}
String
name=(String)session.getAttribute("name");
String
site_name=(String)session.getAttribute("site_nm");
String status=null,uid=null;
try{
Connection con = databasecon.getconnection();
Statement st = con.createStatement();
String sss = "select d.Status,u.id from domain
d,reg u where d.customer_mail= u.email &&
d.customer_mail='"+email+"' ";
ResultSet rs=st.executeQuery(sss);
if(rs.next())
{
status=rs.getString(1);
uid=rs.getString(2);
}
String sss1 = "update site set
file='"+fileName+"',email='"+email+"',username='"+name+"',status='"+status+"',id='"+uid+"'
where sitename='"+site_name+"'";
int rs1=st.executeUpdate(sss1);
if(rs1<=0)
System.out.println("update site");
Double kk=oo2+kb;
String kkk=Double.toString(kk);
String sss11 = "update domain set
used='"+size+"',CONTROL='SITE ACTIVATED' where
customer_mail='"+email+"'";
int rs11=st.executeUpdate(sss11);
if(rs11<=0)
System.out.println("update domain");
}
catch(Exception e)
{
System.out.println("upd :"+e);
}
response.sendRedirect("create_site.jsp?msg=Domain
Successfully Uploaded");
}
}
}
catch (FileUploadException e)
{
System.out.println("ERR 1 "+e);
}
catch (Exception e)
{
System.out.println("ERR 2 "+e);
}
}
else{
response.sendRedirect("create_site.jsp?msg=Admin_not_allocated_the
_memory");
}
%>
View Files
<%@ page
import="java.text.SimpleDateFormat,java.util.*,java.io.*,javax.servlet.*,
javax.servlet.http.*" %>
<%@ page import = "java.util.Date,java.text.SimpleDateFormat,java.text.ParseException"%>
<%@ page
import="java.sql.*,databaseconnection.*"%>
<%
String
s2="",s3="",s4="",s5="",s6="",s7="",s8="";
int i=0,j=0;
try{
Connection con = databasecon.getconnection();
Statement st = con.createStatement();
String name = (String)
session.getAttribute("name");
System.out.println("cloudhome:" +name);
String sql="SELECT * FROM upload where cloud=
'"+name+"' ";
ResultSet rs=st.executeQuery(sql);
while(rs.next())
{
s2=rs.getString("username");
s3=rs.getString("filename");
s4=rs.getString("filetype");
s5=rs.getString("cloud");
session.setAttribute("user",s3);
System.out.println("call:" +s3);
s6=rs.getString("Key");
s7=rs.getString("date");
s8=rs.getString("count");
%>
<%
}
}
catch(Exception e)
{
out.println(e.getMessage());
}
%>
Search
<%@ page
import="java.sql.*"
import="databaseconnection.*"%>
<%@ page import="java.io.*,java.util.*,
javax.servlet.*" %>
<%@ page import="javax.servlet.http.*"
%>
<%
String fname=
request.getParameter("filename");
//session.setAttribute("key_word",
filename);
String
username=null,date=null,document=null,keyword=null,cloud=null;
try
{
Connection con4 = databasecon.getconnection();
Statement st4 = con4.createStatement();
String sss4 = "select * from upload where
filename='"+fname+"' ";
ResultSet rs4=st4.executeQuery(sss4);
if(rs4.next())
{
try
{
Connection con = databasecon.getconnection();
Statement st = con.createStatement();
String sss = "select * from upload where
filename='"+fname+"' ";
ResultSet rs=st.executeQuery(sss);
while(rs.next())
{
username=rs.getString("username");
session.setAttribute("username",
username);
fname=rs.getString("filename");
//document=rs.getString("document");
keyword=rs.getString("keyword");
date=rs.getString("date");
cloud =rs.getString("cloud");
session.setAttribute("cloud",cloud);
session.setAttribute("fname", fname);
%>
<%
}
}
catch(Exception e)
{
System.out.println(e);
}
}
else
{
out.println("<script>alert('NO SUCH
KEYWORD
MATCH..!')</script>");
//response.sendRedirect("search.jsp");
}
}
catch(Exception e4)
{
System.out.println(e4);
}%>
Download
<%@page
import="java.sql.ResultSet"%>
<%@page
import="java.sql.Statement"%>
<%@page
import="java.sql.Connection"%>
<%@ page
import="java.sql.*,java.io.*"
%>
<%@page import="com.oreilly.servlet.*,java.sql.*,java.lang.*,databaseconnection.*,java.text.SimpleDateFormat,java.util.*"
%>
<%@ page import =
"java.util.Date,java.text.SimpleDateFormat,java.text.ParseException"%>
<%@page
import="java.io.OutputStream"%>
<%
String fname = (String) session.getAttribute("file");
System.out.println(fname);
Blob b=null;
String getFile = request.getQueryString();
Connection con = databasecon.getconnection();
Statement st = con.createStatement();
ResultSet rs = st.executeQuery("select * from
upload where filename = '" + fname + "'");
if (rs.next())
{
b = rs.getBlob(1);
String document= null;
document+=".doc";
byte[] ba = b.getBytes(1, (int)b.length());
response.setContentType("application/txt");
response.setHeader("Content-Disposition",
"attachment; filename="+rs.getString(3));
OutputStream os = response.getOutputStream();
os.write(ba);
os.close();
ba = null;
fname=rs.getString("filename");
try{
Class.forName("com.mysql.jdbc.Driver");
st=con.createStatement();
String sql1="select * from upload where
filename='"+fname+"'";
rs=st.executeQuery(sql1);
while(rs.next())
{
int count=0;
try{
Class.forName("com.mysql.jdbc.Driver");
Connection con2 =
DriverManager.getConnection("jdbc:mysql://localhost:3306/search","root","root");
PreparedStatement
ps=con.prepareStatement("Update upload set count=count+1 where
filename='"+fname+"' ");
//ps.setInt(1,hit);
int x=ps.executeUpdate();
}
catch (Exception ex)
{
out.println(ex.getMessage());
}}}
catch (Exception e){
out.println(e.getMessage());}}%>
SCREEN
SHORT
HOME
PAGE:
LOGIN:
VIEW
JOB:
CLOUD
LOGIN:
VIEW
COMPANY DETAILS:
SEARCH
JOB:
SEARCH:
DOWNLOAD
FILE:
CHAPTER 14
CONCLUSION
In this paper, we first
proposed a general worker
quality evaluation algorithm, which is
applied to any
critical crowdsourcing tasks without
pre-developed answers. Then, to satisfy the demand of parallel
evaluation for a multitude of workers
in a
big data environment, we implement
the proposed algorithm in the Hadoop platform using the MapReduce
programming model. The experimental
results show that the algorithm is accurate and has high efficiency and performance in a big data environment.
REFERENCES
[1] D.C.
Brabham, “Crowdsourcing as a Model for
Problem Solv-ing: An Introduction and Cases,”
Convergence the Internation-al Journal of Research Into New Media Technologies, vol. 14, no. 1,
pp. 75-90, 2008.
[2] M. Allahbakhsh,
B. Benatallah, A.
Ignjatovic, et al, “Quality Control in Crowdsourcing Systems:
Issues and Directions,” IEEE Internet Computing, vol. 17, no. 2, pp. 76-81,
2013.
[3] A. Doan, R.
Ramakrishnan, and A.Y. Halevy,
“Crowdsourcing Systems on
the World-Wide Web,”
Communications of the ACM, vol. 54, no. 4, pp. 86-96, 2011.
[4] P. Clough, M.
Sanderson, J. Tang, et al, “Examining the Limits of Crowdsourcing for
Relevance Assessment,” IEEE Internet
Computing, vol. 17, no. 4, pp. 32-38, 2013.
[5] B. Carpenter,
“Multilevel Bayesian Models of Categorical Data Annotation,”
unpublished, 2008.
[6] A. Brew, D.
Greene, and P. Cunningham,
“Using crowdsourc-ing and active
learning to track sentiment in online media,”
In Proceedings of the 6th Conference on Prestigious Applications of
Intelligent Systems, 2010.
[7] J. Howe,
“The Rise of Crowdsourcing,”
Wired Magazine, vol. 14, no.14, pp. 176-183, 2006.
[8] V. C.
Raykar, S. Yu, L. H. Zhao, et al, “Learning From Crowds,”Journal of Machine
Learning Research, vol. 11, no. 2, pp.
1297-1322, 2010.
[9] J.
Manyika, M. Chui, B. Brown, et al, “Big
Data: The next frontier for innovation, competition, and productivity,” 2011.
[10] S. C.H.
Hoi, J. Wang, P. Zhao, et al, “Online feature selection for mining big data,”
BigMine, pp. 93-100, 2012.
[11] K. Michael, K.W.
Miller, “Big Data: New
Opportunities and New Challenges,” Computer, vol. 46, no. 6, pp. 22-24, 2013.
[12] C.
Lynch, “Big Data: How do your data grow?,”
Nature, Vol.455, No. 7209, pp. 28-29, 2008.
[13] F.
Chang, J. Dean, S. Ghemawat, et al, “Bigtable: A distributed storage system for
structured data,” ACM Transactions on
Computer Systems, Vol. 26, No.4, 2008.
[14] M. Joglekar, H.
Garcia-Molina, and A.
Parameswaran, “Evalu-ating the
crowd with confidence,” Proceedings of
the 19th ACM SIGKDD international conference on Knowledge discov-ery and data
mining, ACM, pp. 686-694, 2013.
[15] J. Zhang, T.
Li, and Y.
Pan, “Parallel rough set based
knowledge acquisition using MapReduce from big data,” Big-Mine, pp. 20-27, 2012.
[16] J. Dean,
and S. Ghemawat, “MapReduce: Simplified
data pro-cessing on large
clusters,” Communications of the
ACM, vol. 51, no.1, pp. 107-113, 2005.
[17] D.
Hastorun, M. Jampani, G. Kakulapati, et
al, “Dynamo: Ama-zons highly available key-value store,” In: Proceedings of the 21st ACM Symposium on
Operating Systems Principles, pp.
205-220, 2007.
[18] M.
Isard, M. Budiu, Y. Yu, et al, “Dryad:
Distributed data-parallel programs from sequential build-ing
blocks,” European Conference on Computer
Systems, pp. 59-72, 2007.
[19] J. Wang, T.
Kraska, M. J. Franklin, et al,
“CrowdER: crowdsourcing entity resolution,” Proceedings of the VLDB Endowment, vol. 5,
no. 11, pp. 1483-1494, 2012.
[20] N. Maisonneuve,
and B. Chopard,
“Crowdsourcing Satellite Imagery Analysis: Study of Parallel and
Iterative Models,” GIScience, pp. 116-131, 2012.
Comments
Post a Comment