Abstract—With the increasing volume of images users share through social sites, maintaining privacy has become a major problem,
as demonstrated by a recent wave of publicized incidents where users inadvertently shared personal information. In light of these
incidents, the need of tools to help users control access to their shared content is apparent. Toward addressing this need, we propose
an Adaptive Privacy Policy Prediction (A3P) system to help users compose privacy settings for their images. We examine the role of
social context, image content, and metadata as possible indicators of users’ privacy preferences. We propose a two-level framework
which according to the user’s available history on the site, determines the best available privacy policy for the user’s images being
uploaded. Our solution relies on an image classification framework for image categories which may be associated with similar policies,
and on a policy prediction algorithm to automatically generate a policy for each newly uploaded image, also according to users’ social
features. Over time, the generated policies will follow the evolution of users’ privacy attitude. We provide the results of our extensive
evaluation over 5,000 policies, which demonstrate the effectiveness of our system, with prediction accuracies over 90 percent.
INTRODUCTION
IMAGES are now one of the key enablers of users’ connectivity.
Sharing takes place both among previously
established groups of known people or social circles (e.
g., Google+, Flickr or Picasa), and also increasingly with
people outside the users social circles, for purposes of
social discovery-to help them identify new peers and
learn about peers interests and social surroundings.
However, semantically rich images may reveal contentsensitive
information []. Consider a photo of a students
2012 graduationceremony, for example. It could be
shared within a Google+ circle or Flickr group, but may
unnecessarily expose the studentsBApos familymembers
and other friends. Sharing images within online content
sharing sites,therefore,may quickly leadto unwanted disclosure
and privacy violations [3], [24]. Further, the persistent
nature of online media makes it possible for
other users to collect rich aggregated information about
the owner of the published content and the subjects in
the published content [3], [20], [24]. The aggregated
information can result in unexpected exposure of one’s
social environment and lead to abuse of one’s personal
information
Corresponding to the aforementioned two criteria, the
proposed A3P system is comprised of two main building
blocks (as shown in Fig. 1): A3P-Social and A3P-Core. The
A3P-core focuses on analyzing each individual user’s own
images and metadata, while the A3P-Social offers a community
perspective of privacy setting recommendations for a
user’s potential privacy improvement. We design the interaction
flows between the two building blocks to balance the
benefits from meeting personal characteristics and obtaining
community advice.
To assess the practical value of our approach, we built a
system prototype and performed an extensive experimental
evaluation. We collected and tested over 5,500 real policies
generated by more than 160 users. Our experimental results
demonstrate both efficiency and high prediction accuracy of
our system.
A preliminary discussion of the A3P-core was presented
in [32]. In this work, we present an overhauled version of
A3P, which includes an extended policy prediction
algorithm in A3P-core (that is now parameterized based on
user groups and also factors in possible outliers), and a new
A3P-social module that develops the notion of social context
to refine and extend the prediction power of our system. We
also conduct additional experiments with a new data set
collecting over 1,400 images and corresponding policies,
and we extend our analysis of the empirical results to unveil
more insights of our system’s performance.
The rest of the paper is organized as follows. Section 2
reviews related works. Section 3 introduces preliminary
notions. Section 4 introduces the A3P-core and Section 5
introduces the A3P-Social. Section 6 reports the experimental
evaluation. Finally, Section 7 concludes the paper.
2 RELATED WORK
Our work is related to works on privacy setting configuration
in social sites, recommendation systems, and privacy
analysis of online images.
2.1 Privacy Setting Configuration
Several recent works have studied how to automate the task
of privacy settings (e.g., [7], [15], [20], [22], [27], [28]).
Bonneau et al. [7] proposed the concept of privacy suites
which recommend to users a suite of privacy settings that
“expert” users or other trusted friends have already set, so
that normal users can either directly choose a setting or only
need to do minor modification. Similarly, Danezis [8] proposed
a machine-learning based approach to automatically
extract privacy settings from the social context within
which the data is produced. Parallel to the work of Danezis,
Adu-Oppong et al. [15] develop privacy settings based on a
concept of “Social Circles” which consist of clusters of
friends formed by partitioning users’ friend lists. Ravichandran
et al. [30] studied how to predict a user’s privacy preferences
for location-based data (i.e., share her location or
not) based on location and time of day. Fang et al. [28] proposed
a privacy wizard to help users grant privileges to
their friends. The wizard asks users to first assign privacy
labels to selected friends, and then uses this as input to construct
a classifier which classifies friends based on their profiles
and automatically assign privacy labels to the
unlabeled friends. More recently, Klemperer et al. [20] studied
whether the keywords and captions with which users
tag their photos can be used to help users more intuitively
create and maintain access-control policies. Their findings
are inline with our approach: tags created for organizational
purposes can be repurposed to help create reasonably accurate
access-control rules.
The aforementioned approaches focus on deriving policy
settings for only traits, so they mainly consider social context
such as one’s friend list. While interesting, they may
not be sufficient to address challenges brought by image
files for which privacy may vary substantially not just
because of social context but also due to the actual image
content. As far as images, authors in [41] have presented an
expressive language for images uploaded in social sites.
This work is complementary to ours as we do not deal with
policy expressiveness, but rely on common forms policy
specification for our predictive algorithm.
In addition, there is a large body of work on image content
analysis, for classification and interpretation (e.g., [14],
[37], [46]), retrieval ([12], [13] are some examples), and
photo ranking [35], [40], also in the context of online photo
sharing sites, such as Flickr [10], [29], [36]. Of these works,
Zerr’s work [43] is probably the closest to ours. Zerr
explores privacy-aware image classification using a mixed
set of features, both content and meta-data. This is however
a binary classification (private versus public), so the classification
task is very different than ours. Also, the authors do
not deal with the issue of cold-start problem.
2.2 Recommendation Systems
Our work is related to some existing recommendation systems
which employ machine learning techniques.
Chen et al. [9] proposed a system named SheepDog to
automatically insert photos into appropriate groups and
recommend suitable tags for users on Flickr. They adopt
concept detection to predict relevant concepts (tags) of a
photo. Choudhury et al. [10] proposed a recommendation
framework to connect image content with communities
in online social media. They characterize images
through three types of features: visual features, user generated
text tags, and social interaction, from which they
recommend the most likely groups for a given image.
Similarly, Yu et al. [42] proposed an automated recommendation
system for a user’s images to suggest suitable
photo-sharing groups.
There is also a large body of work on the customization
and personalization of tag-based information retrieval (e.g.,
[21], [23], [45]), which utilizes techniques such as association
rule mining. For example, [45] proposes an interesting
experimental evaluation of several collaborative filtering
algorithms to recommend groups for Flickr users. These
approaches have a totally different goal to our approach as
they focus on sharing rather than protecting the content.
3 A3P FRAMEWORK
3.1 Preliminary Notions
Users can express their privacy preferences about their content
disclosure preferences with their socially connected
users via privacy policies. We define privacy policies
according to Definition 1. Our policies are inspired by popular
content sharing sites (i.e., Facebook, Picasa, Flickr),
although the actual implementation depends on the specific
content-management site structure and implementation.
Definition 1. A privacy policy P of user u consists of the following
components:
Subject (S): A set of users socially connected to u.
Data (D): A set of data items shared by u.
Action (A): A set of actions granted by u to S on D.
Condition (C): A boolean expression which must be
satisfied in order to perform the granted actions.
In the definition, users in S can be represented by their
identities, roles (e.g., family, friend, coworkers), or organizations
(e.g., non-profit organization, profit organization).
D will be the set of images in the user’s profile. Each image
has a unique ID along with some associated metadata like
tags “vacation”, “birthday”. Images can be further grouped
into albums. As for A, we consider four common types of
actions: {view, comment, tag, download}. Last, the condition
component C specifies when the granted action is effective.
C is a Boolean expression on the grantees’ attributes like
time, location, and age. For better understanding, an example
policy is given below.
Example 1. Alice would like to allow her friends and coworkers
to comment and tag images in the album named
“vacation album” and the image named “summer.jpg”
before year 2012. Her privacy preferences can be
expressed by the following policy:
P: ½{friend, coworker}, {vacation_album, summer.jpg},
{comment, tag}, (date< 2012) .
3.2 System Overview
The A3P system consists of two main components: A3P-core
and A3P-social. The overall data flow is the following.
When a user uploads an image, the image will be first sent
to the A3P-core. The A3P-core classifies the image and
determines whether there is a need to invoke the A3P-social.
In most cases, the A3P-core predicts policies for the users
directly based on their historical behavior. If one of the following
two cases is verified true, A3P-core will invoke A3Psocial:
(i) The user does not have enough data for the type
of the uploaded image to conduct policy prediction; (ii) The
A3P-core detects the recent major changes among the user’s
community about their privacy practices along with user’s
increase of social networking activities (addition of new
friends, new posts on one’s profile etc). In above cases, it
would be beneficial to report to the user the latest privacy
practice of social communities that have similar background
as the user. The A3P-social groups users into social communities
with similar social context and privacy preferences,
and continuously monitors the social groups. When the
A3P-social is invoked, it automatically identifies the social
group for the user and sends back the information about the
group to the A3P-core for policy prediction. At the end, the
predicted policy will be displayed to the user. If the user is
fully satisfied by the predicted policy, he or she can just
accept it. Otherwise, the user can choose to revise the policy.
The actual policy will be stored in the policy repository of
the system for the policy prediction of future uploads.
4 A3P-CORE
There are two major components in A3P-core: (i) Image classification
and (ii) Adaptive policy prediction. For each user,
his/her images are first classified based on content and
metadata. Then, privacy policies of each category of images
are analyzed for the policy prediction.
Adopting a two-stage approach is more suitable for policy
recommendation than applying the common one-stage
data mining approaches to mine both image features and
policies together. Recall that when a user uploads a new
image, the user is waiting for a recommended policy. The
two-stage approach allows the system to employ the first
stage to classify the new image and find the candidate sets
of images for the subsequent policy recommendation. As
for the one-stage mining approach, it would not be able to
as demonstrated by a recent wave of publicized incidents where users inadvertently shared personal information. In light of these
incidents, the need of tools to help users control access to their shared content is apparent. Toward addressing this need, we propose
an Adaptive Privacy Policy Prediction (A3P) system to help users compose privacy settings for their images. We examine the role of
social context, image content, and metadata as possible indicators of users’ privacy preferences. We propose a two-level framework
which according to the user’s available history on the site, determines the best available privacy policy for the user’s images being
uploaded. Our solution relies on an image classification framework for image categories which may be associated with similar policies,
and on a policy prediction algorithm to automatically generate a policy for each newly uploaded image, also according to users’ social
features. Over time, the generated policies will follow the evolution of users’ privacy attitude. We provide the results of our extensive
evaluation over 5,000 policies, which demonstrate the effectiveness of our system, with prediction accuracies over 90 percent.
INTRODUCTION
IMAGES are now one of the key enablers of users’ connectivity.
Sharing takes place both among previously
established groups of known people or social circles (e.
g., Google+, Flickr or Picasa), and also increasingly with
people outside the users social circles, for purposes of
social discovery-to help them identify new peers and
learn about peers interests and social surroundings.
However, semantically rich images may reveal contentsensitive
information []. Consider a photo of a students
2012 graduationceremony, for example. It could be
shared within a Google+ circle or Flickr group, but may
unnecessarily expose the studentsBApos familymembers
and other friends. Sharing images within online content
sharing sites,therefore,may quickly leadto unwanted disclosure
and privacy violations [3], [24]. Further, the persistent
nature of online media makes it possible for
other users to collect rich aggregated information about
the owner of the published content and the subjects in
the published content [3], [20], [24]. The aggregated
information can result in unexpected exposure of one’s
social environment and lead to abuse of one’s personal
information
Corresponding to the aforementioned two criteria, the
proposed A3P system is comprised of two main building
blocks (as shown in Fig. 1): A3P-Social and A3P-Core. The
A3P-core focuses on analyzing each individual user’s own
images and metadata, while the A3P-Social offers a community
perspective of privacy setting recommendations for a
user’s potential privacy improvement. We design the interaction
flows between the two building blocks to balance the
benefits from meeting personal characteristics and obtaining
community advice.
To assess the practical value of our approach, we built a
system prototype and performed an extensive experimental
evaluation. We collected and tested over 5,500 real policies
generated by more than 160 users. Our experimental results
demonstrate both efficiency and high prediction accuracy of
our system.
A preliminary discussion of the A3P-core was presented
in [32]. In this work, we present an overhauled version of
A3P, which includes an extended policy prediction
algorithm in A3P-core (that is now parameterized based on
user groups and also factors in possible outliers), and a new
A3P-social module that develops the notion of social context
to refine and extend the prediction power of our system. We
also conduct additional experiments with a new data set
collecting over 1,400 images and corresponding policies,
and we extend our analysis of the empirical results to unveil
more insights of our system’s performance.
The rest of the paper is organized as follows. Section 2
reviews related works. Section 3 introduces preliminary
notions. Section 4 introduces the A3P-core and Section 5
introduces the A3P-Social. Section 6 reports the experimental
evaluation. Finally, Section 7 concludes the paper.
2 RELATED WORK
Our work is related to works on privacy setting configuration
in social sites, recommendation systems, and privacy
analysis of online images.
2.1 Privacy Setting Configuration
Several recent works have studied how to automate the task
of privacy settings (e.g., [7], [15], [20], [22], [27], [28]).
Bonneau et al. [7] proposed the concept of privacy suites
which recommend to users a suite of privacy settings that
“expert” users or other trusted friends have already set, so
that normal users can either directly choose a setting or only
need to do minor modification. Similarly, Danezis [8] proposed
a machine-learning based approach to automatically
extract privacy settings from the social context within
which the data is produced. Parallel to the work of Danezis,
Adu-Oppong et al. [15] develop privacy settings based on a
concept of “Social Circles” which consist of clusters of
friends formed by partitioning users’ friend lists. Ravichandran
et al. [30] studied how to predict a user’s privacy preferences
for location-based data (i.e., share her location or
not) based on location and time of day. Fang et al. [28] proposed
a privacy wizard to help users grant privileges to
their friends. The wizard asks users to first assign privacy
labels to selected friends, and then uses this as input to construct
a classifier which classifies friends based on their profiles
and automatically assign privacy labels to the
unlabeled friends. More recently, Klemperer et al. [20] studied
whether the keywords and captions with which users
tag their photos can be used to help users more intuitively
create and maintain access-control policies. Their findings
are inline with our approach: tags created for organizational
purposes can be repurposed to help create reasonably accurate
access-control rules.
The aforementioned approaches focus on deriving policy
settings for only traits, so they mainly consider social context
such as one’s friend list. While interesting, they may
not be sufficient to address challenges brought by image
files for which privacy may vary substantially not just
because of social context but also due to the actual image
content. As far as images, authors in [41] have presented an
expressive language for images uploaded in social sites.
This work is complementary to ours as we do not deal with
policy expressiveness, but rely on common forms policy
specification for our predictive algorithm.
In addition, there is a large body of work on image content
analysis, for classification and interpretation (e.g., [14],
[37], [46]), retrieval ([12], [13] are some examples), and
photo ranking [35], [40], also in the context of online photo
sharing sites, such as Flickr [10], [29], [36]. Of these works,
Zerr’s work [43] is probably the closest to ours. Zerr
explores privacy-aware image classification using a mixed
set of features, both content and meta-data. This is however
a binary classification (private versus public), so the classification
task is very different than ours. Also, the authors do
not deal with the issue of cold-start problem.
2.2 Recommendation Systems
Our work is related to some existing recommendation systems
which employ machine learning techniques.
Chen et al. [9] proposed a system named SheepDog to
automatically insert photos into appropriate groups and
recommend suitable tags for users on Flickr. They adopt
concept detection to predict relevant concepts (tags) of a
photo. Choudhury et al. [10] proposed a recommendation
framework to connect image content with communities
in online social media. They characterize images
through three types of features: visual features, user generated
text tags, and social interaction, from which they
recommend the most likely groups for a given image.
Similarly, Yu et al. [42] proposed an automated recommendation
system for a user’s images to suggest suitable
photo-sharing groups.
There is also a large body of work on the customization
and personalization of tag-based information retrieval (e.g.,
[21], [23], [45]), which utilizes techniques such as association
rule mining. For example, [45] proposes an interesting
experimental evaluation of several collaborative filtering
algorithms to recommend groups for Flickr users. These
approaches have a totally different goal to our approach as
they focus on sharing rather than protecting the content.
3 A3P FRAMEWORK
3.1 Preliminary Notions
Users can express their privacy preferences about their content
disclosure preferences with their socially connected
users via privacy policies. We define privacy policies
according to Definition 1. Our policies are inspired by popular
content sharing sites (i.e., Facebook, Picasa, Flickr),
although the actual implementation depends on the specific
content-management site structure and implementation.
Definition 1. A privacy policy P of user u consists of the following
components:
Subject (S): A set of users socially connected to u.
Data (D): A set of data items shared by u.
Action (A): A set of actions granted by u to S on D.
Condition (C): A boolean expression which must be
satisfied in order to perform the granted actions.
In the definition, users in S can be represented by their
identities, roles (e.g., family, friend, coworkers), or organizations
(e.g., non-profit organization, profit organization).
D will be the set of images in the user’s profile. Each image
has a unique ID along with some associated metadata like
tags “vacation”, “birthday”. Images can be further grouped
into albums. As for A, we consider four common types of
actions: {view, comment, tag, download}. Last, the condition
component C specifies when the granted action is effective.
C is a Boolean expression on the grantees’ attributes like
time, location, and age. For better understanding, an example
policy is given below.
Example 1. Alice would like to allow her friends and coworkers
to comment and tag images in the album named
“vacation album” and the image named “summer.jpg”
before year 2012. Her privacy preferences can be
expressed by the following policy:
P: ½{friend, coworker}, {vacation_album, summer.jpg},
{comment, tag}, (date< 2012) .
3.2 System Overview
The A3P system consists of two main components: A3P-core
and A3P-social. The overall data flow is the following.
When a user uploads an image, the image will be first sent
to the A3P-core. The A3P-core classifies the image and
determines whether there is a need to invoke the A3P-social.
In most cases, the A3P-core predicts policies for the users
directly based on their historical behavior. If one of the following
two cases is verified true, A3P-core will invoke A3Psocial:
(i) The user does not have enough data for the type
of the uploaded image to conduct policy prediction; (ii) The
A3P-core detects the recent major changes among the user’s
community about their privacy practices along with user’s
increase of social networking activities (addition of new
friends, new posts on one’s profile etc). In above cases, it
would be beneficial to report to the user the latest privacy
practice of social communities that have similar background
as the user. The A3P-social groups users into social communities
with similar social context and privacy preferences,
and continuously monitors the social groups. When the
A3P-social is invoked, it automatically identifies the social
group for the user and sends back the information about the
group to the A3P-core for policy prediction. At the end, the
predicted policy will be displayed to the user. If the user is
fully satisfied by the predicted policy, he or she can just
accept it. Otherwise, the user can choose to revise the policy.
The actual policy will be stored in the policy repository of
the system for the policy prediction of future uploads.
4 A3P-CORE
There are two major components in A3P-core: (i) Image classification
and (ii) Adaptive policy prediction. For each user,
his/her images are first classified based on content and
metadata. Then, privacy policies of each category of images
are analyzed for the policy prediction.
Adopting a two-stage approach is more suitable for policy
recommendation than applying the common one-stage
data mining approaches to mine both image features and
policies together. Recall that when a user uploads a new
image, the user is waiting for a recommended policy. The
two-stage approach allows the system to employ the first
stage to classify the new image and find the candidate sets
of images for the subsequent policy recommendation. As
for the one-stage mining approach, it would not be able to
Comments
Post a Comment