Sunday, August 5, 2012

2012, Java IEEE Project Abstracts - Part 2

JAVA IEEE 2012 PROJECT ABSTRACTS

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
PROTECTING LOCATION PRIVACY AGAINST LOCATION-DEPENDENT ATTACKS IN MOBILE SERVICES
Privacy protection has recently received considerable attention in location-based services. A large number of location cloaking algorithms have been proposed for protecting the location privacy of mobile users. In this paper, we consider the scenario where different location-based query requests are continuously issued by mobile users while they are moving.
We show that most of the existing k-anonymity location cloaking algorithms are concerned with snapshotuser locations only and cannot effectively prevent location-dependent attacks when users’ locations are continuously updated. Therefore, adopting both the location k-anonymity and cloaking granularity as privacy metrics, we propose a new incremental clique-based cloaking algorithm, called ICliqueCloak, to defend against location-dependent attacks.
The main idea is to incrementally maintain maximal cliques needed for location cloaking in an undirected graph that takes into consideration the effect of continuous location updates. Thus, a qualified clique can be quickly identified and used to generate the cloaked region when a new request arrives.
The efficiency and effectiveness of the proposed ICliqueCloak algorithm are validated by a series of carefully designed experiments. The experimental results also show that the price paid for defending against location-dependent attacks is small


*------------*------------*------------*------------*------------*------------*

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
OPTIMIZING THE CALCULATION OF CONDITIONAL PROBABILITY TABLES IN HYBRID BAYESIAN NETWORKS USING BINARY FACTORIZATION
Reducing the computational complexity of inference in Bayesian Networks (BNs) is a key challenge. Current algorithms for inference convert a BN to a junction tree structure made up of clusters of the BN nodes and the resulting complexity is time exponential in the size of a cluster.
The need to reduce the complexity is especially acute where the BN contains continuous nodes. We propose a new method for optimizing the calculation of Conditional Probability Tables (CPTs) involving continuous nodes, approximated in Hybrid Bayesian Networks (HBNs), using an approximation algorithm called dynamic discretization.
We present an optimized solution to this problem involving binary factorization of the arithmetical expressions declared to generate the CPTs for continuous nodes for deterministic functions and statistical distributions.
The proposed algorithm is implemented and tested in a commercial Hybrid Bayesian Network software package and the results of the empirical evaluation show significant performance improvement over unfactorized models


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
MAXIMUM AMBIGUITY-BASED SAMPLE SELECTION IN FUZZY DECISION TREE INDUCTION
Sample selection is to select a number of representative samples from a large database such that a learning algorithm can have a reduced computational cost and an improved learning accuracy. This paper gives a new sample selection mechanism, i.e., the maximum ambiguity-based sample selection in fuzzy decision tree induction.
Compared with the existing sample selection methods, this mechanism selects the samples based on the principle of maximal classification ambiguity. The major advantage of this mechanism is that the adjustment of the fuzzy decision tree is minimized when adding selected samples to the training set.
This advantage is confirmed via the theoretical analysis of the leaf-nodes’ frequency in the decision trees. The decision tree generated from the selected samples usually has a better performance than that from the original database.
Furthermore, experimental results show that generalization ability of the tree based on our selection mechanism is far more superior to that based on random selection mechanism


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
MODEL-BASED METHOD FOR PROJECTIVE CLUSTERING
Clustering high-dimensional data is a major challenge due to the curse of dimensionality. To solve this problem, projective clustering has been defined as an extension to traditional clustering that attempts to find projected clusters in subsets of the dimensions of a data space.
In this paper, a probability model is first proposed to describe projected clusters in high-dimensional data space. Then, we present a model-based algorithm for fuzzy projective clustering that discovers clusters with overlapping boundaries in various projected subspaces.
The suitability of the proposal is demonstrated in an empirical study done with synthetic data set and some widely used real-world data set


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
EXTENDING RECOMMENDER SYSTEMS FOR DISJOINT USER/ITEM SETS: THE CONFERENCE RECOMMENDATION PROBLEM
In this paper, we describe the problem of recommending conference sessions to attendees and show how novel extensions to traditional model-based recommender systems, as suggested in Adomavicius and Tuzhilin [1], can address this problem.
We introduce Recommendation Engine by CONjoint Decomposition of ITems and USers ( RECONDITUS)—a technique that is an extension of preference-based recommender systems to recommend items from a new disjoint set to users from a new disjoint set. The assumption being that preferences exhibited by users with known usage behavior (e.g., past conference session attendance), which can be abstracted by projections of user and item matrices, will be similar to ones of new (different) users where the basic environment and item domain are the same (e.g., new conference). RECONDITUS requires no item ratings, but operates on observed user behavior such as past conference session attendance.
The RECONDITUS paradigm consists of projections of both user and item data, and the learning of relationships in projected space. Once established, the relationships enable predicting new relationships and provide associated recommendations.
The approach can encompass several traditional data mining problems where both clustering and prediction are necessary.


*------------*------------*------------*------------*------------*------------*

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
ENERGY-EFFICIENT REVERSE SKYLINE QUERY PROCESSING OVER WIRELESS SENSOR NETWORKS
Reverse skyline query plays an important role in many sensing applications, such as environmental monitoring, habitat monitoring, and battlefield monitoring. Due to the limited power supplies of wireless sensor nodes, the existing centralized approaches, which do not consider energy efficiency, cannot be directly applied to the distributed sensor environment.
In this paper, we investigate how to process reverse skyline queries energy efficiently in wireless sensor networks. Initially, we theoretically analyzed the properties of reverse skyline query and proposed a skyband-based approach to tackle the problem of reverse skyline query answering over wireless sensor networks.
Then, an energy-efficient approach is proposed to minimize the communication cost among sensor nodes of evaluating range reverse skyline query. Moreover, optimization mechanisms to improve the performance of multiple reverse skylines are also discussed. Extensive experiments on both real-world data and synthetic data have demonstrated the efficiency and effectiveness of our proposed approaches with various experimental settings


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
ENERGY EFFICIENT SCHEMES FOR ACCURACY-GUARANTEED SENSOR DATA AGGREGATION USING SCALABLE COUNTING
Sensor networks have received considerable attention in recent years, and are employed in many applications. In these applications, statistical aggregates such as Sum over the readings of a group of sensor nodes are often needed. One challenge for computing sensor data aggregates comes from the communication failures, which are common in sensor networks.
To enhance the robustness of the aggregate computation, multipath-based aggregation is often used. However, the multipath-based aggregation suffers from the problem of overcounting sensor readings. The approaches using the multipath-based aggregation therefore need to incorporate techniques that avoid overcounting sensor readings. In this paper, we present a novel technique named scalable counting for efficiently avoiding the overcounting problem.
We focus on having an (",   ) accuracy guarantee for computing an aggregate, which ensures that the error in computing the aggregate is within a factor of " with probability (1     ). Our schemes using the scalable counting technique efficiently compute the aggregates under a given accuracy guarantee.
We provide theoretical analyses that show the advantages of the scalable counting technique over previously proposed techniques. Furthermore, extensive experiments are made to validate the theoretical results and manifest the advantages of using the scalable counting technique for sensor data aggregation


*------------*------------*------------*------------*------------*------------*

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
EFFICIENT AND PROGRESSIVE ALGORITHMS FOR DISTRIBUTED SKYLINE QUERIES OVER UNCERTAIN DATA
The skyline operator has received considerable attention from the database community, due to its importance in many applications including multicriteria decision making, preference answering, and so forth. In many applications where uncertain data are inherently exist, i.e., data collected from different sources in distributed locations are usually with imprecise measurements, and thus exhibit kind of uncertainty.
Taking into account the network delay and economic cost associated with sharing and communicating large amounts of distributed data over an internet, an important problem in this scenario is to retrieve the global skyline tuples from all the distributed local sites with minimum communication cost.
Based on the well-known notation of the probabilistic skyline query over centralized uncertain data, in this paper, we propose the notation of distributed skyline queries over uncertain data. Furthermore, two communication- and computation-efficient algorithms are proposed to retrieve the qualified skylines from distributed local sites.
Extensive experiments have been conducted to verify the efficiency, the effectiveness and the progressiveness of our algorithms with both the synthetic and real data sets


*------------*------------*------------*------------*------------*------------*

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
EDISC: A CLASS-TAILORED DISCRETIZATION TECHNIQUE FOR RULE-BASED CLASSIFICATION
Discretization is a critical component of data mining whereby continuous attributes of a data set are converted into discrete ones by creating intervals either before or during learning. There are many good reasons for preprocessing discretization, such as increased learning efficiency and classification accuracy, comprehensibility of data mining results, as well as the inherent limitation of a great majority of learning algorithms to handle only discrete data.
Many preprocessing discretization techniques have been proposed to date, of which the Entropy-MDLP discretization has been accepted as by far the most effective in the context of both decision tree learning and rule induction algorithms. This paper presents a new discretization technique EDISC which utilizes the entropy-based principle but takes a class-tailored approach to discretization.
The technique is applicable in general to any covering algorithm, including those that use the class-per-class rule induction methodology such as CN2 as well as those that use a seed example during the learning phase, such as the RULES family.
Experimental evaluation has proved the efficiency and effectiveness of the technique as a preprocessing discretization procedure for CN2 as well as RULES-7, the latest algorithm among the RULESfamily of inductive learning algorithms .


*------------*------------*------------*------------*------------*------------*

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
DISCRIMINATIVE FEATURE SELECTION BY NONPARAMETRIC BAYES ERROR MINIMIZATION
Feature selection is fundamental to knowledge discovery from massive amount of high-dimensional data. In an effort to establish theoretical justification for feature selection algorithms, this paper presents a theoretically optimal criterion, namely, the discriminative optimal criterion (DoC) for feature selection.
Compared with the existing representative optimal criterion (RoC, [21]) which retains maximum information for modeling the relationship between input and output variables, DoC is pragmatically advantageous because it attempts to directly maximize the classification accuracy and naturally reflects the Bayes error in the objective.
To make DoC computationally tractable for practical tasks, we propose an algorithmic framework, which selects a subset of features by minimizing the Bayes error rate estimated by a nonparametric estimator.
A set of existing algorithms as well as new ones can be derived naturally from this framework. As an example, we show that the Relief algorithm [20] greedily attempts to minimize the Bayes error estimated by the k -Nearest-Neighbor ( k NN) method.
This new interpretation insightfully reveals the secret behind the family of margin-based feature selection algorithms [28], [14] and also offers a principled way to establish new alternatives for performance improvement.
In particular, by exploiting the proposed framework, we establish the Parzen-Relief (P-Relief) algorithm based on Parzen window estimator, and the MAP-Relief (M-Relief) which integrates label distribution into the max-margin objective to effectively handle imbalanced and multiclass data. Experiments on various benchmark data sets demonstrate the effectiveness of the proposed algorithms.


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
DISCOVERY OF DELTA CLOSED PATTERNS AND NONINDUCED PATTERNS FROM SEQUENCES
Discovering patterns from sequence data has significant impact in many aspects of science and society, especially in genomics and proteomics. Here we consider multiple strings as input sequence data and substrings as patterns. In the real world, usually a large set of patterns could be discovered yet many of them are redundant, thus degrading the output quality.
This paper improves the output quality by removing two types of redundant patterns. First, the notion of delta tolerance closed itemset is employed to remove redundant patterns that are not delta closed. Second, the concept of statistically induced patterns is proposed to capture redundant patterns which seem to be statistically significant yet their significance is induced by their strong significant subpatterns.
It is computationally intense to mine these nonredundant patterns (delta closed patterns and noninduced patterns). To efficiently discover these patterns in very large sequence data, two efficient algorithms have been developed through innovative use of suffix tree.
Three sets of experiments were conducted to evaluate their performance. They render excellent results when applying to genomics. The experiments confirm that the proposed algorithms are efficient and that they produce a relatively small set of patterns which reveal interesting information in the sequences


*------------*------------*------------*------------*------------*------------*

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
DATA MINING FOR XML QUERY-ANSWERING SUPPORT
Extracting information from semistructured documents is a very hard task, and is going to become more and more critical as the amount of digital information available on the Internet grows. Indeed, documents are often so large that the data set returned as answer to a query may be too big to convey interpretable knowledge.
In this paper, we describe an approach based on Tree-Based Association Rules (TARs): mined rules, which provide approximate, intensional information on both the structure and the contents of Extensible Markup Language (XML) documents, and can be stored in XML format as well.
This mined knowledge is later used to provide:
1)    a concise idea—the gist—of both the structure and the content of the XML document and
2)    quick, approximate answers to queries. In this paper, we focus on the second feature. A prototype system and experimental results demonstrate the effectiveness of the approach


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
CLUSTERING WITH MULTIVIEWPOINT-BASED SIMILARITY MEASURE
All clustering methods have to assume some cluster relationship among the data objects that they are applied on. Similarity between a pair of objects can be defined either explicitly or implicitly. In this paper, we introduce a novel multiviewpoint-based similarity measure and two related clustering methods.
The major difference between a traditional dissimilarity/similarity measure and ours is that the former uses only a single viewpoint, which is the origin, while the latter utilizes many different viewpoints, which are objects assumed to not be in the same cluster with the two objects being measured. Using multiple viewpoints, more informative assessment of similarity could be achieved. Theoretical analysis and empirical study are conducted to support this claim.
Two criterion functions for document clustering are proposed based on this new measure. We compare them with several well-known clustering algorithms that use other popular similarity measures on various document collections to verify the advantages of our proposal.


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
CONTINUOUS DETOUR QUERIES IN SPATIAL NETWORKS
We study the problem of finding the shortest route between two locations that includes a stopover of a given type. An example scenario of this problem is given as follows: “On the way to Bob’s place, Alice searches for a nearby take-away Italian restaurant to buy a pizza.” Assuming that Alice is interested in minimizing the total trip distance, this scenario can be modeled as a query where the current Alice’s location (start) and Bob’s place (destination) function as query points.
Based on these two query points, we find the minimum detour object (MDO), i.e., a stopover that minimizes the sum of the distances: 1) from the start to the stopover, and 2) from the stopover to the destination. In a realistic location-based application environment, a user can be indecisive about committing to a particular detour option.
The user may wish to browse multiple ( k) MDOs before making a decision. Furthermore, when a user moves, thekMDOresults at one location may become obsolete. We propose a method for continuous detour query (CDQ) processing based on incremental construction of a shortest path tree.
We conducted experimental studies to compare the performance of our proposed method against two methods derived from existing k-nearest neighbor querying techniques using real road-network data sets. Experimental results show that our proposed method significantly outperforms the two competitive techniques


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
ANOMALY DETECTION FOR DISCRETE SEQUENCES: A SURVEY
This survey attempts to provide a comprehensive and structured overview of the existing research for the problem of detecting anomalies in discrete/symbolic sequences. The objective is to provide a global understanding of the sequence anomaly detection problem and how existing techniques relate to each other.
The key contribution of this survey is the classification of the existing research into three distinct categories, based on the problem formulation that they are trying to solve. These problem formulations are: 1) identifying anomalous sequences with respect to a database of normal sequences; 2) identifying an anomalous subsequence within a long sequence; and 3) identifying a pattern in a sequence whose frequency of occurrence is anomalous.
We show how each of these problem formulations is characteristically distinct from each other and discuss their relevance in various application domains. We review techniques from many disparate and disconnected application domains that address each of these formulations.
Within each problem formulation, we group techniques into categories based on the nature of the underlying algorithm. For each category, we provide a basic anomaly detection technique, and show how the existing techniques are variants of the basic technique.
This approach shows how different techniques within a category are related or different from each other. Our categorization reveals new variants and combinations that have not been investigated before for anomaly detection.
We also provide a discussion of relative strengths and weaknesses of different techniques. We show how techniques developed for one problem formulation can be adapted to solve a different formulation, thereby providing several novel adaptations to solve the different problem formulations. We also highlight the applicability of the techniques that handle discrete sequences to other related areas such as online anomaly detection and time series anomaly detection


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
ADDING TEMPORAL CONSTRAINTS TO XML SCHEMA
If past versions of XML documents are retained, what of the various integrity constraints defined in XML Schema on those documents?
This paper describes how to interpret such constraints as sequenced constraints, applicable at each point in time. We also consider how to add new variants that apply across time, so-called nonsequenced constraints.
Our approach supports temporal documents that vary over both valid and transaction time, whose schema can vary over transaction time. We do this by replacing the schema with a (possibly time-varying) temporal schema and replacing the document with a temporal document, both of which are upward compatible with conventional XML and with conventional tools like XMLLINT , which we have extended to support the temporal constraints introduced here


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
A UNIFIED PROBABILISTIC FRAMEWORK FOR NAME DISAMBIGUATION IN DIGITAL LIBRARY
Despite years of research, the name ambiguity problem remains largely unresolved. Outstanding issues include how to capture all information for name disambiguation in a unified approach, and how to determine the number of peopleK in the disambiguation process.
In this paper, we formalize the problem in a unified probabilistic framework, which incorporates both attributes and relationships. Specifically, we define a disambiguation objective function for the problem and propose a two-step parameter estimation algorithm.
We also investigate a dynamic approach for estimating the number of people K . Experiments show that our proposed framework significantly outperforms four baseline methods of using clustering algorithms and two other previous methods. Experiments also indicate that the number K automatically found by our method is close to the actual number.


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
A PERFORMANCE ANOMALY DETECTION AND ANALYSIS FRAMEWORK FOR DBMS DEVELOPMENT
Detecting performance anomalies and finding their root causes are tedious tasks requiring much manual work. Functionality enhancements in DBMS development as in most software development often introduce performance problems in addition to bugs.
To detect the problems as soon as they are introduced, which often happens during the early phases of a development cycle, we adopt performance regression testing early in the process. In this paper, we describe a framework that we developed to manage performance anomalies after establishing a set of conditions for a problem to be considered an anomaly.
The framework uses Statistical Process Control (SPC) charts to detect performance anomalies and differential profiling to identify their root causes. By automating the tasks within the framework we were able to remove most of the manual overhead in detecting anomalies and reduce the analysis time for identifying the root causes by about 90 percent in most cases.
The tools developed and deployed based on the framework allow us continuous, automated daily monitoring of performance in addition to the usual functionality monitoring in our DBMS development


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
A LOOK-AHEAD APPROACH TO SECURE MULTIPARTY PROTOCOLS
Secure multiparty protocols have been proposed to enable noncolluding parties to cooperate without a trusted server. Even though such protocols prevent information disclosure other than the objective function, they are quite costly in computation and communication.
The high overhead motivates parties to estimate the utility that can be achieved as a result of the protocol beforehand. In this paper, we propose a look-ahead approach, specifically for secure multiparty protocols to achieve distributed k -anonymity, which helps parties to decide if the utility benefit from the protocol is within an acceptable range before initiating the protocol.
The look-ahead operation is highly localized and its accuracy depends on the amount of information the parties are willing to share. Experimental results show the effectiveness of the proposed methods


*------------*------------*------------*------------*------------*------------*

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
A KNOWLEDGE-DRIVEN APPROACH TO ACTIVITY RECOGNITION IN SMART HOMES
This paper introduces a knowledge-driven approach to real-time, continuous activity recognition based on multisensor data streams in smart homes. The approach goes beyond the traditional data-centric methods for activity recognition in three ways.
First, it makes extensive use of domain knowledge in the life cycle of activity recognition. Second, it uses ontologies for explicit context and activity modeling and representation. Third and finally, it exploits semantic reasoning and classification for activity inferencing, thus enabling both coarse-grained and fine-grained activity recognition.
In this paper, we analyze the characteristics of smart homes and Activities of Daily Living (ADL) upon which we built both context and ADL ontologies. We present a generic system architecture for the proposed knowledge-driven approach and describe the underlying ontology-based recognition process. Special emphasis is placed on semantic subsumption reasoning algorithms for activity recognition.
The proposed approach has been implemented in a function-rich software system, which was deployed in a smart home research laboratory. We evaluated the proposed approach and the developed system through extensive experiments involving a number of various ADL use scenarios. An average activity recognition rate of 94.44 percent was achieved and the average recognition runtime per recognition operation was measured as 2.5 seconds.


*------------*------------*------------*------------*------------*------------*

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
A FRAMEWORK FOR PERSONAL MOBILE COMMERCE PATTERN MINING AND PREDICTION
Due to a wide range of potential applications, research on mobile commerce has received a lot of interests from both of the industry and academia. Among them, one of the active topic areas is the mining and prediction of users’ mobile commerce behaviors such as their movements and purchase transactions.
In this paper, we propose a novel framework, calledMobile Commerce Explorer (MCE), for mining and prediction of mobile users’ movements and purchase transactions under the context of mobile commerce.
The MCEframework consists of three major components: 1) Similarity Inference Model ðSIMÞ for measuring the similarities among stores and items, which are two basic mobile commerce entities considered in this paper; 2)Personal Mobile Commerce Pattern Mine ( PMCP-Mine) algorithm for efficient discovery of mobile users’ Personal Mobile Commerce Patterns ( PMCPs); and 3) Mobile Commerce Behavior PredictorðMCBPÞ for prediction of possible mobile user behaviors.
To our best knowledge, this is the first work that facilitates mining and prediction of mobile users’ commerce behaviors in order to recommend stores and items previously unknown to a user. We perform an extensive experimental evaluation by simulation and show that our proposals produce excellent results


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
A DECISION-THEORETIC FRAMEWORK FOR NUMERICAL ATTRIBUTE VALUE RECONCILIATION
One of the major challenges of data integration is to resolve conflicting numerical attribute values caused by data heterogeneity.
In addressing this problem, existing approaches proposed in prior literature often ignore such data inconsistencies or resolve them in an ad hoc manner. In this study, we propose a decision-theoretical framework that resolves numerical value conflicts in a systematic manner.
The framework takes into consideration the consequences of incorrect numerical values and selects the value that minimizes the expected cost of errors for all data application problems under consideration. Experimental results show that significant savings can be achieved by adopting the proposed framework instead of ad hoc approaches


*------------*------------*------------*------------*------------*------------*

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
VISUAL ROLE MINING: A PICTURE IS WORTH A THOUSAND ROLES
This paper offers a new role engineering approach to Role-Based Access Control(RBAC), referred to as visual role mining . The key idea is to graphically represent user-permission assignments to enable quick analysis and elicitation of meaningful roles.
First, we formally define the problem by introducing a metric for the quality of the visualization. Then, we prove that finding the best representation according to the defined metric is a NP-hard problem. In turn, we propose two algorithms: ADVISER and EXTRACT.
The former is a heuristic used to best represent the user-permission assignments of a given set of roles. The latter is a fast probabilistic algorithm that, when used in conjunction with ADVISER, allows for a visual elicitation of roles even in absence of predefined roles.
Besides being rooted in sound theory, our proposal is supported by extensive simulations run over real data. Results confirm the quality of the proposal and demonstrate its viability in supporting role engineering decisions


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - KNOWLEDGE AND DATA ENGINEERING
SEGMENTATION AND SAMPLING OF MOVING OBJECT TRAJECTORIES BASED ON REPRESENTATIVENESS
Moving Object Databases (MOD), although ubiquitous, still call for methods that will be able to understand, search, analyze, and browse their spatiotemporal content. In this paper, we propose a method for trajectory segmentation and sampling based on the representativeness of the (sub)trajectories in the MOD.
In order to find the most representative subtrajectories, the following methodology is proposed. First, a novel global voting algorithm is performed, based on local density and trajectory similarity information. This method is applied for each segment of the trajectory, forming a local trajectory descriptor that represents line segment representativeness.
The sequence of this descriptor over a trajectory gives the voting signal of the trajectory, where high values correspond to the most representative parts. Then, a novel segmentation algorithm is applied on this signal that automatically estimates the number of partitions and the partition borders, identifying homogenous partitions concerning their representativeness.
Finally, a sampling method over the resulting segments yields the most representative subtrajectories in the MOD. Our experimental results in synthetic and real MOD verify the effectiveness of the proposed scheme, also in comparison with other  sampling techniques.


*------------*------------*------------*------------*------------*------------*

DOMAIN - KNOWLEDGE AND DATA ENGINEERING
SATURN: RANGE QUERIES, LOAD BALANCING AND FAULT TOLERANCE IN DHT DATA SYSTEMS
In this paper, we present Saturn, an overlay architecture for large-scale data networks maintained over Distributed Hash Tables (DHTs) that efficiently processes range queries and ensures access load balancing and fault-tolerance.
Placing consecutive data values in neighboring peers is desirable in DHTs since it accelerates range query processing; however, such a placement is highly susceptible to load imbalances.
At the same time, DHTs may be susceptible to node departures/failures and high data availability and fault tolerance are significant issues. Saturn deals effectively with these problems through the introduction of a novel multiple ring, order-preserving architecture. The use of a novel order-preserving hash function ensures fast range query processing.
Replication across and within data rings (termed vertical and horizontal replication) forms the foundation over which our mechanisms are developed, ensuring query load balancing and fault tolerance, respectively.
Our detailed experimentation study shows strong gains in range query processing efficiency, access load balancing, and fault tolerance, with low replication overheads.
The significance of Saturn is not only that it effectively tackles all three issues together—i.e., supporting range queries, ensuring load balancing, and providing fault tolerance over DHTs—but also that it can be applied on top of any order-preserving DHT enabling it to dynamically handle replication and, thus, to trade off replication costs for fair load distribution and fault tolerance.

2012, Java IEEE Project Abstracts - Part 1

JAVA IEEE 2012 PROJECT ABSTRACTS

DOMAIN – CLOUD COMPUTING
SLA-BASED OPTIMIZATION OF POWER AND MIGRATION COST IN CLOUD COMPUTING
Cloud computing systems (or hosting datacenters) have attracted a lot of attention in recent years. Utility computing, reliable data storage, and infrastructure-independent computing are example applications of such systems. Electrical energy cost of a cloud computing system is a strong function of the consolidation and migration techniques used to assign incoming clients to existing servers.
Moreover, each client typically has a service level agreement (SLA), which specifies constraints on performance and/or quality of service that it receives from the system. These constraints result in a basic trade-off between the total energy cost and client satisfaction in the system. In this paper, a resource allocation problem is considered that aims to minimize the total energy cost of cloud computing system while meeting the specified client-level SLAs in a probabilistic sense.
The cloud computing system pays penalty for the percentage of a client’s requests that do not meet a specified upper bound on their service time. An efficient heuristic algorithm based on convex optimization and dynamic programming is presented to solve the aforesaid resource allocation problem. Simulation results demonstrate the effectiveness of the proposed algorithm compared to previous work
 

*------------*------------*------------*------------*------------*------------*

DOMAIN – CLOUD COMPUTING
SCALABLE JOIN QUERIES IN CLOUD DATA STORES
Cloud data stores provide scalability and high avail-ability properties for Web applications, but do not support complex queries such as joins. Web application developers must therefore design their programs according to the peculiarities of NoSQL data stores rather than established software engineering practice.
This results in complex and error-prone code, especially with respect to subtle issues such as data consistency under concurrent read/write queries. We present join query support in CloudTPS, a middleware layer which stands between a Web application and its data store.
The system enforces strong data consistency and scales linearly under a demanding workload com-posed of join queries and read-write transactions. In large-scale deployments, CloudTPS outperforms replicated PostgreSQL up to three times


*------------*------------*------------*------------*------------*------------*

DOMAIN – CLOUD COMPUTING
REVENUE MANAGEMENT FOR CLOUD PROVIDERS - A POLICY-BASED APPROACH UNDER STOCHASTIC DEMAND
Competition on global markets forces enterprises to make  use  of  new  applications,  reduce  process  times and cut the costs of their IT-infrastructure. To achieve this  commercial  users  harness  the  benefits  of  Cloud computing  as  they  can  outsource  data  storage  and computation facilities, while saving on the overall cost of  IT  ownership. 
Cloud services can  be  accessed  on demand at any time in a pay-as-you-go manner. However, it is this flexibility of customers that results in great challenges for Cloud service providers. They need to maximize their revenue in the presence of limited fixed resources  and  uncertainty  regarding upcoming  service  requests  while  contemporaneously considering  their  SLAs. 
To address  this  challenge  we introduce  models  that  can  predict  revenue  and utilization  achieved  with  admission-control  policy based  revenue  management  under  stochastic  demand. This allows providers to significantly increase revenue by choosing the optimum policy.


*------------*------------*------------*------------*------------*------------*
 
DOMAIN – CLOUD COMPUTING
PERFORMANCE EVALUATION OF MULTIMEDIA SERVICES OVER WIRELESS LAN USING ROUTING PROTOCOL OPTIMIZED LINK STATE ROUTING (OLSR)
Development of multimedia from year to year increasing with the growing support of computer networks such as wireless LAN (WLAN) as a media intermediary. The one of method is streaming multimedia delivery over the internet from the server to the client to respond to client requests to a video and audio contained in a computer network. Factors that affect the streaming is bandwidth.
These factors may cause the process stream is often disrupted when there is not enough bandwidth so that resulted in the loss and delay in delivery. To reduce the occurrence of loss and delay, a routing protocol is needed that can support multimedia service quality package that will be passed on wireless LAN networks.
In this paper will be evaluated on the WLAN performance multimedia services with the help of routing protocol Optimized Link State Routing (OLSR). 


 *------------*------------*------------*------------*------------*------------*
 
DOMAIN – CLOUD COMPUTING
PERFORMANCE ANALYSIS OF CLOUD COMPUTINGCENTERS USING M =G=M=MÞ R QUEUING SYSTEMS
Successful development of cloud computing paradigm necessitates accurate performance evaluation of cloud data centers.
As exact modeling of cloud centers is not feasible due to the nature of cloud centers and diversity of user requests, we describe a novel approximate analytical model for performance evaluation of cloud server farms and solve it to obtain accurate estimation of the complete probability distribution of the request response time and other important performance indicators.
The model allows cloud operators to determine the relationship between the number of servers and input buffer size, on one side, and the performance indicators such as mean number of tasks in the system, blocking probability, and probability that a task will obtain immediate service, on the other

 
*------------*------------*------------*------------*------------*------------*
 
DOMAIN – CLOUD COMPUTING
MINING CONCEPT DRIFTING NETWORK TRAFFIC IN CLOUD COMPUTING ENVIRONMENTS
Anomaly-based   network   Intrusion   Detection   Systems   (IDS) model patterns of normal activity and detect novel network attacks. However, these systems depend on the availability of the systems normal traffic pattern profile. But the statistical fingerprint of the normal traffic pattern can change and shift over a period of time due to changes in operational or user activity at the networked site or even system updates.
The changes in normal traffic patterns  over  time  lead  to  concept  drift.  Some  changes  can  be  temporal, cyclical and can be short-lived or they can last for longer periods of time. Depending on a number of factors the speed at which the change in traffic patterns occurs can also be variable, ranging from near instantaneous to the change occurring over the span of numerous months.
These changes in traffic patterns are a cause of concern for IDSs as they can lead to a significant increase   in   false   positive   rates,   thereby   reducing   the   overall   system performance . In order to improve the reliability of the IDS, there is a need for an automated mechanism to detect valid traffic changes and avoid inappropriate ad hoc responses.
ROC curves have historically been used to evaluate the accuracy of IDSs. ROC curves generated using fixed, time- invariant classification thresholds do not characterize the best accuracy that an IDS can achieve in presence of concept-drifting network traffic.
In this paper,  we  present  integrated  supervised  machine  learning  and  control theoretic model (especially for clouds) for detecting concept drift in network traffic patterns.
The model comprises of an online support vector machine based classifier (incremental anomaly based detection), a Kullback- Leibler divergence based relative entropy measurement scheme (quantifying concept drift) and feedback control engine (adapting ROC thresholding). In our proposed  system,  any  intrusion  activity  will  cause  significant  variations, thereby  causing  a  large  error,  while  a  minor  aberration  in  the  variations (concept drift) will not be immediately reported as alert



*------------*------------*------------*------------*------------*------------*
 
DOMAIN – CLOUD COMPUTING
FRAMEWORK ON LARGE PUBLIC SECTOR IMPLEMENTATION OF CLOUD COMPUTING
Cloud computing enables IT systems to be scalable and elastic. One significant advantage of it is users no longer need to determine their exact computing resource requirements upfront.  Instead, they request computing resources as required, on-demand.
This paper is written to introduce a framework specific for large public sector entities on how to migrate to cloud computing. This paper can then be also be a reference for the Organizations to overcome its limitations and to convince their stakeholders to further implement various types of Cloud Computing service models.
 

*------------*------------*------------*------------*------------*------------*

DOMAIN – CLOUD COMPUTING
ENHANCED ENERGY-EFFICIENT SCHEDULING FOR PARALLEL APPLICATIONS IN CLOUD
Energy consumption has become a major concern to the widespread deployment of cloud data centers. The growing importance for parallel applications in the cloud introduces significant challenges in reducing the power consumption drawn by the hosted servers.
In this paper, we propose an enhanced energy-efficient scheduling (EES) algorithm to reduce energy consumption while meeting the performance-based service level agreement (SLA). Since slacking non-critical jobs can achieve significant power saving, we exploit the slack room and allocate them in a global manner in our schedule.
Using random generated and real-life application workflows, our results demonstrate that EES is able to reduce considerable energy consumption while still meeting SLA


*------------*------------*------------*------------*------------*------------*
 
DOMAIN – CLOUD COMPUTING
CLOUD CHAMBER: A SELF-ORGANIZING FACILITY TO CREATE, EXERCISE, AND EXAMINE SOFTWARE AS A SERVICE TENANTS
Cloud Chamber is a testbed for understanding how web services behave as tenants in a Software as a Service (SaaS) environment.  This work describes the Cloud Chamber testbed to investigate autonomic resource management of web services in a cloud environment. 
Cloud Chamber is a virtualized environment which provides web servers as services, facilities to apply loads to the tenant services, algorithms for autonomic organization and reconfiguration of service assignments as demand changes, and sensors to capture resource consumption and performance metrics.
The testbed inserts sensors into web servers to collect the resource utilization of CPU cycles, memory consumption, and bandwidth consumption of the individual web services, the web server, and the operating system.  This high resolution performance data generates profiles of the resource usage of each web service and the resource availability of each server. 
The testbed, as described in this work, utilizes these profiles to efficiently place services on servers, thus balancing resource consumption, service performance, and service availability.  Once services have been placed, the testbed monitors changes such as traffic levels, server ch urn, and the introduction of new services. 
The information gathered is used to calculate configurations of service placement which better meet the changing requirements of the environment


*------------*------------*------------*------------*------------*------------*
 
DOMAIN – CLOUD COMPUTING
A TIME-SERIES PATTERN BASED NOISE GENERATION STRATEGY FOR PRIVACY PROTECTION IN CLOUD COMPUTING
Cloud computing promises an open environment where customers can deploy IT services in a pay-as-you-go fashion while saving huge capital investment in their own IT infrastructure. Due to the openness, various malicious service providers may exist. Such service providers may record service information in a service process from a customer and then collectively deduce the customer’s private information.
Therefore, from the perspective of cloud computing security, there is a need to take special actions to protect privacy at client sides. Noise obfuscation is an effective approach in this regard by utilising noise data.
For instance, it generates and injects noise service requests into real customer service requests so that service providers would not be able to distinguish which requests are real ones if their occurrence probabilities are about the same. However, existing typical noise generation strategies mainly focus on the entire service usage period to achieve about the same final occurrence probabilities of service requests.
In fact, such probabilities can fluctuate in a time interval such as three months and may significantly differ than other time intervals. In this case, service providers may still be able to deduce the customers’ privacy from a specific time interval although unlikely from the overall period.
That is to say, the existing typical noise generation strategies could fail to protect customers’ privacy for local time intervals. To address this problem, we develop a novel time-series pattern based noise generation strategy.
Firstly, we analyse previous probability fluctuations and propose a group of time-series patterns for predicting future fluctuated probabilities. Then, based on these patterns, we present our strategy by forecasting future occurrence probabilities of real service requests and generating noise requests to reach about the same final probabilities in the next time interval.
The simulation evaluation demonstrates that our strategy can cope with these fluctuations to significantly improve the effectiveness of customers’ privacy protection


*------------*------------*------------*------------*------------*------------*
 
DOMAIN – CLOUD COMPUTING
A PRICE- AND-TIME-SLOT-NEGOTIATION MECHANISM FOR CLOUD SERVICE RESERVATIONS
When making reservations for Cloud services, con-sumers and providers need to establish service-level agreements through negotiation. Whereas it is essential for both a consumer and a provider to reach an agreement on the price of a service and when to use the service, to date, there is little or no negotiation support for both price and time-slot negotiations (PTNs) for Cloud service reservations.
This paper presents a multi-issue negotiation mechanism to facilitate the following: 1)PTNs between Cloud agents and 2) tradeoff between price and time-slot utilities. Unlike many existing negotiation mechanisms in which a negotiation agent can only make one proposal at a time, agents in this work are designed to concurrently make multiple proposals in a negotiation round that generate the same aggregated utility, differing only in terms of individual price and time-slot utilities.
Another novelty of this work is formulating a novel time-slot utility function that characterizes preferences for different time slots. These ideas are implemented in an agent-based Cloud testbed.
Using the test-bed, experiments were carried out to compare this work with related approaches. Empirical results show that PTN agents reach faster agreements and achieve higher utilities than other related approaches. A case study was carried out to demonstrate the application of the PTN mechanism for pricing Cloud resources


*------------*------------*------------*------------*------------*------------*

DOMAIN – CLOUD COMPUTING
A CLOUD INFRASTRUCTURE FOR OPTIMIZATION OF A MASSIVE PARALLEL SEQUENCING WORKFLOW
Massive Parallel Sequencing is a term used to describe several revolutionary approaches to DNA sequencing, the so-called Next Generation Sequencing technologies.
These technologies generate millions of short sequence fragments in a single run and can be used to measure levels of gene expression and to identify novel splice variants of genes allowing more accurate analysis. The proposed solution provides novelty on two fields, firstly an optimization of the read mapping
algorithm has been designed, in order to parallelize processes, secondly an implementation of an architecture that consists of a Grid platform, composed of physical nodes, a Virtual platform, composed of virtual nodes set up on demand, and a scheduler that allows to integrate the two platforms


*------------*------------*------------*------------*------------*------------*
 
DOMAIN – CLOUD COMPUTING
TIME AND COST SENSITIVE DATA-INTENSIVE COMPUTING ON HYBRID CLOUDS
Purpose-built clusters permeate many of today’s or-ganizations, providing both large-scale data storage and comput-ing. Within local clusters, competition for resources complicates applications with deadlines. However, given the emergence of the cloud’s pay-as-you-go model, users are increasingly storing portions of their data remotely and allocating compute nodes on-demand to meet deadlines.
This scenario gives rise to a hybrid cloud, where data stored across local and cloud resources may be processed over both environments. While a hybrid execution environment may be used to meet time constraints, users must now attend to the costs associated with data storage, data transfer, and node allocation time on the cloud.
In this paper, we describe a modeling-driven resource allocation framework to support both time and cost sensitive execution for data-intensive applications executed in a hybrid cloud setting. We evaluate our framework using two data-intensive applications and a number of time and cost constraints.
Our experimental results show that our system is capable of meeting execution deadlines within a 3.6% margin of error. Similarly, cost constraints are met within a 1.2% margin of error, while minimizing the application’s execution time


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - DEPENDABLE AND SECURE COMPUTING
LARGE MARGIN GAUSSIAN MIXTURE MODELS WITH DIFFERENTIAL PRIVACY
As increasing amounts of sensitive personal information is aggregated into data repositories, it has become important to develop mechanisms for processing the data without revealing information about individual data instances.
The differential privacy model provides a framework for the development and theoretical analysis of such mechanisms. In this paper, we propose an algorithm for learning a discriminatively trained multiclass Gaussian mixture model-based classifier that preserves differential privacy using a large margin loss function with a perturbed regularization term.
We present a theoretical upper bound on the excess risk of the classifier introduced by the perturbation


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - DEPENDABLE AND SECURE COMPUTING
ITERATIVE TRUST AND REPUTATION MANAGEMENT USING BELIEF PROPAGATION
In this paper, we introduce the first application of the belief propagation algorithm in the design and evaluation of trust and reputation management systems. We approach the reputation management problem as an inference problem and describe it as computing marginal likelihood distributions from complicated global functions of many variables.
However, we observe that computing the marginal probability functions is computationally prohibitive for large-scale reputation systems. Therefore, we propose to utilize the belief propagation algorithm to efficiently (in linear complexity) compute these marginal probability distributions; resulting a fully iterative probabilistic and belief propagation-based approach (referred to as BP-ITRM). BP-ITRM models the reputation system on a factor graph. By using a factor graph, we obtain a qualitative representation of how the consumers (buyers) and service providers (sellers) are related on a graphical structure.
Further, by using such a factor graph, the global functions factor into products of simpler local functions, each of which depends on a subset of the variables. Then, we compute the marginal probability distribution functions of the variables representing the reputation values (of the service providers) by message passing between nodes in the graph. We show that BP-ITRM is reliable in filtering out malicious/unreliable reports.
We provide a detailed evaluation of BP-ITRM via analysis and computer simulations. We prove that BP-ITRM iteratively reduces the error in the reputation values of service providers due to the malicious raters with a high probability. Further, we observe that this probability drops suddenly if a particular fraction of malicious raters is exceeded, which introduces a threshold property to the scheme.
Furthermore, comparison of BP-ITRM with some well-known and commonly used reputation management techniques (e.g., Averaging Scheme, Bayesian Approach, and Cluster Filtering) indicates the superiority of the proposed scheme in terms of robustness against attacks (e.g., ballot stuffing, bad mouthing). Finally, BP-ITRM introduces a linear complexity in the number of service providers and consumers, far exceeding the efficiency of other schemes


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - DEPENDABLE AND SECURE COMPUTING
INCENTIVE COMPATIBLE PRIVACY-PRESERVING DISTRIBUTED CLASSIFICATION
In this paper, we propose game-theoretic mechanisms to encourage truthful data sharing for distributed data mining. One proposed mechanism uses the classic Vickrey-Clarke-Groves (VCG) mechanism, and the other relies on the Shapley value.
Neither relies on the ability to verify the data of the parties participating in the distributed data mining protocol. Instead, we incentivize truth telling based solely on the data mining result. This is especially useful for situations where privacy concerns prevent verification of the data.
Under reasonable assumptions, we prove that these mechanisms are incentive compatible for distributed data mining. In addition, through extensive experimentation, we show that they are applicable in practice


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - DEPENDABLE AND SECURE COMPUTING
ENSURING DISTRIBUTED ACCOUNTABILITY FOR DATA SHARING IN THE CLOUD
Cloud computing enables highly scalable services to be easily consumed over the Internet on an as-needed basis. A major feature of the cloud services is that users’ data are usually processed remotely in unknown machines that users do not own or operate.
While enjoying the convenience brought by this new emerging technology, users’ fears of losing control of their own data (particularly, financial and health data) can become a significant barrier to the wide adoption of cloud services.
To address this problem, in this paper, we propose a novel highly decentralized information accountability framework to keep track of the actual usage of the users’ data in the cloud. In particular, we propose an object-centered approach that enables enclosing our logging mechanism together with users’ data and policies.
We leverage the JAR programmable capabilities to both create a dynamic and traveling object, and to ensure that any access to users’ data will trigger authentication and automated logging local to the JARs. To strengthen user’s control, we also provide distributed auditing mechanisms. We provide extensive experimental studies that demonstrate the efficiency and effectiveness of the proposed approaches.


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - DEPENDABLE AND SECURE COMPUTING
ENHANCED PRIVACY ID: A DIRECT ANONYMOUS ATTESTATION SCHEME WITH ENHANCED REVOCATION CAPABILITIES
Direct Anonymous Attestation (DAA) is a scheme that enables the remote authentication of a Trusted Platform Module (TPM) while preserving the user’s privacy. A TPM can prove to a remote party that it is a valid TPM without revealing its identity and without linkability.
In the DAA scheme, a TPM can be revoked only if the DAA private key in the hardware has been extracted and published widely so that verifiers obtain the corrupted private key. If the unlinkability requirement is relaxed, a TPM suspected of being compromised can be revoked even if the private key is not known.
However, with the full unlinkability requirement intact, if a TPM has been compromised but its private key has not been distributed to verifiers, the TPM cannot be revoked. Furthermore, a TPM cannot be revoked from the issuer, if the TPM is found to be compromised after the DAA issuing has occurred. In this paper, we present a new DAA scheme called Enhanced Privacy ID (EPID) scheme that addresses the above limitations.
While still providing unlinkability, our scheme provides a method to revoke a TPM even if the TPM private key is unknown. This expanded revocation property makes the scheme useful for other applications such as for driver’s license. Our EPID scheme is efficient and provably secure in the same security model as DAA, i.e., in the random oracle model under the strong RSA assumption and the decisional Diffie-Hellman assumption


*------------*------------*------------*------------*------------*------------*

DOMAIN - DEPENDABLE AND SECURE COMPUTING
ENFORCING MANDATORY ACCESS CONTROL IN COMMODITY OS TO DISABLE MALWARE
Enforcing a practical Mandatory Access Control (MAC) in a commercial operating system to tackle malware problem is a grand challenge but also a promising approach. The firmest barriers to apply MAC to defeat malware programs are the incompatible and unusable problems in existing MAC systems.
To address these issues, we manually analyze 2,600 malware samples one by one and two types of MAC enforced operating systems, and then design a novel MAC enforcement approach, named Tracer, which incorporates intrusion detection and tracing in a commercial operating system. The approach conceptually consists of three actions: detecting, tracing, and restricting suspected intruders.
One novelty is that it leverages light-weight intrusion detection and tracing techniques to automate security label configuration that is widely acknowledged as a tough issue when applying a MAC system in practice. The other is that, rather than restricting information flow as a traditional MAC does, it traces intruders and restricts only their critical malware behaviors, where intruders represent processes and executables that are potential agents of a remote attacker.
Our prototyping and experiments on Windows show that Tracer can effectively defeat all malware samples tested via blocking malware behaviors while not causing a significant compatibility problem


*------------*------------*------------*------------*------------*------------*

DOMAIN - DEPENDABLE AND SECURE COMPUTING
DOUBLEGUARD: DETECTING INTRUSIONS IN MULTITIER WEB APPLICATIONS
Internet services and applications have become an inextricable part of daily life, enabling communication and the management of personal information from anywhere. To accommodate this increase in application and data complexity, web services have moved to a multitiered design wherein the webserver runs the application front-end logic and data are outsourced to a database or file server.
In this paper, we present DoubleGuard, an IDS system that models the network behavior of user sessions across both the front-end webserver and the back-end database. By monitoring both web and subsequent database requests, we are able to ferret out attacks that an independent IDS would not be able to identify.
Furthermore, we quantify the limitations of any multitier IDS in terms of training sessions and functionality coverage. We implemented DoubleGuard using an Apache webserver with MySQL and lightweight virtualization. We then collected and processed real-world traffic over a 15-day period of system deployment in both dynamic and static web applications.
Finally, using DoubleGuard, we were able to expose a wide range of attacks with 100 percent accuracy while maintaining 0 percent false positives for static web services and 0.6 percent false positives for dynamic web services


*------------*------------*------------*------------*------------*------------*

DOMAIN - DEPENDABLE AND SECURE COMPUTING
DETECTING ANOMALOUS INSIDERS IN COLLABORATIVE INFORMATION SYSTEMS
Collaborative information systems (CISs) are deployed within a diverse array of environments that manage sensitive information. Current security mechanisms detect insider threats, but they are ill-suited to monitor systems in which users function in dynamic teams.
In this paper, we introduce the community anomaly detection system (CADS), an unsupervised learning framework to detect insider threats based on the access logs of collaborative environments.
The framework is based on the observation that typical CIS users tend to form community structures based on the subjects accessed (e.g., patients’ records viewed by healthcare providers). CADS consists of two components: 1) relational pattern extraction, which derives community structures and 2) anomaly prediction, which leverages a statistical model to determine when users have sufficiently deviated from communities.
We further extend CADS into MetaCADS to account for the semantics of subjects (e.g., patients’ diagnoses). To empirically evaluate the framework, we perform an assessment with three months of access logs from a real electronic health record (EHR) system in a large medical center.
The results illustrate our models exhibit significant performance gains over state-of-the-art competitors. When the number of illicit users is low, MetaCADS is the best model, but as the number grows, commonly accessed semantics lead to hiding in a crowd, such that CADS is more prudent


*------------*------------*------------*------------*------------*------------*

DOMAIN - DEPENDABLE AND SECURE COMPUTING
AUTOMATED SECURITY TEST GENERATION WITH FORMAL THREAT MODELS
Security attacks typically result from unintended behaviors or invalid inputs. Security testing is labor intensive because a real-world program usually has too many invalid inputs. It is highly desirable to automate or partially automate security-testing process.
This paper presents an approach to automated generation of security tests by using formal threat models represented as Predicate/ Transition nets. It generates all attack paths, i.e., security tests, from a threat model and converts them into executable test code according to the given Model-Implementation Mapping (MIM) specification.
We have applied this approach to two real-world systems, Magento (a web-based shopping system being used by many online stores) and FileZilla Server (a popular FTP server implementation in C++). Threat models are built systematically by examining all potential STRIDE (spoofing identity, tampering with data, repudiation, information disclosure, denial of service, and elevation of privilege) threats to system functions. The security tests generated from these models have found multiple security risks in each system.
The test code for most of the security tests can be generated and executed automatically. To further evaluate the vulnerability detection capability of the testing approach, the security tests have been applied to a number of security mutants where vulnerabilities are injected deliberately.
The mutants are created according to the common vulnerabilities in C++ and web applications. Our experiments show that the security tests have killed the majority of the mutants


*------------*------------*------------*------------*------------*------------*

DOMAIN - DEPENDABLE AND SECURE COMPUTING
A LEARNING-BASED APPROACH TO REACTIVE SECURITY
Despite the conventional wisdom that proactive security is superior to reactive security, we show that reactive security can be competitive with proactive security as long as the reactive defender learns from past attacks instead of myopically overreacting to the last attack.
Our game-theoretic model follows common practice in the security literature by making worst case assumptions about the attacker: we grant the attacker complete knowledge of the defender’s strategy and do not require the attacker to act rationally.
In this model, we bound the competitive ratio between a reactive defense algorithm (which is inspired by online learning theory) and the best fixed proactive defense. Additionally, we show that, unlike proactive defenses, this reactive strategy is robust to a lack of information about the attacker’s incentives and knowledge


*------------*------------*------------*------------*------------*------------*

DOMAIN - DEPENDABLE AND SECURE COMPUTING
SECURE FAILURE DETECTION AND CONSENSUS IN TRUSTEDPALS
We present a modular redesign ofTrustedPals, a smart card-based security framework for solving Secure Multiparty Computation (SMC). Originally, TrustedPals assumed a synchronous network setting and allowed to reduce SMC to the problem of fault-tolerant consensus among smart cards.
We explore how to make TrustedPals applicable in environments with less synchrony and show how it can be used to solve asynchronousSMC. Within the redesign we investigate the problem of solving consensus in a general omission failure model augmented with failure detectors.
To this end, we give novel definitions of both consensus and the class  Pof failure detectors in the omission model, which we call  Pð omÞ, and show how to implement  Pð omÞ and have consensus in such a system with very weak synchrony assumptions.
The integration of failure detection and consensus into the TrustedPals framework uses tools from privacy enhancing techniques such as message padding and dummy traffic


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - DEPENDABLE AND SECURE COMPUTING
RESILIENT AUTHENTICATED EXECUTION OF CRITICAL APPLICATIONS IN UNTRUSTED ENVIRONMENTS
Modern computer systems are built on a foundation of software components from a variety of vendors. While critical applications may undergo extensive testing and evaluation procedures, the heterogeneity of software sources threatens the integrity of the execution environment for these trusted programs.
For instance, if an attacker can combine an application exploit with a privilege escalation vulnerability, the operating system (OS) can become corrupted. Alternatively, a malicious or faulty device driver running with kernel privileges could threaten the application. While the importance of ensuring application integrity has been studied in prior work, proposed solutions immediately terminate the application once corruption is detected.
Although, this approach is sufficient for some cases, it is undesirable for many critical applications. In order to overcome this shortcoming, we have explored techniques for leveraging a trusted virtual machine monitor (VMM) to observe the application and potentially repair damage that occurs.
In this paper, we describe our system design, which leverages efficient coding and authentication schemes, and we present the details of our prototype implementation to quantify the overhead of our approach. Our work shows that it is feasible to build a resilient execution environment, even in the presence of a corrupted OS kernel, with a reasonable amount of storage and performance overhead.


*------------*------------*------------*------------*------------*------------*

DOMAIN - DEPENDABLE AND SECURE COMPUTING
ON PRIVACY OF ENCRYPTED SPEECH COMMUNICATIONS
Silence suppression, an essential feature of speech communications over the Internet, saves bandwidth by disabling voice packet transmissions when silence is detected. However, silence suppression enables an adversary to recover talk patterns from packet timing.
In this paper, we investigate privacy leakage through the silence suppression feature. More specifically, we propose a new class of traffic analysis attacks to encrypted speech communications with the goal of detecting speakers of encrypted speech communications. These attacks are based on packet timing information only and the attacks can detect speakers of speech communications made with different codecs.
We evaluate the proposed attacks with extensive experiments over different type of networks including commercial anonymity networks and campus networks. The experiments show that the proposed traffic analysis attacks can detect speakers of encrypted speech communications with high accuracy based on traces of 15 minutes long on average


*------------*------------*------------*------------*------------*------------*
 
DOMAIN - DEPENDABLE AND SECURE COMPUTING
MITIGATING DISTRIBUTED DENIAL OF SERVICE ATTACKS IN MULTIPARTY APPLICATIONS IN THE PRESENCE OF CLOCK DRIFTS
Network-based applications commonly open some known communication port(s), making themselves easy targets for (distributed) Denial of Service (DoS) attacks. Earlier solutions for this problem are based on port-hopping between pairs of processes which are synchronous or exchange acknowledgments.
However, acknowledgments, if lost, can cause a port to be open for longer time and thus be vulnerable, while time servers can become targets to DoS attack themselves. Here, we extend port-hopping to support multiparty applications, by proposing the BIGWHEEL algorithm, for each application server to communicate with multiple clients in a port-hopping manner without the need for group synchronization.
Furthermore, we present an adaptive algorithm, HOPERAA, for enabling hopping in the presence of bounded asynchrony, namely, when the communicating parties have clocks with clock drifts. The solutions are simple, based on each client interacting with the server independently of the other clients, without the need of acknowledgments or time server(s).
Further, they do not rely on the application having a fixed port open in the beginning, neither do they require the clients to get a “first-contact” port from a third party. We show analytically the properties of the algorithms and also study experimentally their success rates, confirm the relation with the analytical bounds.


*------------*------------*------------*------------*------------*------------*