Events 2010

The Hadoop++ Project

Jens Dittrich Fachrichtung Informatik - Saarbruecken
10 Dec 2010, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
The Hadoop++ Project

MapReduce is a computing paradigm that has gained a lot of attention in recent years from industry and research. Unlike parallel DBMSs, MapReduce allows non-expert users to run complex analytical tasks over very large data sets on very large clusters and clouds. However, this comes at a price: MapReduce processes tasks in a scan-oriented fashion. Hence, the performance of Hadoop --- an open-source implementation of MapReduce --- often does not match the one of a well-configured parallel DBMS. ...
The Hadoop++ Project

MapReduce is a computing paradigm that has gained a lot of attention in recent years from industry and research. Unlike parallel DBMSs, MapReduce allows non-expert users to run complex analytical tasks over very large data sets on very large clusters and clouds. However, this comes at a price: MapReduce processes tasks in a scan-oriented fashion. Hence, the performance of Hadoop --- an open-source implementation of MapReduce --- often does not match the one of a well-configured parallel DBMS. We propose a new type of system named Hadoop++: it boosts task performance without changing the Hadoop framework at all. To reach this goal, rather than changing a working system (Hadoop), we inject our technology at the right places through UDFs only and affect Hadoop from inside. This has three important consequences: First, Hadoop++ significantly outperforms Hadoop. Second, any future changes of Hadoop may directly be used with Hadoop++ without rewriting any glue code. Third, Hadoop++ does not need to change the Hadoop interface. Our experiments show the superiority of Hadoop++ over both Hadoop and HadoopDB for tasks related to indexing and join processing. In this talk I will present results from a VLDB 2010 paper as well as more recent work.

Link: http://infosys.cs.uni-saarland.de/hadoop++.php
Read more

Logic, Policy, and Federation in the Cloud

Yuri Gurevich Microsoft Research, Redmond
29 Nov 2010, 11:00 am - 12:00 pm
Saarbrücken building E1 4, room 024
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Distinguished Lecture Series


Imagine that you manage a public cloud. You want to attract lucrative customers but they worry that their data will not be secure in your cloud. Of course they can encode their data before putting it in the cloud and decode it upon removal but that doesn't buy much for them (or for you because your cloud is used just as a glorified blob store). How can you add value? Cryptographers have many tricks but few of them are feasible at this point; ...


Imagine that you manage a public cloud. You want to attract lucrative customers but they worry that their data will not be secure in your cloud. Of course they can encode their data before putting it in the cloud and decode it upon removal but that doesn't buy much for them (or for you because your cloud is used just as a glorified blob store). How can you add value? Cryptographers have many tricks but few of them are feasible at this point; most notably, searching on encrypted data with a single keyword is being considered. But maybe we shouldn't reinvent the wheel. How do enterprises interact in real world? Consider commerce for example. Buyers and sellers from very different (in geography, culture, political system) countries succeed in making mutually beneficial deals. The sellers get paid, and the buyers get their goods. How does it work? Well, there is an involved support system that developed from centuries of experience: banks issuing letters of credit, insurance companies that underwrite the transactions and transportation, etc. And numerous policies are enforced. Similarly, there is an involved support system that allows Big Pharma to conduct clinical trials that straddle multiple countries. And so on. Can we lift such support systems to the cloud scale and make them more efficient in the process? We believe that the answer is YES. An important ingredient of the desired solution is a high-level language for writing policies. As we mentioned above, numerous policies need to be enforced. They also need to be stated formally to allow automation, and they need to be high-level to allow comprehension and reasoning. Cryptography is indispensible in enforcing policies but first we need a language to formulate policies succinctly and to exchange them among autonomous parties. The Distributed Knowledge Authorization Language (DKAL) was created for such purposes. It required foundational logic investigation, and it is in the process of tech transfer. This lecture is a popular introduction to DKAL and its applications to doing business via public clouds.
Read more

SMOOTHIE: Scalable transaction processing in the cloud

Nitin Gupta Cornell University
23 Nov 2010, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
In this talk, I will introduce SMOOTHIE, a new framework for transaction processing. SMOOTHIE is based on optimistic concurrency control separated into a control flow of scalable components that can each scale. In addition, each of these components is nearly stateless and thus can be easily scaled up or down on-demand. I will present the architecture of SMOOTHIE, explain how it achieves scalability, present thoughts on how to avoid bottlenecks through heat-based data placement, and discuss tradeoffs between different implementations of the components of SMOOTHIE. ...
In this talk, I will introduce SMOOTHIE, a new framework for transaction processing. SMOOTHIE is based on optimistic concurrency control separated into a control flow of scalable components that can each scale. In addition, each of these components is nearly stateless and thus can be easily scaled up or down on-demand. I will present the architecture of SMOOTHIE, explain how it achieves scalability, present thoughts on how to avoid bottlenecks through heat-based data placement, and discuss tradeoffs between different implementations of the components of SMOOTHIE.
Read more

Entangled queries: an abstraction for declarative data-driven coordination.

Lucja Kot Cornell University
12 Nov 2010, 11:00 am - 12:30 pm
Kaiserslautern building G26, room 206
simultaneous videocast to Saarbrücken building E1 5, room 5th floor
SWS Colloquium
Today's Web 2.0 users increasingly want to perform data-driven tasks that involve collaboration and coordination. Friends want to make joint travel plans, students want to enroll in courses together, and busy professionals want to coordinate their schedules. These tasks are surprisingly difficult to support using existing database system abstractions, as the coordination required is in direct conflict with isolation -- a key property of ACID transactions.

In the Youtopia project at Cornell, we believe that data-driven coordination is so pervasive that it deserves dedicated support through a clean, ...
Today's Web 2.0 users increasingly want to perform data-driven tasks that involve collaboration and coordination. Friends want to make joint travel plans, students want to enroll in courses together, and busy professionals want to coordinate their schedules. These tasks are surprisingly difficult to support using existing database system abstractions, as the coordination required is in direct conflict with isolation -- a key property of ACID transactions.

In the Youtopia project at Cornell, we believe that data-driven coordination is so pervasive that it deserves dedicated support through a clean, declarative abstraction. It is time to move beyond isolation and support declarative, data-driven coordination (D3C) as a fundamental mode of data management. In this talk, I will introduce entangled queries - a simple yet powerful abstraction and mechanism to enable D3C. These queries allow users to specify the coordination required, such as "I want to travel on the same flight as my friend". At runtime, the system performs the coordination ensuring the specifications are met. I will discuss the syntax, semantics and evaluation of these queries, as well as introducing a range of broader research challenges associated with designing for D3C.
Read more

Making Broadband Access Networks Transparent to Researchers, Developers, and Users

Marcel Dischinger Max Planck Institute for Software Systems
27 Oct 2010, 4:30 pm - 5:30 pm
Saarbrücken building E1 4, room 024
SWS Student Defense Talks - Thesis Defense
Broadband networks are used by hundreds of millions of users to connect to the Internet today. However, most ISPs are hesitant to reveal details about their network deployments, and as a result the characteristics of broadband networks are often not known to users, developers, and researchers. In this thesis, we make progress towards mitigating this lack of transparency in broadband access networks in two ways.

First, using novel measurement tools we performed the first large-scale study of the characteristics of broadband networks. ...
Broadband networks are used by hundreds of millions of users to connect to the Internet today. However, most ISPs are hesitant to reveal details about their network deployments, and as a result the characteristics of broadband networks are often not known to users, developers, and researchers. In this thesis, we make progress towards mitigating this lack of transparency in broadband access networks in two ways.

First, using novel measurement tools we performed the first large-scale study of the characteristics of broadband networks. We found that broadband networks have very different characteristics than academic networks. We also developed Glasnost, a system that enables users to test their Internet access links for traffic differentiation. Glasnost has been used by more than 350,000 users worldwide and allowed us to study ISPs' traffic management practices. We found that ISPs increasingly throttle or even block traffic from popular applications such as BitTorrent.

Second, we developed two new approaches to enable realistic evaluation of networked systems in broadband networks. We developed Monarch, a tool that enables researchers to study and compare the performance of new and existing transport protocols at large scale in broadband environments. Furthermore, we designed SatelliteLab, a novel testbed that can easily add arbitrary end nodes, including broadband nodes and even smartphones, to existing testbeds like PlanetLab.
Read more

Practical memory safety for C

Periklis Akritidis University of Cambridge
06 Sep 2010, 11:00 am - 12:00 pm
Kaiserslautern building G26, room 206
simultaneous videocast to Saarbrücken building E1 5, room 5th floor
SWS Colloquium
C facilitates high performance execution and low-level systems programming, but the lack of memory safety undermines security and reliability; for example, memory corruption bugs can breach security, and faults in kernel extensions can bring down the entire operating system. Memory safe languages, however, are unlikely to displace C in the near future, and solutions currently in use offer inadequate protection. Comprehensive proposals, on the other hand, are either too slow for practical use, or break backwards compatibility by requiring source code porting or generating incompatible binary code. ...
C facilitates high performance execution and low-level systems programming, but the lack of memory safety undermines security and reliability; for example, memory corruption bugs can breach security, and faults in kernel extensions can bring down the entire operating system. Memory safe languages, however, are unlikely to displace C in the near future, and solutions currently in use offer inadequate protection. Comprehensive proposals, on the other hand, are either too slow for practical use, or break backwards compatibility by requiring source code porting or generating incompatible binary code. My talk will present backwards-compatible solutions to prevent dangerous memory corruption in C programs at a low cost.
Read more

Effective scheduling techniques for high-level parallel-programming languages

Mike Rainey University of Chicago
17 Aug 2010, 11:00 am - 12:00 pm
Kaiserslautern building G26, room 206
simultaneous videocast to Saarbrücken building E1 5, room 5th floor
SWS Colloquium
In the not-so-distant past, parallel programming was mostly the  concern of programmers specializing in high-performance  computing. Nowadays, on the other hand, many of today's desktop and  laptop computers come equipped with a species of shared-memory  multiprocessor called a multicore processor, making parallel  programming a concern for a much broader range of  programmers. High-level parallel languages, such as Parallel ML (PML)  and Haskell, seek to reduce the complexity of programming multicore  processors by giving programmers abstract execution models, ...
In the not-so-distant past, parallel programming was mostly the  concern of programmers specializing in high-performance  computing. Nowadays, on the other hand, many of today's desktop and  laptop computers come equipped with a species of shared-memory  multiprocessor called a multicore processor, making parallel  programming a concern for a much broader range of  programmers. High-level parallel languages, such as Parallel ML (PML)  and Haskell, seek to reduce the complexity of programming multicore  processors by giving programmers abstract execution models, such as  implicit threading, where programmers annotate their programs to  suggest the parallel decomposition. Implicitly-threaded programs,  however, do not specify the actual decomposition of computations or  mapping from computations to processors. The annotations act simply as  hints that can be ignored and safely replaced with sequential  counterparts. The parallel decomposition itself is the responsibility  of the language implementation and, more specifically, of the  scheduling system.

 Threads can take arbitrarily different amounts of time to execute, and  these times are difficult to predict.  Implicit threading encourages  the programmer to divide the program into threads that are as small as  possible because doing so increases the flexibility the scheduler in  its duty to distribute work evenly across processors.  The downside of  such fine-grain parallelism is that if the total scheduling cost is  too large, then parallelism is not worthwhile. This problem is the  focus of this talk.

 The starting point of this talk is work stealing, a scheduling policy  well known for its scalable parallel performance, and the work-first  principle, which serves as a guide for building efficient  implementations of work stealing.  In this talk, I present two  techniques, Lazy Promotion and Lazy Tree Splitting, for implementing  work stealing. Both techniques derive their efficiency from adhering  to the work-first principle. Lazy Promotion is a strategy that  improves the performance, in terms of execution time, of a  work-stealing scheduler by reducing the amount of load the scheduler  places on the garbage collector. Lazy Tree Splitting is a technique  for automatically scheduling the execution of parallel operations over  trees to yield scalable performance and eliminate the need for  per-application tuning. I use Manticore, PML's compiler and runtime  system, and a sixteen-core NUMA machine as a testbed for these  techniques.

 In addition, I present two empirical studies.  In the first study, I  evaluate Lazy Promotion over six PML benchmarks.  The results  demonstrate that Lazy Promotion either outperforms or performs the  same as an alternative scheme based on Eager Promotion.  This study  also evaluates the design of the Manticore runtime system, in  particular, the split-heap memory manager, by comparing the system to  an alternative system based on a unified-heap memory manager, and  showing that the unified version has limited scalability due to poor  locality.  In the second study, I evaluate Lazy Tree Splitting over  seven PML benchmarks by comparing Lazy Tree Splitting to its  alternative, Eager Tree Splitting. The results show that, although the  two techniques offer similar scalability, only Lazy Tree Splitting is  suitable for building an effective language implementation.
Read more

Solvers for Software Reliability and Security

Vijay Ganesh MIT
22 Jul 2010, 1:00 pm - 2:30 pm
Kaiserslautern building G26, room 206
simultaneous videocast to Saarbrücken building E1 5, room 5th floor
SWS Colloquium
The task of building reliable and secure software remains one of the most important and challenging issues in computer science. In recent years, there has been rapid progress in the scalability and effectiveness of software reliability tools. A key reason for this success is the dramatic improvement in the speed of constraint solvers over the last decade. Constraint solvers are essential components of most software reliability tools, whether they are based on formal methods, program analysis, ...
The task of building reliable and secure software remains one of the most important and challenging issues in computer science. In recent years, there has been rapid progress in the scalability and effectiveness of software reliability tools. A key reason for this success is the dramatic improvement in the speed of constraint solvers over the last decade. Constraint solvers are essential components of most software reliability tools, whether they are based on formal methods, program analysis, testing or synthesis. My research on constraint solvers has directly contributed to this trend of increasing solver efficiency and expressive power, thus advancing the state-of-the-art in software reliability research.

In this talk, I will present two solvers that I have designed and implemented, namely, STP and HAMPI. I will talk about the techniques that enable STP and HAMPI to scale, and also some theoretical results. I will also talk about the contexts and applications where each solver is best suited.

STP is a solver for the theory of bit-vectors and arrays. STP was one of the first constraint solvers to enable an exciting new testing technique called Dynamic Systematic Testing (aka Concolic Testing). STP-enabled concolic testers have been used to find hundreds of previously unknown bugs in Unix utilities, OS kernels, media players, and commercial software, some with approximately a million lines of code.

Next, I will describe HAMPI, a solver for a rich theory of equality over bounded string variables, bounded regular expressions, and context-free grammars. Constraints in this theory are generated by analysis of string-manipulating programs. HAMPI has been used to find many unknown SQL injection vulnerabilities in applications with more than 100,000 lines of PHP code using static and dynamic analysis.

Finally, I will conclude my talk with two future research programs. First, I will discuss how faster solvers can enable qualitatively novel approaches to software reliability. Second, I will discuss how we can go from specific solver techniques to solver design paradigms for rich logics.
Read more

Timing Analysis of Mixed Time / Event-Triggered Multi-Mode Systems

Linh Phan University of Pensylvania
19 Jul 2010, 1:00 pm - 2:30 pm
Kaiserslautern building G26, room 206
simultaneous videocast to Saarbrücken building E1 5, room 5th floor
SWS Colloquium
Many embedded systems operate in multiple modes, where mode switches can be both time- as well as event-triggered. While timing and schedulability analysis of the system while it is operating in a single mode has been well studied, it is always difficult to piece together the results from different modes in order to deduce the timing properties of a multi-mode system. In this talk, I will present a model and associated analysis techniques to describe embedded systems that process multiple bursty/complex event/data streams and in which mode changes are both time- and event-triggered. ...
Many embedded systems operate in multiple modes, where mode switches can be both time- as well as event-triggered. While timing and schedulability analysis of the system while it is operating in a single mode has been well studied, it is always difficult to piece together the results from different modes in order to deduce the timing properties of a multi-mode system. In this talk, I will present a model and associated analysis techniques to describe embedded systems that process multiple bursty/complex event/data streams and in which mode changes are both time- and event-triggered. Compared to previous studies, our model is very general and can capture a wide variety of real-life systems. Our analysis techniques can be used to determine different performance metrics, such as the maximum fill-levels of different buffers and the delays suffered by the streams being processed by the system. The main novelty in our analysis lies in how we piece together results from the different modes in order to obtain performance metrics for the full system.
Read more

Approximate Inference Algorithms in Markov Random Field with Their Applications

Kyomin Jung KAIST
19 Jul 2010, 11:00 am - 12:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Markov Random Field (MRF) provides an elegant and succinct abstraction to capture inter-dependency between a large number of random variables - applications abound in communication networks, signal processing, statistical physics, combinatorics, biology, etc. In most of these applications, the key task pertains inferring the most likely assignment (aka MAP). This problem is computationally hard in general. Therefore, various approximations have been developed that utilize graph structure of the MRF for efficient computation. Popular approaches like Belief Propagation (and its variants) work well when the underlying graph has large girth, ...
Markov Random Field (MRF) provides an elegant and succinct abstraction to capture inter-dependency between a large number of random variables - applications abound in communication networks, signal processing, statistical physics, combinatorics, biology, etc. In most of these applications, the key task pertains inferring the most likely assignment (aka MAP). This problem is computationally hard in general. Therefore, various approximations have been developed that utilize graph structure of the MRF for efficient computation. Popular approaches like Belief Propagation (and its variants) work well when the underlying graph has large girth, eg. sparse graphical codes like LDPC codes.

In many applications of interest, graphs do have lots of short cycles, but they naturally posses some ``geometry''. We develop a new class of approximation algorithms that utilize geometry, which is called polynomially growing, of the underlying graph to obtain efficient approximation with provable guarantee for the inference problem.

In this talk, I will describe the main idea of the Belief Propagation, and our new algorithm based on simple local updates. I will describe their applications to wireless network and image processing.
Read more

Spoken Networks: Analyzing face-to-face conversations and how they shape our social connections

Tanzeem Choudhury Dartmouth College
25 Jun 2010, 1:30 pm - 2:30 pm
Kaiserslautern building G26, room 206
SWS Colloquium
With the proliferation of sensor-rich mobile devices, it is now possible to collect data that continuously capture the real-world social interactions of entire groups of people. These new data sets provide opportunities to study the social networks of people as they are observed "in the wild." However, traditional methods often model social networks as static and binary, which are inadequate for continuous behavioral data. Networks derived from behavioral data are almost always temporal, are often non-stationary, ...
With the proliferation of sensor-rich mobile devices, it is now possible to collect data that continuously capture the real-world social interactions of entire groups of people. These new data sets provide opportunities to study the social networks of people as they are observed "in the wild." However, traditional methods often model social networks as static and binary, which are inadequate for continuous behavioral data. Networks derived from behavioral data are almost always temporal, are often non-stationary, and have finer grained observations about interactions as opposed to simple binary indicators. Thus, new techniques are needed that can take into account the variable tie intensities and the dynamics of a network as it evolves in time. In this talk, I will provide an overview of the computational framework we have developed for modeling the micro-level dynamics of human interactions as well as the macro-level network structure and its dynamics from local, noisy sensor observations. Furthermore, by studying the micro and macro levels simultaneously we are able to link dyad-level interaction dynamics (local behavior) to network-level prominence (a global property). I will conclude by providing some specific examples of how the methods we have developed can be applied more broadly to better understand and enhance the lives and health of people.

Based on joint work with Danny Wyatt (University of Washington), James Kitts (Columbia), Jeff Bilmes (University of Washington), Andrew Campbell (Dartmouth), and Ethan Berke (Dartmouth Medical School)
Read more

Privacy and forensics in federated distributed systems

Andreas Haeberlen University of Pennsylvania
15 Jun 2010, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
In this talk, I will describe two of my ongoing projects, one on collaborative security and the other on secure network provenance.

Collaborative security is an approach to fighting security threats that span many different administrative domains. For example, it is often difficult for Internet service providers to detect botnets because each domain can locally observe only a small part of the botnet; however, exchanging information across domains is challenging because of privacy concerns. ...
In this talk, I will describe two of my ongoing projects, one on collaborative security and the other on secure network provenance.

Collaborative security is an approach to fighting security threats that span many different administrative domains. For example, it is often difficult for Internet service providers to detect botnets because each domain can locally observe only a small part of the botnet; however, exchanging information across domains is challenging because of privacy concerns. We are working on a system that enables domains to share information in a controlled fashion. Each domain establishes a 'privacy budget' and allows other domains to ask queries in a special language; the 'privacy cost' of each query is inferred automatically and deducted from the budget. This is accomplished through a combination of type systems and differential privacy.

Secure network provenance (SNP) is a technique for answering forensic questions about distributed systems in which some nodes have been compromised by an adversary. Each node maintains a record of past states and the (local or remote) causes of state transitions. Thus, if the symptoms of an intrusion are observed, it is possible to trace them back to their root causes or, conversely, to determine how an intrusion on one node has affected other nodes.
Read more

Reverse Traceroute

Arvind Krishnamurthy University of Washington
14 Jun 2010, 10:15 am - 11:15 am
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Traceroute is the most widely used Internet diagnostic tool today. Network operators use it to help identify routing failures, poor performance, and router misconfigurations. Researchers use it to map the Internet, predict performance, geolocate routers, and classify the performance of ISPs. However, traceroute has a fundamental limitation that affects all these applications: it does not provide reverse path information. Although various public traceroute servers across the Internet provide some visibility, no general method exists for determining a reverse path from an arbitrary destination. ...
Traceroute is the most widely used Internet diagnostic tool today. Network operators use it to help identify routing failures, poor performance, and router misconfigurations. Researchers use it to map the Internet, predict performance, geolocate routers, and classify the performance of ISPs. However, traceroute has a fundamental limitation that affects all these applications: it does not provide reverse path information. Although various public traceroute servers across the Internet provide some visibility, no general method exists for determining a reverse path from an arbitrary destination.

We address this longstanding limitation by building a reverse traceroute tool. Our tool provides the same information as traceroute, but for the reverse path, and it works in the same case as traceroute, when the user may lack control of the destination. Our approach combines a number of ideas: source spoofing, IP timestamp and record route options, and multiple vantage points. In the median case our tool finds 87% of the hops seen in a directly measured traceroute along the same path, versus only 38% if one simply assumes the path is symmetric, a common fallback given the lack of available tools. We then use our reverse traceroute tool to study previously unmeasurable aspects of the Internet: we uncover more than a thousand peer-to-peer AS links invisible to current topology mapping efforts, we examine a case study of how a content provider could use our tool to troubleshoot poor path performance, and we measure the latency of individual backbone links with, on average, sub-millisecond precision
Read more

Exploiting Language Abstraction to Optimize Memory Efficiency

Jennifer B. Sartor University of Texas, Austin
12 May 2010, 2:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Memory continues to be a bottleneck in modern systems as application complexity and the number of cores have increased, generating more traffic. These complex applications are written largely in high-level managed languages, which offer an opportunity to dynamically optimize memory performance because they virtualize memory management. We explored memory inefficiencies in a Java virtual machine, performing a study of data compression techniques. I will present the results of that study, showing arrays are a dominant source of heap bloat. ...
Memory continues to be a bottleneck in modern systems as application complexity and the number of cores have increased, generating more traffic. These complex applications are written largely in high-level managed languages, which offer an opportunity to dynamically optimize memory performance because they virtualize memory management. We explored memory inefficiencies in a Java virtual machine, performing a study of data compression techniques. I will present the results of that study, showing arrays are a dominant source of heap bloat. Focusing on arrays, we found the traditional contiguous layout precludes space optimizations and does not offer memory management time and space bounds. I show how to exploit the opportunities afforded by managed languages to implement an efficient discontiguous array layout with tunable optimization parameters that improve space usage. Having attacked memory performance on the software side, I will then describe new work that takes a cooperative software-hardware approach. I show how a memory manager can communicate regions of dead data to the architecture, allowing it to eliminate useless writes, substantially reducing memory write traffic. My research combines the flexibility and productivity of high-level managed languages with improved memory efficiency that is critical to current and future hardware.
Read more

Programmable Self-Adjusting Computation

Ruy Ley-Wild Carnegie Mellon University
27 Apr 2010, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
In the self-adjusting computation model, programs can respond automatically and efficiently to input changes by tracking the dynamic data dependencies of the computation and incrementally updating the output as needed. After a run from scratch, the input can be changed and the output can be updated via change-propagation, a mechanism for re-executing the portions of the computation affected by the changes while reusing the unaffected parts. Previous research shows that self-adjusting computation can be effective at achieving near-optimal update bounds for various applications. ...
In the self-adjusting computation model, programs can respond automatically and efficiently to input changes by tracking the dynamic data dependencies of the computation and incrementally updating the output as needed. After a run from scratch, the input can be changed and the output can be updated via change-propagation, a mechanism for re-executing the portions of the computation affected by the changes while reusing the unaffected parts. Previous research shows that self-adjusting computation can be effective at achieving near-optimal update bounds for various applications. We address the question of how to write and reason about self-adjusting programs.

We propose a language-based technique for annotating ordinary programs and compiling them into equivalent self-adjusting versions. We also provide a cost semantics and a concept of trace distance that enables reasoning about the effectiveness of self-adjusting computation at the source level. To facilitate asymptotic analysis, we propose techniques for composing and generalizing concrete distances via trace contexts (traces with holes). The translation preserves the extensional semantics of the source programs, the intensional cost of from-scratch runs, and ensures that change-propagation between two evaluations takes time bounded by their relative distance.
Read more

Accurate Analysis of Large Private Datasets

Vibhor Rastogi University of Washington
15 Apr 2010, 10:00 am - 11:30 am
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Today, no individual has full control over access to his personal information. Private data collected by various hospitals and universities, and also by websites like Google and Facebook, contain valuable statistical facts that can be mined for research and analysis, e.g., analyze outbreak of diseases, detect traffic patterns on the road, or understand browsing trends on the web, but concerns about individual privacy severely restricts its use, e.g., privacy attacks led AOL to recently pull-off its published search-log data. ...
Today, no individual has full control over access to his personal information. Private data collected by various hospitals and universities, and also by websites like Google and Facebook, contain valuable statistical facts that can be mined for research and analysis, e.g., analyze outbreak of diseases, detect traffic patterns on the road, or understand browsing trends on the web, but concerns about individual privacy severely restricts its use, e.g., privacy attacks led AOL to recently pull-off its published search-log data.

To remedy this, much recent work focuses on data analysis with formal privacy guarantees. This has given rise to differential privacy considered by many as the golden standard of privacy. However, few practical techniques satisfying differential privacy exist for complex analysis tasks (e.g., analysis involving complex query operators), or new data models (e.g., data having temporal correlations or uncertainty). In this talk, I will discuss techniques that fill this void. I will first discuss a query answering algorithm that can handle joins (previously, no private technique could accurately answer join queries arising in many analysis tasks). This algorithm makes several privacy-preserving analyses over social network graphs possible for the first time. Then I will discuss a query-answering technique over time-series data, which enables private analysis of GPS traces and other temporally-correlated data. Third, I will discuss an access control mechanism for uncertain data, which enables enforcing security policies on RFID-based location data. Finally, I will conclude by discussing some privacy and security problems in building next-generation computing systems based on new models for data (e.g., uncertain data), computing (e.g., cloud computing), and human computer interaction (e.g., ubiquitous systems).
Read more

Binary Program Analysis and Model Extraction for Security Applications

Juan Caballero Carnegie Mellon University
01 Apr 2010, 11:00 am - 12:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
In this talk I present a platform to extract models of security-relevant functionality from program binaries, enabling multiple security applications such as active botnet infiltration, finding deviations between implementations of the same functionality, vulnerability signature generation, and finding content-sniffing cross-site scripting (XSS) attacks. In this talk, I present two applications: active botnet infiltration and finding content-sniffing XSS attacks.

Botnets, large networks of infected computers under control of an attacker, are one of the dominant threats in the Internet, ...
In this talk I present a platform to extract models of security-relevant functionality from program binaries, enabling multiple security applications such as active botnet infiltration, finding deviations between implementations of the same functionality, vulnerability signature generation, and finding content-sniffing cross-site scripting (XSS) attacks. In this talk, I present two applications: active botnet infiltration and finding content-sniffing XSS attacks.

Botnets, large networks of infected computers under control of an attacker, are one of the dominant threats in the Internet, enabling fraudulent activities such as spamming, phishing, and distributed denial-of-service attacks. To build strong botnet defenses, defenders need information about the botnet's capabilities and the attacker's actions. One effective way to obtain that information is through active botnet infiltration, but such infiltration is challenging due to the encrypted and proprietary protocols that botnets use to communicate. In this talk, I describe techniques for reverse-engineering such protocols and present how we use this information to infiltrate a prevalent, previously not analyzed, spam botnet.

Cross-site scripting attacks are the most prevalent class of attacks nowadays. One subtle class of overlooked XSS attacks are content-sniffing XSS attacks. In this talk, I present model extraction techniques and how they enable finding content-sniffing XSS attacks. We use those models to find attacks against popular web sites and browsers such as Wikipedia when accessed using Internet Explorer 7. I describe our defenses for these attacks and how our proposals have been adopted by widely used browsers such as Google Chrome and IE8, as well as standardization groups.
Read more

Taming the Malicious Web: Avoiding and Detecting Web-based Attacks

Marco Cova University of California, Santa Barbara
29 Mar 2010, 11:00 am - 12:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
The world wide web is an essential part of our infrastructure and a predominant mean for people to interact, do business, and participate to democratic processes. Unfortunately, in recent years, the web has also become a more dangerous place. In fact, web-based attacks are now a prevalent and serious threat. These attacks target both web applications, which store sensitive data (such as financial and personal records) and are trusted by large user bases, and web clients, ...
The world wide web is an essential part of our infrastructure and a predominant mean for people to interact, do business, and participate to democratic processes. Unfortunately, in recent years, the web has also become a more dangerous place. In fact, web-based attacks are now a prevalent and serious threat. These attacks target both web applications, which store sensitive data (such as financial and personal records) and are trusted by large user bases, and web clients, which, after a compromise, can be mined for private data or used as drones of a botnet.

In this talk, we will present an overview of our techniques to detect, analyze, and mitigate malicious activity on the web. In particular, I will present a system, called Wepawet, which targets the problem of detecting web pages that launch drive-by-download attacks against their visitors. Wepawet visits web pages with an instrumented browser and records events that occur during the interpretation of their HTML and JavaScript code. This observed activity is analyzed using anomaly detection techniques to classify web pages as benign or malicious. We made our tool available as an online service, which is currently used by several thousands of users every month.

We will also discuss techniques to automatically detect vulnerabilities and attacks against web applications. In particular, we will focus on static analysis techniques to identify ineffective sanitization routines and to detect vulnerabilities stemming from the interaction of multiple modules of a web application. These techniques found tens of vulnerabilities in several real-world web applications.
Read more

Formal Program Verification Through Characteristic Formulae

Arthur Chargueraud INRIA
25 Mar 2010, 11:00 am - 12:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
The characteristic formula of a program is a logical formula that implies any valid post-condition for that program. In this talk, I will explain how to build, in a systematic manner, the characteristic formula of a given purely functional Caml program, and then explain how one can use this formula to verify the program interactively, using the Coq proof assistant. This new, sound and complete approach to the verification of total correctness properties has been applied to the formalization of a number of data structures taken from Chris Okasaki's reference book. ...
The characteristic formula of a program is a logical formula that implies any valid post-condition for that program. In this talk, I will explain how to build, in a systematic manner, the characteristic formula of a given purely functional Caml program, and then explain how one can use this formula to verify the program interactively, using the Coq proof assistant. This new, sound and complete approach to the verification of total correctness properties has been applied to the formalization of a number of data structures taken from Chris Okasaki's reference book. My presentation will include demos based on those case studies.
Read more

Program equivalence and compositional compiler correctness

Chung-Kil Hur Laboratoire PPS
22 Mar 2010, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
We introduce a notion of program equivalence between different languages and talk about what kind of mathematical techniques are used, how it gives a compositional notion of compiler correctness and more ambitiously how it may be seen as a technique to prove the correctness of programs. We also briefly discuss how these ideas can be formalized and verified in the formal proof assistant Coq.

Towards full verification of concurrent libraries

Viktor Vafeiadis University of Cambridge
18 Mar 2010, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Modern programming platforms, such as Microsoft's .NET, provide libraries of efficient concurrent data structures, which are used in a wide range of applications.  In this talk, I will discuss some of the pitfalls in implementing such concurrent data structures, what correctness of these libraries means, how one can formally prove that a given library is correct, and the extent to which these proofs can be carried out automatically.

Achieving Reliability in Deployed Software Systems

Michael Bond University of Texas, Austin
15 Mar 2010, 11:00 am - 12:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Software is becoming more complex and concurrent due to demand for features and hardware trends that are leading to more instead of faster cores. In the face of these challenges, developers have trouble writing large, correct, scalable programs. Deployed software inevitably contains bugs, even if it has been thoroughly tested, because it is infeasible to analyze and test all possible inputs, environments, and thread schedules. My research focuses on improving reliability while production software runs, to help prevent, ...
Software is becoming more complex and concurrent due to demand for features and hardware trends that are leading to more instead of faster cores. In the face of these challenges, developers have trouble writing large, correct, scalable programs. Deployed software inevitably contains bugs, even if it has been thoroughly tested, because it is infeasible to analyze and test all possible inputs, environments, and thread schedules. My research focuses on improving reliability while production software runs, to help prevent, diagnose, and tolerate errors that actually manifest in deployment.

This talk first presents Pacer, a deployable, scalable approach for detecting data races, which are a common and serious type of concurrency bug. Second, I describe techniques for efficiently reporting the calling context (stack trace) of concurrency and other bugs -- essential information for understanding the behavior of complex, modern programs. I conclude with my future plans for developing new analyses and frameworks that make concurrent software reliable.
Read more

Improving the Interface between Systems and Cryptography

Thomas Ristenpart UC San Diego
11 Mar 2010, 11:00 am - 12:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Modern cryptography provides a rigorous mathematical framework for proving the security of cryptographic algorithms. To be effective, however, these mathematical models must accurately reflect the realities of cryptography's use in systems. In this talk, I will address mismatches between theory and use, giving several examples from my work: the problem of credit card number encryption, dealing with bad (cryptographic) randomness, the increasingly diverse applications of cryptographic hash functions, and privacy-preserving device tracking. In each example, problems arise because of gaps between what cryptography offers and what security and privacy demands. ...
Modern cryptography provides a rigorous mathematical framework for proving the security of cryptographic algorithms. To be effective, however, these mathematical models must accurately reflect the realities of cryptography's use in systems. In this talk, I will address mismatches between theory and use, giving several examples from my work: the problem of credit card number encryption, dealing with bad (cryptographic) randomness, the increasingly diverse applications of cryptographic hash functions, and privacy-preserving device tracking. In each example, problems arise because of gaps between what cryptography offers and what security and privacy demands. To fix these issues, I take an application-oriented approach. This involves modifying cryptography in a theoretically-sound to work better for systems, as well as understanding cryptography's role in broader system security mechanisms.

Looking forward, I will discuss my recent work on new attacks in the setting of cloud computing and my future plans for securing next-generation cloud computing services.
Read more

Byzantine fault tolerance for cluster services

Allen Clement University of Texas
10 Mar 2010, 4:00 pm - 5:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Experiences with computers and computer systems indicate an inconvenient truth: computers fail, and they fail for a wide range of reasons including power outage, disc and memory corruption, misconfiguration, NIC malfunction, user error, and many others. The impact of computer failures can be dramatic, ranging from unavailability of email to canceled flights and stranded airline passengers to closed companies.

Byzantine fault tolerant (BFT) state machine replication is posited as an approach for masking individual computer faults and deploying robust services that continue to function despite failures. ...
Experiences with computers and computer systems indicate an inconvenient truth: computers fail, and they fail for a wide range of reasons including power outage, disc and memory corruption, misconfiguration, NIC malfunction, user error, and many others. The impact of computer failures can be dramatic, ranging from unavailability of email to canceled flights and stranded airline passengers to closed companies.

Byzantine fault tolerant (BFT) state machine replication is posited as an approach for masking individual computer faults and deploying robust services that continue to function despite failures. However, the lack of deployed systems that utilize BFT techniques is indicates that the current state of the art falls short of the needs of practical deployments. In this talk I will discuss our efforts to make BFT a practical and attractive option for practitioners. These efforts are centered around developing BFT techniques that (a) are fast and efficient, (b) tolerate Byzantine faults, and (c) can be easily incorporated into legacy applications.

This work is in collaboration with Amit Aiyer, Lorenzo Alvisi, Mike Dahlin, Manos Kapritsos, Yang Wang, and Edmund Wong (UT-Austin), Rama Kotla (currently MSR-SVC), and Mirco Marchetti (currently University of Modena and Reggio Emilia).
Read more

Compositional Shape Analysis by means of Bi-Abduction

Cristiano Calcagno Imperial College, London
08 Mar 2010, 11:00 am - 12:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
We describe a compositional shape analysis, where each procedure is analyzed independently of its callers. The analysis uses an abstract domain based on a restricted fragment of separation logic, and assigns a collection of Hoare triples to each procedure; the triples provide an over-approximation of data structure usage. Compositionality brings its usual benefits - increased potential to scale, ability to deal with unknown calling contexts, graceful way to deal with imprecision - to shape analysis, for the first time. ...
We describe a compositional shape analysis, where each procedure is analyzed independently of its callers. The analysis uses an abstract domain based on a restricted fragment of separation logic, and assigns a collection of Hoare triples to each procedure; the triples provide an over-approximation of data structure usage. Compositionality brings its usual benefits - increased potential to scale, ability to deal with unknown calling contexts, graceful way to deal with imprecision - to shape analysis, for the first time. The analysis rests on a generalized form of abduction (inference of explanatory hypotheses) which we call bi-abduction. Bi-abduction displays abduction as a kind of inverse to the frame problem: it jointly infers anti-frames (missing portions of state) and frames (portions of state not touched by an operation), and is the basis of a new interprocedural analysis algorithm. We have implemented our analysis algorithm and we report case studies on smaller programs to evaluate the quality of discovered specifications, and larger programs (e.g., an entire Linux distribution) to test scalability and graceful imprecision.
Read more

Algorithms for Parallel Cache Hierarchies

Guy Blelloch Carnegie Mellon University
05 Mar 2010, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Saarbrücken building E1 4, room 24
SWS Colloquium
Cache hierarchies in multicore computers are quite complex consisting of many levels of both shared and private caches. Designing algorithms and applications for such caches can be very complicated although often necessary to get good performance. We discuss approaches of capturing the locality of a parallel algorithm at a high-level requiring little work by the algorithm designer or programmer and then using this along with an appropriate thread scheduler to get good cache performance on a variety of parallel cache configurations. ...
Cache hierarchies in multicore computers are quite complex consisting of many levels of both shared and private caches. Designing algorithms and applications for such caches can be very complicated although often necessary to get good performance. We discuss approaches of capturing the locality of a parallel algorithm at a high-level requiring little work by the algorithm designer or programmer and then using this along with an appropriate thread scheduler to get good cache performance on a variety of parallel cache configurations. I will describe results for private caches, shared caches and some new results on multiple level cache hierarchies. In all cases the same algorithms can be used, but the scheduler needs to be changed. The approach makes use of ideas on cache oblivious algorithms.
Read more

Verifying Functional Programs with Type Refinements

Joshua Dunfield McGill University
03 Mar 2010, 2:00 pm - 3:00 pm
Saarbrücken building MPI SWS, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Types express properties of programs; typechecking is specification checking. But the specifications expressed by conventional type systems are imprecise. Type refinements enable programmers to express more precise properties, while keeping typechecking decidable.

I present a system that unifies and extends past work on datasort and index refinements. Intersection and union types permit a powerful type system that requires no user input besides type annotations. Instead of seeing type annotations as a burden or just as a shield against undecidability, ...
Types express properties of programs; typechecking is specification checking. But the specifications expressed by conventional type systems are imprecise. Type refinements enable programmers to express more precise properties, while keeping typechecking decidable.

I present a system that unifies and extends past work on datasort and index refinements. Intersection and union types permit a powerful type system that requires no user input besides type annotations. Instead of seeing type annotations as a burden or just as a shield against undecidability, I see them as a desirable form of machine-checked documentation. Accordingly, I embrace the technique of bidirectional typechecking for everything from dimension types to first-class polymorphism.

My implementation of this type system, for a subset of Standard ML, found several bugs in the SML/NJ data structure libraries.
Read more

Partial Replication in Large Networks

Nicolas Schiper University of Lugano
16 Feb 2010, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
The popularity of Internet applications such as e-banking, on-line stores, and social networks has tremendously increased in the past years. As a consequence, our daily lives depend on computers more and more each day. These applications typically run on many machines spread across several geographical sites, or groups, to provide low response times to clients. In these complex systems, hardware and software failures are common. Providing high availability without sacrificing performance is thus of a prime importance. ...
The popularity of Internet applications such as e-banking, on-line stores, and social networks has tremendously increased in the past years. As a consequence, our daily lives depend on computers more and more each day. These applications typically run on many machines spread across several geographical sites, or groups, to provide low response times to clients. In these complex systems, hardware and software failures are common. Providing high availability without sacrificing performance is thus of a prime importance. In this talk, we present protocols to achieve high availability through data replication. We proceed in two steps. We first devise fault-tolerant atomic multicast algorithms that offer message ordering guarantees, and allow messages to be addressed to any subset of groups. We then build a partial replication protocol that relies on this multicast service. The four atomic protocols presented differ in which properties they ensure, namely disastertolerance and genuineness. With the former property, entire groups may contain faulty computers; with the latter, to deliver a message m, protocols only involve the groups addressed by m. Performance at the multicast layer is obtained by minimizing the latency to deliver multicast messages. At the replication layer, data is partitioned among groups. This allows for more scalability than found in traditional replicated systems since sites only handle a fraction of the workload. We show how convoy effects can appear in partially replicated systems and propose techniques to reduce these effects. Experimental evaluations compare the proposed solutions.
Read more

Fences in Weak Memory Models

Jade Alglave Inria
08 Feb 2010, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
We present an axiomatic framework, implemented in Coq, to define weak memory models w.r.t. several parameters: local reorderings of reads and writes, and visibility of inter and intra processor communications through memory, including full store atomicity relaxation. Thereby, we give a formal hierarchy of weak memory models, in which we provide a formal study of what should be the action and placement of fences to restore a given model such as Sequential Consistency from a weaker one. ...
We present an axiomatic framework, implemented in Coq, to define weak memory models w.r.t. several parameters: local reorderings of reads and writes, and visibility of inter and intra processor communications through memory, including full store atomicity relaxation. Thereby, we give a formal hierarchy of weak memory models, in which we provide a formal study of what should be the action and placement of fences to restore a given model such as Sequential Consistency from a weaker one. Finally, we provide a tool, diy, that tests a given machine to determine the architecture it exhibits. We detail the results of our experiments on Power and the model we extract from it. This identified an implementation error in Power 5 memory barriers (for which IBM is providing a software workaround); our results also suggest that Power 6 does not suffer from this problem.
Read more

Distributed Key Generation and its Applications

Aniket Kate University of Waterloo
03 Feb 2010, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 5th floor
SWS Colloquium
Although distributed key generation (DKG) has been studied for some time, it has never been examined outside of the synchronous communication setting. In this talk, I will present the first practical and provably secure asynchronous DKG protocol and its implementation for use over the Internet. I will also discuss cryptographic properties such as uniform randomness of the shared secret, and will provide proactive security and group modification primitives. Notably, this asynchronous DKG protocol requires a set agreement protocol, ...
Although distributed key generation (DKG) has been studied for some time, it has never been examined outside of the synchronous communication setting. In this talk, I will present the first practical and provably secure asynchronous DKG protocol and its implementation for use over the Internet. I will also discuss cryptographic properties such as uniform randomness of the shared secret, and will provide proactive security and group modification primitives. Notably, this asynchronous DKG protocol requires a set agreement protocol, and implements it using a leader-based Byzantine agreement scheme.

In the second half of the talk, I will describe applications of the DKG protocol in designing distributed private-key generators (PKGs) for identity-based cryptography (IBC), a pairing-based onion routing (PB-OR) circuit construction and two robust communication protocols in distributed hash tables. Looking in detail at PB-OR, I will describe a provably secure privacy-preserving key agreement scheme in the IBC setting with distributed PKG and use it to design an efficient and compact onion routing circuit construction that is secure in the universal composability framework.
Read more

A Sensor-Based Framework for Kinetic Data

Sorelle Friedler University of Maryland
02 Feb 2010, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
We introduce a framework for storing and processing kinetic data observed by sensor networks, such as highway traffic or migratory birds. These sensor networks generate vast quantities of data, which motivates a significant need for data compression. We are given a set of sensors, each of which continuously monitors some region of space. We are interested in the kinetic data generated by a finite set of objects moving through space, as observed by these sensors. Our model relies purely on sensor observations; ...
We introduce a framework for storing and processing kinetic data observed by sensor networks, such as highway traffic or migratory birds. These sensor networks generate vast quantities of data, which motivates a significant need for data compression. We are given a set of sensors, each of which continuously monitors some region of space. We are interested in the kinetic data generated by a finite set of objects moving through space, as observed by these sensors. Our model relies purely on sensor observations; it allows points to move freely and requires no advance notification of motion plans. Sensor outputs are represented as random processes, where nearby sensors may be statistically dependent. We model the local nature of sensor networks by assuming that two sensor outputs are statistically dependent only if the two sensors are among the k nearest neighbors of each other. We present an algorithm for the lossless compression of the data produced by the network. We show that, under the statistical dependence and locality assumptions of our framework, asymptotically this compression algorithm encodes the data to within a constant factor of the information-theoretic lower bound optimum dictated by the joint entropy of the system.

We also present an efficient algorithm for answering spatio-temporal range queries. Our algorithm operates on a compressed representation of the data, without the need to decompress it. We analyze the efficiency of our algorithm in terms of two natural measures of information content, the statistical and empirical joint entropies of the sensor outputs. We show that with space roughly equal to entropy, queries can be answered in time that is roughly logarithmic in entropy. These results represent the first solution to range searching problems over compressed kinetic sensor data and set the stage for future statistical analysis.

This is joint work with David Mount.
Read more

Fault-tolerant partial replication at large-scale

Marco Shapiro and Pierre Sutra LIP6
11 Jan 2010, 3:00 pm - 4:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Experience with many applications (e.g., web caching, P2P sharing, and grid and cloud computing) shows that data replication is a fundamental feature of large systems. Replication improves performance, availability and dependability.

Replication algorithms based on state machine replication are attractive because they maintain a simple sequential semantics. However, we believe they will not scale to massive cloud and peer-to-peer systems. To be successful, future algorithms must: (1) support multi-object transactions across distant data centres, ...
Experience with many applications (e.g., web caching, P2P sharing, and grid and cloud computing) shows that data replication is a fundamental feature of large systems. Replication improves performance, availability and dependability.

Replication algorithms based on state machine replication are attractive because they maintain a simple sequential semantics. However, we believe they will not scale to massive cloud and peer-to-peer systems. To be successful, future algorithms must: (1) support multi-object transactions across distant data centres, (2) leverage the semantics of data accesses, and (3) support partial replication.

In the first part of this talk, we describe two variants of Generalized Paxos, a solution to consensus that leverages the commutativity semantics. Our algorithms reduce message delay when a collision occurs between non-commuting operations. In the second part, we present a new approach to partial replication of database systems at large scale. Previous protocols either reexecute transactions entirely and/or compute a total order of transactions. In contrast, ours applies update values, and generate a partial order between mutually conflicting transactions only.
Read more