Events 2009

Reinventing The Desktop

Brad Chen Google Inc.
15 Dec 2009, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Saarbrücken building E1 4, room 007
SWS Colloquium
Desktop software, in the form of web browsers, browser features, and OS distributions, are a growing area of engineering activity at Google. This talk will give an overview of this work, looking in detail at Native Client as an example project in the space. Native Client is an open-source technology for running untrusted native code in web applications, with the goal of maintaining the browser neutrality, OS portability, and safety that people expect from web apps. ...
Desktop software, in the form of web browsers, browser features, and OS distributions, are a growing area of engineering activity at Google. This talk will give an overview of this work, looking in detail at Native Client as an example project in the space. Native Client is an open-source technology for running untrusted native code in web applications, with the goal of maintaining the browser neutrality, OS portability, and safety that people expect from web apps. It supports performance-oriented features generally absent from web application programming environments, such as thread support, instruction set extensions such as SSE, and use of compiler intrinsics and hand-coded assembler. We combine these properties in an open architecture designed to leverage existing web standards, and to encourage community review and 3rd-party tools. Overall, Google's desktop efforts seek to enable new Web applications, improve end-user experience, and enable a more flexible balance between client and server computing. Google has open sourced many of our desktop efforts, in part to encourage collaboration and independent innovation.
Read more

Characterization of an Online Social Aggregation Service

Anirban Mahanti NICTA
07 Dec 2009, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Many Web users have accounts with multiple different social networking services. This has prompted development of services that aggregate information available through various services. This talk will consider one such aggregation service, FriendFeed. The first part of the talk will consider questions such as what types of services users aggregate content from, the relative popularity of services, who follows the aggregated feeds, and why. The second part of the talk will focus on factors such as proximity, ...
Many Web users have accounts with multiple different social networking services. This has prompted development of services that aggregate information available through various services. This talk will consider one such aggregation service, FriendFeed. The first part of the talk will consider questions such as what types of services users aggregate content from, the relative popularity of services, who follows the aggregated feeds, and why. The second part of the talk will focus on factors such as proximity, common interests, time spent in the system, and combinations thereof, and their influence in link formation. Results based on data collected from FriendFeed between September 2008 and May 2009 will be presented. This talk is based on joint work with Martin Arlitt (HP Labs/University of Calgary), Niklas Carlsson (University of Calgary), Sanchit Garg (IIT Delhi), and Trinabh Gupta (IIT Delhi).
Read more

Improving the Privacy of Online Social Networks and Cloud Computing

Stefan Saroiu Microsoft Research
04 Dec 2009, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 5th floor
SWS Colloquium
This talk has two parts. In the first part, I will present Lockr, a system that improves the privacy of online social networks (OSNs). Lockr offers three significant privacy benefits to OSN users. First, it separates social networking content from all other functionality that OSNs provide. This decoupling puts users in control of their own social information: they decide which OSN providers should store it, which third parties should have access to it, or they can even choose to manage it themselves. ...
This talk has two parts. In the first part, I will present Lockr, a system that improves the privacy of online social networks (OSNs). Lockr offers three significant privacy benefits to OSN users. First, it separates social networking content from all other functionality that OSNs provide. This decoupling puts users in control of their own social information: they decide which OSN providers should store it, which third parties should have access to it, or they can even choose to manage it themselves. Second, Lockr ensures that digitally signed social relationships needed to access social data cannot be re-used by an OSN site for unintended purposes. This feature drastically reduces the value of the social content that users entrust to their OSN providers. Finally, Lockr enables message encryption using a social relationship key. This key lets two strangers with a common friend verify their relationship without exposing it to others, a common privacy threat when sharing data in a decentralized scenario.

In the second part, I will present our ongoing work in improving privacy when users run their code in the infrastructure. This is becoming an increasingly common scenario as cloud computing and mobile computing are becoming more popular. I will start by discussing these scenarios, after which I will present our current work in minimizing the attack surface exposed to malicious users and operators in VM-based cloud environments.

Lockr is joint work with Alec Wolman (MSR), Amin Tootoonchian, and Yashar Ganjali (U. of Toronto). For more information or to download our Lockr implementations for Flickr and for BitTorrent, please visit http://www.lockr.org. The second part is work-in-progress that is jointly done with Alec Wolman (MSR) and Shravan Rayanchu (U. of Wisconsin).
Read more

Sierra: a power-proportional, distributed storage system

Eno Thereska Microsoft Research
02 Dec 2009, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
I'll present the design, implementation, and evaluation of Sierra: a power-proportional, distributed storage system. I/O workloads in data centers show significant diurnal variation, with peak and trough periods. Sierra powers down storage servers during the troughs. The challenge is to ensure that data is available for reads and writes at all times, including power-down periods. Consistency and fault-tolerance of the data, as well as good performance, must also be maintained. Sierra achieves all these through a set of techniques including power-aware layout, ...
I'll present the design, implementation, and evaluation of Sierra: a power-proportional, distributed storage system. I/O workloads in data centers show significant diurnal variation, with peak and trough periods. Sierra powers down storage servers during the troughs. The challenge is to ensure that data is available for reads and writes at all times, including power-down periods. Consistency and fault-tolerance of the data, as well as good performance, must also be maintained. Sierra achieves all these through a set of techniques including power-aware layout, predictive gear scheduling, and a replicated short-term versioned store. Replaying live server traces from a large e-mail service (Hotmail) shows power savings of at least 23%, and analysis of load from a small enterprise shows that power savings of up to 60% are possible.
Read more

BorgCube: Rethinking the data center

Ant Rowstron Microsoft Research
30 Nov 2009, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
The BorgCube project explores how to combine ideas from high performance computing, networking and distributed systems to create future data centers. A BorgCube has a tight integration between servers, networking and services, achieved by not using any switches or routers. Instead, servers are directly connected to a small set of other servers. The BorgCube uses a k-ary 3-cube (or 3D Torus) physical topology, providing good failure resilience with reduced cost.

More importantly, we believe this also creates a platform on which it is easier to build the large-scale core distributed services, ...
The BorgCube project explores how to combine ideas from high performance computing, networking and distributed systems to create future data centers. A BorgCube has a tight integration between servers, networking and services, achieved by not using any switches or routers. Instead, servers are directly connected to a small set of other servers. The BorgCube uses a k-ary 3-cube (or 3D Torus) physical topology, providing good failure resilience with reduced cost.

More importantly, we believe this also creates a platform on which it is easier to build the large-scale core distributed services, e.g. GFS, BigTable and Dynamo, which underpin many of the applications that run in large-scale data centers. Each server has a coordinate creating a namespace, similar to the namespace provided by a structured overlay. However, unusually for an overlay, the physical network topology and virtual topology are the same. This namespace can be exploited by the services, and all functionality is implemented on top of a link-orientated low-level API. We have implemented many services on our prototype BorgCube, including a bootstrap service, several multi-hop routing protocols, and a service supporting unmodified TCP/IP applications allowing them to run on the BorgCube.

In this talk I will describe the ideas behind BorgCube, and explain some of our early experiences with writing services for the BorgCube. This is joint work with Paolo Costa and Greg O'Shea.

Read more

Differential attacks on PIN Processing APIs

Graham Steel ENS-Cachan
08 Oct 2009, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 5th floor
SWS Colloquium
International standards dictate that all processing of customer Personal Identification Numbers (PINs) in the international cash machine network must take place inside special tamper resistant Hardware Security Modules (HSMs). These HSMs have a restricted API designed so that even if an attacker is able to make arbitrary command calls to the device, no customer PINs can be obtained. However, in recent years, a number of attacks have been found on these APIs. Some of them are so-called differential attacks, ...
International standards dictate that all processing of customer Personal Identification Numbers (PINs) in the international cash machine network must take place inside special tamper resistant Hardware Security Modules (HSMs). These HSMs have a restricted API designed so that even if an attacker is able to make arbitrary command calls to the device, no customer PINs can be obtained. However, in recent years, a number of attacks have been found on these APIs. Some of them are so-called differential attacks, whereby an attacker makes repeated calls to the API with slightly different parameters, and from the pattern of error messages received, he is able to deduce the value of a PIN. In this talk, I will present some of these attacks, and talk about efforts to analyse them formally. This will include techniques for proving the absence of such attacks in patched APIs, and a practical proposal for improving the security of the network without making large-scale changes to the current infrastructure.
Read more

Bruce Maggs Duke University
18 Sep 2009, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 5th floor
SWS Colloquium
This talk shows how operators of Internet-scale distributed systems, such as Google, Microsoft, and Akamai can reduce electricity costs (but not necessarily energy consumption) by dynamically allocating work among data centers in response to fluctuating energy prices. The approach applies to systems consisting of fully replicated clusters of servers installed in diverse geographical locations where energy can be purchased through spot markets. Using historical energy prices for major energy markets in the United States, as well as usage data from Akamai's content delivery network, ...
This talk shows how operators of Internet-scale distributed systems, such as Google, Microsoft, and Akamai can reduce electricity costs (but not necessarily energy consumption) by dynamically allocating work among data centers in response to fluctuating energy prices. The approach applies to systems consisting of fully replicated clusters of servers installed in diverse geographical locations where energy can be purchased through spot markets. Using historical energy prices for major energy markets in the United States, as well as usage data from Akamai's content delivery network, we should how much can be saved now, and what might be saved in the future given server technology trends.

Joint work with Asfandyar Quershi, Rick Weber, Hari Balakrishnan, and John Guttag.
Read more

A complete characterization of observational equivalence in polymorphic lambda-Calculus with general references

Eijiro Sumii Tohoku University, Sendai, Japan
10 Sep 2009, 2:00 pm - 3:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Saarbrücken building E1 4, room 019
SWS Colloquium
We give the first sound and complete proof method for observational equivalence in full polymorphic lambda-calculus with existential types and first-class, higher-order references. Our method is syntactic and elementary in the sense that it only employs simple structures such as relations on terms. It is nevertheless powerful enough to prove many interesting equivalences that can and cannot be proved by previous approaches such as logical relations.

*Looking over the evolution of Internet workloads*

Virgilio Almeida Federal University of Minas Gerais, Brazil
04 Sep 2009, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
The Internet has a number of popular applications and services experiencing workloads with very different, non-trivial and unique properties. The emergence of new applications and computing models (e.g., online social networking, streaming video, games and cloud computing), and the explosive growth in popularity of others (e.g., search, peer-to-peer, e-business, malware), most of which with workloads with not fully understood fundamental properties, make this a research topic of timely relevance. Real workload characterization and modeling provide key insights into the cost-effective design, ...
The Internet has a number of popular applications and services experiencing workloads with very different, non-trivial and unique properties. The emergence of new applications and computing models (e.g., online social networking, streaming video, games and cloud computing), and the explosive growth in popularity of others (e.g., search, peer-to-peer, e-business, malware), most of which with workloads with not fully understood fundamental properties, make this a research topic of timely relevance. Real workload characterization and modeling provide key insights into the cost-effective design, operation, and management of Internet-based services and systems. This talk looks over the evolution of Internet workloads, by presenting an overview of a variety of real Internet workloads and how they have evolved through the years; from system workload to social workload. It shows the main properties of these workloads and discusses the invariants across different types of workloads. It outlines methodologies and techniques used in workload characterization and modeling. Constructing a model involves tradeoffs between usefulness and accuracy. The talk shows how characterization techniques have been used to capture the most relevant aspects of Internet workloads while keeping the model as simple as possible. The talk concludes showing some examples of how workload models have been used to design efficient Web systems and services.

Read more

Ratul Mahajan Microsoft Research, Redmond
13 Jul 2009, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium


Through extensive measurements of wireless connectivity from moving vehicles, I find that packet loss creates significant performance issues for interactive applications. This poor performance exists for both WLAN technologies (e.g., WiFi) and WWAN technologies (e.g., 3G and WiMax). Unlike wired networks, in wireless networks priority-based queuing is not sufficient to reduce packet loss for loss-sensitive applications. I propose that losses should instead be masked through aggressive but controlled use of available redundancy, and I describe two such systems. ...


Through extensive measurements of wireless connectivity from moving vehicles, I find that packet loss creates significant performance issues for interactive applications. This poor performance exists for both WLAN technologies (e.g., WiFi) and WWAN technologies (e.g., 3G and WiMax). Unlike wired networks, in wireless networks priority-based queuing is not sufficient to reduce packet loss for loss-sensitive applications. I propose that losses should instead be masked through aggressive but controlled use of available redundancy, and I describe two such systems. The first system, called ViFi, targets the use of WiFi from moving vehicles. Current WiFi handoff methods, in which clients communicate with one base station at a time, lead to frequent disruptions in connectivity. ViFi leverages the presence of redundant in-range base stations to reduce disruptions and improve application performance. The second system, called PluriBus, targets the use of 3G or WiMax from moving vehicles. PluriBus leverages the spare capacity in the wireless channel using a novel erasure coding method. In my experiments, each system improves the performance of interactive applications by at least a factor of 2.

Read more

Rethinking Storage Space Management in High-Performance Computing Centers

Ali R. But Virginia Tech
08 Jun 2009, 4:00 pm - 5:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Modern scientific applications, such as computer models for analyzing data from particle colliders or space observatories, process data that is exponentially growing in size. High-performance computing (HPC) centers that support such applications are now faced with a data deluge, which can no longer be managed using ad hoc approaches in use today. Consequently, a fundamental reevaluation of the data management tools and techniques is required. In this talk, I will describe a fresh approach to HPC storage space management, ...
Modern scientific applications, such as computer models for analyzing data from particle colliders or space observatories, process data that is exponentially growing in size. High-performance computing (HPC) centers that support such applications are now faced with a data deluge, which can no longer be managed using ad hoc approaches in use today. Consequently, a fundamental reevaluation of the data management tools and techniques is required. In this talk, I will describe a fresh approach to HPC storage space management, especially for the center scratch space --- a high speed storage used for servicing currently running and soon to run applications --- which effectively treats the storage as a tiered cache and provide comprehensive integrated storage management. I will discuss how the caching model is achieved, and how its mechanisms are supported through just-in-time staging and timely offloading of data. Finally, I will show how this approach can also mitigate the effects of center storage failures. The overall goal is to improve HPC center serviceability and resource utilization.
Read more

Online Social Networks and Applications: a Measurement Perspective

Ben Y. Zhao UC Santa Barbara
28 May 2009, 1:00 pm - 2:00 pm
Saarbrücken building E1 5, room 5th floor
SWS Colloquium
With more than half a billion users worldwide, online social networks such as Facebook are popular platforms for interaction, communication and collaboration between friends. Researchers have recently proposed an emerging class of Internet applications that integrate relationships from social networks to improve security and performance. But can these applications be effective in real life? And if so, how can we predict their effectiveness when they are deployed on real social networks?

In this talk, ...
With more than half a billion users worldwide, online social networks such as Facebook are popular platforms for interaction, communication and collaboration between friends. Researchers have recently proposed an emerging class of Internet applications that integrate relationships from social networks to improve security and performance. But can these applications be effective in real life? And if so, how can we predict their effectiveness when they are deployed on real social networks?

In this talk, we will describe recent research that tries to answer these questions using measurement-based studies of online social networks and applications. Using measurements of a socially-enhanced web auction site, we show how social networks can actually reduce fraud in online transactions. We then discuss the evaluation of social network applications, and argue that existing methods using social graphs can produce to misleading results. We use results from a large-scale study of the Facebook network to show that social graphs are insufficient models of user activity, and propose the use of "interaction graphs" as a more accurate model. We construct interaction graphs from our Facebook datasets, and use both types of graphs to validate two well-known social-based applications (Reliable Email and SybilGuard). Our results reveal new insights into both systems and confirm our hypothesis that choosing the right graph model significantly impacts predictions of application performance.

Read more

Self-Stabilizing Autonomic Recoverers

Olga Brukman Ben-Gurion University
20 May 2009, 3:00 pm - 4:30 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Saarbrücken building E1 4, room 019
SWS Colloquium
In this talk I will cover research conducted towards my PhD dissertation. This dissertation introduces theoretical foundations for system architectures and  algorithms for creating truly robust autonomic systems -- systems that are able to recover automatically from unexpected failures. Our approaches complement each other starting with the case of given black-box systems, continuing with the process of  developing  new systems, and concluding with  the case of automatic creation of recovery-oriented software.

In the first part we consider software packages to be black boxes. ...
In this talk I will cover research conducted towards my PhD dissertation. This dissertation introduces theoretical foundations for system architectures and  algorithms for creating truly robust autonomic systems -- systems that are able to recover automatically from unexpected failures. Our approaches complement each other starting with the case of given black-box systems, continuing with the process of  developing  new systems, and concluding with  the case of automatic creation of recovery-oriented software.

In the first part we consider software packages to be black boxes. We propose modeling software package flaws (bugs) by assuming eventual Byzantine behavior of the package. A general, yet practical, framework and paradigm for the monitoring and recovery of systems called autonomic recoverer is proposed. The framework receives task specific requirements in the form of safety and liveness predicates and recovery actions. The autonomic recoverer uses a new scheme for liveness assurance via on-line monitoring that complements known schemes for on-line safety assurance.

In the second part we consider a software package to be a transparent box and introduce the recovery oriented programming: programs will include important safety and liveness properties and recovery actions as an integral part of the program. We design a pre-compiler that produces augmented code for monitoring the properties and executing the recovery actions upon a property violation. Assuming the restartability property of a given program, the resulting code is able to overcome safety and liveness violations. We provide a correctness proof scheme for proving that the code produced by the pre-compiler from the program code combined with the important properties and recovery actions fulfills its specifications when started in an arbitrary state.

Finally, in the third part we consider a highly dynamic environment, which typically implies that there are no realizable specifications for the environment, i.e., there does not exist a program that respects the specifications for every given environment.  In such cases the predefined recovery action may not suffice and a dramatic change in the program is required. We suggest to search for a program in run time by trying  all possible programs on plant replicas in parallel, where the plant is the relevant part of the environment.  We present control search algorithms for various settings plant state settings (reflection and ability to set plant to a certain state).
Read more

Traveling to Rome: a retrospective on the journey

John Wilkes Google, Mountain View California
15 May 2009, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room Wartburg, 5th floor
SWS Colloquium
Starting in 1994/5, the Storage Systems Program at HP Labs embarked on a decade-long journey to automate the management of enterprise storage systems by means of a technique we initially called attribute-managed storage. The key idea was to provide declarative specifications of workloads and their needs, and of storage devices and their capabilities, and to automate the mapping of one to the other. One of many outcomes of the project was a specification language we called Rome - hence the title of this talk, ...
Starting in 1994/5, the Storage Systems Program at HP Labs embarked on a decade-long journey to automate the management of enterprise storage systems by means of a technique we initially called attribute-managed storage. The key idea was to provide declarative specifications of workloads and their needs, and of storage devices and their capabilities, and to automate the mapping of one to the other. One of many outcomes of the project was a specification language we called Rome - hence the title of this talk, which offers a retrospective on the overall approach and some of the lessons we learned along the way.
Read more

Balachander Krishnamurthy AT&T Labs - Research
15 May 2009, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room Wartburg, 5th floor
SWS Colloquium
For the last few years we have been examining the leakage of privacy on the Internet from one specific angle: how information related to individual users is aggregated as they browse seemingly unrelated Web sites. Thousands of Web sites across numerous categories, countries, and languages were studied to generate a privacy "footprint". This talk reports on our longitudinal study consisting of multiple snapshots of our examination of such diffusion over four years. We examine the various technical ways by which third-party aggregators acquire data and the depth of user-related information acquired. ...
For the last few years we have been examining the leakage of privacy on the Internet from one specific angle: how information related to individual users is aggregated as they browse seemingly unrelated Web sites. Thousands of Web sites across numerous categories, countries, and languages were studied to generate a privacy "footprint". This talk reports on our longitudinal study consisting of multiple snapshots of our examination of such diffusion over four years. We examine the various technical ways by which third-party aggregators acquire data and the depth of user-related information acquired. We study techniques for protecting privacy diffusion as well as limitations of such techniques. We introduce the concept of secondary privacy damage.

Our results show increasing aggregation of user-related data by a steadily decreasing number of entities. A handful of companies are able to track users' movement across almost all of the popular Web sites. Virtually all the protection techniques have significant limitations highlighting the seriousness of the problem and the need for alternate solutions.

I will also talk about a recent discovery of large-scale leakage of personally identifiable information (PII) via Online Social Networks (OSN). Third-parties can link PII with user actions both within OSN sites and elsewhere on non-OSN sites.

Read more

Formal Verification of Realistic Compilers

Zaynah Dargaye ENSIIE, Evry
13 May 2009, 3:00 pm - 4:30 pm
Saarbrücken building E1 5, room Wartburg, 5th floor
simultaneous videocast to Saarbrücken building E1 4, room 019
SWS Colloquium
It is generally expected that the compiler is semantically transparent, that is, produces executable code that behaves as prescribed by the semantics of the source code. However, compilers -- and especially optimizing compilers, which attempt to increase the efficiency of generated code through program analyses and transformations -- are highly complex software systems that perform delicate symbolic computations. Bugs in the compiler are therefore unavoidable : they can cause wrong code to be produced from a correct source code

The CompCert project investigates the formal verification of realistic compilers usable for critical embedded software.  ...
It is generally expected that the compiler is semantically transparent, that is, produces executable code that behaves as prescribed by the semantics of the source code. However, compilers -- and especially optimizing compilers, which attempt to increase the efficiency of generated code through program analyses and transformations -- are highly complex software systems that perform delicate symbolic computations. Bugs in the compiler are therefore unavoidable : they can cause wrong code to be produced from a correct source code

The CompCert project investigates the formal verification of realistic compilers usable for critical embedded software.  Such verified compilers come with a mathematical, machine-checked proof that the generated executable code behaves exactly as prescribed by the semantics of the source program.  By ruling out the possibility of compiler-introduced bugs, verified compilers strengthen the guarantees that can be obtained by applying formal methods to source programs.

This talk describes the formal verification of compilers and focuses on two experimentation : the C CompCert compiler and the MLCompCert compiler. MLCompCert is the mechanically verified compiler for the purely functional fragment of ML that I have developed and verified in Coq in my Ph. D. thesis.
Read more

Fingerprinting performance crises in the datacenter

Moises Goldszmidt Microsoft Research, Silicon Valley
11 May 2009, 4:00 pm - 5:00 pm
Saarbrücken building E1 5, room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
We propose a method for significantly reducing troubleshooting and diagnosis time in the datacenter by automatically generating fingerprints of performance crises, enabling fast classification and recognition of recurring instances. We evaluated the approach on data from a production datacenter with hundreds of machines running a 24x7 enterprise-class user-facing application, verifying each identification result with the operators of the datacenter (and the corresponding troubleshooting tickets). The approach has 80% identification accuracy in the operations-online setting with time to identification below 15 minutes (on average) after the start of the crises (operators stipulated a deadline of 60 minutes). ...
We propose a method for significantly reducing troubleshooting and diagnosis time in the datacenter by automatically generating fingerprints of performance crises, enabling fast classification and recognition of recurring instances. We evaluated the approach on data from a production datacenter with hundreds of machines running a 24x7 enterprise-class user-facing application, verifying each identification result with the operators of the datacenter (and the corresponding troubleshooting tickets). The approach has 80% identification accuracy in the operations-online setting with time to identification below 15 minutes (on average) after the start of the crises (operators stipulated a deadline of 60 minutes). In an offline setting, where some parameters can be fitted optimally, the accuracy is on the 95%-98% range. After explaining the fingerprinting method and the results, I will end the talk with a discussion on the possibility of predicting the crises, and on extending this work to model the operator's repair actions for learning models of automated decision making. Joint work with Peter Bodik and Armando Fox from UC Berkeley, and Hans Andersen from Microsoft.
Read more

JavaScript Isolation and Web Security

John C. Mitchell Stanford University
04 May 2009, 4:00 pm - 5:00 pm
Saarbrücken building E1 4, room 019
SWS Colloquium
Web sites that incorporate untrusted content may use browser-or language-based methods to keep such content from maliciously altering pages, stealing sensitive information, or causing other harm. We use accepted methods from the study of programming languages to investigate language-based methods for filtering and rewriting JavaScript code, using Facebook's FBJS as a motivating example.

We explain the core problems by describing previously unknown vulnerabilities and shortcomings, provide JavaScript code that enforces provable isolation properties at run-time, ...
Web sites that incorporate untrusted content may use browser-or language-based methods to keep such content from maliciously altering pages, stealing sensitive information, or causing other harm. We use accepted methods from the study of programming languages to investigate language-based methods for filtering and rewriting JavaScript code, using Facebook's FBJS as a motivating example.

We explain the core problems by describing previously unknown vulnerabilities and shortcomings, provide JavaScript code that enforces provable isolation properties at run-time, and develop a foundation for improved solutions based on an operational semantics of the full ECMA262 language. We also compare our results with the techniques used in FBJS.

Joint work with Sergio Maffeis and Ankur Taly
Read more

Theory Plus Practice in Computer Security : Radio FrequencyIdentification and Whitebox Fuzzing

David Molnar Univeristy of California at Berkeley
29 Apr 2009, 4:00 pm - 5:00 pm
Saarbrücken building E1 4, room 019
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
I will describe two areas in computer security that demonstrate the wide range of techniques, from both theory and practice, we need to make impact. First, I treat privacy and security in Radio Frequency Identification (RFID). RFID refers to a range of technologies where a small device with an antenna, or "tag" is attached to an item and can be queried later wirelessly by a reader. While proponents of RFID promise security and efficiency benefits, the technology also raises serious security concerns. ...
I will describe two areas in computer security that demonstrate the wide range of techniques, from both theory and practice, we need to make impact. First, I treat privacy and security in Radio Frequency Identification (RFID). RFID refers to a range of technologies where a small device with an antenna, or "tag" is attached to an item and can be queried later wirelessly by a reader. While proponents of RFID promise security and efficiency benefits, the technology also raises serious security concerns. I will describe my work on practical security analysis of RFID in library books and the United States e-passport deployments. These deployments in turn uncover a new theoretical problem, that of "scalable private authentication." I will describe the first solution to this problem that scales sub-linearly in the number of RFID tags.

Second, I describe recent work in "whitebox fuzz testing," a new approach to finding security bugs. Security bugs cost millions of dollars to patch after the fact, so we want to find and fix them as early in the deployment cycle as possible. I review previous fuzz testing work, how fuzzing has been responsible for serious security bugs, and classic fuzz testing's inability to deal with "unlikely" code paths. I then show how marrying the idea of dynamic test generation with fuzz testing overcomes these shortcomings, but raises significant scaling problems. Two recent tools, SAGE at Microsoft Research, and SmartFuzz at Berkeley, overcome these scaling problems; I present results on the effectiveness of these tools on commodity Windows and Linux media playing software. Finally, I close with directions for leveraging cloud computing to improve developers' testing and debugging experience.

The talk describes joint work with Ari Juels and David Wagner (RFID), and with Patrice Godefroid, Michael Y. Levin, and Xue Cong Li and David Wagner (Whitebox Fuzzing).
Read more

Sachin Katti University of California at Berkeley
27 Apr 2009, 4:00 pm - 5:00 pm
Kaiserslautern building G26, room 204
simultaneous videocast to Saarbrücken building E1 5, room Wartburg OG 5
SWS Colloquium
Wireless is becoming the preferred mode of network access. The performance of wireless networks in practice, however, is hampered due to the harsh characteristics of the wireless medium: its shared broadcast nature, interference, and high error rate. Traditionally, network designers have viewed these characteristics as problematic, and tried to work around them. In this talk, I will show how we can turn these challenges into opportunities that we exploit to significantly improve performance.

To do so, ...
Wireless is becoming the preferred mode of network access. The performance of wireless networks in practice, however, is hampered due to the harsh characteristics of the wireless medium: its shared broadcast nature, interference, and high error rate. Traditionally, network designers have viewed these characteristics as problematic, and tried to work around them. In this talk, I will show how we can turn these challenges into opportunities that we exploit to significantly improve performance.

To do so, we use a simple yet fundamental shift in network design. We allow routers to "mix" (i.e., code) packets' content before forwarding them. We built three systems, COPE, ANC and MIXIT, which exploit this network coding functionality via novel algorithms to provide large practical gains. In this talk, I will discuss COPE and ANC; COPE exploits wireless broadcast, while ANC exploits strategic interference to improve throughput.

This work bridges and contributes to two unrelated areas: network coding and wireless mesh network design. It lays down the algorithmic framework for using network coding in modern wireless networks, by designing algorithms which work with the common case of unicast flows in dynamic and unknown environments. It also provides the first implementation, deployment and experimental evaluation of network coding. For wireless mesh networks, it shows how the framework of network coding allows us to profitably harness the inherent wireless characteristics. This union ultimately allows us to deliver a several-fold increase in wireless throughput.
Read more

Neelakantan R. Krishnaswami CMU Computer Science Department at Pittsburgh
20 Apr 2009, 1:30 pm - 2:30 pm
Saarbrücken building E1 5, room Wartburg OG 5
simultaneous videocast to Saarbrücken building E1 4, room 019
SWS Colloquium
O'Hearn and Reynolds' separation logic has proven to be a very successful attempt at taming many of the difficulties associated with reasoning about aliased, mutable data structures. Using it, researchers have given correctness proofs of even quite intricate low-level imperative programs such as garbage collectors and device drivers.

However, high level languages such as ML and Haskell also give programmers access to mutable, aliased data, and when those features are used, programmers are still prone to all the troubles state is heir to. ...
O'Hearn and Reynolds' separation logic has proven to be a very successful attempt at taming many of the difficulties associated with reasoning about aliased, mutable data structures. Using it, researchers have given correctness proofs of even quite intricate low-level imperative programs such as garbage collectors and device drivers.

However, high level languages such as ML and Haskell also give programmers access to mutable, aliased data, and when those features are used, programmers are still prone to all the troubles state is heir to. In fact, many problems become more complex, since these languages encourage the use of an abstract, higher-order style, and support the design of libraries that rely on higher-order functions as well as callbacks (ie, references to functions in the heap).

In this talk, I'll describe work I've done (in collaboration with my PhD supervisors) designing a version of separation logic suitable for use in languages such as ML, and describe an application of this logic to formally verifying the correctness of a small library for writing event-driven programs in a lazy dataflow style. This then allows an efficient imperative implementation of a functional reactive programming library.
Read more

Self - Ajusting Computation

Umut A. Acar Toyota Technological Institute at Chicago
16 Apr 2009, 4:00 pm - 5:00 pm
Saarbrücken building E1 4, room 019
simultaneous videocast to Kaiserslautern building G26, room 204/206
SWS Colloquium
Many application domains require computations to interact with data sets that change slowly or incrementally over time. For example, software systems that interact with the physically changing world, e.g., databases, graphical systems, robotic software systems, program-development environments, scientific-computing applications, must respond efficiently and correctly to changes as they occur in the world. Since incremental modifications to data are often small, they can be processed asymptotically faster than re-computing from scratch, often generating orders of magnitude speedups in practice. ...
Many application domains require computations to interact with data sets that change slowly or incrementally over time. For example, software systems that interact with the physically changing world, e.g., databases, graphical systems, robotic software systems, program-development environments, scientific-computing applications, must respond efficiently and correctly to changes as they occur in the world. Since incremental modifications to data are often small, they can be processed asymptotically faster than re-computing from scratch, often generating orders of magnitude speedups in practice. Realizing this potential using traditional techniques, however, often requires complex algorithmic and software design and implementation, ultimately limiting the range of problems that can effectively be addressed in practice.

In this talk, I present an overview of advances on self-adjusting computation: an approach to developing software systems that interact with changing data. I start by presenting the principle ideas behind a mechanism for propagating a change through a computation, and then describe the design of programming-language constructs for enabling computations to respond automatically and efficiently to modifications to their data. I show that these language constructs are realistic by describing how to extend existing languages with them and how to compile the extended languages into provably efficient executables, whose performance properties can be analyzed via cost semantics. To evaluate the effectiveness of the approach, I consider a number of benchmarks as well as more sophisticated applications from diverse areas such as computational geometry, scientific computing, machine learning, and computational biology. Our results show that self-adjusting computation can be broadly effective in achieving efficient implementations, solving open problems, and pointing to new research directions. In practice, our measurements show that self-adjusting programs often respond to incremental modifications by a linear factor faster than recomputing from scratch, resulting in orders of magnitude speedups.
Read more

Programming with Hoare Type Theory

Aleksandar Nanevski Microsoft Research, Cambridge, UK
14 Apr 2009, 4:00 pm - 5:00 pm
Saarbrücken building E1 4, room 019
simultaneous videocast to Kaiserslautern building G26, room 204/206
SWS Colloquium
Two main properties make type systems an effective and scalable formal method. First, important classes of programming errors are eliminated by statically enforcing the correct use of values. Second, types facilitate modular software development by serving as specifications of program components, while hiding the component's actual implementation. Implementations with the same type can be interchanged, thus facilitating software reuse and evolution.

Mainstream type systems focus on specifying relatively simple properties that admit type inference and checking with little or no input from the programmer. ...
Two main properties make type systems an effective and scalable formal method. First, important classes of programming errors are eliminated by statically enforcing the correct use of values. Second, types facilitate modular software development by serving as specifications of program components, while hiding the component's actual implementation. Implementations with the same type can be interchanged, thus facilitating software reuse and evolution.

Mainstream type systems focus on specifying relatively simple properties that admit type inference and checking with little or no input from the programmer. Unfortunately, this leaves a number of properties, including data structure invariants and API protocols outside of their reach, and also restricts the practical programming features that can be safely supported. For example, most simply-typed languages cannot safely allow low-level operations such as pointer arithmetic or explicit memory management.

In this talk, I will describe Hoare Type Theory (HTT) which combines dependent types of a system like Coq with features for specification and verification of low-level stateful operations in the style of Hoare and Separation Logic.

Such a combination is desirable for several reasons. On the type-theoretic side, it makes it possible to integrate stateful behaviour into dependent type theories that have so far been purely functional. On the Hoare Logic side, it makes is possible to use the higher-order data abstraction and information hiding mechanisms of type theory, which are essential for scaling the verification effort.

I will discuss the implementation of HTT, verification of various examples that I have carried out, as well as the possibilities for extending HTT to support further programming features such as concurrency.
Read more

Functional Programming Perspectives on Concurrency and Parallelis

Matthew Fluet Toyota Technological Institute at Chicago
23 Mar 2009, 4:00 pm - 5:00 pm
Saarbrücken building E1 4, room 019
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
The trend in microprocessor design toward multicore processors has sparked renewed interest in programming languages and language features for harnessing concurrency and parallelism in commodity applications. Past research efforts demonstrated that functional programming provides a good semantic base for concurrent- and parallel-language designs, but slowed in the absence of widely available multiprocessor machines. I will describe new functional programming approaches towards concurrency and parallelism, grounded in more recent functional programming research.

To frame the discussion, ...
The trend in microprocessor design toward multicore processors has sparked renewed interest in programming languages and language features for harnessing concurrency and parallelism in commodity applications. Past research efforts demonstrated that functional programming provides a good semantic base for concurrent- and parallel-language designs, but slowed in the absence of widely available multiprocessor machines. I will describe new functional programming approaches towards concurrency and parallelism, grounded in more recent functional programming research.

To frame the discussion, I will introduce the Manticore project, an effort to design and implement a new functional language for parallel programming. Unlike some earlier parallel language proposals, Manticore is a heterogenous language that supports parallelism at multiple levels. In this talk, I will describe a number of Manticore's notable features, including implicitly-parallel programming constructs (inspired by common functional programming idioms) and a flexible runtime model that supports multiple scheduling disciplines. I will also take a deeper and more technical look at transactional events, a novel concurrency abstraction that combines first-class synchronous message-passing events with all-or-nothing transactions. This combination enables elegant solutions to interesting problems in concurrent programming. Transactional events have a rich compositional structure, inspired by the use of monads for describing effectful computations in functional programming. I will conclude with future research directions in the Manticore project, aiming to combine static and dynamic information for the implementation and optimization of parallel constructs.
Read more

Serge Egelman Carnegie Mellon
19 Mar 2009, 4:00 pm - 5:00 pm
Saarbrücken building E1 5, room 019
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
In a world where making an incorrect online trust decision can mean the difference between checking your account balance and transferring it to criminals, Internet users need effective security warnings to help them identify risky situations. In a perfect world, software could automatically detect all security threats and then block access to high risk websites. Because there are many threats that we cannot detect with 100% accuracy and false positives are all too frequent, web browser vendors generally opt to warn users about security threats. ...
In a world where making an incorrect online trust decision can mean the difference between checking your account balance and transferring it to criminals, Internet users need effective security warnings to help them identify risky situations. In a perfect world, software could automatically detect all security threats and then block access to high risk websites. Because there are many threats that we cannot detect with 100% accuracy and false positives are all too frequent, web browser vendors generally opt to warn users about security threats. In this talk I cover the common pitfalls of web browser security warnings and draw parallels with literature in the warning sciences. I describe the results of two laboratory phishing studies I performed in order to examine users' mental models, risk perceptions, and comprehension of current security warnings. Finally, I show how I used these findings to iteratively design and test a more usable SSL warning that clearly conveys risk and uses context to minimize habituation effects.

Read more

Logical Relations: A Step Towards More Secure and Reliable Software

Amal Ahmed Toyota Technological Institute Chicago
10 Mar 2009, 4:00 pm - 5:00 pm
Saarbrücken building E1 4, room 019
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Logical relations are a powerful proof technique for establishing many important properties of programs, programming languages, and language implementations. In particular, they provide a convenient method for proving behavioral equivalence of programs. Hence, they may be used to show that one implementation of an abstract data type (ADT) can be replaced by another without affecting the behavior of the rest of the program; to prove that security-typed languages satisfy noninterference, which requires that confidential data not affect the publicly observable behavior of a program; ...
Logical relations are a powerful proof technique for establishing many important properties of programs, programming languages, and language implementations. In particular, they provide a convenient method for proving behavioral equivalence of programs. Hence, they may be used to show that one implementation of an abstract data type (ADT) can be replaced by another without affecting the behavior of the rest of the program; to prove that security-typed languages satisfy noninterference, which requires that confidential data not affect the publicly observable behavior of a program; and to establish the correctness of compiler optimizations.

Yet, despite three decades of research and much agreement about their potential benefits, logical relations are still primarily used to reason about languages that are not even Turing complete. The method has not scaled well to many features found in practical programming languages: support for recursive types (lists, objects) and mutable memory (as in Java or ML) requires sophisticated mathematical machinery (e.g., domain theory, category theory), which makes the resulting logical relations cumbersome to use and hard to extend. Mutable memory is particularly problematic, especially in combination with features like generics (parametric polymorphism) and ADTs.

In this talk, I will describe *step-indexed* logical relations which support all of the language features mentioned above and yet permit simple proofs based on operational reasoning, without the need for complicated mathematics. To illustrate the effectiveness of step-indexed logical relations, I will discuss three applications. The first is a secure multi-language system where we show that code written in different languages may interoperate without sacrificing the abstraction guarantees provided by each language in isolation. The second is imperative self-adjusting computation, a system for efficiently updating the results of a computation in response to changes to some of its inputs; we show that our update mechanism is consistent with a from-scratch run. The third is security-preserving compilation, which ensures that compiled code is no more vulnerable to attacks launched at the level of the target language than the original code is to attacks launched at the level of the source language; we show that the typed closure conversion phase of a compiler has this property.
Read more

Bruce Allen Max-Planck-Institut fuer Gravitationsphyik
25 Feb 2009, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room conf.room 5th floor
simultaneous videocast to Kaiserslautern building G26, room 204/206
SWS Colloquium
My research group at the MPI for Gravitational Physics, Hannover, operates a large computer cluster used for data analysis. In principle, the most cost-effective and highest-bandwidth data storage available is the disks local to the compute nodes. In the case of Atlas, there are 1680 disks with an aggregate capacity of 840 TB. With a suitable file system the array of these disks could form a highly reliable storage system, however at the moment there appears to be no open-source distributed file system with the necessary RAID-like properties. ...
My research group at the MPI for Gravitational Physics, Hannover, operates a large computer cluster used for data analysis. In principle, the most cost-effective and highest-bandwidth data storage available is the disks local to the compute nodes. In the case of Atlas, there are 1680 disks with an aggregate capacity of 840 TB. With a suitable file system the array of these disks could form a highly reliable storage system, however at the moment there appears to be no open-source distributed file system with the necessary RAID-like properties. In this talk I present a list and description of the properties that such a file-system should have, and arguments to support the case that it should be possible to achieve this in the real world. If such a file system were available, I believe that thousands of large computer clusters around the world would employ and benefit from it.
Read more

Stochastic Games in Synthesis and Verification

Krishnendu Chatterjee Univeristy of California at Berkeley
24 Feb 2009, 4:00 pm - 5:00 pm
Saarbrücken building E1 4, room 019
simultaneous videocast to Kaiserslautern building G26, room 206
SWS Colloquium
Dynamic games played on game graphs with winning conditions specified as automata provide the theoretical framework for the study of controller synthesis and many problems related to formal verification. Besides synthesis and verification, these games have been used in several other contexts such as checking interface compatibility, modular reasoning, checking receptiveness. In this talk we first present different game models, that are suited to different applications, and the canonical winning conditions that can be specified as automata. ...
Dynamic games played on game graphs with winning conditions specified as automata provide the theoretical framework for the study of controller synthesis and many problems related to formal verification. Besides synthesis and verification, these games have been used in several other contexts such as checking interface compatibility, modular reasoning, checking receptiveness. In this talk we first present different game models, that are suited to different applications, and the canonical winning conditions that can be specified as automata. We first consider the strictly competitive (zero-sum) game formulation that is appropriate for controller synthesis. We present a brief overview of the field, summarizing the classical results, and then present our results that significantly improve the complexity for several classes of games. We also present practical algorithms for analysis of several classes of such games.

We then consider the problem of multi-process verification and argue that the zero-sum formulation is too strong for multi-process verification. This is because the environment for a process is typically other processes with their own specifications. On the other hand, the notion of Nash equilibria, that captures the notion of rational behavior in absence of external criteria, is too weak for multi-process verification. We will present a new notion of equilibrium, which we call secure equilibrium. We will show how the new notion of equilibrium is more appropriate for multi-process verification, discuss the existence and computation of such equilibrium for graph games.
Read more

Robots, Molecules and Physical Computing

Lydia E. Kavraki Rice University
02 Feb 2009, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 019
simultaneous videocast to Kaiserslautern building G26, room 204/206
SWS Distinguished Lecture Series
The field of computing is increasingly expected to solve complex geometric

problems arising in the physical world. Such problems can be found in applications ranging from robotics planning for industrial automation to molecular modeling for studying biological processes. This talk will first describe the development of a set of algorithmic tools for robot motion planning which are often grouped under the name sampling-based algorithms.

Emphasis will be placed on recent results for systems with increased physical realism and complex dynamics. ...
The field of computing is increasingly expected to solve complex geometric

problems arising in the physical world. Such problems can be found in applications ranging from robotics planning for industrial automation to molecular modeling for studying biological processes. This talk will first describe the development of a set of algorithmic tools for robot motion planning which are often grouped under the name sampling-based algorithms.

Emphasis will be placed on recent results for systems with increased physical realism and complex dynamics. The talk will then discuss how the experience gained through sampling-based methods in robotics has led to algorithms for characterizing the flexibility of biomolecules for drug discovery. A new trend in Computer Science is presented in this talk. It concerns the development of algorithmic frameworks for addressing complex high-dimensional geometric problems arising, at different scales, in the physical world. The challenges of physical computing will be highlighted as well as the opportunities to impact molecular biology and medicine.
Read more