Events 2024

Expanding the Horizons of Finite-Precision Analysis

Debasmita Lohar Max Planck Institute for Software Systems
27 Mar 2024, 2:45 pm - 3:45 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
Finite-precision programs, prevalent in embedded systems, scientific computing, and machine learning, inherently introduce numerical uncertainties stemming from noises in the inputs and finite-precision errors. Furthermore, implementing these programs on hardware necessitates a trade-off between accuracy and efficiency. Therefore, it is crucial to ensure that numerical uncertainties remain acceptably small and to optimize implementations for accurate results tailored to specific applications. Existing analysis and optimization techniques for finite-precision programs face challenges in scalability and applicability to real-world scenarios. ...
Finite-precision programs, prevalent in embedded systems, scientific computing, and machine learning, inherently introduce numerical uncertainties stemming from noises in the inputs and finite-precision errors. Furthermore, implementing these programs on hardware necessitates a trade-off between accuracy and efficiency. Therefore, it is crucial to ensure that numerical uncertainties remain acceptably small and to optimize implementations for accurate results tailored to specific applications. Existing analysis and optimization techniques for finite-precision programs face challenges in scalability and applicability to real-world scenarios. In this work, we expand the individual capabilities of these techniques by capturing the impact of uncertain inputs on discrete decisions and roundoff errors, by scaling floating-point verification for larger programs, and by specializing optimization for feed-forward deep neural networks.
Read more

Digital Safety and Security for Survivors of Technology-Mediated Harms

Emily Tseng Cornell University
11 Mar 2024, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 002
CIS@MPG Colloquium
Platforms, devices, and algorithms are increasingly weaponized to control and harass the most vulnerable among us. Some of these harms occur at the individual and interpersonal level: for example, abusers in intimate partner violence (IPV) use smartphones and social media to surveil and stalk their victims. Others are more subtle, at the level of social structure: for example, in organizations, workplace technologies can inadvertently scaffold exploitative labor practices. This talk will discuss my research (1) investigating these harms via online measurement studies, ...
Platforms, devices, and algorithms are increasingly weaponized to control and harass the most vulnerable among us. Some of these harms occur at the individual and interpersonal level: for example, abusers in intimate partner violence (IPV) use smartphones and social media to surveil and stalk their victims. Others are more subtle, at the level of social structure: for example, in organizations, workplace technologies can inadvertently scaffold exploitative labor practices. This talk will discuss my research (1) investigating these harms via online measurement studies, (2) building interventions to directly assist survivors with their security and privacy; and (3) instrumenting these interventions, to enable scientific research into new types of harms as attackers and technologies evolve. I will close by sharing my vision for centering inclusion and equity in digital safety, security and privacy, towards brighter technological futures for us all.
Read more

Efficient Request Isolation in Function-as-a-Service

Mohamed Alzayat Max Planck Institute for Software Systems
08 Mar 2024, 2:00 pm - 3:00 pm
Saarbrücken building E1 5, room 002
SWS Student Defense Talks - Thesis Defense
As cloud applications become increasingly event-driven, Function-as-a-Service (FaaS) is emerging as an important abstraction. FaaS allows tenants to state their application logic as stateless functions without managing the underlying infrastructure that runs and scales their applications.

FaaS providers ensure the confidentiality of tenants’ data, to a limited extent, by isolating function instances from one another. However, for performance considerations, the same degree of isolation does not apply to sequential requests activating the same function instance. ...
As cloud applications become increasingly event-driven, Function-as-a-Service (FaaS) is emerging as an important abstraction. FaaS allows tenants to state their application logic as stateless functions without managing the underlying infrastructure that runs and scales their applications.

FaaS providers ensure the confidentiality of tenants’ data, to a limited extent, by isolating function instances from one another. However, for performance considerations, the same degree of isolation does not apply to sequential requests activating the same function instance. This compromise can lead to confidentiality breaches since bugs in a function implementation or its dependencies may retain state and leak data across activations. Moreover, platform optimizations that assume function statelessness may introduce unexpected behavior if the function retains state, jeopardizing correctness.

This dissertation presents two complementary systems: Groundhog and CtxTainter. Groundhog is a black-box and programming-language-agnostic solution that enforces confidentiality by efficiently rolling back changes to a function’s state after each function activation, effectively enforcing statelessness by breaking all data flows at the request boundary. CtxTainter is a development-phase dynamic data flow analysis tool that detects data flows that violate the statelessness assumption and reports them to the developer for reviewing and fixing.
Read more

Designing for Autonomy in Data-Driven AI Systems

Ge Tiffany Wang University of Oxford
07 Mar 2024, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 002
CIS@MPG Colloquium
As ubiquitous AI becomes increasingly integrated into the smart devices that users use daily, the rise of extensive datafication, data surveillance, and monetized behavioral engineering is becoming increasingly noticeable. Sophisticated algorithms are utilized to perform in-depth analyses of people's data, dissecting it to evaluate personal characteristics, thereby making significant and impactful algorithmic decisions for them. In this evolving digital environment, smart devices are no longer just functional tools; they have become active agents in shaping experiences, ...
As ubiquitous AI becomes increasingly integrated into the smart devices that users use daily, the rise of extensive datafication, data surveillance, and monetized behavioral engineering is becoming increasingly noticeable. Sophisticated algorithms are utilized to perform in-depth analyses of people's data, dissecting it to evaluate personal characteristics, thereby making significant and impactful algorithmic decisions for them. In this evolving digital environment, smart devices are no longer just functional tools; they have become active agents in shaping experiences, transforming lives as algorithmic decisions are etching pathways for people's futures. This trend is particularly concerning for vulnerable groups such as children, young people, and other marginalized communities, who may be disproportionately affected by these technological advancements. My research in human-computer interaction focuses on reimagining these data-driven AI systems to better support user autonomy. To address these challenges, I develop tools and systems that empower users and communities, especially those most vulnerable, to control their own experiences and information directly. These include: 1) human-AI interaction tools that enhance user decision-making power, 2) AI literacy tools for a deeper, critical understanding of data-driven systems, and 3) the development of actionable strategies and frameworks for policymakers and industry leaders to ensure the ethical development and use of AI technologies.
Read more

Shaping Experience and Expression by Designing Sensorimotor Contingencies

Paul Strohmeier MPI-INF - D4
06 Mar 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Our experience of the world is mediated by a continuous feedback loop between motor activity and sensory stimuli. Manipulating these sensorimotor loops offers a powerful means to influence user experience. Such loops, updating in milliseconds, demand devices with higher temporal resolution than those typically used in conventional human-computer interaction (HCI). In this talk, I will show how tapping into sensorimotor loops with tactile feedback allows us to alter our experience of the physical world and our own bodily sensations. ...
Our experience of the world is mediated by a continuous feedback loop between motor activity and sensory stimuli. Manipulating these sensorimotor loops offers a powerful means to influence user experience. Such loops, updating in milliseconds, demand devices with higher temporal resolution than those typically used in conventional human-computer interaction (HCI). In this talk, I will show how tapping into sensorimotor loops with tactile feedback allows us to alter our experience of the physical world and our own bodily sensations. Additionally, I will discuss how we can go beyond simply augmenting experience to enhancing human expression through sensorimotor augmentation.
Read more

Hardware and Software Codesign

Marcus Pirron Max Planck Institute for Software Systems
05 Mar 2024, 2:00 pm - 3:00 pm
Kaiserslautern building G26, room 111
SWS Student Defense Talks - Thesis Defense
Modern robotic applications consist of a variety of robotic systems that work together to achieve complex tasks. Programming these applications draws from multiple fields of knowledge and typically involves low-level imperative programming languages that provide little to no support for abstraction or reasoning. We present a unifying programming model, ranging from automated controller synthesis for individual robots to a compositional reasoning framework for inter-robot coordination. We provide novel methods on the topics of control and planning of modular robots, ...
Modern robotic applications consist of a variety of robotic systems that work together to achieve complex tasks. Programming these applications draws from multiple fields of knowledge and typically involves low-level imperative programming languages that provide little to no support for abstraction or reasoning. We present a unifying programming model, ranging from automated controller synthesis for individual robots to a compositional reasoning framework for inter-robot coordination. We provide novel methods on the topics of control and planning of modular robots, making contributions in three main areas: controller synthesis, concurrent systems, and verification. Our method synthesizes control code for serial and parallel manipulators and leverages physical properties to synthesize sensing abilities. This allows us to determine parts of the system's state that previously remained unmeasured. Our synthesized controllers are robust; we are able to detect and isolate faulty parts of the system, find alternatives, and ensure continued operation. On the concurrent systems side, we deal with dynamic controllers affecting the physical state, geometric constraints on components, and synchronization between processes. We provide a programming model for robotics applications that consists of assemblies of robotic components together with a run-time and a verifier. Our model combines message-passing concurrent processes with motion primitives and explicit modeling of geometric frame shifts, allowing us to create composite robotic systems for performing tasks that are unachievable for individual systems. We provide a verification algorithm based on model checking and SMT solvers that statically verifies concurrency-related properties (e.g. absence of deadlocks) and geometric invariants (e.g. collision-free motions). Our method ensures that jointly executed actions at end-points are communication-safe and deadlock-free, providing a compositional verification methodology for assemblies of robotic components with respect to concurrency and dynamic invariants. Our results indicate the efficiency of our novel approach and provide the foundation of compositional reasoning of robotic systems.
Read more

Beyond the Search Bar in Human-AI Interactions: Augmenting Discovery, Synthesis, and Creativity With User-Generated Context

Srishti Palani University of California, San Diego
05 Mar 2024, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
Searching and exploring information online is integral to everyday life, shaping how we learn, work, and create. As the Web paradigm evolves to include foundational AI models and beyond, we are experiencing a shift in how we search and work. With this transformation in human-AI interaction, it is important to investigate how we might present the user with the right information in the right context, the right representation, and at the right time. In this talk, ...
Searching and exploring information online is integral to everyday life, shaping how we learn, work, and create. As the Web paradigm evolves to include foundational AI models and beyond, we are experiencing a shift in how we search and work. With this transformation in human-AI interaction, it is important to investigate how we might present the user with the right information in the right context, the right representation, and at the right time. In this talk, I will share how I have explored these questions in the context of complex critical information work (such as knowledge discovery, synthesis, and creativity). I present insights about user behaviors and challenges from mixed-method studies observing how people conduct this work using today’s tools. Then, I present novel AI-powered tools and techniques that augment these cognitive processes by mining rich contextual signals from unstructured user-generated artifacts. By deepening our understanding of human cognition and behavior and building tools that understand user contexts more meaningfully, I envision a future where human-AI interactions are more personalized, context-aware, cognitively-convivial, and truly collaborative.
Read more

Data Privacy in the Decentralized Era

Amrita Roy Chowdhury University of California-San Diego
01 Mar 2024, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 002
CIS@MPG Colloquium
Data is today generated on smart devices at the edge, shaping a decentralized data ecosystem comprising multiple data owners (clients) and a service provider (server). Clients interact with the server with their personal data for specific services, while the server performs analysis on the joint dataset. However, the sensitive nature of the involved data, coupled with inherent misalignment of incentives between clients and the server, breeds mutual distrust. Consequently, a key question arises: How to facilitate private data analytics within a decentralized data ecosystem, ...
Data is today generated on smart devices at the edge, shaping a decentralized data ecosystem comprising multiple data owners (clients) and a service provider (server). Clients interact with the server with their personal data for specific services, while the server performs analysis on the joint dataset. However, the sensitive nature of the involved data, coupled with inherent misalignment of incentives between clients and the server, breeds mutual distrust. Consequently, a key question arises: How to facilitate private data analytics within a decentralized data ecosystem, comprising multiple distrusting parties?

My research shows a way forward by designing systems that offer strong and provable privacy guarantees while preserving data functionality. I accomplish this by systematically exploring the synergy between cryptography and differential privacy, exposing their rich interconnections in both theory and practice. In this talk, I will focus on two systems, CryptE and EIFFeL, which enable privacy-preserving query analytics and machine learning, respectively.
Read more

Methods for Financial Stability Analysis

Christoph Siebenbrunner WU Vienna
28 Feb 2024, 3:00 pm - 4:00 pm
Saarbrücken building E1 5, room 002
simultaneous videocast to Kaiserslautern building G26, room 113
SWS Colloquium
We present a number of methods used by central banks and regulatory authorities to assess the stability of the financial system, including stress tests, network analysis of the interbank market and models of interbank contagion and fire sales, aiming to capture the dynamics similar to those observed during the 2008 financial crisis. We will discuss the key role of banks in the money creation process, how this relates to monetary aggregates and what the introduction of central bank digital currencies and their different implementation options may mean in this context. ...
We present a number of methods used by central banks and regulatory authorities to assess the stability of the financial system, including stress tests, network analysis of the interbank market and models of interbank contagion and fire sales, aiming to capture the dynamics similar to those observed during the 2008 financial crisis. We will discuss the key role of banks in the money creation process, how this relates to monetary aggregates and what the introduction of central bank digital currencies and their different implementation options may mean in this context.

--

Please contact the office team for link information
Read more

Computational Approaches to Narrative Analysis

Maria Antoniak Allen Institute
26 Feb 2024, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
People use storytelling to persuade, to entertain, to inform, and to make sense of their experiences as a community—and as a form of self-disclosure, personal storytelling can strengthen social bonds, build trust, and support the storyteller’s health. But computational analysis of narratives faces challenges, including difficulty in defining stories, lack of annotated datasets, and need to generalize across diverse settings. In this talk, I’ll present work addressing these challenges that uses methods from natural language processing (NLP) to measure storytelling both across online communities and within specific social contexts. ...
People use storytelling to persuade, to entertain, to inform, and to make sense of their experiences as a community—and as a form of self-disclosure, personal storytelling can strengthen social bonds, build trust, and support the storyteller’s health. But computational analysis of narratives faces challenges, including difficulty in defining stories, lack of annotated datasets, and need to generalize across diverse settings. In this talk, I’ll present work addressing these challenges that uses methods from natural language processing (NLP) to measure storytelling both across online communities and within specific social contexts. This work has implications for NLP methods for story detection, the literary field of narratology, cooperative work in online communities, and better healthcare support. As one part of my research in NLP and cultural analytics, it also highlights how NLP methods can be used creatively and reliably to study human experiences.
Read more

Global Investigation of Network Connection Tampering

Ramakrishnan Sundara Raman University of Michigan
23 Feb 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
As the Internet's user base and criticality of online services continue to expand daily, powerful adversaries like Internet censors are increasingly monitoring and restricting Internet traffic. These adversaries, powered by advanced network technology, perform large-scale connection tampering attacks seeking to prevent users from accessing specific online content, compromising Internet availability and integrity. In recent years, we have witnessed recurring censorship events affecting Internet users globally, with far-reaching social, financial, and psychological consequences, making them important to study. ...
As the Internet's user base and criticality of online services continue to expand daily, powerful adversaries like Internet censors are increasingly monitoring and restricting Internet traffic. These adversaries, powered by advanced network technology, perform large-scale connection tampering attacks seeking to prevent users from accessing specific online content, compromising Internet availability and integrity. In recent years, we have witnessed recurring censorship events affecting Internet users globally, with far-reaching social, financial, and psychological consequences, making them important to study. However, characterizing tampering attacks at the global scale is an extremely challenging problem, given intentionally opaque practices by adversaries, varying tampering mechanisms and policies across networks, evolving environments, sparse ground truth, and safety risks in collecting data. In this talk, I will describe my research on building empirical methods to characterize connection tampering globally and investigate the network technology enabling tampering. First, I will introduce novel network measurement methods for locating and examining network devices that perform censorship. Next, I will describe a modular design for the Censored Planet Observatory that enables it to remotely and sustainably measure Internet censorship longitudinally in more than 200 countries. I will introduce time series analysis methods to detect key censorship events in longitudinal Censored Planet data, and reveal global censorship trends. Finally, I will describe exciting ongoing and future research directions, such as building intelligent measurement platforms.
Read more

High-stakes decisions from low-quality data: AI decision-making for planetary health

Lily Xu Harvard University
21 Feb 2024, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
Planetary health is an emerging field which recognizes the inextricable link between human health and the health of our planet. Our planet’s growing crises include biodiversity loss, with animal population sizes declining by an average of 70% since 1970, and maternal mortality, with 1 in 49 girls in low-income countries dying from complications in pregnancy or birth. Underlying these global challenges is the urgent need to effectively allocate scarce resources. My research develops data-driven AI decision-making methods to do so, ...
Planetary health is an emerging field which recognizes the inextricable link between human health and the health of our planet. Our planet’s growing crises include biodiversity loss, with animal population sizes declining by an average of 70% since 1970, and maternal mortality, with 1 in 49 girls in low-income countries dying from complications in pregnancy or birth. Underlying these global challenges is the urgent need to effectively allocate scarce resources. My research develops data-driven AI decision-making methods to do so, overcoming the messy data ubiquitous in these settings. Here, I’ll present technical advances in stochastic bandits, robust reinforcement learning, and restless bandits, addressing research questions that emerge from my close collaboration with the public sector. I’ll also discuss bridging the gap from research and practice, including anti-poaching field tests in Cambodia, field visits in Belize and Uganda, and large-scale deployment with SMART conservation software.
Read more

Paths to AI Accountability

Sarah Cen Massachusetts Institute of Technology
19 Feb 2024, 10:00 am - 11:00 am
Kaiserslautern building G26, room 111
CIS@MPG Colloquium
In the past decade, we have begun grappling with difficult questions related to the rise of AI, including: What rights do individuals have in the age of AI? When should we regulate AI and when should we abstain? What degree of transparency is needed to monitor AI systems? These questions are all concerned with AI accountability: determining who owes responsibility and to whom in the age of AI. In this talk, I will discuss the two main components of AI accountability, ...
In the past decade, we have begun grappling with difficult questions related to the rise of AI, including: What rights do individuals have in the age of AI? When should we regulate AI and when should we abstain? What degree of transparency is needed to monitor AI systems? These questions are all concerned with AI accountability: determining who owes responsibility and to whom in the age of AI. In this talk, I will discuss the two main components of AI accountability, then illustrate them through a case study on social media. Within the context of social media, I will focus on how social media platforms filter (or curate) the content that users see. I will review several methods for auditing social media, drawing from concepts and tools in hypothesis testing, causal inference, and LLMs.
Read more

Programming Theory in Security Analysis: A Tripartite Framework for Vulnerability Specification

Yinxi Liu Chinese University of Hong Kong
15 Feb 2024, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 002
CIS@MPG Colloquium
Living in a computer-reliant era, we’re balancing the power of computer systems with the challenges of ensuring their functional correctness and security. Program analysis has proven successful in addressing these issues by predicting the behavior of a system when executed. However, the complexity of conducting program analysis significantly arises as modern applications employ advanced, high-level programming languages and progressively embody the structure of a composite of independent modules that interact in sophisticated manners. In this talk, ...
Living in a computer-reliant era, we’re balancing the power of computer systems with the challenges of ensuring their functional correctness and security. Program analysis has proven successful in addressing these issues by predicting the behavior of a system when executed. However, the complexity of conducting program analysis significantly arises as modern applications employ advanced, high-level programming languages and progressively embody the structure of a composite of independent modules that interact in sophisticated manners. In this talk, I will detail how to apply programming language theory to construct refined vulnerability specifications and reduce the complexity of program analysis across computational, conformational, and compositional aspects: - My primary focus will be on introducing some formal specifications that I have developed for modeling the common exponential worst-case computational complexity inherent in modern programming languages. These specifications have guided the first worst-case polynomial solution for detecting performance bugs in regexes. - I will also briefly discuss why generating inputs with complex conformation to target deep-seated bugs is a significant obstacle for existing techniques, and how I devised strategies to generate more sound input by intentionally satisfying previously unrecognized forms of dependencies. - Finally, as part of a vision to enhance security analysis in modern distributed systems, where different operations can be composed in a complex way and may interleave with each other, I will briefly discuss my efforts to establish new security notions to identify non-atomic operations in smart contracts and deter any potential attacks that might exploit their interactions.
Read more

Formal Reasoning about Relational Properties in Large-Scale Systems

Jana Hofmann Azure Research
14 Feb 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
Establishing strong guarantees for security-critical systems has never been more challenging. On the one hand, systems become increasingly complex and intricate. On the other hand, many security requirements are relational, i.e., they compare several execution traces. Examples are noninterference properties, program indistinguishability, and even algorithmic fairness. Due to their relational nature, formal reasoning about such properties quickly blows up in complexity. In this talk, I discuss the necessity to scale relational reasoning to larger software and black-box systems and present two of our recent contributions to tackle this challenge. ...
Establishing strong guarantees for security-critical systems has never been more challenging. On the one hand, systems become increasingly complex and intricate. On the other hand, many security requirements are relational, i.e., they compare several execution traces. Examples are noninterference properties, program indistinguishability, and even algorithmic fairness. Due to their relational nature, formal reasoning about such properties quickly blows up in complexity. In this talk, I discuss the necessity to scale relational reasoning to larger software and black-box systems and present two of our recent contributions to tackle this challenge. In the first part, I focus on formal algorithms for white-box systems and show how to combine relational reasoning with non-relational specifications to enable the synthesis of smart contract control flows. In the second part, I focus on relational testing of black-box systems and illustrate its use in modeling and detecting microarchitectural information leakage in modern CPUs.
Read more

Algorithmic Verification of Linear Dynamical Systems

Toghrul Karimov MPI-SWS
08 Feb 2024, 4:00 pm - 5:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Defense
Linear dynamical systems (LDS) are mathematical models widely used in engineering and science to describe systems that evolve over time. In this thesis, we study algorithms for various decision problems of discrete-time linear dynamical systems. Our main focus is the Model-Checking Problem, which is to decide, given a linear dynamical system and an ω-regular specification, whether the trajectory of the LDS satisfies the specification. Using tools from various mathematical disciplines, most notably algebraic number theory, Diophantine approximation, ...
Linear dynamical systems (LDS) are mathematical models widely used in engineering and science to describe systems that evolve over time. In this thesis, we study algorithms for various decision problems of discrete-time linear dynamical systems. Our main focus is the Model-Checking Problem, which is to decide, given a linear dynamical system and an ω-regular specification, whether the trajectory of the LDS satisfies the specification. Using tools from various mathematical disciplines, most notably algebraic number theory, Diophantine approximation, automata theory, and combinatorics on words, we prove decidability of the Model-Checking Problem for large classes of linear dynamical systems and ω regular properties. We further exploit deep connections between linear dynamical systems and contemporary number theory to show that improving any of our decidability results would amount to major mathematical breakthroughs. Our results delineate the boundaries of decision problems of linear dynamical systems that, at the present time, can be solved algorithmically.
Read more

Reliable Measurement for Machine Learning at Scale

A. Feder Cooper Cornell University
08 Feb 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
We need reliable measurement in order to develop rigorous knowledge about ML models and the systems in which they are embedded. But reliable measurement is a really hard problem, touching on issues of reproducibility, scalability, uncertainty quantification, epistemology, and more. In this talk, I will discuss the criteria needed to take reliability seriously — criteria for designing meaningful metrics, and for methodologies that ensure that we can dependably and efficiently measure these metrics at scale and in practice. ...
We need reliable measurement in order to develop rigorous knowledge about ML models and the systems in which they are embedded. But reliable measurement is a really hard problem, touching on issues of reproducibility, scalability, uncertainty quantification, epistemology, and more. In this talk, I will discuss the criteria needed to take reliability seriously — criteria for designing meaningful metrics, and for methodologies that ensure that we can dependably and efficiently measure these metrics at scale and in practice. I will give two examples of my research that put these criteria into practice: (1) large-scale evaluation of training-data memorization in large language models, and (2) evaluating latent arbitrariness in algorithmic fairness binary classification contexts. Throughout this discussion, I will emphasize how important it is to make metrics understandable for other stakeholders in order to facilitate public governance. For this reason, my work aims to design metrics that are legally cognizable — a goal that frames the both my ML and legal scholarship. I will draw on important connections that I have uncovered between ML and law: connections between (1) the generative-AI supply chain and US copyright law, and (2) ML arbitrariness and arbitrariness in legal rules. This talk reflects joint work with collaborators at The GenLaw Center, Cornell CS, Cornell Law School, Google DeepMind, and Microsoft Research.
Read more

New Tools for Text Indexing and Beyond: Substring Complexity and String Synchronizing Sets

Tomasz Kociumaka MPI-INF - D1
07 Feb 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
In recent decades, the volume of sequential data processed in genomics and other disciplines has exploded. This trend escalates the challenge of designing text indexes – concise data structures that allow answering various queries efficiently. For example, pattern-matching queries ask if the text (a long string modeling the dataset) contains an occurrence of a given pattern (a short string provided at query time). The suffix array is a classic index for pattern matching and many other queries, ...
In recent decades, the volume of sequential data processed in genomics and other disciplines has exploded. This trend escalates the challenge of designing text indexes – concise data structures that allow answering various queries efficiently. For example, pattern-matching queries ask if the text (a long string modeling the dataset) contains an occurrence of a given pattern (a short string provided at query time). The suffix array is a classic index for pattern matching and many other queries, but modern applications demand space-efficient alternatives like the FM index. Still, even these data structures struggle with massive highly repetitive datasets, such as human genome collections. This scenario requires indexes whose size is comparable to the best text compression methods.

The talk will introduce two tools originating from my text indexing research. Substring complexity is a well-behaved measure of text repetitiveness that aids in comparing the performance of compression algorithms. String synchronizing sets enable consistent selection of small subsets of text substrings. Recently, these tools became the core of the asymptotically smallest compressed text index with suffix-array functionality. Furthermore, they find applications beyond text indexing: in combinatorics on words, quantum string algorithms, and more.
Read more

Causal Inference for Robust, Reliable, and Responsible NLP

Zhijing Jin MPI-IS & ETH
06 Feb 2024, 10:00 am - 11:00 am
Saarbrücken building E1 5, room 002
CIS@MPG Colloquium
Despite the remarkable progress in large language models (LLMs), it is well-known that natural language processing (NLP) models tend to fit for spurious correlations, which can lead to unstable behavior under domain shifts or adversarial attacks. In my research, I develop a causal framework for robust and fair NLP, which investigates the alignment of the causality of human decision-making and model decision-making mechanisms. Under this framework, I develop a suite of stress tests for NLP models across various tasks, ...
Despite the remarkable progress in large language models (LLMs), it is well-known that natural language processing (NLP) models tend to fit for spurious correlations, which can lead to unstable behavior under domain shifts or adversarial attacks. In my research, I develop a causal framework for robust and fair NLP, which investigates the alignment of the causality of human decision-making and model decision-making mechanisms. Under this framework, I develop a suite of stress tests for NLP models across various tasks, such as text classification, natural language inference, and math reasoning; and I propose to enhance robustness by aligning model learning direction with the underlying data generating direction. Using this causal inference framework, I also test the validity of causal and logical reasoning in models, with implications for fighting misinformation, and also extend the impact of NLP by applying it to analyze the causality behind social phenomena important for our society, such as causal analysis of policies, and measuring gender bias in our society. Together, I develop a roadmap towards socially responsible NLP by ensuring the reliability of models, and broadcasting its impact to various social applications.
Read more

Towards Ethical and Democratic Design of AI

Tanusree Sharma University of Illinois at Urbana Champaign
05 Feb 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
Advancements in Artificial Intelligence (AI) are impacting our lives, raising concerns related to data collection, and social alignment to the resilience of AI models. A major criticism of AI development is the lack of transparency in design and decision-making about AI behavior, potentially leading to adverse outcomes such as discrimination, lack of inclusivity and representation, breaching legal rules, and privacy and security risks. Underserved populations, in particular, can be disproportionately affected by these design decisions. Conventional approaches in soliciting people’s input, ...
Advancements in Artificial Intelligence (AI) are impacting our lives, raising concerns related to data collection, and social alignment to the resilience of AI models. A major criticism of AI development is the lack of transparency in design and decision-making about AI behavior, potentially leading to adverse outcomes such as discrimination, lack of inclusivity and representation, breaching legal rules, and privacy and security risks. Underserved populations, in particular, can be disproportionately affected by these design decisions. Conventional approaches in soliciting people’s input, such as interviews, surveys, and focus groups, have limitations, such as often lacking consensus, coordination, and regular engagement. In this talk, I will present two examples of sociotechnical interventions for democratic and ethical AI. First, to address the need for ethical dataset creation for AI development, I will present a novel method "BivPriv," drawing ideas from accessible computing and computer vision in creating an inclusive private visual dataset with blind users as contributors. Then I will discuss my more recent work on "Inclusive.AI" funded by OpenAI to address concerns of social alignment by facilitating a democratic platform with decentralized governance mechanisms for scalable user interaction and integrity in the decision-making processes related to AI.
Read more

Theoretically Sound Cryptography for Key Exchange and Advanced Applications

Doreen Riepel UC San Diego
01 Feb 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
Today, nearly all internet connections are established using a cryptographic key exchange protocol. Fo r these protocols, we want to guarantee security even if an adversary can control the protocol’s execution or secr ets are leaked. A security analysis takes this into account and provides a mathematical proof relying on computati onal problems that are believed to be hard to solve. In this talk, I will first give an overview of the security p roperties of authenticated key exchange protocols and how to achieve them using cryptographic building blocks. ...
Today, nearly all internet connections are established using a cryptographic key exchange protocol. Fo r these protocols, we want to guarantee security even if an adversary can control the protocol’s execution or secr ets are leaked. A security analysis takes this into account and provides a mathematical proof relying on computati onal problems that are believed to be hard to solve. In this talk, I will first give an overview of the security p roperties of authenticated key exchange protocols and how to achieve them using cryptographic building blocks. I w ill talk about tight security and the role of idealized models such as the generic group model. In addition to cla ssical Diffie-Hellman based key exchange, I will also present recent results on isogeny-based key exchange, a prom ising candidate for post-quantum secure cryptography. Finally, I will touch upon examples of advanced cryptographi c primitives like ratcheted key exchange for secure messaging.
Read more

New Algorithmic Tools for Rigorous Machine Learning Security Analysis

Teodora Baluta National University of Singapore
30 Jan 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
Machine learning security is an emerging area with many open questions lacking systematic analysis. In this talk, I will present three new algorithmic tools to address this gap: (1) algebraic proofs; (2) causal reasoning; and (3) sound statistical verification. Algebraic proofs provide the first conceptual mechanism to resolve intellectual property disputes over training data. I show that stochastic gradient descent, the de-facto training procedure for modern neural networks, is a collision-resistant computation under precise definitions. These results open up connections to lattices, ...
Machine learning security is an emerging area with many open questions lacking systematic analysis. In this talk, I will present three new algorithmic tools to address this gap: (1) algebraic proofs; (2) causal reasoning; and (3) sound statistical verification. Algebraic proofs provide the first conceptual mechanism to resolve intellectual property disputes over training data. I show that stochastic gradient descent, the de-facto training procedure for modern neural networks, is a collision-resistant computation under precise definitions. These results open up connections to lattices, which are mathematical tools used for cryptography presently. I will also briefly mention my efforts to analyze causes of empirical privacy attacks and defenses using causal models, and to devise statistical verification procedures with ‘probably approximately correct’ (PAC)-style soundness guarantees.
Read more

Oblivious Algorithms for Privacy-Preserving Computations

Sajin Sasy University of Waterloo
25 Jan 2024, 10:00 am - 11:00 am
Bochum building MPI-SP
CIS@MPG Colloquium
People around the world use data-driven online services every day. However, data center attacks and data breaches have become a regular and rising phenomenon. How, then, can one reap the benefits of data-driven statistical insights without compromising the privacy of individuals' data? In this talk, I will first present an overview of three disparate approaches towards privacy-preserving computations today, namely homomorphic cryptography, distributed trust, and secure hardware. These ostensibly unconnected approaches have one unifying element: oblivious algorithms. ...
People around the world use data-driven online services every day. However, data center attacks and data breaches have become a regular and rising phenomenon. How, then, can one reap the benefits of data-driven statistical insights without compromising the privacy of individuals' data? In this talk, I will first present an overview of three disparate approaches towards privacy-preserving computations today, namely homomorphic cryptography, distributed trust, and secure hardware. These ostensibly unconnected approaches have one unifying element: oblivious algorithms. I will discuss the relevance and pervasiveness of oblivious algorithms in all the different models for privacy-preserving computations. Finally, I highlight the performance and security challenges in deploying such privacy-preserving solutions, and present three of my works that mitigate these obstacles through the design of novel efficient oblivious algorithms.
Read more

A comprehensive analysis of PII leakage-based web tracking

Ha Dao MPI-INF - INET
10 Jan 2024, 12:15 pm - 1:15 pm
Saarbrücken building E1 5, room 002
Joint Lecture Series
Third-party web tracking has been used to collect and analyze user browsing behavior, which has raised concerns about privacy among Internet users. In this talk, I will describe the mechanisms behind third-party web tracking and offer suggestions for user protection. My discussion will extend to our recent work on Personally Identifiable Information (PII) leakage-based tracking across various scenarios. I will conclude the talk by suggesting potential research directions to enhance transparency on the World Wide Web.
Third-party web tracking has been used to collect and analyze user browsing behavior, which has raised concerns about privacy among Internet users. In this talk, I will describe the mechanisms behind third-party web tracking and offer suggestions for user protection. My discussion will extend to our recent work on Personally Identifiable Information (PII) leakage-based tracking across various scenarios. I will conclude the talk by suggesting potential research directions to enhance transparency on the World Wide Web.