Research B – Autumn Semester 2013/14


Recent news


Basic information


Deadlines and Meetings

Deadlines
All deadlines are at 15:00, CET (Amsterdam time).
Lectures
Progress Meetings

Topics

Please also take a look at thesis projects, they may also fit for a Research B project.
Topics marked with a red frame are already taken.

Title: Programming with infinite circular data
Supervisor: Alexandra Silva <alexandra@cs.ru.nl>
Description: There are many programs/algorithms that operate on circular data. However, language support therefore is still very limited. In this project we will be looking at extending functional languages with new constructs to operate on circular data structures, like trees and graphs.
Literature:
Type of work: Theory and programming

Title: Formal languages in theorem provers
Supervisor: Alexandra Silva <alexandra@cs.ru.nl>
Description: The goal of the project is to develop a library in a functional language like Haskell or a theorem prover, like Coq or Circ, where regular expressions and regular languages are formalized. The goal is to then formalize proofs of known theorems, such as the Myhill-nerode theorem.
Literature:
Type of work: Theory and programming

Title: Pairing based cryptography in SmartGrids
Supervisor: Bárbara Vieira <b.vieira@cs.ru.nl>
Description: Development and implementation (in C++) of a pairing-based protocol and evaluation of the applicability of pairing based cryptography to the SmartGrids context. The protocol will be used in the specific context of C-DAX, an European Project which proposes a cyber-secure data and control cloud for future power distribution networks.
Literature:
Type of work: Theory (40%): Includes studying paring based cryptography (and all the related concepts: elliptic curves, pairings, etc) and information centric networks (communication model), with special focus on the design of a pairing based protocol for the specific communication model supported by information centric networks (considering the particular case of C-DAX).
Programming (40%): It is expected that implementation (programming part) corresponds to 40% of the work. The implementation requires at least knowledge of the C programming language. Notice that, the purpose of this research project is not to implement a pairing library from the scratch. Efficient pairing implementations are already available online and the purpose is to rely on those libraries to implement the protocols.
Experiments (20%): C-DAX targets three different use-cases (SmartGrid Use-Cases) with many different characteristics (speed, data volume, etc). It will be necessary to evaluate in which use case the protocol can be implemented by evaluating the underlying constrains (doing experiments considering the different characteristics). There might be the possibility of testing the protocol at Aliander's (a C-DAX partner) testbed. Aliander's is one of the current DNOs in The Netherlands.

Title: Reconstructing causal protein signaling networks from data
Supervisor: Joris Mooij <j.mooij@cs.ru.nl>
Description: Proteins play a vital role in the human body. In this project, the student will analyze protein concentration data measured in cells of the human immune system [1] in order to build a causal model of how various proteins interact in these cells. This has potentially important repercussions as malfunctioning of this particular protein signaling system is known to cause cancer. Understanding the complex dynamics of these proteins is traditionally done by performing many costly and labor-intensive experiments. The idea of this project is to investigate how modern causal discovery methods perform in reconstructing the protein signaling network from data gathered under various experimental conditions (see also Figure 1 in the pdf description).
Literature:
  • Karen Sachs, Omar Perez, Dana Pe'er, Douglas A. Lauffenburger, and Garry P. Nolan: Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523­529, 2005.
  • Joris M. Mooij and Tom Heskes. Cyclic causal discovery from continuous equilibrium data. In Ann Nicholson and Padhraic Smyth, editors, Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (UAI-13), pages 431­439. AUAI Press, 2013.
Type of work: The work to be performed is a combination of theory (30%), programming (40%) and experiments (30%). The student first familiarizes him-/herself with the current state-of-the-art causal discovery method [2] for this type of data, implements an improved version of the method in C++, runs it on the biological data and analyzes the results.
Possible improvements of the current method include optimization (the current method is quite slow), taking into account censoring (the measurement data are cut off at some threshold which is currently ignored), using more background knowledge (about the "sign" of the interventions, i.e., whether it is an activator or inhibitor), and implementing a method to compare the results with previous results obtained by other methods.

Title: iTasks in Haskell
Supervisor: prof. dr. Rinus Plasmeijer <rinus@cs.ru.nl> and drs. Jurriën Stutterheim <j.stutterheim@cs.ru.nl>
Description: The iTask system (iTasks) is a task-oriented programming toolkit for programming workflow support applications in Clean. With this toolkit work- flows can be speficied using combinators in a very high level declarative monadic style. Workflows consist of typed tasks that produce results that can be passed as parameters to other tasks. Tasks are constructed by combining single steps sequentially or in parallel. From iTask specifications, executable workflow sup- port systems are generated automatically. Currently, iTasks is only available for Clean due to its reliance on Clean's advanced language features. A very similar, more widely spread purely func- tional programming language is Haskell, which is implemented by the Glorious Haskell Compiler (GHC), amongst others. While Haskell and Clean are very similar, there are differences in the advanced language features they support. For instance, Clean's dynamics are significantly more powerful than Haskell's, while Haskell offers more advanced type-level features, such as GADTs and type families. Since Haskell and Clean share many similarities, one can imagine that a large portion of iTasks can be easily translated to Haskell. This may be desirable, because it would allow an even larger audience to work with iTasks. However, the parts of iTasks that rely on advanced language features in Clean may not be so easily translated to Haskell. GHC may offer alternative solutions to solving the same problems, or may not be able to solve some problems in iTasks at all. This proposal is for a 3 months (6 ec) research project for Research B, in which a student will research the differences and similarities between Clean and Haskell in the context of iTasks. We are looking to answer the following research questions:
  • Which parts of iTasks cannot be implemented in GHC and why?
  • Which parts of iTasks can be implemented in GHC with an alternative approach and how?
Literature:

Title: In Apps we trust: using your smart phone for strong authentication
Supervisor: Roland van Rijswijk-Deij <rijswijk@cs.ru.nl>
Description: Smart phones are seemingly ideal devices to use for strong authentication; they are ubiquitous, easy to program and people are highly likely to not forget them (rather than the dedicated security tokens commonly used by banks). Many traditional vendors of security tokens have realised this and have come up with a host of apps that replicate the functionality of their tokens in an app. Other organisations have also introduced token-like apps, such as the quite popular Google Authenticator and the open source solution "tiqr".
In essence, smart phones are general purpose computing platforms with all the security caveats and woes that come with that. There is already a staggering amount of malware for mobile platforms, especially Android. Presumably, this might make using your smart phone as a security token a bad idea. Nevertheless, app developers can make a risk assessment and—based on a sound threat model—can take quite effective measures to ensure that the token app is as-secure-as-possible.
The goal of this research project is to investigate existing security token apps on different mobile platforms and to learn what kind of approaches the developers have used to ensure the security of their app, what threat model underpins their design and how realistic that threat model is. Apps that can be readily investigated are Google Authenticator and tiqr both of which are open source, but we will also strive to get access to apps from commercial vendors like Vasco and SafeNet.
The two end result of the project will be a set of "design patterns" for securing sensitive credential data in mobile token apps and one or more formalised threat/attacker models that apply to these kinds of apps.
Alternative avenues of research include looking into using security features of some platforms like trusted execution environments and secure elements and security analysis of the client/server protocols used in these token apps.
Literature:
Type of work: 30% researching state-of-the-art
40% experimenting with and investigating apps
30% documenting your results

Title: Cluster analysis for side-channel analysis
Supervisor: Lejla Batina <lejla@cs.ru.nl> and Elena Marchiori <lenam@cs.ru.nl>
Description: Most implementations of public key cryptography employ exponentiation algorithms. Side-channel attacks on secret exponents are typically bound to the leakage of single executions because of cryptographic protocols or side-channel countermeasures. The recent paper "Clustering Algorithms for Non-Profiled Single-Execution Attacks on Exponentiations" proposes to use clustering to attack cryptographic exponentiations and recover secret exponents. The algorithm exploits single-execution leakage and recovers secret exponents without any prior profiling or heuristic leakage models. The choice of a clustering algorithm depends on the assumed shape of the clusters. The paper uses k-means clustering. In this project you will perform the following tasks:
  1. implement the method introduced in this paper,
  2. develop and implement variants of the method obtained by considering different types of clustering methods, and
  3. perform a comparative experimental analysis of the resulting algorithms in order to assess their performance.
Data for performing the experimental analysis will be provided by us.
Literature:
  • Johann Heyszl, Andreas Ibing, Stefan Mangard, Fabrizio De Santis, and Georg Sigl: Clustering Algorithms for Non-Profiled Single-Execution Attacks on Exponentiations. http://eprint.iacr.org/2013/438

Title: Strength of a causal relation: finding things that matter most
Supervisor: Tom Claassen, <tomc@cs.ru.nl>
Description: Causality lies at the heart of most scientific research today. Huge amounts of data are produced in the search for cause-effect relations in all kinds of areas, ranging from food supplements that are beneficial to health, key factors behind global warming, to behavioural patterns that increase the chance on developing ADHD. In each case we look for factors that we can actively influence (causes) in order to obtain some desired result (effect).
When proper 'double-blind' experiments are impossible, e.g. because they are too expensive, unethical, or simply too difficult, we have to rely on (sometimes very large) databases of available observations. Recent work by our group has shown how you can not only identify possible causal relations from this type of data, but also indicate how good (reliable) that prediction really is.
The next important step is to indicate how strong each of these relations is. For example, as a consultant on public health policy it is not enough to know that there is some causal link between, say, cannabis and mental health problems: you need to know whether the total effect of a ban on cannabis use will result in a decrease or increase in psychological disorders, and whether this overall impact is really significant or hardly noticeable. However, strangely enough, there is no generally used or accepted measure for quantifying the strength of a causal relation.
This project aims to come up with a good measure to quantify 'causal strength' that not only can be computed efficiently from 'Big Data' sets, but also conveys those aspects that are most relevant/ informative to researchers in various application areas. It involves initial background research in available literature to see what has been tried/used so far, followed by a specification of the requirements needed for our measure. We select a number of possible candidates, (including some of our own), and test and compare them on several representative data sets in order to formulate a final recommendation.
Literature: TBD
Type of work: starting off with theory / literature survey, (hopefully) leading to some form of experimental validation (incl. programming).

Title: User navigation behaviour on the www.ru.nl domain
Supervisor: Suzan Verberne <s.verberne@cs.ru.nl>
Description: The ultimate goal of this project is to suggest relevant pages to any ru.nl-visitor based on his navigation behaviour.
Web servers register a Web log entry for every single access they get, in which pieces of information about accessing are recorded, including the URL requested, the IP address from which the request originated, and a timestamp. We have a large access log of the www.ru.nl website.
Using this data we aim to (1) create a path x1 → x2 → ... → xn for each time a user visited the www.ru.nl website, (2) cluster these paths in such a manner that visitor types (e.g. high school students, RU researchers, journalists) are revealed, and (3) predict the next page a user will visit based on the behaviour of other users.
Literature:
  • Shahabi et al. 1997: Knowledge discovery from users web-page navigation
  • Morzy et al 2000: Web users clustering
  • Cadez et al. 2003: Model-based clustering and visualization of navigation patterns on a website
  • Xing et al. 2004: Efficient data mining for web navigation patterns
  • Verberne et al. 2010: How does the Library Searcher behave?
  • Alhindi et al. 2013: Site Search Using Profile-Based Document Summarisation

Title: Reachability for xMAS
Supervisor: Julien Schmaltz <Julien.Schmaltz@ou.nl>
Description: xMAS is a graphical language used at Intel's research labs to model communication networks used in multi-core processors. These models are used in the design and validation and their correctness is of utmost importance. Our group is developing an environment for the design and formal validation of xMAS networks. In this project, you will develop a reachability analyser for xMAS. The goal is to build a tool answering questions like "is it possible to reach a state where message a is in queue 1 and message b is in queue 2? etc.".
Literature:
Type of work: The work is good mix between theory, experiments, and programming (C/C++). You will need to get the theory about reachability for xMAS networks, design and code an algorithm solving the reachability problem. Of course, you will also use experimental data to assess the scalability of your solution.

Title: Formal proofs of hardware security
Supervisor: Julien Schmaltz <Julien.Schmaltz@ou.nl>
Description: The evaluation of secure devices is a challenge. In this project, you will look at how formal methods can help in guiding and improving the evaluation of secure devices for their resistance against side-channel attacks. You main task will be to combine a formal model of the AVR-8 controller with forma power models to obtain symbolic expressions of the power consumption of an algorithm. You will then study how these symbolic expressions can be used to guide and improve Side Channel Analysis techniques, such as differential power analysis.
Literature: TBD
Type of work: The work is first about—using the ACL2 theorem proving system—a partial formalisation of the AVR controller together with a power model (say, Hamming weight). There will then be a practical part about studying how these formal models can be used in side channel analysis.

Title: Extracting programs from proofs.
Supervisor: Herman Geuvers <herman@cs.ru.nl>
Description: The Curry-Howard formulas-as-types isomorphism is between proofs in constructive logic and typed lambda terms, which can be viewed as functional programs. This enables extracting programs from constructive proofs. Extending this isomorphism to classical logic, we can interpret classical proofs as "functional programs with jumps", which can be modeled as escape or call-cc constructs. There has been a lot of work on extending program extraction to classical logic, giving rise to new presentations of the logical systems, new typed lambda calculi, new reduction rules and new reduction strategies. There are still many questions left. A concrete research question is: Given a functional program, how to get the associated proof?
Literature: TBD

Title: Extensions of the Turing machine or lambda calculus paradigm with "interaction", "reactivity", "advice" or "oracles".
Supervisor: Herman Geuvers <herman@cs.ru.nl>
Description: There is (a lot of) work on this by, for example, van Leeuwen and Wiederman, Wegner, Goldin, Smolka and many others. Many questions on the universality of these models and their relative strength remain. A concrete recent approach is in a paper "Reactive Turing Machines", by Baeten, Luttik and van Tilburg. This unifies a model of computability with a process model. A concrete question is what the lambda calculus equivalent is: "Reactive Lambda Calculus".
Literature: TBD


Slides and Course Material