Christian Steinruecken


I am a researcher in machine learning / artificial intelligence at the University of Cambridge. I work in the CBL Laboratory at the Department of Engineering. I am interested in information theory, data compression, Bayesian inference, probabilistic programming, sampling methods, language design, automated question answering and other topics.

I am a member of King's College.

Contact

I can best be reached by email:
tcs27 [at] cam . ac . uk

Some recent work

Improving PPM with dynamic parameter updates

Pre-print
This article makes several improvements to the classic PPM algorithm, resulting in a new algorithm with superior compression effectiveness on human text. The key differences of our algorithm to classic PPM are that: Ⓐ rather than the original escape mechanism, we use a generalised blending method with explicit hyper-parameters that control the way symbol counts are combined to form predictions; Ⓑ different hyper-parameters are used for classes of different contexts; and Ⓒ these hyper-parameters are updated dynamically using gradient information. The resulting algorithm (PPM-DP) compresses human text better than all currently published variants of PPM, CTW, DMC, LZ, CSE and BWT, with runtime only slightly slower than classic PPM.
(To be presented at the Data Compression Conference, DCC 2015.)

Compressing Sets and Multisets of Sequences

http://arxiv.org/abs/1401.6410
This article describes lossless compression algorithms for multisets of sequences, taking advantage of the multiset's unordered structure. Multisets are a generalisation of sets where members are allowed to occur multiple times. A multiset can be encoded naïvely by simply storing its elements in some sequential order, but then information is wasted on the ordering. We propose a technique that transforms the multiset into an order-invariant tree representation, and derive an arithmetic code that optimally compresses the tree. Our method achieves compression even if the sequences in the multiset are individually incompressible (such as cryptographic hash sums). The algorithm is demonstrated practically by compressing collections of SHA-1 hash sums, and multisets of arbitrary, individually encodable objects.
(To appear in IEEE Transactions on Information Theory.)

Lossless Data Compression

PhD thesis (pre-print)
This thesis makes several contributions to the field of data compression. Compression algorithms are derived for a variety of applications, employing probabilistic modelling, Bayesian inference, and arithmetic coding; and making the underlying probability distributions explicit throughout. A general compression toolbox is described, consisting of practical algorithms for compressing data distributed by various fundamental probability distributions, and mechanisms for combining these algorithms in a principled way. New mathematical theory is introduced for compressing objects with an underlying combinatorial structure, such as permutations, combinations, and multisets. For text compression, a novel unifying construction is developed for a family of context-sensitive compression algorithms, special cases of which include the PPM algorithm and the Sequence Memoizer. The work concludes with experimental results, example applications, and a brief discussion on cost-sensitive compression and adversarial sequences.
(Final version to appear soon.)

SAT-solving: Performance analysis of Survey Propagation and DPLL

Technical report
The Boolean Satisfiability Problem (SAT) belongs to the class of NP-complete problems, meaning that there is no known deterministic algorithm that can solve an arbitrary problem instance in less than exponential time (parametrized on the length of the input). There is great industrial demand for solving SAT, motivating the need for algorithms which perform well. I present a comparison of two approaches for solving SAT instances: DPLL (an exact algorithm from classical computer science) and Survey Propagation (a probabilistic algorithm from statistical physics). The two algorithms were compared on randomly generated 3-SAT problems with varying clause to variable ratios.
(Cavendish Laboratory, University of Cambridge)

Teaching

I have taught various undergraduate courses at Cambridge University, including: Probability Theory, Types, Artificial Intelligence, Computation Theory, Functional Programming, Quantum Computation, ML, Prolog, Java, and Foundations of Computer Science.

I also supervised a few undergraduate B.A. dissertations.



Miscellaneous


Computer Science Miscellania