I graduated in Machine Learning at University College London with distinction award under the supervision of Sebastian Riedel and Tim Rocktäschel. Since October 2017, I am a Research Scientist at Synerise. I focus on building effective personalization products and high-performance machine learning algorithms. I am broadly interested in developing and studying machine learning models that can reason about the rich structure of multimodal web-scale data. This includes topics in graph representation learning, recommendation systems, behavioral user representations, NLP.
I am a winner of recent machine learning competitions:
MSc in Machine Learning, 2016
University College London
MSc in Computer Science, 2015
Warsaw University of Technology
Faculty of Science Summer Scholarship, 2014
The University of Auckland
BSc in Computer Science, 2013
Warsaw University of Technology
We describe our 3rd place solution to the KDD Cup 2021 Open Benchmark Challenge. We tackle the task of academic paper classification within a heterogeneous graph containing paper, author and institution nodes. We present an efficient model based on our previously introduced algorithms: EMDE and Cleora, on top of a simplistic feed-forward neural network.
In this paper we present our 2nd place solution of the Booking.com Data Challenge competition which focused on making the best recommendation for the next destination of a user trip, based on dataset with millions of real anonymized accommodation reservations.
Recently, the Efficient Manifold Density Estimator (EMDE) model has been introduced. The model exploits Local Sensitive Hashing and Count-Min Sketch algorithms, combining them with a neural network to achieve state-of-the-art results on multiple recommender datasets. However, this model ingests a compressed joint representation of all input items for each user/session, so calculating attributions for separate items via gradient-based methods seems not applicable. We prove that interpreting this model in a white-box setting is possible thanks to the properties of EMDE item retrieval method. By exploiting multimodal flexibility of this model, we obtain meaningful results showing the influence of multiple modalities: text, categorical features, and images, on movie recommendation output.
Many unsupervised representation learning methods belong to the class of similarity learning models. While various modality-specific approaches exist for different types of data, a core property of many methods is that representations of similar inputs are close under some similarity function. We propose EMDE (Efficient Manifold Density Estimator) - a framework utilizing arbitrary vector representations with the property of local similarity to succinctly represent smooth probability densities on Riemannian manifolds. Our approximate representation has the desirable properties of being fixed-size and having simple additive compositionality, thus being especially amenable to treatment with neural networks - both as input and output format, producing efficient conditional estimators. We generalize and reformulate the problem of multi-modal recommendations as conditional, weighted density estimation on manifolds. Our approach allows for trivial inclusion of multiple interaction types, modalities of data as well as interaction strengths for any recommendation setting. Applying EMDE to both top-k and session-based recommendation settings, we establish new state-of-the-art results on multiple open datasets in both uni-modal and multi-modal settings.
Neural language models predict the next token using a latent representation of the immediate token history. Recently, various methods for augmenting neural language models with an attention mechanism over a differentiable memory have been proposed. For predicting the next token, these models query information from a memory of the recent history which can facilitate learning mid- and long-range dependencies. However, conventional attention mechanisms used in memory-augmented neural language models produce a single output vector per time step. This vector is used both for predicting the next token as well as for the key and value of a differentiable memory of a token history. In this paper, we propose a neural language model with a key-value attention mechanism that outputs separate representations for the key and value of a differentiable memory, as well as for encoding the next-word distribution. This model outperforms existing memory-augmented neural language models on two corpora. Yet, we found that our method mainly utilizes a memory of the five most recent output representations. This led to the unexpected main finding that a much simpler model based only on the concatenation of recent output representations from previous time steps is on par with more sophisticated memory-augmented neural language models.
LiveScan3D is a free, open source system for live, 3D data acquisition using multiple Kinect v2 sensors. It allows the user to place any number of sensors in any physical configuration and start gathering data at real time speed. The freedom of placing the sensors in any configuration allows for many possible acquisition scenarios such as: capturing a single object from many viewpoints or creating 3D panoramas with multiple devices located close to each other. Thanks to the off-the-shelf Kinect v2 sensor the system is both accurate and inexpensive, opening 3D acquisition up to more recipients. In the paper we describe our system with the algorithms it is using and show its effectiveness in multiple scenarios including head shape reconstruction and 3D reconstruction of dynamic scenes.
This paper describes a novel system for building morphable 3D head models. In contrast to most of the previous approaches that need several seconds to capture each scan, we acquire the data using a matrix of calibrated RGBD cameras, enabling real time face scanning. We localize the face and it’s 68 characteristic points on an orthogonal projection image, and use the detected points to align multiple scans. We use a Delaunay triangulation of the 68 characteristic points to obtain dense head shapes with point to point correspondence across all 3D head shapes. In the last step we create a morphable model in a way that is similar to the original procedure by Blanz and Vetter. We demonstrate the functionality of our model, created on just five people, in a real-time application. The novelty of this article lies mostly in the method of defining correspondences of the characteristic points in 3D, that leads to a realistic three-dimensional model and blendshapes.