A final-year PhD-student at University Tübingen and MPI Tübingen studying Machine Learning and Massively Parallel Computing with applications in Computer Graphics and Computer Vision. I write reviews for TPAMI, NIPS, CVPR, ICCV, ACCV, SIGGRAPH and SIGGRAPH Asia papers.

Conference Papers

Flex-Convolution (Deep Learning Beyond Grid-Worlds)
Asian Conference on Computer Vision (ACCV) 2018

Fabian Groh*, Patrick Wieschollek*, Hendrik P.A. Lensch
Traditional convolution layers are specifically designed to exploit the natural data representation of images -- a fixed and regular grid. However, unstructured data like 3D point clouds containing irregular neighborhoods constantly breaks the grid-based data assumption. Therefore applying best-practices and design choices from 2D-image learning methods towards processing point clouds are not readily possible. In this work, we introduce a natural generalization flex-convolution of the conventional convolution layer along with an efficient GPU implementation. We demonstrate competitive performance on rather small benchmark sets using fewer parameters and lower memory consumption and obtain significant improvements on a million-scale real-world dataset. Ours is the first which allows to efficiently process 7 million points concurrently.

Separating Reflection and Transmission Images in the Wild
European Conference on Computer Vision (ECCV) 2018

Patrick Wieschollek, Orazio Gallo, Jinwei Gu, Jan Kautz
The reflections caused by common semi-reflectors, such as glass windows, can severely impact the performance of computer vision algorithms. State-of-the-art works can successfully remove reflections on synthetic data and in controlled scenarios. However, they are based on strong assumptions and fail to generalize to real-world images---even when they leverage polarization. We present a deep learning approach to separate the reflected and the transmitted components of the recorded irradiance. Key to our approach is our synthetic data generation, which accurately simulates reflections, including those generated by curved and non-ideal surfaces, and non-static scenes. We extensively validate our method against a number of related works on a new dataset of images captured in the wild.
@inproceeding{eccv2017/Wieschollek, author = {Patrick Wieschollek and Orazio Gallo and Jinwei Gu and Jan Kautz }, title = {Separating Reflection and Transmission Images in the Wild}, booktitle = {European Conference on Computer Vision (ECCV)}, month = {September}, year = {2018} }

Will People Like Your Image?
IEEE Winter Conf. on Applications of Computer Vision (WACV) 2018

Katharina Schwarz, Patrick Wieschollek, Hendrik P.A. Lensch
Rating how aesthetically pleasing an image appears is a highly complex matter and depends on a large number of different visual factors. Previous work has tackled the aesthetic rating problem by ranking on a 1-dimensional rating scale, e.g., incorporating handcrafted attributes. In this paper, we propose a rather general approach to map aesthetic pleasingness with all its complexity into an automatically “aesthetic space” to allow for a highly fine-grained resolution. In detail, making use of deep learning, our method directly learns an encoding of a given image into this high-dimensional feature space resembling visual aesthetics. In addition to the mentioned visual factors, differences in personal judgments have a substantial impact on the likeableness of a photograph. Nowadays, online platforms allow users to “like” or favor particular content with a single click. To incorporate a vast diversity of people, we make use of such multi-user agreements and assemble an extensive data set of 380K images (AROD) with associated meta information and derive a score to rate how visually pleasing a given photo is. We validate our derived model of aesthetics in a user study. Further, without any extra data labeling or handcrafted features, we achieve state-of-the-art accuracy on the AVA benchmark data set. Finally, as our approach is able to predict the aesthetic quality of any arbitrary image or video, we demonstrate our results on applications for resorting photo collections, capturing the best shot on mobile devices and aesthetic key-frame extraction from videos.
@inproceedings{wacv2018/Schwarz author = {Katharina Schwarz and Patrick Wieschollek and Hendrik P.A. Lensch}, title = {Will People Like Your Image?}, booktitle = {IEEE Winter Conf. on Applications of Computer Vision (WACV)}, month = {March}, year = {2018} }

Learning Blind Motion Deblurring
IEEE International Conference on Computer Vision (ICCV) 2017

Patrick Wieschollek, Michael Hirsch, Bernhard Schölkopf, Hendrik P.A. Lensch
As handheld video cameras are now commonplace and available in every smartphone images and videos can be recorded almost everywhere at anytime. However, taking a quick shot frequently yields a blurry result due to unwanted camera shake during recording or moving objects in the scene. Removing these artifacts from the blurry recordings is a highly ill-posed problem as neither the sharp image nor the motion blur kernel are known. Propagating information between multiple consecutive blurry observations can help restore the desired sharp image or video. Solutions for blind deconvolution based on neural networks rely on a massive amount of ground-truth data which is hard to acquire. In this work, we propose an efficient approach to produce a significant amount of realistic training data and introduce a novel recurrent network architecture to deblur frames taking temporal information into account, which can efficiently handle arbitrary spatial and temporal input sizes. We demonstrate the versatility of our approach in a comprehensive comparison on a number of challening real-world examples.

Interactive comparison to FBA:

A single blurry shot from the input sequence.
Deblurring Result Fourier-Burst-Accumulation.
Delbracio et al. (CVPR 2015)
Deblurring Result RDN.
ours (ICCV 2017)
A single blurry shot from the input sequence.
Deblurring Result Fourier-Burst-Accumulation.
Delbracio et al. (CVPR 2015)
Deblurring Result RDN.
ours (ICCV 2017)
@inproceeding{iccv2017/Wieschollek, author = {Patrick Wieschollek and Michael Hirsch Bernhard Sch{\"{o}}lkopf and Hendrik P. A. Lensch }, title = {Learning Blind Motion Deblurring}, booktitle = {International Conference on Computer Vision (ICCV)}, month = {October}, year = {2017} }

Learning Robust Video Synchronization without Annotations
IEEE International Conference On Machine Learning And Applications (ICMLA) 2017 [oral presentation]

Patrick Wieschollek, Ido Freeman, Hendrik P.A. Lensch
Aligning video sequences is a fundamental yet still unsolved component for a broad range of applications in computer graphics and vision. Most classical image processing methods cannot be directly applied to related video problems due to the high amount of underlying data and their limit to small changes in appearance. We present a scalable and robust method for computing a non-linear temporal video alignment. The approach autonomously manages its training data for learning a meaningful representation in an iterative procedure each time increasing its own knowledge. It leverages on the nature of the videos themselves to remove the need for manually created labels. While previous alignment methods similarly consider weather conditions, season and illumination, our approach is able to align videos from data recorded months apart.
@inproceeding{icmla2017/Wieschollek, author = {Patrick Wieschollek and Ido Freeman and Hendrik P. A. Lensch }, title = {Learning Robust Video Synchronization without Annotations}, booktitle = {International Conference On Machine Learning And Applications (ICMLA)}, month = {December}, year = {2017} }

End-to-End Learning for Image Burst Deblurring
Asian Conference on Computer Vision (ACCV) 2016 [oral presentation]

Patrick Wieschollek, Bernhard Schölkopf, Hendrik P.A. Lensch, Michael Hirsch
We present a neural network model approach for multi-frame blind deconvolution. The discriminative approach adapts and combines two recent techniques for image deblurring into a single neural network architecture. Our proposed hybrid architecture combines the explicit prediction of a deconvolution filter and non-trivial averaging of Fourier coefficients in the frequency domain. To make full use of the information contained in all images in one burst, the proposed network embeds smaller networks, which explicitly allow the model to transfer information between images in early layers. Our system is trained end-to-end using standard backpropagation on a set of artificially generated training examples, enabling competitive performance in multi-frame blind deconvolution, both on quality and runtime.
1 Random shot of 5 blurry input images and result of our deblurring approach. Drag the line to reveal the before and after.
@inproceeding{accv2016/Wieschollek, author = {Patrick Wieschollek and Bernhard Sch{\"{o}}lkopf and Hendrik P. A. Lensch and Michael Hirsch }, title = {End-to-End Learning for Image Burst Deblurring}, booktitle = {Asian Conference on Computer Vision (ACCV)}, month = {November}, year = {2016} }

Efficient Large-scale Approximate Nearest Neighbor Search on the GPU
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016

Patrick Wieschollek, Oliver Wang, Alexander Sorkine-Hornung, Hendrik P.A. Lensch
We present a new approach for efficient approximate nearest neighbor (ANN) search in high dimensional spaces, extending the idea of Product Quantization. We propose a two level product and vector quantization tree that reduces the number of vector comparisons required during tree traversal. Our approach also includes a novel highly parallelizable re-ranking method for candidate vectors by efficiently reusing already computed intermediate values. Due to its small memory footprint during traversal the method lends itself to an efficient, parallel GPU implementation. This Product Quantization Tree (PQT) approach significantly outperforms recent state of the art methods for high dimensional nearest neighbor queries on standard reference datasets. Ours is the first work that demonstrates GPU performance superior to CPU performance on high dimensional, large scale ANN problems in time-critical real-world applications, like loop-closing in videos.
@inproceedings{cvpr2016/Wieschollek author = {Patrick Wieschollek and Oliver Wang and Alexander Sorkine-Hornung and Hendrik P.A. Lensch}, title = {Efficient Large-scale Approximate Nearest Neighbor Search on the GPU}, booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2016} }

Robust and Efficient Kernel Hyperparameter Paths with Guarantees
International Conference on Machine Learning (ICML) 2014

Joachim Giesen, Soeren Laue, Patrick Wieschollek
We present a general framework for computing approximate solution paths for parameterized optimization problems. The framework can not only be used to compute regularization paths but also for computing the entire kernel hyperparamter solution path for support vector machines and the robust Kernel Regression. We prove a combinatorial complexity of the ε-approximate solution path of O(1/ε) which is independent of the number of data points.
@proceedings{icml2014/Wieschollek, Author = {Joachim Giesen and Soeren Laue and Patrick Wieschollek}, title = {Proceedings of the 31th International Conference on Machine Learning,{ICML} 2014, Beijing, China, 21-26 June 2014}, series = {{JMLR} Workshop and Conference Proceedings}, volume = {32}, publisher = {JMLR.org}, year = {2014}, url = {http://jmlr.org/proceedings/papers/v32/} }

Pre-Prints

Backpropagation Training for Fisher Vectors within Neural Networks
pre-print (2016)

Patrick Wieschollek, Fabian Groh, Hendrik P.A. Lensch
Fisher-Vectors (FV) encode higher-order statistics of a set of multiple local descriptors like SIFT features. They already show good performance in combination with shallow learning architectures on visual recognitions tasks. Current methods using FV as a feature descriptor in deep architectures assume that all original input features are static. We propose a framework to jointly learn the representation of original features, FV parameters and parameters of the classifier in the style of traditional neural networks. Our proof of concept implementation improves the performance of FV on the Pascal Voc 2007 challenge in a multi-GPU setting in comparison to a default SVM setting. We demonstrate that FV can be embedded into neural networks at arbitrary positions, allowing end-to-end training with back-propagation.

Transfer Learning for Material Classification using Convolutional Networks
pre-print (2015)

Patrick Wieschollek, Hendrik P.A. Lensch
Material classification in natural settings is a challenge due to complex interplay of geometry, reflectance properties, and illumination. Previous work on material classification relies strongly on hand-engineered features of visual samples. In this work we use a Convolutional Neural Network (convnet) that learns descriptive features for the specific task of material recognition. Specifically, transfer learning from the task of object recognition is exploited to more effectively train good features for material classification. The approach of transfer learning using convnets yields significantly higher recognition rates when compared to previous state-of-the-art approaches. We then analyze the relative contribution of reflectance and shading information by a decomposition of the image into its intrinsic components. The use of convnets for material classification was hindered by the strong demand for sufficient and diverse training data, even with transfer learning approaches. Therefore, we present a new data set containing approximately 10k images divided into 10 material categories.