# VISAPP 2017 Abstracts

## Area 1 - Image Formation and Preprocessing

Full Papers
Paper Nr: 38
Title:

### Specialization of a Generic Pedestrian Detector to a Specific Traffic Scene by the Sequential Monte-Carlo Filter and the Faster R-CNN

Authors:

#### Ala Mhalla, Thierry Chateau, Sami Gazzah and Najoua Essoukri Ben Amara

Abstract: The performance of a generic pedestrian detector decreases significantly when it is applied to a specific scene due to the large variation between the source dataset used to train the generic detector and samples in the target scene. In this paper, we suggest a new approach to automatically specialize a scene-specific pedestrian detector starting with a generic detector in video surveillance without further manually labeling any samples under a novel transfer learning framework. The main idea is to consider a deep detector as a function that generates realizations from the probability distribution of the pedestrian to be detected in the target. Our contribution is to approximate this target probability distribution with a set of samples and an associated specialized deep detector estimated in a sequential Monte Carlo filter framework. The effectiveness of the proposed framework is demonstrated through experiments on two public surveillance datasets. Compared with a generic pedestrian detector and the state-of-the-art methods, our proposed framework presents encouraging results.

Paper Nr: 49
Title:

### Fast Scalable Coding based on a 3D Low Bit Rate Fractal Video Encoder

Authors:

#### Vitor de Lima, Thierry Moreira, Helio Pedrini and William Robson Schwartz

Abstract: Video transmissions usually occur at a fixed or at a small number of predefined bit rates. This can lead to several problems in communication channels whose bandwidth can vary along time (e.g. wireless devices). This work proposes a video encoding method for solving such problems through a fine rate control that can be dynamically adjusted with low overhead. The encoder uses fractal compression and a simple rate distortion heuristic to preprocess the content in order to speed up the process of switching between different bit rates. Experimental results show that the proposed approach can accurately transcode a preprocessed video sequence into a large range of bit rates with a small computational overhead.

Paper Nr: 62
Title:

### A Robust Chessboard Detector for Geometric Camera Calibration

Authors:

#### Mathis Hoffmann, Andreas Ernst, Tobias Bergen, Sebastian Hettenkofer and Jens-Uwe Garbas

Abstract: We introduce an algorithm that detects chessboard patterns in images precisely and robustly for application in camera calibration. Because of the low requirements on the calibration images, our solution is particularly suited for endoscopic camera calibration. It successfully copes with strong lens distortions, partially occluded patterns, image blur, and image noise. Our detector initially uses a sparse sampling method to find some connected squares of the chessboard pattern in the image. A pattern-growing strategy iteratively locates adjacent chessboard corners with a region-based corner detector. The corner detector examines entire image regions with the help of the integral image to handle poor image quality. We show that it outperforms recent solutions in terms of detection rates and performs at least equally well in terms of accuracy.

Paper Nr: 75
Title:

### Hierarchical Techniques to Improve Hybrid Point Cloud Registration

Authors:

#### Ferran Roure, Xavier Lladó, Joaquim Salvi, Tomislav Pribanić and Yago Diez

Abstract: Reconstructing 3D objects by gathering information from multiple spatial viewpoints is a fundamental problem in a variety of applications ranging from heritage reconstruction to industrial image processing. A central issue is known as the ”point set registration or matching” problem. where the two sets being considered are to be rigidly aligned. This is a complex problem with a huge search space that suffers from high computational costs or requires expensive and bulky hardware to be added to the scanning system. To address these issues, a hybrid hardware-software approach was presented in (Pribanic et al., 2016) allowing for fast software registration by using commonly available (smartphone) sensors. In this paper we present hierarchical techniques to improve the performance of this algorithm. Additionally, we compare the performance of our algorithm against other approaches. Experimental results using real data show how the algorithm presented greatly improves the time of the previous algorithm and perform best over all studied algorithms.

Paper Nr: 81
Title:

### Development of Real-time HDTV-to-8K TV Upconverter

Authors:

#### Seiichi Gohshi, Shinichiro Nakamura and Hiroyuki Tabata

Abstract: Recent reports show that 4K and 8K TV systems are expected to replace HDTV in the near future. 4K TV broadcasting has begun commercially and the same for 8K TV is projected to begin by 2018. However, the availability of content for 8K TV is still insufficient, a situation similar to that of HDTV in the 1990s. Upconverting analogue content to HDTV content was important to supplement the insufficient HDTV content. This upconverted content was also important for news coverage as HDTV equipment was heavy and bulky. The current situation for 4K and 8K TV is similar wherein covering news with 8K TV equipment is very difficult as this equipment is much heavier and bulkier than that required for HDTV in the 1990s. The HDTV content available currently is sufficient, and the equipment has also evolved to facilitate news coverage; therefore, an HDTV-to-8K TV upconverter can be a solution to the problems described above . However, upconversion from interlaced HDTV to 8K TV results in an enlargement of the images by a factor of 32, thus making the upconverted images very blurry. An upconverter with super resolution has been proposed in this study in order to fix this issue.

Paper Nr: 119
Title:

### Fast Intra Prediction Algorithm with Enhanced Sampling Decision for H.265/HEVC

Authors:

#### Sio-Kei Im, Mohammad Mahdi Ghandi and Ka-Hou Chan

Abstract: H.265/HEVC is the latest video coding standard, which offers superior compression performance against H.264/AVC at the cost of greater complexity in its encoding process. In the intra coding of HEVC, a Coding Unit (CU) is recursively divided into a quad-tree-based structure from the Largest Coding Unit (LCU). At each level, up to 35 potential intra modes should be checked. However, examining all these modes is very time-consuming. In this paper, an intra mode decision algorithm is proposed that reduces the required computations while having negligible effect on Rate-Distortion (RD) performance. A rough mode decision method based on image component sampling is proposed to reduce the number of candidate modes for rough mode decision and RD optimization. To balance the quality and performance, the decision to reduce the full search is made with a threshold that is dynamically updated based on the Quantization Parameter (QP) and CU size of each recursive step. Experiments show that our algorithm can achieve a reasonable trade-off between encoding quality and efficiency. The saving in encoding time is between 30.0% to 45.0% while BD-RATE may increase by up to 0.5% for H.265/HEVC reference software HM 16.9 under all-intra configuration.

Paper Nr: 120
Title:

### Pushing the Limits for View Prediction in Video Coding

Authors:

#### Jens Ogniewski and Per-Erik Forssén

Abstract: More and more devices have depth sensors, making RGB+D video data increasingly common. Depth images have also been considered for 3D and free-viewpoint video coding. This depth data can be used to render a given scene from different viewpoints, thus making it a useful asset in e.g. view prediction for video coding. In this paper we evaluate a multitude of algorithms for scattered data interpolation, in order to optimize the performance of frame prediction for video coding. Our evaluation uses the depth extension of the Sintel datasets. Using ground-truth sequences is crucial for such an optimization, as it ensures that all errors and artifacts are caused by the prediction itself rather than noisy or erroneous data. We also present a comparison with the commonly used mesh-based projection.

Paper Nr: 125
Title:

### Dehazing using Non-local Regularization with Iso-depth Neighbor-Fields

Authors:

#### Incheol Kim and Min H. Kim

Abstract: Removing haze from a single image is a severely ill-posed problem due to the lack of the scene information. General dehazing algorithms estimate airlight initially using natural image statistics and then propagate the incompletely estimated airlight to build a dense transmission map, yielding a haze-free image. Propagating haze is different from other regularization problems, as haze is strongly correlated with depth according to the physics of light transport in participating media. However, since there is no depth information available in single-image dehazing, traditional regularization methods with a common grid random field often suffer from haze isolation artifacts caused by abrupt changes in scene depths. In this paper, to overcome the haze isolation problem, we propose a non-local regularization method by combining Markov random fields (MRFs) with nearest-neighbor fields (NNFs), based on our insightful observation that the NNFs searched in a hazy image associate patches at the similar depth, as local haze in the atmosphere is proportional to its depth. We validate that the proposed method can regularize haze effectively to restore a variety of natural landscape images, as demonstrated in the results. This proposed regularization method can be used separately with any other dehazing algorithms to enhance haze regularization.

Paper Nr: 134
Title:

### Combining Different Reconstruction Kernel Responses as Preprocessing Step for Airway Tree Extraction in CT Scan

Authors:

#### Samah Bouzidi, Fabien Baldacci, Chokri ben Amar and Pascal Desbarats

Abstract: In this paper, we propose a new preprocessing procedure that combines the responses of different Computed Tomography (CT) reconstruction kernels in order to improve the segmentation of the airway tree. These filters are available in all commercial CT scanner. A broad range of preprocessing techniques have been proposed but all of them operate on images reconstructed using a single reconstruction filter. In this work, the new preprocessing approach is based on a fusion of images reconstructed using different reconstruction kernels and can be included as a preprocessing stage in every segmentation pipeline. Our approach has been applied on various CT scans and an experimental comparison study between state of the art of segmentation approaches results performed on processed and unprocessed data has been made. Results show that the fusion process improves segmentation results and removes false positives.

Paper Nr: 186
Title:

### Specularity, Shadow, and Occlusion Removal for Planar Objects in Stereo Case

Authors:

#### Irina Nurutdinova, Ronny Hänsch, Vincent Mühler, Stavroula Bourou, Alexandra I. Papadaki and Olaf Hellwich

Abstract: Specularities, shadows, and occlusions are phenomena that commonly occur in images and cause a loss of information. This paper addresses the task to detect and remove all these phenomena simultaneously in order to obtain a corrected image with all information visible and recognizable. The proposed (semi-)automatic algorithm utilizes two input images that depict a planar object. The images can be acquired without special equipment (such as flash systems) or restrictions on the spatial camera layout. Experiments were performed for various combinations of objects, phenomena occurring, and capturing conditions. The algorithm perfectly detects and removes specularities in all examined cases. Shadows and occlusions are satisfactorily detected and removed with minimal user intervention in the majority of the performed experiments.

Paper Nr: 200
Title:

### CUDA Accelerated Visual Egomotion Estimation for Robotic Navigation

Authors:

#### Safa Ouerghi, Remi Boutteau, Xavier Savatier and Fethi Tlili

Abstract: Egomotion estimation is a fundamental issue in structure from motion and autonomous navigation for mobile robots. Several camera motion estimation methods from a set of variable number of image correspondances have been proposed. Five-point methods represent the minimal number of required correspondences to estimate the essential matrix, raised special interest for their application in a hypothesize-and-test framework. This algorithm allows relative pose recovery at the expense of a much higher computational time when dealing with higher ratios of outliers. To solve this problem with a certain amount of speedup, we propose in this work, a CUDA-based solution for the essential matrix estimation performed using the Grobner basis version of 5-point algorithm, complemented with robust estimation. The description of the hardware-specific implementation considerations as well as the parallelization methods employed are given in detail. Performance analysis against existing CPU implementation is also given, showing a speedup 4 times faster than the CPU for an outlier ratio e = 0.5, common for the essential matrix estimation from automatically computed point correspondences. More speedup was shown when dealing with higher outlier ratios.

Paper Nr: 213
Title:

### Automatic Separation of Basal Cell Carcinoma from Benign Lesions in Dermoscopy Images with Border Thresholding Techniques

Authors:

#### Nabin K. Mishra, Ravneet Kaur, Reda Kasmi, Serkan Kefel, Pelin Guvenc, Justin G. Cole, Jason R. Hagerty, Hemanth Y. Aradhyula, Robert LeAnder, R. Joe Stanley, Randy H. Moss and William V. Stoecker

Abstract: Basal cell carcinoma (BCC), with an incidence in the US exceeding 2.7 million cases/year, exacts a significant toll in morbidity and financial costs. Earlier BCC detection via automatic analysis of dermoscopy images could reduce the need for advanced surgery. In this paper, automatic diagnostic algorithms are applied to images segmented by five thresholding segmentation routines. Experimental results for five new thresholding routines are compared to expert-determined borders. Logistic regression analysis shows that thresholding segmentation techniques yield diagnostic accuracy that is comparable to that obtained with manual borders. The experimental results obtained with algorithms applied to automatically segmented lesions demonstrate significant potential for the new machine vision techniques.

Paper Nr: 217
Title:

### Global Patch Search Boosts Video Denoising

Authors:

#### Thibaud Ehret, Pablo Arias and Jean-Michel Morel

Abstract: With the increasing popularity of mobile imaging devices and the emergence of HdR video surveillance, the need for fast and accurate denoising algorithms has also increased. Patch-based methods, which are currently state-of-the-art in image and video denoising, search for similar patches in the signal. This search is generally performed locally around each target patch for obvious complexity reasons. We propose here a new and efficient approximate patch search algorithm. It permits for the first time to evaluate the impact of a global search on the video denoising performance. A global search is particularly justified in video denoising, where a strong temporal redundancy is often available. We first verify that the patches found by our new approximate search are far more concentrated than those obtained by exact local search, and are obtained in comparable time. To demonstrate the potential of the global search in video denoising, we take two patch-based image denoising algorithms and apply them to video. While with a classical local search their performance is poor, with the proposed global search they even improve the latest state-of-the-art video denoising methods.

Paper Nr: 235
Title:

### Color Edge Detection using Quaternion Convolution and Vector Gradient

Authors:

#### Nadia BenYoussef and Aicha Bouzid

Abstract: In this paper, a quaternion-based method is proposed for color image edge detection. A pair of quaternion mask is used for horizontal and vertical filter since quaternion convolution is not commutative. The detection procedure consists of two steps: quaternion convolution for edge detection and gradient vector to enhance edge structures. Experimental results demonstrate its capabilities on natural color images.

Short Papers
Paper Nr: 57
Title:

### Denoising of Noisy and Compressed Video Sequences

Authors:

#### A. Buades and J. L. Lisani

Abstract: A novel denoising algorithm is presented for video sequences. The proposed approach takes advantage of the self similarity and redundancy of adjacent frames. The algorithm automatically estimates a signal dependent noise model for each level of a multi-scale pyramid. A variance stabilization transform is applied at each scale and a novel sequence denoising algorithm is used. Experiments show that the new algorithm is able to correctly remove highly correlated noise from dark and compressed movie sequences. Particularly, we illustrate the performance with indoor and lowlight scenes acquired with mobile phones.

Paper Nr: 88
Title:

### A Novel 2.5D Feature Descriptor Compensating for Depth Rotation

Authors:

#### Frederik Hagelskjær, Norbert Krüger and Anders Glent Buch

Abstract: We introduce a novel type of local image descriptor based on Gabor filter responses. Our method operates on RGB-D images. We use the depth information to compensate for perspective distortions caused by out-of-plane rotations. The descriptor contains the responses of a multi-resolution Gabor bank. Contrary to existing methods that rely on a dominant orientation estimate to achieve rotation invariance, we utilize the orientation information in the Gabor bank to achieve rotation invariance during the matching stage. Compared to SIFT and a recent also projective distortion compensating descriptor proposed for RGB-D data, our method achieves a significant increase in accuracy when tested on a wide-baseline RGB-D matching dataset.

Paper Nr: 102
Title:

### Image Resolution Enhancement based on Curvelet Transform

Authors:

Abstract: We present an image resolution enhancement method based on Curvelet transform. This transform is used to decompose the input image into different subbands. After this decomposition, a nonlinear function is applied to the Curvelet coefficients in order to enhance the content of the different frequency subbands. These enhanced frequency subbands are then interpolated. We increase the enhancement results by a fusion of the obtained data and the interpolated input image. An image database is used for experiments. The visual results are showing the superiority of the proposed technique compared to two state-of-art image resolution enhancement techniques. These results have been confirmed by quantitative image quality metrics.

Paper Nr: 109
Title:

### Edge based Blind Single Image Deblurring with Sparse Priors

Authors:

#### Khouloud Guemri, Fadoua Drira, Rim Walha, Adel M. Alimi and Frank LeBourgeois

Abstract: Blind image deblurring is the estimation of the blur kernel and the latent sharp image from a blurry image. This makes it a significant ill-posed problem with various investigations looking for adequate solutions. The recourse to image priors have been noticed in recent approaches to improve final results. One of the most interesting results are based on data priors. This has been the starting point to the proposed blind image deblurring system. In particular, this study explores the potential of the sparse representation widely known for its efficiency in several reconstruction tasks. In fact, we propose a sparse representation based iterative deblurring method that exploits sparse constraints of edge based image patches. This process includes the K-SVD algorithm useful for the dictionary definition. Our main contributions are (1) the application of a shock filter as a pre-processing step followed by filter sub-bands applications for an effective contour detection, (2) the use of an online training data-sets with elementary patterns to describe edge-based information and (3) the recourse to an adaptative dictionary training. The experimental study illustrates promising results of the proposed deblurring method compared to the well-known state-of-the-art methods.

Paper Nr: 146
Title:

### Nuclei Segmentation using a Level Set Active Contour Method and Spatial Fuzzy C-means Clustering

Authors:

#### Ravali Edulapuram, R. Joe Stanley, Rodney Long, Sameer Antani, George Thoma, Rosemary Zuna, William V. Stoecker and Jason Hagerty

Abstract: Digitized histology images are analyzed by expert pathologists in one of several approaches to assess pre-cervical cancer conditions such as cervical intraepithelial neoplasia (CIN). Many image analysis studies focus on detection of nuclei features to classify the epithelium into the CIN grades. The current study focuses on nuclei segmentation based on level set active contour segmentation and fuzzy c-means clustering methods. Logical operations applied to morphological post-processing operations are used to smooth the image and to remove non-nuclei objects. On a 71-image dataset of digitized histology images (where the ground truth is the epithelial mask which helps in eliminating the non epithelial regions), the algorithm achieved an overall nuclei segmentation accuracy of 96.47%. We propose a simplified fuzzy spatial cost function that may be generally applicable for any n-class clustering problem of spatially distributed objects.

Paper Nr: 163
Title:

### 3D Video Multiple Description Coding Considering Region of Interest

Authors:

#### Ehsan Rahimi and Chris Joslin

Abstract: 3D video is becoming a most favorable video and attracting researcher’s mind to provide robust methods of streaming since packet failure has always been the inseparable characteristic of wired or wireless networks. This paper aims to provide a new multiple description coding for 3D video considering objects existed in the scene. To this end, a low complex algorithm for realizing objects in 3D scene will provided and then a non-identical decimation method with respect to objects will be utilized to produce descriptions of MDC approach. Also, in point of depth map image, a new non-identical MDC algorithm will be be introduced to stream depth map image saving bandwidth without affecting the quality of decoded video in the receiver side.

Paper Nr: 168
Title:

### Application of LSD-SLAM for Visualization Temperature in Wide-area Environment

Authors:

#### Masahiro Yamaguchi, Hideo Saito and Shoji Yachida

Abstract: In this paper, we propose a method to generate a three-dimensional (3D) thermal map by overlaying thermal images onto a 3D surface reconstructed by a monocular RGB camera. In this method, we capture the target scene moving both an RGB camera and a thermal camera, which are mounted on the same zig. From the RGB image sequence, we reconstruct 3D structures of the scene by using Large-Scale Direct Monocular Simultaneous Localization and Mapping (LSD-SLAM), on which temperature distribution captured by the thermal camera is overlaid, thus generate a 3D thermal map. The geometrical relationship between those cameras is calibrated beforehand by using a calibration board that can be detected by both cameras. Since we do not use depth cameras such as Kinect, the depth of the target scene is not limited by the measurement range of the depth camera; any depth range can be captured. To demonstrating this technique, we show synthesized 3D thermal maps for both indoor and outdoor scenes.

Paper Nr: 172
Title:

### Single Image Dehazing based on Dark Channel Prior with Different Atmospheric Light

Authors:

#### Sheng Zhang and Wencang Bai

Abstract: Single image dehazing based on dark channel prior could recover a high-quality haze-free image from non-sky image. However, it does not perform well in bright region such as sky region. This paper proposes a novel method for single image dehazing, which jointly considers the atmospheric lights of sky regions and land surface. In this proposal, we divide the image with sky regions into bright image (such as sky region and artificial light) and dark image (such as natural outdoor scenery and buildings) according to the image saturation, the intensity of pixels and Rayleigh scattering theory. In the recovery processing, bright image and dark image can be recovered separately with different parameters of atmospheric light. The experimental results show that the proposed scheme can obtain a high-quality haze-free image in the images which cover the sky.

Paper Nr: 183
Title:

### Automatic Calibration of the Optical System in Passive Component Inspection

Authors:

#### Sungho Suh and Moonjoo Kim

Abstract: A passive component inspection machine is to obtain a image of a passive component by using a specific lighting and camera, and to detect defects on the image of the component. It inspects all the aspects of the component based on the image which is captured by using the lightings and cameras. The number of the lightings and cameras are proportional to the number of the component aspects. To detect the defects of the component effectively, the difference between the image quality by each camera should be minimized. Even if the light conditions are calibrated automatically, the average intensities of the images are different because of influence of Bayer filter which is used in CCD camera in the passive component inspection machine. Moreover, there is one more problem that the range of the light intensity cannot cover the range of the component reflectance. Sometimes, it is needed to calibrate a gain value and white balance ratios of the camera manually. In order to solve the problems, we propose an automatic calibration method of the optical system in passive component inspection machine. The proposed method minimizes the influence of Bayer filter, does not use any initial camera calibration, and find the optimal values for the overall gain and white balance ratios of red, green, blue colors automatically. To reduce the influence of Bayer filter, we perform to find the optimal values of all colors balance ratio iteratively and formulate a relation between the overall gain and the white balance ratios to control all the parameters automatically. The proposed method is simple and the experimental results show that the proposed method provides faster and more precise than the previous method.

Paper Nr: 216
Title:

### Rolling Shutter Camera Synchronization with Sub-millisecond Accuracy

Authors:

#### Matěj Šmíd and Jiri Matas

Abstract: A simple method for synchronization of video streams with a precision better than one millisecond is proposed. The method is applicable to any number of rolling shutter cameras and when a few photographic flashes or other abrupt lighting changes are present in the video. The approach exploits the rolling shutter sensor property that every sensor row starts its exposure with a small delay after the onset of the previous row. The cameras may have different frame rates and resolutions, and need not have overlapping fields of view. The method was validated on five minutes of four streams from an ice hockey match. The found transformation maps events visible in all cameras to a reference time with a standard deviation of the temporal error in the range of 0.3 to 0.5 milliseconds. The quality of the synchronization is demonstrated on temporally and spatially overlapping images of a fast moving puck observed in two cameras.

Paper Nr: 230
Title:

### Action Recognition using the Rf Transform on Optical Flow Images

Authors:

#### Josep Maria Carmona and Joan Climent

Abstract: The objective of this paper is the automatic recognition of human actions in video sequences. The use of spatio-temporal features for action recognition has become very popular in recent literature Instead of extracting the spatio-temporal features from the raw video sequence, some authors propose to project the sequence to a single template first. As a contribution we propose the use of several variants of the R transform for projecting the image sequences to templates. The R transform projects the whole sequence to a single image, retaining information concerning movement direction and magnitude. Spatio-temporal features are extracted from the template, they are combined using a bag of words paradigm, and finally fed to a SVM for action classification. The method presented is shown to improve the state-of-art results on the standard Weizmann action dataset

Paper Nr: 256
Title:

### Image Super Resolution from Alignment Errors of Image Sensors and Spatial Light Modulators

Authors:

#### Masaki Hashimoto, Fumihiko Sakaue and Jun Sato

Abstract: In this paper, we propose a novel method for obtaining super resolution images by using alignment errors between an image sensor and a spatial light modulator, such as LCoS device, in the coded imaging systems. Recently, coded imaging systems are often used for obtaining high dynamic range (HDR) images and for deblurring depth and motion blurs. For obtaining accurate HDR images and unblur images, it is very important to setup the spatial light modulators with cameras accurately, so that the one-to-one correspondences hold between light modulator pixels and camera image pixels. However, the accurate alignment of the light modulator and the image sensor is very difficult in reality. In this paper, we do not adjust light modulators and image sensors accurately. Instead, we use the alignment errors between the light modulators and the image sensors for obtaining high resolution images from low resolution observations in the image sensors.

Paper Nr: 270
Title:

### Segmentation of the LV Wall with Trabeculations

Authors:

#### Clément Beitone, Christophe Tilmant and Frédéric Chausse

Abstract: The evaluation of cardiac functional parameters for heart disease diagnosis requires to have an accurate segmentation result. We propose a method to efficiently and reliably segment both the endocardial and the epicardial borders of the left ventricle. We use MR short axis images acquired in SSFP mode. Our framework combines a threshold-based approach to produce an estimation of the shape of the cardiac wall and a level set approach that refine it. We assessed our method on two databases built for two MICCAI challenges. Our results would have positioned us at the third place of the 2009 challenges.

Paper Nr: 274
Title:

### Medical Image Processing in the Age of Deep Learning - Is There Still Room for Conventional Medical Image Processing Techniques?

Authors:

#### Jason Hagerty, R. Joe Stanley and William V. Stoecker

Abstract: Deep learning, in particular convolutional neural networks, has increasingly been applied to medical images. Advances in hardware coupled with availability of increasingly large data sets have fueled this rise. Results have shattered expectations. But it would be premature to cast aside conventional machine learning and image processing techniques. All that deep learning comes at a cost, the need for very large datasets. We discuss the role of conventional manually tuned features combined with deep learning. This process of fusing conventional image processing techniques with deep learning can yield results that are superior to those obtained by either learning method in isolation. In this article, we review the rise of deep learning in medical image and the recent onset of fusion of learning methods. We discuss supervision equilibrium point and the factors that favor the role of fusion methods for histopathology and quasi-histopathology modalities.

Posters
Paper Nr: 8
Title:

### Oil Portrait Snapshot Classification on Mobile

Authors:

#### Yan Sun and Xiaomu Niu

Abstract: In recent years, several art museums have developed smartphone applications as the e-guide in museums. However few of them provide the function of instant retrieval and identification for a painting snapshot taken by mobile. Therefore in this work we design and implement an oil portrait classification application on smartphone. The accuracy of recognition suffers greatly by aberration, blur, geometric deformation and shrinking due to the unprofessional quality of snapshots. Low-megapixel phone camera is another factor downgrading the classification performance. Carefully studying the nature of such photos, we adopts the SIPH algorithm (Scale-invariant feature transform based Image Perceptual Hashing)) to extract image features and generate image information digests. Instead of popular conventional Hamming method, we applied an effective method to calculate the perceptual distance. Testing results show that the proposed method conducts satisfying performance on robustness and discriminability in portrait snapshot identification and feature indexing.

Paper Nr: 143
Title:

### Real-world Pill Segmentation based on Superpixel Merge using Region Adjacency Graph

Authors:

#### Sudhir Sornapudi, R. Joe Stanley, Jason Hagerty and William V. Stoecker

Abstract: Misidentified or unidentified prescription pills are an increasing challenge for all caregivers, both families and professionals. Errors in pill identification may lead to serious or fatal adverse events. To respond to this challenge, a fast and reliable automated pill identification technique is needed. The first and most critical step in pill identification is segmentation of the pill from the background. The goals of segmentation are to eliminate both false detection of background area and false omission of pill area. Introduction of either type of error can cause errors in color or shape analysis and can lead to pill misidentification. The real-world consumer images used in this research provide significant segmentation challenges due to varied backgrounds and lighting conditions. This paper proposes a color image segmentation algorithm by generating superpixels using the Simple Linear Iterative Clustering (SLIC) algorithm and merging the superpixels by thresholding the region adjacency graphs. Post-processing steps are given to result in accurate pill segmentation. The segmentation accuracy is evaluated by comparing the consumer-quality pill image segmentation masks to the high quality reference pill image masks.

Paper Nr: 144
Title:

### Color Feature-based Pillbox Image Color Recognition

Authors:

#### Peng Guo, Ronald J. Stanley, Justin G. Cole, Jason Hagerty and William V. Stoecker

Abstract: Patients, their families and caregivers routinely examine pills for medication identification. Key pill information includes color, shape, size and pill imprint. The pill can then be identified using an online pill database. This process is time-consuming and error prone, leading researchers to develop techniques for automatic pill identification. Pill color may be the pill feature that contributes most to automatic pill identification. In this research, we investigate features from two color planes: red, green and blue (RGB), and hue saturation and value (HSV), as well as chromaticity and brightness features. Color-based classification is explored using MatLab over 2140 National Library of Medicine (NLM) Pillbox reference images using 20 feature descriptors. The pill region is extracted using image processing techniques including erosion, dilation and thresholding. Using a leave-one-image-out approach for classifier training/testing, a support vector machine (SVM) classifier yielded an average accuracy over 12 categories as high as 97.90%.

Paper Nr: 147
Title:

### Multi-view ToF Fusion for Object Detection in Industrial Applications

Authors:

#### Inge Coudron and Toon Goedemé

Abstract: The use of time-of-flight (ToF) cameras in industrial applications has become increasingly popular due to the camera’s reduced cost and its ability to provide real-time depth information. Still, one of the main drawbacks of these cameras has been their limited field of view. We therefore propose a technique to fuse the views of multiple ToF cameras. By mounting two cameras side by side and pointing them away from each other, the horizontal field of view can be artificially extended. The combined views can then be used for object detection. The main advantages of our technique is that the calibration is fully automatic and only one shot of the calibration target is needed. Furthermore, no overlap between the views is required.

Paper Nr: 219
Title:

### High-speed Motion Detection using Event-based Sensing

Authors:

#### Jose A. Boluda, Fernando Pardo and Francisco Vegara

Abstract: Event-based vision emerges as an alternative to conventional full-frame image processing. In event-based systems there is a vision sensor which delivers visual events asynchronously, typically illumination level changes. The asynchronous nature of these sensors makes it difficult to process the corresponding data stream. It might be possible to have few events to process if there are minor changes in the scene, or conversely, to have an untreatable explosion of events if the whole scene is changing quickly. A Selective Change-Driven (SCD) sensing system is a special event-based sensor which only delivers, in a synchronous manner and ordered by the magnitude of its change, those pixels that have changed most since the last time they have been read-out. To prove this concept, a processing architecture for high-speed motion analysis, based on the processing of the SCD pixel stream has been developed and implemented into a Field Programmable Gate-Array (FPGA). The system measures average distances using a laser line projected into moving objects. The acquisition, processing and delivery of distance takes less than 2 us. To obtain a similar result using a conventional frame-based camera it would be required a device working at more than 500 Kfps, which is not practical in embedded and limited-resource systems. The implemented system is small enough to be mounted on an autonomous platform.

Paper Nr: 220
Title:

### Detecting Non-lambertian Materials in Video

Authors:

#### Seyed Mahdi Javadi, Yongmin Li and Xiaohui Liu

Abstract: This paper describes a novel method to identify and distinguish shiny and glossy materials in videos automatically. The proposed solution works by analyzing the logarithm of chromaticity of sample pixels from various materials over a period of time to differentiate between shiny and matt textures. The Lambertian materials have different reflectance model and the distribution of their chromaticity is not the same as non-Lambertian texture. We will use this to detect shiny materials. This system has many application in texture and object recognition, water leakage and oil spillage detection systems.

Paper Nr: 227
Title:

### Subjective Assessment Method for Multiple Displays with and without Super Resolution

Authors:

#### Chinatsu Mori and Seiichi Gohshi

Abstract: At present, although 4K TV sets are available in the market, the provision of 4K TV content is still not sufficient. Almost all TV content is in high-definition television (HDTV) broadcasting, and images/videos with insufficient resolution are up-converted to the resolution of the display. Thus, almost all 4K TV sets are equipped with super-resolution (SR) technology to improve the resolution of the content. However, the performance of SR on TV sets has not been guaranteed. Although the capability of SR needs to be assessed, there has been no standard method for such an assessment. In this paper, a subjective assessment method for multiple displays is proposed. Subjective assessment experiments of displays with and without SR are conducted to confirm the ability of an SR method. As the results of statistical analysis, the superiority of the SR in resolution quality is proved by the significant differences indicating the reproducible results. As the reproducible results are obtainable, the proposed method is useful to assess multiple displays. In this paper, the methodology of the proposed assessment method is described and the experimental results are presented.

Paper Nr: 247
Title:

### Exploratory Multimodal Data Analysis with Standard Multimedia Player - Multimedia Containers: A Feasible Solution to Make Multimodal Research Data Accessible to the Broad Audience

Authors:

#### Julius Schöning, Anna L. Gert, Alper Açık, Tim C. Kietzmann, Gunther Heidemann and Peter König

Abstract: The analysis of multimodal data comprised of images, videos and additional recordings, such as gaze trajectories, EEG, emotional states, and heart rate is presently only feasible with custom applications. Even exploring such data requires compilation of specific applications that suit a specific dataset only. This need for specific applications arises since all corresponding data are stored in separate files in custom-made distinct data formats. Thus accessing such datasets is cumbersome and time-consuming for experts and virtually impossible for non-experts. To make multimodal research data easily shareable and accessible to a broad audience, like researchers from diverse disciplines and all other interested people, we show how multimedia containers can support the visualization and sonification of scientific data. The use of a container format allows explorative multimodal data analyses with any multimedia player as well as streaming the data via the Internet. We prototyped this approach on two datasets, both with visualization of gaze data and one with additional sonification of EEG data. In a user study, we asked expert and non-expert users about their experience during an explorative investigation of the data. Based on their statements, our prototype implementation, and the datasets, we discuss the benefit of storing multimodal data, including the corresponding videos or images, in a single multimedia container. In conclusion, we summarize what is necessary for having multimedia containers as a standard for storing multimodal data and give an outlook on how artificial networks can be trained on such standardized containers.

Paper Nr: 250
Title:

### Single Image Marine Snow Removal based on a Supervised Median Filtering Scheme

Authors:

Abstract: Underwater image processing has attracted a lot of attention due to the special difficulties at capturing clean and high quality images in this medium. Blur, haze, low contrast and color cast are the main degradations. In an underwater image noise is mostly considered as an additive noise (e.g. sensor noise), although the visibility of underwater scenes is distorted by another source, termed marine snow. This signal disturbs image processing methods such as enhancement and segmentation. Therefore removing marine snow can improve image visibility while helping advanced image processing approaches such as background subtraction to yield better results. In this article, we propose a simple but effective filter to eliminate these particles from single underwater images. It consists of different steps which adapt the filter to fit the characteristics of marine snow the best. Our experimental results show the success of our algorithm at outperforming the existing approaches by effectively removing this phenomenon and preserving the edges as much as possible.

Paper Nr: 263
Title:

### Calibration of a Different Field-of-view Stereo Camera System using an Embedded Checkerboard Pattern

Authors:

#### Pathum Rathnayaka, Seung-Hae Baek and Soon-Yong Park

Abstract: Knowing the correct relative pose between cameras is considered as the first and foremost important step in a stereo camera system. It has been of the interest in many computer vision related experiments. Much work has been introduced for stereo systems with relatively common field-of-views; where a few number of advanced feature points-based methods have been presented for partially overlapping field-of-view systems. In this paper, we propose a new, yet simplified, method to calibrate a partially overlapping field-of-view heterogeneous stereo camera system using a specially designed embedded planar checkerboard pattern. The embedded pattern is a combination of two differently colored planar patterns with different checker sizes. The heterogeneous camera system comprises a lower focal length wide-angle camera and a higher focal length conventional narrow-angle camera. Relative pose between the cameras is calculated by multiplying transformation matrices. Our proposed method becomes a decent alternative to many advanced feature-based techniques. We show the robustness of our method through re-projection error and comparing point difference values in ’Y’ axis in image rectification results.

## Area 2 - Image and Video Analysis

Full Papers
Paper Nr: 25
Title:

### Segmentation-based Multi-scale Edge Extraction to Measure the Persistence of Features in Unorganized Point Clouds

Authors:

#### Dena Bazazian, Josep R. Casas and Javier Ruiz-Hidalgo

Abstract: Edge extraction has attracted a lot of attention in computer vision. The accuracy of extracting edges in point clouds can be a significant asset for a variety of engineering scenarios. To address these issues, we propose a segmentation-based multi-scale edge extraction technique. In this approach, different regions of a point cloud are segmented by a global analysis according to the geodesic distance. Afterwards, a multi-scale operator is defined according to local neighborhoods. Thereupon, by applying this operator at multiple scales of the point cloud, the persistence of features is determined. We illustrate the proposed method by computing a feature weight that measures the likelihood of a point to be an edge, then detects the edge points based on that value at both global and local scales. Moreover, we evaluate quantitatively and qualitatively our method. Experimental results show that the proposed approach achieves a superior accuracy. Furthermore, we demonstrate the robustness of our approach in noisier real-world datasets.

Paper Nr: 30
Title:

### Remote Respiration Rate Determination in Video Data - Vital Parameter Extraction based on Optical Flow and Principal Component Analysis

Authors:

#### Christian Wiede, Julia Richter, Manu Manuel and Gangolf Hirtz

Abstract: Due to the steadily ageing society, the determination of vital parameters, such as the respiration rate, has come into focus of research in recent years. The respiration rate is an essential parameter to monitor a person’s health status. This study presents a robust method to remotely determine a person’s respiration rate with an RGB camera. In our approach, we detected four subregions on a person’s chest, tracked features over time with optical flow, applied a principal component analysis (PCA) and several frequency determination techniques. Furthermore, this method was evaluated in various recorded scenarios. Overall, the results show that this method is applicable in the field Ambient Assisted Living (AAL).

Paper Nr: 37
Title:

### Towards a Diminished Reality System that Preserves Structures and Works in Real-time

Authors:

#### Hugo Álvarez, Jon Arrieta and David Oyarzun

Abstract: This paper presents a Diminished Reality system that is able to propagate textures as well as structures with a low computational cost, almost in real-time. An existing inpainting algorithm is optimized in order to reduce the high computational cost by implementing some Computer Vision techniques. Although some of the presented optimizations can be applied to a single static image directly, the global system is mainly oriented to video sequences, where temporal coherence ideas can be applied. Given that, a novel pipeline is proposed to maintain the visual quality of the reconstructed image area without the need of calculating everything again despite slow camera motions. To the best of our knowledge, the prototype presented in this paper is the only Diminished Reality system focused on structure propagation that works near real-time. Apart from the technical description, this paper presents an extensive experimental study of the system, which evaluates the optimizations in terms of time and quality.

Paper Nr: 68
Title:

### Color-based and Rotation Invariant Self-similarities

Authors:

#### Xiaohu Song, Damien Muselet and Alain Tremeau

Abstract: One big challenge in computer vision is to extract robust and discriminative local descriptors. For many applications such as object tracking, image classification or image matching, there exist appearance-based descriptors such as SIFT or learned CNN-features that provide very good results. But for some other applications such as multimodal image comparison (infra-red versus color, color versus depth, ...) these descriptors failed and people resort to using the spatial distribution of self-similarities. The idea is to inform about the similarities between local regions in an image rather than the appearances of these regions at the pixel level. Nevertheless, the classical self-similarities are not invariant to rotation in the image space, so that two rotated versions of a local patch are not considered as similar and we think that many discriminative information is lost because of this weakness. In this paper, we present a method to extract rotation-invariant self similarities. In this aim, we propose to compare color descriptors of the local regions rather than the local regions themselves. Furthermore, since this comparison informs us about the relative orientations of the two local regions, we incorporate this information in the final image descriptor in order to increase the discriminative power of the system. We show that the self similarities extracted by this way are very discriminative.

Paper Nr: 80
Title:

### Towards a Videobronchoscopy Localization System from Airway Centre Tracking

Authors:

#### Carles Sánchez, Antonio Esteban Lansaque, Agnès Borràs, Marta Diez-Ferrer, Antoni Rosell and Debora Gil

Abstract: Bronchoscopists use fluoroscopy to guide flexible bronchoscopy to the lesion to be biopsied without any kind of incision. Being fluoroscopy an imaging technique based on X-rays, the risk of developmental problems and cancer is increased in those subjects exposed to its application, so minimizing radiation is crucial. Alternative guiding systems such as electromagnetic navigation require specific equipment, increase the cost of the clinical procedure and still require fluoroscopy. In this paper we propose an image based guiding system based on the extraction of airway centres from intra-operative videos. Such anatomical landmarks are matched to the airway centreline extracted from a pre-planned CT to indicate the best path to the nodule. We present a feasibility study of our navigation system using simulated bronchoscopic videos and a multi-expert validation of landmarks extraction in 3 intra-operative ultrathin explorations.

Paper Nr: 95
Title:

### Face Presentation Attack Detection using Biologically-inspired Features

Authors:

#### Aristeidis Tsitiridis, Cristina Conde, Isaac Martín De Diego and Enrique Cabello

Abstract: A person intentionally concealing or faking their identity from biometric security systems is known to perform a ‘presentation attack’. Efficient presentation attack detection poses a challenging problem in modern biometric security systems. Sophisticated presentation attacks may successfully spoof a person’s face and therefore, disrupt accurate biometric authentication in controlled areas. In this work, a presentation attack detection technique which processes biologically-inspired facial features is introduced. The main goal of the proposed method is to provide an alternative foundation for biometric detection systems. In addition, such a system can be used for future generation biometric systems capable of carrying out rapid facial perception tasks in complex and dynamic situations. The newly-developed model was tested against two different databases and classifiers. Presentation attack detection results have shown promise, exceeding 94% detection accuracy on average for the investigated databases. The proposed model can be enriched with future enhancements that can further improve its effectiveness and complexity in more diverse situations and sophisticated attacks in the real world.

Paper Nr: 138
Title:

### Artery/vein Classification of Blood Vessel Tree in Retinal Imaging

Authors:

#### Joaquim de Moura, Jorge Novo, Marcos Ortega, Noelia Barreira and Pablo Charlón

Abstract: Alterations in the retinal microcirculation are signs of relevant diseases such as hypertension, arteriosclerosis, or diabetes. Specifically, arterial constriction and narrowing were associated with early stages of hypertension. Moreover, retinal vasculature abnormalities may be useful indicators for cerebrovascular and cardiovascular diseases. The Arterio-Venous Ratio (AVR), that measures the relation between arteries and veins, is one of the most referenced ways of quantifying the changes in the retinal vessel tree. Since these alterations affect differently arteries and veins, a precise characterization of both types of vessels is a key issue in the development of automatic diagnosis systems. In this work, we propose a methodology for the automatic vessel classification between arteries and veins in eye fundus images. The proposal was tested and validated with 19 near-infrared reflectance retinographies. The methodology provided satisfactory results, in a complex domain as is the retinal vessel tree identification and classification.

Paper Nr: 149
Title:

Authors:

Paper Nr: 50
Title:

### Cost Adaptive Window for Local Stereo Matching

Authors:

#### J. Navarro and A. Buades

Abstract: We present a novel stereo block-matching algorithm which uses adaptive windows. The shape of the window is selected to minimize the matching cost. Such a window might be the less distorted by the disparity function and thus the optimal one for matching. Moreover, we introduce a coarse-to-fine strategy to limit the number of ambiguous matches and reduce the computational cost. The proposed approach performs as state of the art local matching methods.

Paper Nr: 55
Title:

### Simultaneous Estimation of Optical Flow and Its Boundaries based on the Dynamical System Model

Authors:

#### Yuya Michishita, Noboru Sebe, Shuichi Enokida and Eitaku Nobuyama

Abstract: Optical flow is a velocity vector which represents the motion of objects in video images. Optical flow estimation is difficult in the neighborhood of flow boundary. To resolve this problem, Sasagawa (2014) proposes a modified dynamical system model in which one assumes that, in the neighborhood of flow boundaries, the brightness flows in the perpendicular direction, and considers the resulting corrections to the brightness constancy constraint. However, in that model, the correction is occurred even in place where the flow is continuous. We propose a new model, which switches the conventional model and the proposed model in Sasagawa (2014). As a result, we expect improvement of the estimate accuracy in place where the flow is continuous. We conduct numerical experiments to investigate the improvements that the proposed model yields in the estimation accuracy of optical flows.

Paper Nr: 69
Title:

### 3D Reconstruction of Indoor Scenes using a Single RGB-D Image

Authors:

#### Panagiotis-Alexandros Bokaris, Damien Muselet and Alain Trémeau

Abstract: The three-dimensional reconstruction of a scene is essential for the interpretation of an environment. In this paper, a novel and robust method for the 3D reconstruction of an indoor scene using a single RGB-D image is proposed. First, the layout of the scene is identified and then, a new approach for isolating the objects in the scene is presented. Its fundamental idea is the segmentation of the whole image in planar surfaces and the merging of the ones that belong to the same object. Finally, a cuboid is fitted to each segmented object by a new RANSAC-based technique. The method is applied to various scenes and is able to provide a meaningful interpretation of these scenes even in cases with strong clutter and occlusion. In addition, a new ground truth dataset, on which the proposed method is further tested, was created. The results imply that the present work outperforms recent state-of-the-art approaches not only in accuracy but also in robustness and time complexity.

Paper Nr: 74
Title:

### Real-time Stereo Vision System at Tunnel

Authors:

#### Yuquan Xu, Seiichi Mita, Hossein Tehrani and Kazuhisa Ishimaru

Abstract: Although stereo vision has made great progress in recent years, there are limited works which estimate the disparity for challenging scenes such as tunnel scenes. In such scenes, owing to the low light conditions and fast camera movement, the images are severely degraded by motion blur. These degraded images limit the performance of the standard stereo vision algorithms. To address this issue, in this paper, we combine the stereo vision with the image deblurring algorithms to improve the disparity result. The proposed algorithm consists of three phases: the PSF estimation phase; the image restoration phase; and the stereo vision phase. In the PSF estimation phase, we introduce three methods to estimate the blur kernel, which are optical flow based algorithm, cepstrum base algorithm and simple constant kernel algorithm, respectively. In the image restoration phase, we propose a fast non-blind image deblurring algorithm to recover the latent image. In the last phase, we propose a multi-scale multi-path Viterbi algorithm to compute the disparity given the deblurred images. The advantages of the proposed algorithm are demonstrated by the experiments with data sequences acquired in the tunnel.

Paper Nr: 87
Title:

### Matching of Line Segment for Stereo Computation

Authors:

#### O. Martorell, A. Buades and B. Coll

Abstract: A stereo algorithm based on the matching of line segments between two images is proposed. We extract several characteristics of the segments which permit its matching across the two images. A depth ordering computed from the line segments of the reference image allows us to attribute the match disparity to the correct pixels. This depth sketch is computed by joining close line segments and identifying T-junctions and convexity points. The disparity computed for segments is then extrapolated to the rest of the image by means of a diffusion process. The performance of the proposed algorithm is illustrated by applying the procedure to synthetic stereo pairs.

Paper Nr: 93
Title:

### LiDAR-based 2D Localization and Mapping System using Elliptical Distance Correction Models for UAV Wind Turbine Blade Inspection

Authors:

#### Ivan Nikolov and Claus Madsen

Abstract: The wind energy sector faces a constant need for annual inspections of wind turbine blades for damage, erosion and cracks. These inspections are an important part of the wind turbine life cycle and can be very costly and hazardous to specialists. This has led to the use of automated drone inspections and the need for accurate, robust and inexpensive systems for localization of drones relative to the wing. Due to the lack of visual and geometrical features on the wind turbine blade, conventional SLAM algorithms have a limited use. We propose a cost-effective, easy to implement and extend system for on-site outdoor localization and mapping in low feature environment using the inexpensive RPLIDAR and an 9-DOF IMU. Our algorithm geometrically simplifies the wind turbine blade 2D cross-section to an elliptical model and uses it for distance and shape correction. We show that the proposed algorithm gives localization error between 1 and 20 cm depending on the position of the LiDAR compared to the blade and a maximum mapping error of 4 cm at distances between 1.5 and 3 meters from the blade. These results are satisfactory for positioning and capturing the overall shape of the blade.

Paper Nr: 94
Title:

### Gait Recognition with Compact Lidar Sensors

Authors:

#### Bence Gálai and Csaba Benedek

Abstract: In this paper, we present a comparative study on gait and activity analysis using LiDAR scanners with different resolution. Previous studies showed that gait recognition methods based on the point clouds of a Velodyne HDL-64E Rotating Multi-Beam LiDAR can be used for people re-identification in outdoor surveillance scenarios. However, the high cost and the weight of that sensor means a bottleneck for its wide application in surveillance systems. The contribution of this paper is to show that the proposed Lidar-based Gait Energy Image descriptor can be efficiently adopted to the measurements of the compact and significantly cheaper Velodyne VLP-16 LiDAR scanner, which produces point clouds with a nearly four times lower vertical resolution than HDL-64. On the other hand, due to the sparsity of the data, the VLP-16 sensor proves to be less efficient for the purpose of activity recognition, if the events are mainly characterized by fine hand movements. The evaluation is performed on five tests scenarios with multiple walking pedestrians, which have been recorded by both sensors in parallel.

Paper Nr: 97
Title:

### Explicit Image Quality Detection Rules for Functional Safety in Computer Vision

Authors:

#### Johann Thor Mogensen Ingibergsson, Dirk Kraft and Ulrik Pagh Schultz

Abstract: Computer vision has applications in a wide range of areas from surveillance to safety-critical control of autonomous robots. Despite the potentially critical nature of the applications and a continuous progress, the focus on safety in relation to compliance with standards has been limited. As an example, field robots are typically dependent on a reliable perception system to sense and react to a highly dynamic environment. The perception system thus introduces significant complexity into the safety-critical path of the robotic system. This complexity is often argued to increase safety by improving performance; however, the safety claims are not supported by compliance with any standards. In this paper, we present rules that enable low-level detection of quality problems in images and demonstrate their applicability on an agricultural image database. We hypothesise that low-level and primitive image analysis driven by explicit rules facilitates complying with safety standards, which improves the real-world applicability of existing proposed solutions. The rules are simple independent image analysis operations focused on determining the quality and usability of an image.

Paper Nr: 99
Title:

### Simultaneous Camera Calibration and Temporal Alignment of 2D and 3D Trajectories

Authors:

#### Joni Herttuainen, Tuomas Eerola, Lasse Lensu and Heikki Kälviäinen

Abstract: In this paper, we present an automatic method that given the 2D and 3D motion trajectories recorded with a camera and 3D sensor, automatically calibrates the camera with respect to the 3D sensor coordinates and aligns the trajectories with respect to time. The method utilizes a modified Random Sample Consensus (RANSAC) procedure that iteratively selects two points from both trajectories, uses them to calculate the scale and translation parameters for the temporal alignment, computes point correspondences, and estimates the camera matrix. We demonstrate the approach with a setup consisting of a standard web camera and Leap Motion sensor. We further propose necessary object tracking and trajectory filtering procedures to produce proper trajectories with the setup. The result showed that the proposed method achieves over 96% success rate with a test set of complex trajectories.

Paper Nr: 122
Title:

### Optical Flow Refinement using Reliable Flow Propagation

Authors:

#### Tan Khoa Mai, Michèle Gouiffes and Samia Bouchafa

Abstract: This paper shows how to improve optical flow estimation by considering a neighborhood consensus strategy along with a reliable flow propagation method. Propagation takes advantages of reliability measures that are available from local low level image features. In this paper, we focus on color but our method could be easily generalized by considering also texture or gradient features. We investigate the conditions of estimating accurate optical flow and managing correctly flow discontinuities by proposing a variant of the well-known Kanade-Lucas-Tomasi (KLT) approach. Starting from this classical approach, a consensual flow is estimated locally while two additional criteria are proposed to evaluate its reliability. Propagation of reliable flow throughout the image is then performed using a specific distance criterion based on color and proximity. Experiments are conducted within the Middlebury database and show better results than classic KLT and even global methods like the well known Horn and Schunck or Black and Anandan approaches.

Paper Nr: 151
Title:

### Multiple Target, Multiple Type Visual Tracking using a Tri-GM-PHD Filter

Authors:

#### Nathanael L. Baisa and Andrew Wallace

Abstract: We propose a new framework that extends the standard Probability Hypothesis Density (PHD) filter for multiple targets having three different types, taking into account not only background false positives (clutter), but also confusion between detections of different target types, which are in general different in character from background clutter. Our framework extends the existing Gaussian Mixture (GM) implementation of the PHD filter to create a tri-GM-PHD filter based on Random Finite Set (RFS) theory. The methodology is applied to real video sequences containing three types of multiple targets in the same scene, two football teams and a referee, using separate detections. Subsequently, Munkres’s variant of the Hungarian assignment algorithm is used to associate tracked target identities between frames. This approach is evaluated and compared to both raw detections and independent GM-PHD filters using the Optimal Sub-pattern Assignment (OSPA) metric and discrimination rate. This shows the improved performance of our strategy on real video sequences.

Paper Nr: 214
Title:

### Moving Object Detection by Connected Component Labeling of Point Cloud Registration Outliers on the GPU

Authors:

#### Michael Korn, Daniel Sanders and Josef Pauli

Abstract: Using a depth camera, the KinectFusion with Moving Objects Tracking (KinFu MOT) algorithm permits tracking the camera poses and building a dense 3D reconstruction of the environment which can also contain moving objects. The GPU processing pipeline allows this simultaneously and in real-time. During the reconstruction, yet untraced moving objects are detected and new models are initialized. The original approach to detect unknown moving objects is not very precise and may include wrong vertices. This paper describes an improvement of the detection based on connected component labeling (CCL) on the GPU. To achieve this, three CCL algorithms are compared. Afterwards, the migration into KinFu MOT is described. It incorporates the 3D structure of the scene and three plausibility criteria refine the detection. In addition, potential benefits on the CCL runtime of CUDA Dynamic Parallelism and of skipping termination condition checks are investigated. Finally, the enhancement of the detection performance and the reduction of response time and computational effort is shown.

Paper Nr: 257
Title:

### Recovering 3D Structure of Multilayer Transparent Objects from Multi-view Ray Tracing

Authors:

#### Atsunori Maeda, Fumihiko Sakaue and Jun Sato

Abstract: 3D reconstruction of object shape is one of the most important problems in the field of computer vision. Although many methods have been proposed up to now, the 3D reconstruction of transparent objects is still a very difficult unsolved problem. In particular, if the transparent objects have multiple layers with different refraction properties, the recovery of the 3D structure of transparent objects is quite difficult. In this paper, we propose a method for recovering the 3D structure of multilayer transparent objects. For this objective we introduce a new representation of 3D space by using a boxel with refraction properties, and recovering the refraction properties of each boxel by using the ray tracing. The efficiency of the proposed method is shown by some preliminary experiments.

Posters
Paper Nr: 10
Title:

### Practical Scheduling of Computer Vision Functions

Authors:

#### Adrien Chan-Hon-Tong and Stephane Herbin

Abstract: Plug and play scheduler adapted to computer vision context could boost the development of robotic platform embedding large variety of computer vision functions. In this paper, we make a step toward such scheduler by offering a framework, particularly adapted to time constraint image classification. The relevancy of our framework is established by experimentations on real life computer vision datasets and scenarios.

Paper Nr: 39
Title:

### Collaborative Contributions for Better Annotations

Authors:

#### Priyam Bakliwal, Guruprasad M. Hegde and C. V. Jawahar

Abstract: We propose an active learning based solution for efficient, scalable and accurate annotations of objects in video sequences. Recent computer vision solutions use machine learning. Effectiveness of these solutions relies on the amount of available annotated data which again depends on the generation of huge amount of accurately annotated data. In this paper, we focus on reducing the human annotation efforts with simultaneous increase in tracking accuracy to get precise, tight bounding boxes around an object of interest. We use a novel combination of two different tracking algorithms to track an object in the whole video sequence. We propose a sampling strategy to sample the most informative frame which is given for human annotation. This newly annotated frame is used to update the previous annotations. Thus, by collaborative efforts of both human and the system we obtain accurate annotations with minimal effort. Using the proposed method, user efforts can be reduced to half without compromising on the annotation accuracy. We have quantitatively and qualitatively validated the results on eight different datasets.

Paper Nr: 48
Title:

### Regularised Energy Model for Robust Monocular Ego-motion Estimation

Authors:

#### Hsiang-Jen Chien and Reinhard Klette

Abstract: For two decades, ego-motion estimation is an actively developing topic in computer vision and robotics. The principle of existing motion estimation techniques relies on the minimisation of an energy function based on re-projection errors. In this paper we augment such an energy function by introducing an epipolar-geometry-derived regularisation term. The experiments prove that, by taking soft constraints into account, a more reliable motion estimation is achieved. It also shows that the implementation presented in this paper is able to achieve a remarkable accuracy comparative to the stereo vision approaches, with an overall drift maintained under 2% over hundreds of metres.

Paper Nr: 67
Title:

### Parallelized Flight Path Prediction using a Graphics Processing Unit

Authors:

#### Maximilian Götzinger, Martin Pongratz, Amir M. Rahmani and Axel Jantsch

Abstract: Summarized under the term Transport-by-Throwing, robotic arms throwing objects to each other are a visionary system intended to complement the conventional, static conveyor belt. Despite much research and many novel approaches, no fully satisfactory solution to catch a ball with a robotic arm has been developed so far. A new approach based on memorized trajectories is currently being researched. This paper presents an algorithm for real-time image processing and flight prediction. Object detection and flight path prediction can be done fast enough for visual input data with a frame rate of 130 FPS (frames per second). Our experiments show that the average execution time for all necessary calculations on an NVidia GTX 560 TI platform is less than 7.7ms. The maximum times of up to 11.7ms require a small buffer for frame rates over 85 FPS. The results demonstrate that the use of a GPU (Graphics Processing Unit) considerably accelerates the entire procedure and can lead to execution rates of 3.5 to 7.2 faster than on a CPU. Prediction, which was the main focus of this research, is accelerated by a factor of 9.5 by executing the devised parallel algorithm on a GPU. Based on these results, further research could be carried out to examine the prediction system’s reliability and limitations (compare (Pongratz, 2016)).

Paper Nr: 133
Title:

### A Multi Patch Warping Approach for Improved Stereo Block Matching

Authors:

#### Mircea Paul Muresan, Sergiu Nedevschi and Radu Danescu

Abstract: Stereo cameras are a suitable solution for reconstructing the 3D information of the observed scenes, and, because of their low price and ease to set up and operate, they can be used in a wide area of applications, ranging from autonomous driving to advanced driver assistance systems or robotics. Due to the high quality of the results, energy based reconstruction methods like semi global matching have gained a lot of popularity in recent years. The disadvantages of semi global matching are the large memory footprint and the high computational complexity. In contrast, window based matching methods have a lower complexity, and are leaner with respect to the memory consumption. The downside of block matching methods is that they are more error prone, especially on surfaces which are not parallel to the image plane. In this paper we present a novel block matching scheme that improves the quality of local stereo correspondence algorithms. The first contribution of the paper consists in an original method for reliably reconstructing the environment on slanted surfaces. The second contribution consists in the creation of set of local constraints that filter out possible outlier disparity values. The third and final contribution consists in the creation of a refinement technique which improves the resulted disparity map. The proposed stereo correspondence approach has been validated on the KITTI stereo dataset.

Paper Nr: 156
Title:

### Sampling Density Criterion for Circular Structured Light 3D Imaging

Authors:

#### Deokwoo Lee and Hamid Krim

Abstract: 3D reconstruction work has chiefly focused on the accuracy of reconstruction results in computer vision, and efficient 3D functional camera system has been of interest in the field of mobile camera as well. The optimal sampling density, referred to as the minimum sampling rate for 3D or high-dimensional signal reconstruction, is proposed in this paper. There have been many research activities to develop an adaptive sampling theorem beyond the Shannon-Nyquist Sampling Theorem in the areas of signal processing, but sampling theorem for 3D imaging or reconstruction is an open challenging topic and crucial part of our contribution in this paper. We hence propose an approach to sampling rate (lower / upper bound) determination to recover 3D objects (surfaces) represented by a set of circular light patterns, and the criterion for a sampling rate is formulated using geometric characteristics of the light patterns overlaid on the surface. The proposed method is in a sense a foundation for a sampling theorem applied to 3D image processing, by establishing a relationship between frequency components and geometric information of a surface.

Paper Nr: 158
Title:

### InLiDa: A 3D Lidar Dataset for People Detection and Tracking in Indoor Environments

Authors:

#### Cristina Romero-González, Álvaro Villena, Daniel González-Medina, Jesus Martínez-Gómez, Luis Rodríguez-Ruiz and Ismael García-Varea

Abstract: The objective evaluation of people detectors and trackers is essential to develop high performance and general purpose solutions to these problems. This evaluation can be easily done thanks to the use of annotated datasets, but there are some combinations of sensors and scopes that have not been extensively explored. Namely, the application of large range 3D sensors in indoor environments for people detection purposes has been sparsely studied. To fill this gap, we propose InLiDa, a dataset that consists of six different sequences acquired in two different large indoor environments. The dataset is released with a set of tools valid for its use as benchmark for people detection and tracking proposals. Also baseline results obtained with state-of-the-art techniques for people detection and tracking are presented

Paper Nr: 175
Title:

### Multi Target Tracking by Linking Tracklets with a Convolutional Neural Network

Authors:

#### Yosra Dorai, Frederic Chausse, Sami Gazzah and Najoua Essoukri Ben Amara

Abstract: The computer vision community has developed many multi-object tracking methods in various fields. The focus is put on traffic scenes and video-surveillance applications where tracking object features are challenging. Indeed, in these particular applications, objects can be partially or totally occluded and can appear differently. Usual detection methods generally fail to leverage those limitations. To deal with this, a framework for multi-object tracking based on the linking of tracklets (mini-trajectories) is proposed. Despite the number of errors (false positives or missing detections) made by the Faster R-CNN detector, short-term Faster R-CNN detection similarities are tracked. The goal is to get tracklets in a given number of frames. We suggest to associate tracklets and apply an update function to correct the trajectories. The experiments show that on the one hand, our approach outperforms the detector to find the undetected objects. And on the other hand, the developed method eliminates the false positives and shows the effectiveness of tracking.

Paper Nr: 228
Title:

### Pedestrian Tracking using a Generalized Potential Field Approach

Authors:

#### Florian Particke, Lucila Patiño-Studencki, Jörn Thielecke and Christian Feist

Abstract: Mobile robots and autonomous driving cars operate in a shared environment with pedestrians. In order to avoid accidents, it is important to track and predict human trajectories in an optimal way. In this paper, a generalized potential field approach for characterizing pedestrian movements is proposed which goes beyond the well-known social force model. Its goal is to give a generalized architecture for improving the tracking accuracy of pedestrians in surveillance situations. In comparison to other fusion approaches, the number of proposed parameters is reduced and the parameters can be intuitively understood. For a simple scenario, in a forum the trajectories of pedestrians are predicted for a configured parameter set. For this purpose, the proposed model is used. The predicted trajectories are compared to the real trajectories of the pedestrians. First results regarding the accuracy of the approach are presented.

Paper Nr: 253
Title:

### Quantitative Comparison of Affine Invariant Feature Matching

Authors:

#### Zoltán Pusztai and Levente Hajder

Abstract: It is a key problem in computer vision to apply accurate feature matchers between images. Thus the comparison of such matchers is essential. There are several survey papers in the field, this study extends one of those: the aim of this paper is to compare competitive techniques on the ground truth (GT) data generated by our structured-light 3D scanner with a rotating table. The discussed quantitative comparison is based on real images of six rotating 3D objects. The rival detectors in the comparison are as follows: Harris-Laplace, Hessian-Laplace, Harris-Affine, Hessian-Affine, IBR, EBR, SURF, and MSER.