## Séminaire Vision artificielle / Équipe Willow## OrganisÃ© par : Jean Ponce (ENS) |

*Building local part models for category-level recognition*(le 6 octobre 2005) —**Cordelia Schmid**

This talk addresses the problem of building semi-local part models for category-level recognition. In the context of category recognition, it is no longer sufficient to use individual local features, and it becomes necessary to model intra-class variations, to select discriminant features, and to model spatial relations. This leads to a part-based approach to category-level recognition that I will illustrate with two examples. The first one represents images as distributions of local parts and learns a Support Vector Machine classifier with kernels based on two effective measures for comparing distributions, the Earth Mover’s Distance and the Chi-square distance. The second one represents object classes with a dictionary of composite semi-local parts, i.e., groups of neighboring keypoints with stable and distinctive appearance and geometric layout. A discriminative maximum entropy framework is used to learn the posterior distribution of the class label given the occurrences of parts from the dictionary in the training set.

This is joint work with S. Lazebnik, M. Marszalek, J. Zhang and J. Ponce.

*Three-Dimensional Computer Vision: Challenges and Opportunities*(le 12 octobre 2005) —**Jean Ponce**

This talk addresses two of the main challenges of computer vision: automatically identifying three-dimensional (3D) objects in photographs despite arbitrary viewpoint variations, occlusion, and clutter; and recovering accurate models of 3D shapes observed in multiple images. I will first present a new approach to object recognition that combines local invariants with global geometric constraints to construct 3D object models from multiple images and/or stereo views and effectively identify them in heavily cluttered photographs taken from unknown viewpoints. I will then discuss a novel algorithm that uses the geometric and photometric constraints associated with multiple calibrated photographs to construct high-quality solid models of complex 3D shapes in the form of carved visual hulls. I will conclude with a brief discussion of exciting new application domains and wide open research issues.

Joint work with Yasutaka Furukawa, Akash Kushal, Svetlana Lazebnik, Fred Rothganger, and Cordelia Schmid.

*Modélisation 3D de scènes dynamiques à partir de plusieurs vues*(le 16 novembre 2005) —**Edmond Boyer**

Dans cet exposé, je présenterai quelques travaux réalisés au sein de l’équipe MOVI de l’INRIA Rhône-Alpes, et portant sur le thème de l’acquisition de modèles dynamiques à partir de flux vidéos. Je m’intéresserai en particulier aux résultats obtenus avec la plate-forme expérimentale Grimage, un environnement multi-caméras, de capture de mouvements et de formes, pour des applications interactives. Cet environnement est constitué d’un espace d’acquisition entouré de caméras, d’un écran de visualisation haute résolution et d’une grappe de PC. Il permet d’extraire et de visualiser, en temps réel, des informations 3D sur la scène observée par les caméras. Je discuterai des questions relatives aux différents éléments constituant la plate-forme, de l’acquisition d’images à la modélisation et la reconnaissance de mouvements.

*Detecting people in images and videos and reconstructing their movements*(le 16 novembre 2005) —**Bill Triggs**

Detecting humans in images is a challenging task owing to their variable appearance and the wide range of poses that they can adopt. I will present detectors for upright humans in static images and in videos. The detectors use a linear SVM classifier over a robust visual feature set based on well normalized local histograms of image gradient orientations. The video detector also incorporates oriented histograms of differential optical flow to capture cues for human motion despite moving cameras and backgrounds.

In the second part of the talk, I will give an overview of some of our work on reconstructing human body motions from monocular image sequences. We avoid using an explicit 3-D body model, instead taking a learning based approach that directly regresses 3-D pose (joint angles) from robust shape descriptors extracted from image silhouettes. A kernelized Relevance Vector Machine is used for regression. Ambiguities in the silhouette representation cause occasional failures and we present two methods to correct this: incorporating a learned dynamical model, and using multi-valued regression to generate several reconstruction hypotheses along with their associated probabilities of being correct.

Work done with my students Navneet Dalal and Ankur Agarwal.

*Using Context in Scene Analysis and Object Detection*(le 16 novembre 2005) —**Martial Hebert**

This talk will include a review of some of our current activities in the general area of object recognition and scene understanding. In particular, I’ll review some recent ideas for incorporating representations of context in object recognition and scene understanding approach. Context may include geometric relations between object parts, relations between objects, relations between regions in images, and geometric cues. These ideas are applied to object detection and scene interpretation. If time allows, ideas for extensions to the temporal domain for video analysis will be discussed.

*Object and Scene Recognition in Large Datasets*(le 16 novembre 2005) —**David Lowe**

Many real-world applications of computer vision require recognizing small objects within large datasets containing thousands of images. This talk will describe some new algorithms for efficient indexing within large datasets, including randomized tree algorithms for fast nearest-neighbour matching and probabilistic methods that determine the minimal number of matches needed for reliable object detection. Some applications of these methods will be described for panorama recognition, location recognition for augmented reality, and a system that can identify any product in a supermarket from a partial image.

*Séminaire vision artificielle*(le 23 novembre 2005) —**Renaud Keriven**

*Image, Texture, Video & ’Structural’ Completion: from LEGO’s to Combinatorial Optimization*(le 30 novembre 2005) —**Nikos Paragios**

Image Completion (often called inpainting) has emerged to be a high level task of low level vision. Such a procedure consists of completing missing content in images. Central idea within such an approach is often the principle of good continuation, which consists of adding content using information from the borders of the area to be inpainted. While such methods can be quite efficient when dealing with smooth content, fail to account for texture while their extension to complete missing content in video as well as 3D is not straightforward. In this talk we propose a novel technique that addresses image renaissance, video inpainting and structure completion through a "multi-level" graph-based matching process. To this end, numerous patches that do present similarities with the local content around the missing part are considered. The selection of these patches is done through a particle filter method to address the task of hypotheses evaluation. These patches are positioned on top of missing segment, ordered depending on their similarity weight, and form in some fashion a multi-layered graph over time. Markov Random Fields are used to formalize inpainting as a labelling estimation problem while a combinatorial approach is used to recover the optimal combination of patches to complete the missing structure. The min-cut max-flow algorithm within the -expansion process is used to determine the optimal cut that, in an implicit fashion, completes the missing image structure. Promising results in image and texture completion demonstrate the potentials of the proposed method.

Joint work with Cedric Allene

*Computer Vision and the Art of Special Effects*(le 2 dÃ©cembre 2005) —**Steve Sullivan**

*La diffusion des illustrations de l’exposé n’a pas été autorisée par Industrial Light and Magic*

Computer vision techniques are now quite common in visual effects production. Camera matchmove, object tracking, motion capture, and image-based modeling have been used in hundreds of films and TV shows and are no longer considered exotic. In practice, however, they are far from robust or automatic, and the next generation of production technologies will demand major advancements in reliability, user interface, and real-time performance.

In this talk, I’ll disuss how computer vision techniques are changing the way movies are made, then cover a few technologies which promise major advances in the near future. Particular attention will be paid to virtual production, and the need for interactive data acquisition to bring directors onto the virtual set.

*Video Google - Faces*(le 7 dÃ©cembre 2005) —**Andrew Zisserman**

Matching people based on their imaged face is hard because of the well known problems of pose, size and expression variation. Indeed these variations can exceed those due to identity. Fortunately, videos of people have the happy benefit of containing multiple exemplars of each person in a form that can easily be associated automatically using straightforward visual tracking.

We describe progress in harnessing these multiple exemplars in order to retrieve humans automatically in videos, given a query face in a shot. There are three areas of interest: (i) the matching of sets of exemplars provided by "tubes" of the spatial-temporal volume; (ii) the description of the face using a spatial orientation field; and (iii) the structuring of the problem so that retrieval is immediate at run time.

The result is a preliminary "Video Google - Faces", able to retrieve a ranked list of shots containing a particular person in the manner of Google. The method will be demonstrated on several feature length films.

Joint work with Josef Sivic and Mark Everingham.

*People Tracking with a Multi-Camera Setup*(le 11 janvier 2006) —**François Fleuret**

In this talk, I will show that in a multi-camera context, we can effectively track and estimate the locations of an a priori unknown number of individuals with good accuracy, despite complex occlusions.

Our algorithm initially estimates for each isolated frame a conditional probability of occupancy for every location on the ground plane, given binary images produced by a simple background subtraction procedure. We show that a simple Bayesian formulation leads to a large system of equations whose variables are the conditional marginal probabilities of occupancy at each location. This system can be solved iteratively at a reasonable speed (10 frames per second with two cameras and a 25cm accuracy). Despite the absence of temporal consistency and the poor quality of the input data, this procedure by itself provides accurate detection of individuals on isolated frames.

The results can be improved by combining these estimates obtained on a few tens of isolated frames into a classical HMM, taking into account both the color consistency and a simple motion model.

We demonstrate the quality of our results on several sequences. The full algorithm performs reliably on these test sequences, with no false negative or false positive, and an error of less than 30cm for more than 90% of the predicted locations.

If there is time left after this main subject, I will briefly introduce a more prospective topic: learning the appearance of an object from a single example. Instead of using a large number of pictures of the object to recognize, we use a labeled reference database of pictures of other objects to learn high-level invariance. We propose to build hundreds of random binary splits of the training set, chosen to keep together the images of any given object, and to combine those splits with a Bayesian rule into a posterior probability of similarity.

Joint work with J. Berclaz, R. Lengagne and P. Fua.

*Collaboration between Computer Vision and Computer Graphics - Applications*(le 11 janvier 2006) —**André Gagalowicz**

This talk is a kind of illustration of the presentation by Steve Sullivan on the second of December. I will first explain what is post-production and 3D rotoscopy which is the most important technique in post-production applications. Then I will discuss the computer vision/computer graphics strategy used to perform this task. The case of rigid objects where the strategy appears clearly will first be described. I will then proceed to the case of articulated objects and especially to the case of a full human body tracking (when humans wear rather tight garments). Some results related to the tracking of professional golfers’swing will be discussed. Finally, I will give some results of 3D face tracking which is a case of deformable objects. I will conclude with a presentation of other possible applications of the research done at the MIRAGES laboratory at INRIA Rocquencourt.

*Toward a Geometrically Coherent Image Interpretation*(le 26 septembre 2006) —**Alexei Efros**

Image interpretation, the ability to see and understand the three-dimensional world behind a two-dimensional image, goes to the very heart of the computer vision problem. The ultimate objective is, given an image, to automatically produce a coherent interpretation of the depicted scene. This requires not only recognizing specific objects (e.g. people, houses, cars, trees), but understanding the underlying structure of the 3D scene where these objects reside.

In this talk I will describe some of our recent efforts toward this lofty goal. I will present an approach for estimating the coarse geometric properties of a scene by learning appearance-based models of geometric classes. Geometric classes describe the 3D orientation of image regions with respect to the camera. This geometric information is then combined with camera viewpoint estimation and local object detection producing a prototype for a coherent image-interpretation framework.

Joint work with Derek Hoiem and Martial Hebert at CMU.

*Color Space*(le 13 novembre 2007) —**Jan Koenderink**

Structure of the space of colors as related to the space of radiant power spectra

*Image Space*(le 13 novembre 2007) —**Jan Koenderink**

Structure of images, image transformations, etc.

*Image Texture and the "Flow of Light"*(le 14 novembre 2007) —**Jan Koenderink**

Light field, light flow over surfaces, novel SFS algorithms

*Pictorial Space*(le 14 novembre 2007) —**Jan Koenderink**

Psychophysics, nature of the geometry

Jean Ponce (ENS) |

Atelier Apprentissage 2005–2006

Atelier Apprentissage 2006–2007

Atelier Mathématiques et biologie 2004–2005

Atelier Mathématiques et biologie 2005–2006

Atelier Mathématiques et biologie 2006–2007

Conférences du département d’Études cognitives

Séminaire Archéologie des sanctuaires celtiques

Séminaire Art, création, cognition

Séminaire de l’ITEM : De l’archive manuscrite au scriptorium électronique

Séminaire de l’ITEM : Genèse et correspondances

Séminaire de l’ITEM : Genèses théâtrales

Séminaire de l’ITEM : Genèses, récit d’auteur / récit de critique

Séminaire de l’ITEM : L’écriture et le souci de la langue

Séminaire du Département de biologie

Séminaire du Département de chimie

Séminaire du Laboratoire de géologie

Séminaire du Laboratoire de météorologie dynamique

Séminaire du Laboratoire de physique statistique

Séminaire Environnement et société

Séminaire européen Sciences sociales et santé mentale

Séminaire général du Département d’informatique

Séminaire général du Département de physique

Séminaire général du Département TAO

Séminaire Histoire de l’enseignement supérieur français, XIX°–XX° siècles

Séminaire Histoire et philosophie des mathématiques

Séminaire Littérature et morale à l’âge classique

Séminaire Louis Pasteur de l’ENS : The design of photosynthesis

Séminaire Modélisation et méthodes statistiques en sciences sociales

Séminaire Musique et mathématiques

Séminaire Musique et philosophie

Séminaire Philosophie et mathématiques

Séminaire Vision algorithmique et biologique