This article is part of the supplement: Proceedings of the 11th European Congress on Telepathology and 5th International Congress on Virtual Microscopy
Out-of-sample extension of diffusion maps in a computer aided diagnosis system. Application to breast cancer virtual slide images
1 BioTICLA-HIQ EA 4656, Université de Caen Basse-Normandie, Caen, France
2 BioTICLA-HIQ EA 4656, CLCC François Baclesse, Caen, France
Diagnostic Pathology 2013, 8(Suppl 1):S9 doi:10.1186/1746-1596-8-S1-S9Published: 30 September 2013
First paragraph (this article has no abstract)
While the pathologist population tends to dramatically drop, the number of pathological cases to be examined increases sharply, mainly due to early screening campaigns; developing automated systems would thus be useful to help pathologists in their daily work. As Virtual Microscopy (VM) is more and more introduced in pathology departments  where it holds immense potential despite the large amounts of data to be managed, its combination with image processing techniques can allow to find objective criteria for differential diagnosis or to quantify prognostic markers. Thus, many works try to develop computer-aided diagnosis systems (CADS) based on image retrieval and classification [2,3]. The first step consists in building a knowledge database involving many features extracted from a set of well-known images; it is an 'off-line' procedure conducted once. These features are represented by vectors of non-linear data acting as a signature for the original images. In a second step, signatures are obtained from new unknown images to analyze and compared with the database; it is an 'on-line' procedure. Because of tumor heterogeneity, it is essential to build knowledge databases containing representative features of the multiple morphological types of lesions before considering to implement a CADS. But, as it is almost impossible for a pathologist to manually segment large virtual slide images (VSI), the usual practice consists in manually selecting some 'representative areas'. A bias is then introduced in the process as this choice is obviously subjective. It is then mandatory to find wiser solutions leading to an unbiased collection of these 'representative areas' (and later called 'patches'). In a previous work , we have proposed an original strategy: starting from a collection of breast cancer VSI, then taking advantage of stereological sampling methods and diffusion maps, a knowledge database is obtained from a reduced number of patches that are representative of given histological types. The sampling tools offered by stereology are well-suited in this context . Systematic sampling starting from a random point with a fixed periodic interval is able to reduce the area to be analyzed, while preserving the collection of distinctive regions encountered in a tumor. However, even if the working area becomes smaller, the number of selected patches can be very large and may include many redundant elements. A data reduction has then to be conducted. Among the available methods, the diffusion maps technique [6,7] has been retained since it provides a very attractive framework for processing and visualizing huge non-linear bulk data. Diffusion maps belongs to unsupervised learning algorithms dealing with a spectral analysis of non-linear data, providing a clustering only for given training points with no straightforward extension for out-of-sample cases. The work presented here focuses on a way to get around this problem and explains how unknown VSI can be classified by considering the diffusion maps as a learning eigenfunction of a data-dependent kernel. It makes use of the Nyström formula to estimate diffusion coordinates of new data . An application on histological types of breast cancer is presented with VSI of Invasive Ductal Carcinoma and Mastosis.