<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1746-1596-6-S1-S3</ui>
   <ji>1746-1596</ji>
   <fm>
      <dochead>Proceedings</dochead>
      <bibl>
         <title>
            <p>Towards a computer aided diagnosis system dedicated to virtual microscopy based on stereology sampling and diffusion maps</p>
         </title>
         <aug>
            <au ca="yes" id="A1"><snm>Belhomme</snm><fnm>Philippe</fnm><insr iid="I1"/><email>philippe.belhomme@unicaen.fr</email></au>
            <au id="A2"><snm>Oger</snm><fnm>Myriam</fnm><insr iid="I1"/><email>m.oger@baclesse.fr</email></au>
            <au id="A3"><snm>Michels</snm><fnm>Jean-Jaques</fnm><insr iid="I1"/><email>jj.michels@baclesse.fr</email></au>
            <au id="A4"><snm>Plancoulaine</snm><fnm>Benoit</fnm><insr iid="I1"/><email>benoit.plancoulaine@unicaen.fr</email></au>
            <au id="A5"><snm>Herlin</snm><fnm>Paulette</fnm><insr iid="I1"/><email>p.herlin@baclesse.fr</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>GRECAN EA 1772, IFR ICORE 146, Universit&#233; de Caen, France</p></ins>
         </insg>
         <source>Diagnostic Pathology</source>
         
         
         <supplement><title><p>Proceedings of the 10th European Congress on Telepathology and 4th International Congress on Virtual Microscopy</p></title><editor>Klaus Kayser, Arvydas Laurinavicius and Gian Kayser</editor><sponsor><note>Sponsorship for publication of these proceedings has been provided by the International Academy of Telepathology (IAT), COST-Action IC0604, EU Verein F&#246;rderung des Biol.-Techn. Fortschritts in der Medizin, e.V., Heidelberg, DiagnomX GmbH.</note></sponsor><note>Proceedings</note></supplement><conference><title><p>The 10th European Congress on Telepathology and 4th International Congress on Virtual Microscopy</p></title><location>Vilnius, Lithuania</location><date-range>1-3 July 2010</date-range><url>http://www.telepathology2010.com</url></conference><issn>1746-1596</issn>
         <pubdate>2011</pubdate>
         <volume>6</volume>
         <issue>Suppl 1</issue>
         <fpage>S3</fpage>
         <url>http://www.diagnosticpathology.org/content/6/S1/S3</url>
         <xrefbib><pubidlist><pubid idtype="pmpid">21489198</pubid><pubid idtype="doi">10.1186/1746-1596-6-S1-S3</pubid></pubidlist></xrefbib>
      </bibl>
      <history><pub><date><day>30</day><month>3</month><year>2011</year></date></pub></history>
      <cpyrt><year>2011</year><collab>Belhomme et al; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>An original strategy is presented, combining stereological sampling methods based on test grids and data reduction methods based on diffusion maps, in order to build a knowledge image database with no bias introduced by a subjective choice of exploration areas. The practical application of the exposed methodology concerns virtual slides of breast tumors.</p>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Introduction</p>
         </st>
         <p>While pathologist population tends to dramatically dropped, the number of pathological cases to examine increases steadily (mainly due to the new screening campaigns). Fully automated image processing is able to provide a solution to this problem. Indeed, it may help pathologists in their daily practice in finding objective criteria for differential diagnosis or quantifying prognostic markers.</p>
         <p>The recent marketing of digitizers now allows visualizing the entire histological slide at high resolution, while limiting time expense and artifacts previously encountered with image tiling methods <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. More and more introduced in pathology departments, these systems however generate very large images which frequently exceed several Gigabytes. Because of tumor heterogeneity, it is essential to build image knowledge databases containing representative features of the various morphological types of lesions before considering implementing computer-aided diagnosis systems <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. But, as it is almost impossible for a pathologist to manually segment such a large image, and a fortiori many of them (the estimated time being hundred hours), the current practice consists in manually selecting some 'representative areas'. A bias is then introduced in the process as this choice is obviously subjective. It is then mandatory to find wiser solutions leading to an unbiased collection of image databases. The sampling tools offered by stereology can be of great help in this context <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>.</p>
         <p>Systematic sampling resulting from a random starting point with a fixed periodic interval is able to reduce the area to be analyzed, while preserving the collection of varied and characteristic regions encountered in a virtual slide (VS) of a tumor. However, even if the working area is smaller, the number of selected regions can be very high and can include many redundant elements. A data reduction has then to be conducted in order to keep a proper right number of representative elements. Among these reduction methods, the diffusion maps <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> provide a very attractive framework for processing and visualizing huge non-linear bulk data.</p>
         <p>This work relates to the medical image processing and retrieval field, with the goal to develop and propose a functional computer-aided diagnosis system based on a knowledge database. The original strategy exposed in this paper consists in starting from a collection of VS, then taking advantage of stereological sampling methods and diffusion maps, to finally compute a knowledge image database containing a small number of image patches that are representative of a given histological type or subtype. The practical application illustrating this framework makes use of VS of breast tumors.</p>
      </sec>
      <sec>
         <st>
            <p>Materials</p>
         </st>
         <p>Images used for illustrating the strategy are VS of histological sections of breast tumors, stained in the same laboratory according to the Hematoxylin-Eosin-Safron protocol and acquired with the same digital scanner. The main goal here is to collect a useful number of image patches corresponding to a given histological type. Its ability to be embedded into a computer-aided diagnosis system (CADS) is illustrated by building an unbiased image database containing representative patches of a benign tumor (Fibroadenoma) and by testing the discrimination between a benign tumor and a malignant tumor (Fibroadenoma vs Comedo carcinoma). Images have been acquired at X20 (0.5 &#181;m per pixel), using a digital slide scanner (ScanScope CS from Aperio Technologies, Inc) and then stored in TIFF 6.0 image file format with a 30% jpeg compression <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Their mean size is about 65000x43000 pixels<sup>2</sup> and each holds about 350 MB on a hard disk.</p>
         <p>The tools needed for this study were developed in Python language (<url>http://www.python.org</url>) with the help of specialized modules (PIL: Python Imaging Library and SciPy: <url>http://www.scipy.org</url>).</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Stereology</p>
            </st>
            <p>In order to reduce the expertise workload, a stereological test grid for point counting is over imposed onto VS in ImageScope viewer (Aperio Technologies, Inc) <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. This kind of probe is usually dedicated to estimate the area and volume fractions in a tissue compartment <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. In our application, the grid step was set to 1000x1000 pixels, ie 2800 points in an image having an average size of 65000x43000. The pathologist has to determine in which histological class must be arranged each area centered on grid points; 30 possibilities are available (breast tumor histological types and sub-types) provided by the annotation tool embedded in Aperio ImageScope. The pathologist is only asked to draw on each point a simple line selected in the overlay layer whose name corresponds to his choice. Each area is then extracted by a dedicated software at the plain resolution and stored as an uncompressed TIFF image file in order to enrich the future knowledge database. These areas (called later 'patches') are squares of size 400x400 pixels. Patch size has been chosen according to the mean size of representative structures encountered in the various histological types of breast tumors <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. It allows pathologists to expertise only 16% of the whole VS. All patches are then analyzed and sorted in order to storing only the most representative ones. The original image name, the histological type of the patch and its coordinates in the test grid are stored in each filename, for later being used by sending SQL requests to the database.</p>
         </sec>
         <sec>
            <st>
               <p>Patch characterization</p>
            </st>
            <p>For each patch, statistical features are computed and embedded in a vector signature. All these signatures will be used in a later image retrieval process. At this stage of the study, none of the features results from segmentation. All are obtained from global measurements on patches computed on <it>I</it><sub>1</sub><it>I</it><sub>2</sub><it>I</it><sub>3</sub> and <it>YCh</it><sub>1</sub><it>Ch</it><sub>2</sub> color components which are derived from the <it>RGB</it> color system according to the following formulas proposed by Ohta <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and Carron <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>:</p>
            <p><display-formula><graphic file="1746-1596-6-S1-S3-i1.gif"/></display-formula>.
</p>
            <p>These color components have been computed from the <it>RGB</it> histograms previously reduced to 64 values.</p>
            <p>For a given color component whose histogram is called <it>H</it>, the computed features are: <it>H</it>, <it>H</it> reverse sorted, cumulative <it>H</it>, 20%, 40%, 60% and 80% quantiles of cumulative <it>H</it>, meanH, medianH, modeH, SkewnessH, KurtosisH, PearsonModeSkewnessH, that is a total of 13 data. Three of them are themselves vectors of 64 values, but will provide a single feature after distance measurements between two signatures. Definitions of these statistical features can be found in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. With the resulting 5 effective color components (as <it>Y</it>=<it>I</it><sub>1</sub>), 65 distance measures will be taken into account but 1010 values will be stored in the signature vector for each patch. Considering the sparse numerical range of features in signatures, the Kullback-Leibler symmetrical distance has been retained for its ability to manage such values, while remaining simple and fast to implement (compared to Mahalanobis or earth mover's distance for example). The symmetric Kullback-Leibler distance between two vectors <it>p</it><sub>1</sub>,<it>p</it><sub>2</sub> of length n is defined by:</p>
            <p><display-formula><graphic file="1746-1596-6-S1-S3-i2.gif"/></display-formula>.
</p>
            <p>The computation time can be reduced using:</p>
            <p><display-formula><graphic file="1746-1596-6-S1-S3-i3.gif"/></display-formula>.
</p>
            <p>In order to give the same weights to histogram features (<it>h<sub>i</sub></it>) and scalar features (<it>x<sub>i</sub></it>), the Kullback-Leibler distance is averaged by the number of histogram values while comparing <it>h<sub>1</sub></it> and <it>h<sub>2</sub></it>. Because of the symmetry of <it>D<sub>KL</sub></it>, and with <it>N</it> images to process, the computation time is proportional to <inline-formula><graphic file="1746-1596-6-S1-S3-i4.gif"/></inline-formula> and is parallelized on multi-core/multi-processor computers.</p>
         </sec>
         <sec>
            <st>
               <p>Data reduction</p>
            </st>
            <p>The ultimate goal of this study is to contribute to the development of a computer-aided diagnosis system (CADS) whose one component should be a visualization tool dedicated to knowledge image databases. This tool would be useful for pathologists if results can be visualized in a 2D or possibly 3D space. It is therefore necessary to reduce dimensionality from n (65 dimensions in our example) to 2 or 3. The patches signatures do not necessary contain linear data. Therefore it is not appropriate to perform a principal component analysis (PCA). Belkin <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and Coifman <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> have shown that methods based on diffusion maps, involving eigenvalues and eigenvectors of a normalized graph Laplacian, are well suited to non linear data.</p>
            <p>Let <it>X</it>={<it>x</it><sub>1</sub>,<it>x</it><sub>2</sub>,...,<it>x<sub>N</sub></it>} be a set of <it>N</it> patches. A (<it>N</it>x<it>N</it>) kernel <it>P</it> is obtained whose coefficients are:</p>
            <p><inline-formula><graphic file="1746-1596-6-S1-S3-i5.gif"/></inline-formula> where <inline-formula><graphic file="1746-1596-6-S1-S3-i6.gif"/></inline-formula> and <inline-formula><graphic file="1746-1596-6-S1-S3-i7.gif"/></inline-formula>.</p>
            <p>The eigenvectors &#966;<sub>k</sub> of <it>P</it>, ordered by decreasing eigenvalues, give the axes of the new observation space. It must be noticed that &#966;<sub>0</sub> is never used since linked to the eigenvalue &#955;=1 (ie the mean of the data set). The projection is then done in (&#966;<sub>1</sub>, &#966;<sub>2</sub>) for a 2D space or (&#966;<sub>1</sub>, &#966;<sub>2</sub>, &#966;<sub>3</sub>) for 3D. The choice of &#949; is empirical but should permit a moderate decrease of the exponential; the median of <it>D<sub>KL</sub></it> distances is usually chosen <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>To illustrate the data reduction algorithm, 4 VS coming from different pathological cases have been selected; their storage needs 1,5 GB. A total number of 2967 patches, classified as Fibroadenoma by a pathologist, have been extracted from a stereological test grid. Figure <figr fid="F1">1</figr> shows their projection in a 2D space. In this reduced space, a classical Euclidean distance can be applied to estimate the similarity between two different patches.</p>
         <fig id="F1"><title><p>Figure 1</p></title><caption><p>2D projection of Fibroadenoma patches</p></caption><text>
   <p>2D projection of Fibroadenoma patches</p>
</text><graphic file="1746-1596-6-S1-S3-1"/></fig>
         <p>Keeping only 100 representative elements thanks to a regular decimation along the original curve, one obtains a new set of patches to be stored in the knowledge database. Their 2D projection is illustrated in Figure <figr fid="F2">2</figr>. These patches represent the unbiased reference, to which new 400x400 areas, extracted from unknown VS, should be compared.</p>
         <fig id="F2"><title><p>Figure 2</p></title><caption><p>Selected Fibroadenoma patches</p></caption><text>
   <p>Selected Fibroadenoma patches</p>
</text><graphic file="1746-1596-6-S1-S3-2"/></fig>
         <p>To illustrate the comparison between a benign and a malignant tumor, these 100 patches obtained from the Fibroadenoma class, were compared to 64 patches extracted from a Comedo carcinoma. Figure <figr fid="F3">3</figr> exhibits the 2D projection of the overall 164 patches. The good discrimination of these two families is obvious, especially according to the axis &#966;<sub>1</sub>=0 which usually provides the sharpest separation between object classes <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         <fig id="F3"><title><p>Figure 3</p></title><caption><p>2D projection of Fibroadenoma vs Comedo carcinoma (164 patches)</p></caption><text>
   <p>2D projection of Fibroadenoma vs Comedo carcinoma (164 patches)</p>
</text><graphic file="1746-1596-6-S1-S3-3"/></fig>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>This work relies on an original strategy starting from VS and leading to an unbiased knowledge image database containing reference patches of breast tumors. We have shown that combining stereological sampling and data reduction based on diffusion maps offers an interesting general framework for this purpose. Once the sequence of procedures has been implemented, the only parameters to be tuned are the choice of image features to use for patch signatures, the size of patches and their number to be kept in the knowledge database. At this stage of the study, none of the features come from any segmentation or from any texture measurement. When all steps will be validated, it will be time to consider advantages of adding new parameters or to introduce other color components. Using patches acquired at a lower resolution should be also an interesting issue, depending on the histological type or subtype to be studied. It should be noticed that, up to now, we did not try to adjust parameters in order to be independent from the acquisition conditions, since all images come from the same origin. However, the final goal being the development of a CADS for several laboratories, it should be necessary to take this into account, by computing International Color Consortium profiles for each device used along the process, starting from histological staining up to image acquisition <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
         <p>The results illustrated in this study are preliminary ones. Up to now, 400 high resolution VS of breast tumors are available. The benign and malignant tumors were classified into 30 histological types and subtypes. We plan now to project these 30 classes in the same 3D space, in order to analyze their scattering. This work is in progress in our laboratory.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This article has been published as part of <it>Diagnostic Pathology</it> Volume 6 Supplement 1, 2011: Proceedings of the 10th European Congress on Telepathology and 4th International Congress on Virtual Microscopy. The full contents of the supplement are available online at <url>http://www.diagnosticpathology.org/supplements/6/S1</url></p>
         </sec>
      </ack>
      <refgrp><bibl id="B1"><title><p>Virtual Slide Telepathology Systems with JPEG2000, EMBS 2007</p></title><aug><au><snm>Ortiz</snm><fnm>JPG</fnm></au><au><snm>Ruiz</snm><fnm>V</fnm></au><au><snm>Garcia</snm><fnm>I</fnm></au></aug><source>29th Annual International Conference of the IEEE</source><pubdate>2007</pubdate><fpage>880</fpage><lpage>883</lpage></bibl><bibl id="B2"><title><p>Towards an automated virtual slide screening: theoretical considerations and practical experiences of automated tissue-based virtual diagnosis to be implemented in the Internet</p></title><aug><au><snm>Kayser</snm><fnm>K</fnm></au><au><snm>Radziszowski</snm><fnm>D</fnm></au><au><snm>Bzdyl</snm><fnm>P</fnm></au><au><snm>Sommer</snm><fnm>R</fnm></au><au><snm>Kayser</snm><fnm>G</fnm></au></aug><source>Diagnostic Pathology</source><pubdate>2006</pubdate><volume>1</volume><fpage>10</fpage><note>doi: 10.1186/1746-1596</note><xrefbib><pubidlist><pubid idtype="doi">10.1186/1746-1596-1-10</pubid><pubid idtype="pmcid">1524814</pubid><pubid idtype="pmpid">16764733</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Stereology</p></title><aug><au><snm>Elias</snm><fnm>H</fnm></au></aug><source>Proceedings of the Second International Congress for Stereology</source><publisher>Chicago, New York: Springer-Verlag</publisher><pubdate>1967</pubdate></bibl><bibl id="B4"><title><p>Stereology for Statisticians</p></title><aug><au><snm>Baddeley</snm><fnm>A</fnm></au><au><snm>Jensen</snm><fnm>EB</fnm></au></aug><source>Chapman and Hall/CRC</source><pubdate>2005</pubdate></bibl><bibl id="B5"><title><p>Laplacian eigenmaps for dimensionality reduction and data representation</p></title><aug><au><snm>Belkin</snm><fnm>M</fnm></au><au><snm>Niyogi</snm><fnm>P</fnm></au></aug><source>Neural Computation</source><pubdate>2003</pubdate><issue>15</issue><fpage>1373</fpage><lpage>1396</lpage><xrefbib><pubid idtype="doi">10.1162/089976603321780317</pubid></xrefbib></bibl><bibl id="B6"><title><p>Geometric diffusions as a tool for harmonics analysis and structure definition of data: Diffusion maps</p></title><aug><au><snm>Coifman</snm><fnm>RR</fnm></au><au><snm>Lafon</snm><fnm>S</fnm></au><au><snm>Lee</snm><fnm>AB</fnm></au><au><snm>Maggioni</snm><fnm>M</fnm></au><au><snm>Nadler</snm><fnm>B</fnm></au><au><snm>Warner</snm><fnm>F</fnm></au><au><snm>Zucker</snm><fnm>S</fnm></au></aug><source>Proceedings of the National Academy of Sciences</source><pubdate>2005</pubdate><volume>102</volume><issue>21</issue><fpage>7426</fpage><lpage>7431</lpage><xrefbib><pubid idtype="doi">10.1073/pnas.0500334102</pubid></xrefbib></bibl><bibl id="B7"><url>http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf</url></bibl><bibl id="B8"><title><p>Computer-Assisted Stereology for Pathology Applications</p></title><aug><au><snm>Herlin</snm><fnm>P</fnm></au></aug><source>Science Webinar series</source><pubdate>2009</pubdate><url>http://www.aperio.com</url></bibl><bibl id="B9"><title><p>Automated region of interest retrieval and classification using spectral analysis</p></title><aug><au><snm>Oger</snm><fnm>M</fnm></au><au><snm>Belhomme</snm><fnm>P</fnm></au><au><snm>Klossa</snm><fnm>J</fnm></au><au><snm>Michels</snm><fnm>JJ</fnm></au><au><snm>Elmoataz</snm><fnm>A</fnm></au></aug><source>Proceedings of 3rd International Congress on Virtual Microscopy</source><publisher>Toledo, Spain</publisher><pubdate>2008</pubdate></bibl><bibl id="B10"><title><p>Color Information for Region Segmentation</p></title><aug><au><snm>Ohta</snm><fnm>Y-I</fnm></au><au><snm>Kanade</snm><fnm>T</fnm></au><au><snm>Sakai</snm><fnm>T</fnm></au></aug><source>Computer Graphics and Image Processing</source><pubdate>1980</pubdate><volume>13</volume><issue>3</issue><fpage>222</fpage><lpage>241</lpage><xrefbib><pubid idtype="doi">10.1016/0146-664X(80)90047-7</pubid></xrefbib></bibl><bibl id="B11"><title><p>Segmentation d&#8217;images couleur dans la base Teinte-Luminance-Saturation : approche num&#233;rique et symbolique</p></title><aug><au><snm>Carron</snm><fnm>T</fnm></au></aug><publisher>Th&#232;se de doctorat, Universit&#233; de Savoie</publisher><pubdate>1995</pubdate></bibl><bibl id="B12"><url>http://mathworld.wolfram.com/topics/Moments.html</url></bibl><bibl id="B13"><title><p>Indexation automatique d&#8217;images num&#233;riques : application aux images histopathologiques du cancer du sein et h&#233;matologiques de leuc&#233;mies lympho&#239;des chroniques</p></title><aug><au><snm>Oger</snm><fnm>M</fnm></au></aug><publisher>Th&#232;se de doctorat, Universit&#233; de Caen Basse-Normandie</publisher><pubdate>2008</pubdate></bibl><bibl id="B14"><title><p>Image standardization in tissue&#8211;based diagnosis</p></title><aug><au><snm>Kayser</snm><fnm>K</fnm></au><au><snm>Borkenfeld</snm><fnm>S</fnm></au><au><snm>Gortler</snm><fnm>J</fnm></au><au><snm>Kayser</snm><fnm>G</fnm></au></aug><source>Diagnostic Pathology</source><pubdate>2010</pubdate><issue>S13</issue><note>doi:10.1186/1746-1596-5-S1-S13</note><xrefbib><pubidlist><pubid idtype="pmcid">3012022</pubid><pubid idtype="pmpid">21162719</pubid></pubidlist></xrefbib></bibl></refgrp>
   </bm>
</art>