FGFR3-TACC3 supplementary methods

These are the supplementary methods of this article:

Clinical, molecular and radiomic profile of gliomas with FGFR3-TACC3 fusions

Anna Luisa Di Stefano1,2,3, Alberto Picca4,5, Edouard Saragoussi6, Franck Bielle7, Francois Ducray8,9, Chiara Villa10, Marica Eoli11, Rosina Paterra11, Luisa Bellu2, Bertrand Mathon12, Laurent Capelle12, Véronique Bourg13, Arnaud Gloaguen14,15, Cathy Philippe15, Vincent Frouin15, Yohann Schmitt1, Julie Lerond1, Julie Leclerc7, Anna Lasorella16,17,18, Antonio Iavarone16,17,19, Karima Mokhtari7, Julien Savatovsky6, Agusti Alentorn1,2, Marc Sanson1,2,20; TARGET study group.

1. Inserm U 1127, CNRS UMR 7225, Sorbonne Université, UPMC Univ. Paris 06 UMR S 1127, Institut du Cerveau et de la Moelle épinière, ICM, F-75013, Paris, France. Equipe labellisée LNCC. Site de Recherche Intégré sur le Cancer (SIRIC) CURAMUS.

2. AP-HP, Hôpital de la Pitié-Salpêtrière, Service de Neurologie 2, F-75013 Paris, France

3. Department of Neurology, Foch Hospital, Suresnes, F-92151 Paris, France

4. IRCCS C. Mondino Foundation, Pavia, Italy

5. Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy

6. Departments of Radiology, Fondation Ophtalmologique Adolphe de Rothschild, Paris, France

7. Neuropathology, AP-HP, Hôpitaux Universitaires Pitié Salpêtrière-Charles Foix, Paris, France

8. Service de Neuro-Oncologie, Hospices Civils de Lyon, Université Claude Bernard Lyon 1, Department of Cancer Cell Plasticity, Cancer Research Center of Lyon, INSERM U1052, CNRS UMR5286, Lyon, France

9. POLA Network

10. Department of Pathology, Foch Hospital, Suresnes, France

11. Unit of Molecular Neuro-Oncology, Fondazione IRCCS Istituto Neurologico Carlo Besta, 20133 Milan, Italy

12. AP-HP, Hôpital de la Pitié-Salpêtrière, Service de Neurochirurgie, F-75013 Paris, France

13. Department of Neurology, Pasteur 2 Hospital, Nice Côte D'Azur University, Nice, France

14. Laboratoire des Signaux et Systèmes (L2S), CentraleSupélec, Université Paris-Saclay, Gif-sur-Yvette, France

15. Université Paris-Saclay, CEA, Neurospin, 91191, Gif-sur-Yvette, France

16. Institute for Cancer Genetics, Columbia University, New York, USA

17. Department of Pathology and Cell Biology, Columbia University, New York, New York, USA

18. Department of Pediatrics, Columbia University, New York, New York, USA

19. Department of Neurology, Columbia University, New York, New York, USA

20. OncoNeuroTek, Institut du Cerveau et de la Moelle épinière, Paris, France

Non-author contributors:

Paule Augerau, Antoine Carpentier, Isabelle Catry-Thomas, Olivier Chinot, Caroline Dehais, Jean-Yves Delattre, Dominique Figarella-Branger, Stephan Gaillard, David Guyon, Khe Hoang-Xuan, Caroline Houillier, Ahmed Idbaih, Florence Laigle-Donadey, Emilie Le Rhun, David Meyronet, Elisabeth Moyal, Dimitri Psimaras, Luc Taillandier, Mehdi Touat, Nadia Younan.

Corresponding Author:

Marc Sanson, Service de Neurologie 2, GH Pitié-Salpêtrière, 47 bd de l’Hopital, 75013 Paris, France. marc.sanson@aphp.fr

Radiological characteristics

In this study we used different MRI scanners (GE Healthcare, Siemens and Phillips of 1.5T and 3T). We used post-contrast-enhanced T1-weighted 3D magnetization-prepared rapid acquisition gradient echo (MPRAGE) sequence (subsequently referred to as T1 enhanced, T1e) as well as a fluid attenuated inversion recovery (FLAIR) sequence. We used the following parameters: median repetition time (TR) for FLAIR 8750ms, median TR for T1e 600ms, median echo time (TE) for FLAIR 139ms and median TE for T1e 11ms. Overall, the medium slice thickness was 1.2mm.

Non-enhanced tumor was defined as regions of T2-weighted image hyperintensity (less than the signal intensity of cerebrospinal fluid and edema) that is associated with mass effect and architectural distortion, including blurring of the gray-white interface.

Necrosis was defined as a region within the enhancing lesion with irregular margins that does not enhance or that showed absence of central enhancement. Edema was defined on FLAIR or T2-weighted images with signal intensity greater than that of non-enhanced tumor but lower on T2 than that of cerebrospinal fluid.

For the ordinal tumor composition features (proportion contrast-enhanced tumor, proportion non-enhanced tumor, proportion necrosis, and proportion edema), the consensus value was equal to the category the neuroradiologists selected most frequently. For the quantitative lesion size measurements, the consensus value was equal to the median of the neuroradiologists’ measurements.

Imaging post-processing and radiomic feature extraction

Once image registration was performed as detailed in the manuscript, we have performed N4-bias field correction [1] and we have standardized intensity MRI for each feature using WhiteStripe R [2]. Then, we used PyRadiomics to transform post-contrast T1e and FLAIR tumor masks discrete and undecimated wavelet transformation along the three spatial dimension to generate eight additional transformed images.

Wavelet transformation enables a multi-scale representation of imaging data into low (L) and high (H) spatial frequency regions.

Finally, binary masks were transferred to the MNI152 space using a series of linear and non-linear registrations in ANTs[3]. The tumor distribution of binary masks was computed (FGFR3-TACC3 positive vs FGFR3-TACC3 negative using Sparse Canonical Correlation Analysis for Neuroimaging (SCCAN)[4]. SCCAN is an optimization multivariate procedure that gradually defines the weights to apply to each voxel such that the overall relationship with the brain tumor distribution is maximized while taking into account other constraints.

We used PyRadiomics pipeline to extract all the radiomics features [5].

Within each volume of interest, we extracted 2616 radiomic features: (a) first-order features, (b) volume and shape features, and (c) textures features.

Volume and shape features depend on the binary information of the segmentation mask only, while first-order and texture features reflect the intensity of normalized imaging sequences and the respective wavelet transformations.

First-order features represent the voxel intensity values according to the first-order statistics, including means, standard deviation, kurtosis, skewness, uniformity, energy and entropy. Volume and shape features characterize the shape and volume of interest according to metrics such as compactness, maximal three-dimensional diameter, spherical disproportion, surface area, volume and surface-to-volume ratio.

Textural features were based on both co-occurrence and run-length-based features. Co-occurrence features were calculated on the basis of grey-level co-occurrence matrices (GLCM) and included Haralick features, while gray run-length matrix (GLRLM) features represented the structure of an image region characterized by a gray-level run-length matrix (GLRLM). In addition, we also included other texture analyses: gray level size zone matrix (GLDM), neighboring gray tone difference matrix and gray level dependence matrix (NGTDM). GLDM quantifies gray level dependencies in an image and NGTDM quantifies the difference between a gray value and the average gray value of its neighbors within distance δ.

Furthermore, we also applied different transformations of the aforementioned features:

1. The Laplacian of Gaussian (log) filter which emphasizes areas of gray level change, where sigma defines how coarse the emphasized texture should be. A low sigma emphasis on fine textures (change over a short distance), where a high sigma value emphasizes coarse textures (gray level change over a large distance).

2. Logarithm transformation (i.e. the logarithm of the absolute intensity + 1),

3. Square transformation (i.e. the square of the image intensities and linearly scales them back to the original range)

4. Squareroot, that takes the square root of the absolute image intensities and scales them back to original range.

5. The exponential transformation.

6. The wavelet transformation (that is already detailed in the manuscript).

Further details on the radiomics features are provided on [6].

Statistical analysis

The ability of radiomic features to classify F3T3-positive gliomas was assessed using random forest and the R caret package (v.6.0-80). In order to better represent the real distribution of F3T3 samples within high-grade gliomas (~3%), we have weighted this classification model with this proportion to correct the imbalance between F3T3 groups.

We used plsRcox R package [7] that provides a Partial least squares Regression (PLS) approach for fitting several Cox models in high-dimensional settings. We assessed the performance the different Cox proportional Hazard models with the Harrel’s concordance index (C-index) [8].

The clinical model included the following variables: age, Karnofsky Performance Status, surgery and WHO grade.

The genetics model was performed using these genetic alterations: FGFR3-TACC3 fusion status, chromosome 7p copy number status (normal, loss or gained), chromosome 10 copy number status (normal, loss or gained), EGFR copy number (normal, gained or amplified), MDM2 copy number (normal, gained or amplified), CDK4 copy number (normal, gained or amplified), MGMT methylation status (unmethylated or methylated), TERT promoter mutation status (mutated or wild-type).

We also used all the radiomics features (2,616 features) obtained with PyRadiomics in the radiomics model. All radiomics features were normalized by transforming the data into new scores with a mean of 0 and a standard deviation of 1 (z-scores). Each block of data (i.e. clinics, genetics and radiomics) was divided by the square root of the number of variables in the specific block to achieve a block scaling. This means that each block starts with the same variance irrespective of its size, as has been previously proposed [9].

Finally, we also used the different combinations of these three individual Cox models (i.e clinics, genetics and radiomics).

To explore the relationship across the three high-dimensional data (clinical, genetics and radiomics), we used the grimon R package[10]. This algorithm gives an optimized edge layout by rotating each layer to minimize sum of angles or lengths of edges when represented in three-dimensional space, using a simulated annealing, which stochastically approximates global optimization even in a large search space.

In addition, we also used another t-distributed stochastic neighbor embedding (t-SNE) [11] that performs a nonlinear dimensional reduction for visualization high-dimensional data in a low-dimensional space using Rtsne R package (v. 0.13), with 5,000 iterations, perplexity parameter set to 4 and exaggeration factor of 5.

Regarding the radiogenomic analysis, the mean values of ordinal variables such as tumor volumes corresponding to patients with FGFR3-TACC3 fusions were compared to the mean values of the FGFR3-TACC3 negative cohort. Significant differences between groups with and without FGFR3-TACC3 fusions were tested with a two-sided Student’s t test. Correlations between radiological and molecular features were assessed using Spearman correlation in both the discovery and validation cohorts, using a significance threshold of p<0.05.

Radiomics analysis

To represent the different blocks of data (i.e. clinical, genetic and radiomic) we used grimon R package (v1.0.0) that enables to smoothly explore relationships across multi-layered high-dimensional data visualized in 3D [12].

We represented the multi-layered high-dimensional data relationships between the different blocks of data according to F3T3, Figure 1 Methods. The analysis showed lower variability within the F3T3 group when analyzing the genetics or radiomics blocks (Levene’s test p = 0.8 and p = 0.6, respectively) but higher variability in the clinical blocks (p = 0.04, Figure 1 Methods). We also assessed the clustering of radiomics features using t-sne approach [11], showing that most F3T3 samples clustered together, Figure 2 Methods.

Figure 1 Methods

Multi-layered high-dimensional data visualized in 3D using using clinics, genetics and radiomics data and showing a higher degree of variability within the clinical data set.

Figure 2 Methods

Dotplot with two dimensions of T-distributed Stochastic Neighbor Embedding (t-SNE) using radiomics data, showing that samples with F3T3 fusion (red dots) are clustered similarly compared to those without fusion (blue dots).