Supporting Information for "Human BioMolecular Atlas Program (HuBMAP): 3D Human Reference Atlas Construction and Usage"
Katy Börner1,2*, Philip D. Blood3, Jonathan C. Silverstein4, Matthew Ruffalo5, Rahul Satija6, Sarah A. Teichmann2,7,8, Gloria Pryhuber9, Ravi Misra9, Jeffrey Purkerson9, Jean Fan10, John W. Hickey11, Gesmira Molla6, Chuan Xu8, Yun Zhang12 Griffin Weber13, Yashvardhan Jain1, Danial Qaurooni1, Yongxin Kong1, HRA Team, Andreas Bueckle1*, Bruce W. Herr II1*
1 Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA 2 CIFAR MacMillan Multiscale Human program, CIFAR, Toronto, Canada
3 Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, PA, USA 4 Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA 5 Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
6 New York Genome Center, New York, NY, USA 7 Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK 8 Department of Medicine, University of Cambridge, Cambridge, UK 9 University of Rochester Medical Center, Rochester, NY, USA 10 Department of Biomedical Engineering, Johns Hopkins University, Baltimore MD, USA 11 Department of Biomedical Engineering, Duke University, Durham, NC, USA 12 J. Craig Venter Institute, La Jolla, CA, USA 13 Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
The LOD server supports SPARQL queries. For easy access to data that is of general utility, pre-made SPARQL queries are provided as web API endpoints via grlc. For example, HRApop users might be interested to examine the biomarker expression values for one cell type across HRApop datasets for specific anatomical structures (Fig. 1) or explore similarity of the 553 datasets used in HRApop construction based on shared cell type populations (Fig. 2) or shared anatomical structures based on mesh-level collision detection (Fig. 3).
SI Figure 1: Dot plot for biomarker expression of one cell type across HRApop datasets. Use the /datasets-with-ct SPARQL query to retrieve all atlas datasets with a given cell type. For cell type ‘adipocyte’, the query returns 109 datasets with that cell type (all were annotated by Azimuth, no other cell type annotation tool assigns an adipocyte cell type) and with a total of 420 biomarkers characterizing that cell type. The query is documented here. The Jupyter Notebook to render the visualization is here.
SI Figure 2. Heatmaps for prevalence of cell types across organs in HRApop dataset. a. Azimuth can be run over four organs. b. CellTypist is available for six organs. c. popV was run for 10 organs. Each heatmap represents a scaled mean value (z-score) for percentage of cells identified in each dataset registered into an organ by tool. The percentage values are scaled using R’s scale() function, where values for a given variable are centered around the mean, and then scaled to the standard deviation from the mean, i.e., given a z-score. A z-score of 0 means these values are close to the variable’s mean value. A color corresponding to a score of 1 would indicate that the cell type percentage values are 1 standard deviation higher than the mean for that cell type, values of 2 would be 2 standard deviations from the mean, etc. Full versions for all three plots are provided here (Azimuth), here (CellTypist), and here (popV). The R Markdown document to generate these visualizations is here.
SI Figure 3. UMAP plot of dataset similarity based on shared anatomical structures. a. The similarity of the 553 atlas-level datasets is plotted here based on the percentage of shared anatomical structures using mesh-level collision detection. Weighted cosine is used here and in US#2 available via the HRA Portal at https://humanatlas.io/user-story/2. Datasets cluster by organ, see legend on right. b. UMAP zoom into four subclusters for the small intestine reveals the four major extraction sites. Full versions for UMAP plots are provided here.
Atlas use case preview: Perivascular immune cells in lung
A use case featuring an example application using the Vitessce visualization tool to visualize similar locations of healthy adult compared to pediatric lung with BPD disease to demonstrate an assessment of multiple cell types relative to nearest endothelial cell nuclei using single-cell spatial protein biomarker data.
*Disclaimer: The datasets for this analysis are still in preparation for upload to the HuBMAP Portal. As a result, the Tissue Datasets field in the Exploration User Interface linked below will show 0. We provide these two datasets via Google Drive for the time being. Once the datasets are on the HuBMAP Portal, this field will be updated.*
Atlas use case preview: Hierarchical cell type populations within FTUs
A use case featuring a code template for hierarchical cell neighborhood analysis. The code was developed for analyzing cell type neighborhoods across scales and we have named some of these scales: cellular “neighborhoods”, “communities”, and “functional tissue units.” The calculation of similar cellular neighborhoods, communities, and tissue units across different scales is analogous to how we might think that people form neighborhoods, cities, and states.
The cell type predictions for the same dataset, using the current version of the cell type model by the Van Valen lab are also made available at https://drive.google.com/drive/folders/1W0MVcc4Zx1pPHmshSohhzYIcvFxyFBDi.
We show a comparison between the original STELLAR predictions (SI Fig. 4, left) vs. the predictions from the development version of the cell type model (SI Fig. 4, right) for one dataset. We also show a confusion matrix for the cell type categories for the same dataset, see SI Fig. 5.
SI Figure 4. Comparison between cell type predictions from STELLAR (left) and development version of cell type model (right).
SI Figure 5. Confusion matrix for B009_Trans_CL_reg001 dataset.