ㅤ

Supporting Information for "Human BioMolecular Atlas Program (HuBMAP): 3D Human Reference Atlas Construction and Usage"

Katy Börner^1,2*, Philip D. Blood³, Jonathan C. Silverstein⁴, Matthew Ruffalo⁵, Rahul Satija⁶, Sarah A. Teichmann^2,7,8, Gloria Pryhuber⁹, Ravi Misra⁹, Jeffrey Purkerson⁹, Jean Fan¹⁰, John W. Hickey¹¹, Gesmira Molla⁶, Chuan Xu⁸, Yun Zhang¹² Griffin Weber¹³, Yashvardhan Jain¹, Danial Qaurooni¹, Yongxin Kong¹, HRA Team, Andreas Bueckle^1*, Bruce W. Herr II^1*

¹ Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
² CIFAR MacMillan Multiscale Human program, CIFAR, Toronto, Canada ³ Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, PA, USA
⁴ Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
⁵ Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA ⁶ New York Genome Center, New York, NY, USA
⁷ Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
⁸ Department of Medicine, University of Cambridge, Cambridge, UK
⁹ University of Rochester Medical Center, Rochester, NY, USA
¹⁰ Department of Biomedical Engineering, Johns Hopkins University, Baltimore MD, USA
¹¹ Department of Biomedical Engineering, Duke University, Durham, NC, USA
¹² J. Craig Venter Institute, La Jolla, CA, USA
¹³ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

* Corresponding authors:
Katy Börner, katy@iu.edu
Andreas Bueckle, abueckle@iu.edu
Bruce W. Herr II, bherr@iu.edu

Links

Link to paper
Link to high-resolution Supporting Figures
Link to Supporting Tables
Link to Paper Publication Page
Link to GitHub Repository

Link to HuBMAP Consortium Website
Link to HuBMAP Portal
Link to HRA Portal

HuBMAP Portal, HuBMAP Data Portal, and HRA Portal

The HuBMAP Portal (https://hubmapconsortium.org) introduces goals and links to experimental and atlas data, tools, and training materials. The Data Portal (https://portal.hubmapconsortium.org) serves experimental datasets and supports data processing, search, filtering, and visualization. The Human Reference Atlas (HRA) Portal (https://humanatlas.io) provides open access to atlas data, code, procedures, and instructional materials.

Flexible hybrid cloud microservices architecture

Click on architecture components to explore resources, APIs, and applications.

Selected counts from the paper for HRA v2.0

Count Needed	Query Response	Link to Query/Source
#ASCT+B tables	33	https://humanatlas.io/assets/table-data/asctb_release6.csv
Number of 3D Reference Objects for organs	65	https://api.triplydb.com/s/vS1axJ1as
Unique Uberon IDs for 65 3D Reference Objects	37	https://api.triplydb.com/s/Rg93GoKzZ
Number of unique UBERON IDs for 65 organs	516	https://api.triplydb.com/s/puMfazNpK
Number of all 3D anatomical structures with an Uberon ID	1,192	https://api.triplydb.com/s/vS1axJ1as
Number of all anatomical structures with an Uberon ID including organs themselves	1,257	https://api.triplydb.com/s/vS1axJ1as
Number of unique 2D FTUs, their unique cell types (CTs), and the total number of cells selectable in FTU illustrations	22 FTUs 116 CT 3,742 cells	https://cdn.humanatlas.io/hra-releases/v2.0/2d-ftu/asct-b-2d-models-crosswalk.csv

Atlas construction and publication

Crosswalk tables for 3D Reference Objects:

Anatomical Structures, Cell Types and Biomarkers (ASCT+B) Tables to 3D Reference Object Library Mapping: https://doi.org/10.48539/HBM224.QJKZ.987

Crosswalk tables for cell type annotation tools:

Azimuth: https://doi.org/10.48539/HBM587.ZSWT.783
CellTypist: https://doi.org/10.48539/HBM478.ZWDH.384
popV: https://doi.org/10.48539/HBM978.STZB.569

Atlas use case preview: Facilitating atlas construction by aligning new tissue blocks with existing data

User stories US#1-2 have been partially implemented and can be explored online via the HRA Portal at https://humanatlas.io/overview-use-the-hra

HRApop workflows: https://github.com/hubmapconsortium/hra-workflows
HRApop workflow runner: https://github.com/hubmapconsortium/hra-workflows-runner
HRApop enriched dataset graph: https://lod.humanatlas.io/graph/hra-pop/latest

This paper uses the HRApop v0.10.2 run and all data is available via

Linked Open Data (LOD) server: https://lod.humanatlas.io/ds-graph/hra-pop-full/latest
GitHub: https://github.com/x-atlas-consortia/hra-pop/tree/main/output-data/v0.10.2

The LOD server supports SPARQL queries. For easy access to data that is of general utility, pre-made SPARQL queries are provided as web API endpoints via grlc. For example, HRApop users might be interested to examine the biomarker expression values for one cell type across HRApop datasets for specific anatomical structures (Fig. 1) or explore similarity of the 553 datasets used in HRApop construction based on shared cell type populations (Fig. 2) or shared anatomical structures based on mesh-level collision detection (Fig. 3).

SI Figure 1: Dot plot for biomarker expression of one cell type across HRApop datasets. Use the /datasets-with-ct SPARQL query to retrieve all atlas datasets with a given cell type. For cell type ‘adipocyte’, the query returns 109 datasets with that cell type (all were annotated by Azimuth, no other cell type annotation tool assigns an adipocyte cell type) and with a total of 420 biomarkers characterizing that cell type. The query is documented here. The Jupyter Notebook to render the visualization is here.

SI Figure 2. Heatmaps for prevalence of cell types across organs in HRApop dataset. a. Azimuth can be run over four organs. b. CellTypist is available for six organs. c. popV was run for 10 organs. Each heatmap represents a scaled mean value (z-score) for percentage of cells identified in each dataset registered into an organ by tool. The percentage values are scaled using R’s scale() function, where values for a given variable are centered around the mean, and then scaled to the standard deviation from the mean, i.e., given a z-score. A z-score of 0 means these values are close to the variable’s mean value. A color corresponding to a score of 1 would indicate that the cell type percentage values are 1 standard deviation higher than the mean for that cell type, values of 2 would be 2 standard deviations from the mean, etc. Full versions for all three plots are provided here (Azimuth), here (CellTypist), and here (popV). The R Markdown document to generate these visualizations is here.

SI Figure 3. UMAP plot of dataset similarity based on shared anatomical structures. a. The similarity of the 553 atlas-level datasets is plotted here based on the percentage of shared anatomical structures using mesh-level collision detection. Weighted cosine is used here and in US#2 available via the HRA Portal at https://humanatlas.io/user-story/2. Datasets cluster by organ, see legend on right. b. UMAP zoom into four subclusters for the small intestine reveals the four major extraction sites. Full versions for UMAP plots are provided here.

Atlas use case preview: Perivascular immune cells in lung

A use case featuring an example application using the Vitessce visualization tool to visualize similar locations of healthy adult compared to pediatric lung with BPD disease to demonstrate an assessment of multiple cell types relative to nearest endothelial cell nuclei using single-cell spatial protein biomarker data.

Link to data on Google Drive: https://drive.google.com/drive/folders/1LX4PHzohrK5l_2G5szZEdxz8iTvIztx2

*Disclaimer: The datasets for this analysis are still in preparation for upload to the HuBMAP Portal. As a result, the Tissue Datasets field in the Exploration User Interface linked below will show 0. We provide these two datasets via Google Drive for the time being. Once the datasets are on the HuBMAP Portal, this field will be updated.*

Link to code on GitHub: https://github.com/cns-iu/hra-construction-usage-supporting-information/tree/main/perivascular-immune-cells-in-lung

Atlas use case preview: Hierarchical cell type populations within FTUs

A use case featuring a code template for hierarchical cell neighborhood analysis. The code was developed for analyzing cell type neighborhoods across scales and we have named some of these scales: cellular “neighborhoods”, “communities”, and “functional tissue units.” The calculation of similar cellular neighborhoods, communities, and tissue units across different scales is analogous to how we might think that people form neighborhoods, cities, and states.

Link to Nature paper and data: https://portal.hubmapconsortium.org/browse/publication/77ab35880329b5932380104aa58795a4
Link to worksheet on GitHub: https://github.com/HickeyLab/Hierarchical-Tissue-Unit-Annotation

The cell type predictions for the same dataset, using the current version of the cell type model by the Van Valen lab are also made available at https://drive.google.com/drive/folders/1W0MVcc4Zx1pPHmshSohhzYIcvFxyFBDi. We show a comparison between the original STELLAR predictions (SI Fig. 4, left) vs. the predictions from the development version of the cell type model (SI Fig. 4, right) for one dataset. We also show a confusion matrix for the cell type categories for the same dataset, see SI Fig. 5.

alt_text

SI Figure 4. Comparison between cell type predictions from STELLAR (left) and development version of cell type model (right).

alt_text

SI Figure 5. Confusion matrix for B009_Trans_CL_reg001 dataset.