HTAN Phase 2 Data Model

Documentation for the HTAN Phase 2 Data Model.

All HTAN Centers are required to encode their data and metadata in a common HTAN Data Model. The HTAN Data Model is created via a community Request for Comment (RFC) process, with participation from all HTAN Centers, and covers clinical, biospecimen, genomic, transcriptomic, proteomic, imaging and spatial profiling data.

This documentation describes the HTAN data model, including required metadata attributes for each assay type. To annotate metadata or submit data, use Curator.

This documentation provides comprehensive information about each module in the HTAN Phase 2 Data Model. Each module page lists all attributes you need to fill out, including inherited attributes from base modules.

Getting Started

First, determine whether your data is file-based or record-based:

Record-Based Data

If you have clinical or biospecimen data (patient records, sample metadata), use:

  • Clinical - Clinical and demographic data

  • Biospecimen - Biospecimen metadata and classification

File-Based Data

If you have sequencing, imaging, or other file-based data, use one of the following modules. Each module page shows all required attributes including inherited core attributes:

  • WES - Bulk Whole Exome Sequencing (includes Core File + Base Sequencing + WES attributes)

  • scRNA-seq - Single-cell RNA sequencing (includes Core File + Base Sequencing + scRNA-seq attributes)

  • Digital Pathology - Whole-slide imaging (includes Core File + Base Imaging + Digital Pathology attributes)

  • Multiplex Microscopy - Multiplexed tissue imaging (includes Core File + Base Imaging + Multiplex Microscopy attributes)

  • SpatialOmics - Spatial omics assays (includes Core File + SpatialOmics attributes)

Each module page is self-contained and lists all attributes you need to fill out, so you don’t need to navigate between multiple pages.

Reference

Modules