SpatialOmics
HTAN Spatial Omics Data Model Schema for Phase 2 - All Levels
CoreFileAttributes
Universal attributes that apply to all file-based data in HTAN
Attribute |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
Name of the file |
|
string |
Yes |
Format of the file (e.g., fastq, bam, vcf, h5ad) |
|
string |
Yes |
HTAN Data File ID (Primary Key) |
|
string |
Yes |
HTAN Parent ID(s) - Foreign key(s) to parent entity (B for Biospecimen, D for data file). One or more IDs; for aggregated files provide multiple. Each ID must have B or D suffix. Supports HTA200-229 for phase 2. |
SpatialLevel1
Level 1 raw spatial data bundle (optional) - Contains raw sequencing data, images, and registration files
Attribute |
Type |
Required |
Description |
|---|---|---|---|
|
Yes |
High-level package format of the bundle |
|
|
string |
Yes |
Name of the file. Must end with an extension matching the FILE_FORMAT (.tar for tar; .tar.gz for tar.gz; .zip for zip) |
|
Yes |
Name of the platform used to generate the data |
|
|
Yes |
Broad assay class (drives downstream conditionals) |
|
|
string |
Yes |
List of expected files or folders in this bundle (relative paths within the archive) |
|
boolean |
No |
If raw/aligned sequencing data is included |
|
Conditional: SEQUENCING_FILE_TYPE is required when HAS_SEQUENCING is true |
Sequencing file type |
|
|
boolean |
Yes |
Whether any image files (e.g., TIFFs) are included |
|
Conditional: IMAGE_TYPES is required when HAS_IMAGES is true |
Types of images provided |
|
|
boolean |
Conditional: HAS_PROBE_SET is required when ASSAY_TYPE is molecular barcoding |
Whether a targeted probe/gene panel is included |
|
boolean |
Yes |
Whether any spatial registration transform files are included |
|
string |
Yes |
HTAN Data File ID (Primary Key) |
|
string |
Yes |
HTAN Parent ID(s) - Foreign key(s) to parent entity (B for Biospecimen, D for data file). One or more IDs; for aggregated files provide multiple. Each ID must have B or D suffix. Supports HTA200-229 for phase 2. |
SpatialLevel3
Level 3 processed spatial assay output bundle - Contains platform-specific output files, segmentation, matrices, and QC metrics
Attribute |
Type |
Required |
Description |
|---|---|---|---|
|
Yes |
Name of the platform used to generate the data |
|
|
No |
Type of spatial assay (in situ or capture-based) |
|
|
string |
Yes |
Assay chemistry version (e.g., v1, v2) |
|
string |
No |
Software/tools used for processing |
|
string |
No |
URL to protocol documentation |
|
boolean |
Yes |
Whether RNA was measured |
|
boolean |
Yes |
Whether protein was measured |
|
Conditional: TRANSCRIPTOME_TYPE is required when RNA_MEASURED is true |
Molecular targets measured using panels |
|
|
integer |
Yes |
Total number of targets in the panel |
|
string |
Conditional: PANEL_NAME is required when TRANSCRIPTOME_TYPE is Targeted OR PROTEIN_MEASURED is true |
Name of the panel used in this experiment |
|
string |
Conditional: HTAN_PANEL_ID is required when TRANSCRIPTOME_TYPE is Targeted OR PROTEIN_MEASURED is true |
Unique HTAN identifier for the panel used in this experiment. Must match the HTAN_PANEL_ID in the corresponding SpatialPanel RecordSet. Follows the HTAN identifier format with a P-prefix segment (e.g., HTA201_1_P1). |
|
string |
No |
HTAN ID of data file that represents same section imaging |
|
No |
Was same section imaging performed |
|
|
string |
Conditional: SAME_SECTION_IMAGING_CHANNELS is required when SAME_SECTION_IMAGING_MODALITY is fluorescence |
Antigens targeted in same section fluorescence imaging |
|
float |
Yes |
Capture area in µm² |
|
string |
Yes |
List of expected files or folders in this bundle (relative paths within the archive) |
|
string |
No |
Relative path of HTML preview in bundle if present |
|
boolean |
Yes |
Indicates presence of cell segmentation data |
|
string |
Conditional: CELL_SEGMENTATION_METHOD is required when HAS_CELL_SEGMENTATION is true |
Description of segmentation method |
|
Conditional: CELL_SEGMENTED_OBJECT_TYPE is required when HAS_CELL_SEGMENTATION is true |
Level of segmentation |
|
|
integer |
Conditional: NUMBER_OF_SEGMENTED_CELLS is required when HAS_CELL_SEGMENTATION is true |
Total number of segmented cells |
|
boolean |
No |
Indicates presence of dimensionally reduced data |
|
Conditional: DIMENSIONALITY_REDUCTION_METHOD is required when HAS_DIMENSIONALITY_REDUCTION is true |
Method used for dimensionality reduction |
|
|
boolean |
Yes |
Indicates if clustering was performed |
|
string |
Conditional: CLUSTERING_METHOD is required when HAS_CLUSTERING is true |
Method used to define clusters |
|
integer |
Conditional: NUMBER_OF_CLUSTERS is required when HAS_CLUSTERING is true |
Number of clusters identified |
|
string |
Conditional: SLIDE_SERIAL_NUMBER is required when PLATFORM is Visium or Visium HD or Xenium |
Slide serial number |
|
Conditional: CAPTURE_AREA is required when PLATFORM is Visium or Visium HD |
Area (or Capture Area) - One of the either four or two active regions where tissue can be placed on a Visium slide |
|
|
string |
No |
A unique identifier for this individual run (typically associated with a single slide) of the spatial transcriptomic processing workflow |
|
boolean |
Conditional: CYTASSIST_USED is required when PLATFORM is Visium or Visium HD |
Whether CytAssist was used |
|
string |
Conditional: GENOMIC_REFERENCE is required when PLATFORM is Visium or Visium HD |
Reference genome used |
|
string |
Conditional: SEQUENCING_INSTRUMENT is required when SPATIAL_ASSAY_TYPE is capture-based |
Sequencer used |
|
string |
Conditional: SEQUENCING_CONFIGURATION is required when SPATIAL_ASSAY_TYPE is capture-based |
Read and index setup |
|
string |
Conditional: SEQUENCING_DEPTH is required when SPATIAL_ASSAY_TYPE is capture-based |
Sequencing depth |
|
Yes |
Type of spatial unit |
|
|
integer |
Yes |
Features (e.g. spots or bins) under tissue |
|
float |
Yes |
Mean reads per feature |
|
integer |
Yes |
Total genes detected |
|
integer |
Yes |
Total number of reads |
|
string |
Yes |
Name of the bundle file. Must end with .tar.gz or .gz |
|
string |
Yes |
Format of the bundle file (tar.gz or gz) |
|
string |
Yes |
HTAN Data File ID (Primary Key) |
|
string |
Yes |
HTAN Parent ID(s) - Foreign key(s) to parent entity (B for Biospecimen, D for data file). One or more IDs; for aggregated files provide multiple. Each ID must have B or D suffix. Supports HTA200-229 for phase 2. |
SpatialLevel4
Level 4 interoperable spatial omics file (optional) - Harmonized h5ad, RDS, or Zarr file for downstream analysis
Attribute |
Type |
Required |
Description |
|---|---|---|---|
|
Yes |
File format of the data file |
|
|
string |
Yes |
Name of the file. Must end with an extension matching the FILE_FORMAT (.h5ad for h5ad; .rds for rds; .zarr for zarr) |
|
No |
Tools or libraries compatible with this file |
|
|
integer |
Yes |
Number of features (e.g. transcripts) |
|
integer |
Yes |
Number of objects (e.g. cells) |
|
boolean |
Yes |
Indicates presence of dimensionally reduced data |
|
Conditional: DIMENSIONALITY_REDUCTION_METHOD is required when HAS_DIMENSIONALITY_REDUCTION is true |
Method used for dimensionality reduction |
|
|
boolean |
Yes |
Indicates if clustering was performed |
|
string |
Conditional: CLUSTERING_METHOD is required when HAS_CLUSTERING is true |
Method used to define clusters |
|
integer |
Conditional: NUMBER_OF_CLUSTERS is required when HAS_CLUSTERING is true |
Number of clusters identified |
|
boolean |
Yes |
Indicates presence of cell type annotations |
|
string |
Conditional: CELL_TYPE_CALLING_METHOD is required when HAS_CELL_TYPE_CALLING is true |
Method used for cell type annotation |
|
string |
Conditional: CELL_TYPES is required when HAS_CELL_TYPE_CALLING is true |
List of cell types present in the data |
|
boolean |
Yes |
Indicates presence of normalized array |
|
Conditional: NORMALISATION_METHOD is required when HAS_NORMALISED_ARRAY is true |
Method used for normalizing the array data |
|
|
boolean |
Yes |
Indicates presence of raw expression array |
|
boolean |
Yes |
Indicates presence of associated image data |
|
Conditional: IMAGE_TYPE is required when HAS_IMAGE is true |
Type of image associated with the data file |
|
|
string |
Yes |
HTAN Data File ID (Primary Key) |
|
string |
Yes |
HTAN Parent ID(s) - Foreign key(s) to parent entity (B for Biospecimen, D for data file). One or more IDs; for aggregated files provide multiple. Each ID must have B or D suffix. Supports HTA200-229 for phase 2. |
SpatialPanel
Spatial omics panel information for targeted sequencing or protein panels
Attribute |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
Unique identifier for the panel |
|
Yes |
Type of probe target. Determines which identifier fields are required. |
|
|
string |
Yes |
Name of the probe target. For human genes use the HGNC-approved gene symbol (e.g., MYC, PIK3CA); for all other target types use the most appropriate available name (e.g., HPV16-E6 for a viral target) |
|
string |
Conditional: |
Stable Ensembl identifier for the target. Use ENSG-prefixed IDs when TARGET_TYPE is Human Gene (e.g., ENSG00000136997 or ENSG00000136997.20 for MYC); use ENST-prefixed IDs when TARGET_TYPE is Human Transcript (e.g., ENST00000621592 or ENST00000621592.7). Required when TARGET_TYPE is Human Gene or Human Transcript |
|
string |
Conditional: |
Version of the HGNC used for gene naming, indicated with the date of the HGNC reference (e.g., 2025-08-01). Required when TARGET_TYPE is Human Gene |
|
string |
Conditional: |
Free-text description of the target. Required when TARGET_TYPE is Other (e.g., microbiome species, synthetic spike-in) |
Enums
AssayType
Value |
Description |
|---|---|
in situ sequencing |
In situ sequencing assay type |
molecular barcoding |
Molecular barcoding assay type |
multi-omic sequencing |
Multi-omic sequencing assay type |
spot-based sequencing |
Spot-based sequencing assay type |
CaptureArea
Value |
Description |
|---|---|
A |
Capture area A (CytAssist slides with 11 mm Capture Area) |
A1 |
Capture area A1 (Visium slides v1 with 6.5 mm Capture Area, or CytAssist/Gateway slides with 6.5 mm Capture Area) |
B |
Capture area B (CytAssist slides with 11 mm Capture Area) |
B1 |
Capture area B1 (Visium slides v1 with 6.5 mm Capture Area) |
C1 |
Capture area C1 (Visium slides v1 with 6.5 mm Capture Area) |
D1 |
Capture area D1 (Visium slides v1 with 6.5 mm Capture Area, or CytAssist/Gateway slides with 6.5 mm Capture Area) |
CellSegmentedObjectType
Value |
Description |
|---|---|
cytoplasm |
Cytoplasm segmentation object type |
nucleus |
Nucleus segmentation object type |
Whole cell |
Whole cell segmentation object type |
DimensionalityReductionMethod
Value |
Description |
|---|---|
PCA |
Principal Component Analysis |
t-SNE |
t-Distributed Stochastic Neighbor Embedding |
UMAP |
Uniform Manifold Approximation and Projection |
other |
Other dimensionality reduction method |
DimensionalityReductionMethodLevel4
Value |
Description |
|---|---|
PCA |
Principal Component Analysis |
t-SNE |
t-Distributed Stochastic Neighbor Embedding |
UMAP |
Uniform Manifold Approximation and Projection |
other |
Other dimensionality reduction method |
FileFormatLevel1
Value |
Description |
|---|---|
tar |
TAR archive format |
tar.gz |
TAR GZIP compressed archive format |
zip |
ZIP compressed archive format |
FileFormatLevel4
Value |
Description |
|---|---|
h5ad |
AnnData HDF5 format (Python) |
rds |
RDS format (R) |
zarr |
Zarr format |
ImageType
Value |
Description |
|---|---|
DAPI |
DAPI (4’,6-diamidino-2-phenylindole) image type |
H&E |
Hematoxylin and Eosin image type |
MIF |
Multiplex Immunofluorescence image type |
Other |
Other image type |
ImageTypeLevel4
Value |
Description |
|---|---|
jpeg |
JPEG image format |
other |
Other image format |
png |
PNG image format |
tiff |
TIFF image format |
NormalisationMethod
Value |
Description |
|---|---|
CPM |
Counts Per Million normalization |
log normalization |
Log normalization |
SCTransform |
SCTransform normalization |
TPM |
Transcripts Per Million normalization |
other |
Other normalization method |
Platform
Value |
Description |
|---|---|
10x Genomics Visium |
10x Genomics Visium platform |
10x Genomics Visium HD |
10x Genomics Visium HD platform |
10x Genomics Xenium |
10x Genomics Xenium platform |
Nanostring CosMX |
Nanostring CosMX platform |
STOmics Stereo-CITE |
STOmics Stereo-CITE platform |
STOmics Stereo-seq |
STOmics Stereo-seq platform |
PlatformLevel3
Value |
Description |
|---|---|
10x Genomics Visium |
10x Genomics Visium platform |
10x Genomics Visium HD |
10x Genomics Visium HD platform |
10x Genomics Xenium |
10x Genomics Xenium platform |
DBiT-seq |
DBiT-seq platform |
Nanostring CosMX |
Nanostring CosMX platform |
SeqFISH |
SeqFISH platform |
STOmics Stereo-CITE |
STOmics Stereo-CITE platform |
STOmics Stereo-seq |
STOmics Stereo-seq platform |
QCSpatialUnit
Value |
Description |
|---|---|
100um area |
100 micrometer area spatial unit |
8um bin |
8 micrometer bin spatial unit |
cell |
Cell spatial unit |
spot |
Spot spatial unit |
SameSectionImagingModality
Value |
Description |
|---|---|
fluorescence |
Fluorescence imaging modality |
H&E |
Hematoxylin and Eosin imaging modality |
SequencingFileType
Value |
Description |
|---|---|
BAM |
BAM alignment file format |
FASTQ |
FASTQ sequencing file format |
SpatialAssayType
Value |
Description |
|---|---|
capture-based |
Capture-based spatial assay type |
In situ |
In situ spatial assay type |
TargetTypeEnum
Value |
Description |
|---|---|
Bacterial |
A probe targeting a bacterial gene or sequence |
Control Probe |
A control probe used for normalization or quality control |
Human Gene |
A probe targeting a human gene |
Human Protein |
A probe targeting a human protein |
Human Transcript |
A probe targeting a human transcript |
Other |
A probe targeting a target not covered by other categories |
Viral |
A probe targeting a viral gene or sequence |
ToolCompatibility
Value |
Description |
|---|---|
anndata |
AnnData library compatibility |
seurat |
Seurat library compatibility |
spatialdata |
SpatialData library compatibility |
TranscriptomeType
Value |
Description |
|---|---|
Protein coding |
Protein coding transcriptome type |
Targeted |
Targeted transcriptome type |
Whole transcriptome |
Whole transcriptome type |