# SpatialOmics HTAN Spatial Omics Data Model Schema for Phase 2 - All Levels ## CoreFileAttributes **Universal attributes that apply to all file-based data in HTAN** | Attribute | Type | Required | Description | |-----------|------|----------|-------------| | `FILENAME` | string | Yes | Name of the file | | `FILE_FORMAT` | string | Yes | Format of the file (e.g., fastq, bam, vcf, h5ad) | | `HTAN_DATA_FILE_ID` | string | Yes | HTAN Data File ID (Primary Key) | | `HTAN_PARENT_ID` | string | Yes | HTAN Parent ID(s) - Foreign key(s) to parent entity (B for Biospecimen, D for data file). One or more IDs; for aggregated files provide multiple. Each ID must have B or D suffix. Supports HTA200-229 for phase 2. | ## SpatialLevel1 **Level 1 raw spatial data bundle (optional) - Contains raw sequencing data, images, and registration files** | Attribute | Type | Required | Description | |-----------|------|----------|-------------| | `FILE_FORMAT` | [FileFormatLevel1](#fileformatlevel1) | Yes | High-level package format of the bundle | | `FILENAME` | string | Yes | Name of the file. Must end with an extension matching the FILE_FORMAT (.tar for tar; .tar.gz for tar.gz; .zip for zip) | | `PLATFORM` | [Platform](#platform) | Yes | Name of the platform used to generate the data | | `ASSAY_TYPE` | [AssayType](#assaytype) | Yes | Broad assay class (drives downstream conditionals) | | `BUNDLE_CONTENTS` | string | Yes | List of expected files or folders in this bundle (relative paths within the archive) | | `HAS_SEQUENCING` | boolean | No | If raw/aligned sequencing data is included | | `SEQUENCING_FILE_TYPE` | [SequencingFileType](#sequencingfiletype) | Conditional: SEQUENCING_FILE_TYPE is required when HAS_SEQUENCING is true | Sequencing file type | | `HAS_IMAGES` | boolean | Yes | Whether any image files (e.g., TIFFs) are included | | `IMAGE_TYPES` | [ImageType](#imagetype) | Conditional: IMAGE_TYPES is required when HAS_IMAGES is true | Types of images provided | | `HAS_PROBE_SET` | boolean | Conditional: HAS_PROBE_SET is required when ASSAY_TYPE is molecular barcoding | Whether a targeted probe/gene panel is included | | `HAS_REGISTRATION_FILES` | boolean | Yes | Whether any spatial registration transform files are included | | `HTAN_DATA_FILE_ID` | string | Yes | HTAN Data File ID (Primary Key) | | `HTAN_PARENT_ID` | string | Yes | HTAN Parent ID(s) - Foreign key(s) to parent entity (B for Biospecimen, D for data file). One or more IDs; for aggregated files provide multiple. Each ID must have B or D suffix. Supports HTA200-229 for phase 2. | ## SpatialLevel3 **Level 3 processed spatial assay output bundle - Contains platform-specific output files, segmentation, matrices, and QC metrics** | Attribute | Type | Required | Description | |-----------|------|----------|-------------| | `PLATFORM` | [PlatformLevel3](#platformlevel3) | Yes | Name of the platform used to generate the data | | `SPATIAL_ASSAY_TYPE` | [SpatialAssayType](#spatialassaytype) | No | Type of spatial assay (in situ or capture-based) | | `ASSAY_CHEMISTRY_VERSION` | string | Yes | Assay chemistry version (e.g., v1, v2) | | `SOFTWARE_AND_VERSION` | string | No | Software/tools used for processing | | `PROTOCOL_LINK` | string | No | URL to protocol documentation | | `RNA_MEASURED` | boolean | Yes | Whether RNA was measured | | `PROTEIN_MEASURED` | boolean | Yes | Whether protein was measured | | `TRANSCRIPTOME_TYPE` | [TranscriptomeType](#transcriptometype) | Conditional: TRANSCRIPTOME_TYPE is required when RNA_MEASURED is true | Molecular targets measured using panels | | `PANEL_SIZE_TOTAL_TARGETS` | integer | Yes | Total number of targets in the panel | | `PANEL_NAME` | string | Conditional: PANEL_NAME is required when TRANSCRIPTOME_TYPE is Targeted OR PROTEIN_MEASURED is true | Name of the panel used in this experiment | | `HTAN_PANEL_ID` | string | Conditional: HTAN_PANEL_ID is required when TRANSCRIPTOME_TYPE is Targeted OR PROTEIN_MEASURED is true | Unique HTAN identifier for the panel used in this experiment. Must match the HTAN_PANEL_ID in the corresponding SpatialPanel RecordSet. Follows the HTAN identifier format with a P-prefix segment (e.g., HTA201_1_P1). | | `SAME_SECTION_IMAGING_ID` | string | No | HTAN ID of data file that represents same section imaging | | `SAME_SECTION_IMAGING_MODALITY` | [SameSectionImagingModality](#samesectionimagingmodality) | No | Was same section imaging performed | | `SAME_SECTION_IMAGING_CHANNELS` | string | Conditional: SAME_SECTION_IMAGING_CHANNELS is required when SAME_SECTION_IMAGING_MODALITY is fluorescence | Antigens targeted in same section fluorescence imaging | | `REGION_AREA` | float | Yes | Capture area in µm² | | `BUNDLE_CONTENTS` | string | Yes | List of expected files or folders in this bundle (relative paths within the archive) | | `PORTAL_PREVIEW_FILE` | string | No | Relative path of HTML preview in bundle if present | | `HAS_CELL_SEGMENTATION` | boolean | Yes | Indicates presence of cell segmentation data | | `CELL_SEGMENTATION_METHOD` | string | Conditional: CELL_SEGMENTATION_METHOD is required when HAS_CELL_SEGMENTATION is true | Description of segmentation method | | `CELL_SEGMENTED_OBJECT_TYPE` | [CellSegmentedObjectType](#cellsegmentedobjecttype) | Conditional: CELL_SEGMENTED_OBJECT_TYPE is required when HAS_CELL_SEGMENTATION is true | Level of segmentation | | `NUMBER_OF_SEGMENTED_CELLS` | integer | Conditional: NUMBER_OF_SEGMENTED_CELLS is required when HAS_CELL_SEGMENTATION is true | Total number of segmented cells | | `HAS_DIMENSIONALITY_REDUCTION` | boolean | No | Indicates presence of dimensionally reduced data | | `DIMENSIONALITY_REDUCTION_METHOD` | [DimensionalityReductionMethod](#dimensionalityreductionmethod) | Conditional: DIMENSIONALITY_REDUCTION_METHOD is required when HAS_DIMENSIONALITY_REDUCTION is true | Method used for dimensionality reduction | | `HAS_CLUSTERING` | boolean | Yes | Indicates if clustering was performed | | `CLUSTERING_METHOD` | string | Conditional: CLUSTERING_METHOD is required when HAS_CLUSTERING is true | Method used to define clusters | | `NUMBER_OF_CLUSTERS` | integer | Conditional: NUMBER_OF_CLUSTERS is required when HAS_CLUSTERING is true | Number of clusters identified | | `SLIDE_SERIAL_NUMBER` | string | Conditional: SLIDE_SERIAL_NUMBER is required when PLATFORM is Visium or Visium HD or Xenium | Slide serial number | | `CAPTURE_AREA` | [CaptureArea](#capturearea) | Conditional: CAPTURE_AREA is required when PLATFORM is Visium or Visium HD | Area (or Capture Area) - One of the either four or two active regions where tissue can be placed on a Visium slide | | `RUN_ID` | string | No | A unique identifier for this individual run (typically associated with a single slide) of the spatial transcriptomic processing workflow | | `CYTASSIST_USED` | boolean | Conditional: CYTASSIST_USED is required when PLATFORM is Visium or Visium HD | Whether CytAssist was used | | `GENOMIC_REFERENCE` | string | Conditional: GENOMIC_REFERENCE is required when PLATFORM is Visium or Visium HD | Reference genome used | | `SEQUENCING_INSTRUMENT` | string | Conditional: SEQUENCING_INSTRUMENT is required when SPATIAL_ASSAY_TYPE is capture-based | Sequencer used | | `SEQUENCING_CONFIGURATION` | string | Conditional: SEQUENCING_CONFIGURATION is required when SPATIAL_ASSAY_TYPE is capture-based | Read and index setup | | `SEQUENCING_DEPTH` | string | Conditional: SEQUENCING_DEPTH is required when SPATIAL_ASSAY_TYPE is capture-based | Sequencing depth | | `QC_SPATIAL_UNIT` | [QCSpatialUnit](#qcspatialunit) | Yes | Type of spatial unit | | `QC_FEATURE_NUMBER` | integer | Yes | Features (e.g. spots or bins) under tissue | | `QC_MEAN_READS_PER_FEATURE` | float | Yes | Mean reads per feature | | `QC_TOTAL_GENES_DETECTED` | integer | Yes | Total genes detected | | `QC_TOTAL_NUMBER_OF_READS` | integer | Yes | Total number of reads | | `FILENAME` | string | Yes | Name of the bundle file. Must end with .tar.gz or .gz | | `FILE_FORMAT` | string | Yes | Format of the bundle file (tar.gz or gz) | | `HTAN_DATA_FILE_ID` | string | Yes | HTAN Data File ID (Primary Key) | | `HTAN_PARENT_ID` | string | Yes | HTAN Parent ID(s) - Foreign key(s) to parent entity (B for Biospecimen, D for data file). One or more IDs; for aggregated files provide multiple. Each ID must have B or D suffix. Supports HTA200-229 for phase 2. | ## SpatialLevel4 **Level 4 interoperable spatial omics file (optional) - Harmonized h5ad, RDS, or Zarr file for downstream analysis** | Attribute | Type | Required | Description | |-----------|------|----------|-------------| | `FILE_FORMAT` | [FileFormatLevel4](#fileformatlevel4) | Yes | File format of the data file | | `FILENAME` | string | Yes | Name of the file. Must end with an extension matching the FILE_FORMAT (.h5ad for h5ad; .rds for rds; .zarr for zarr) | | `TOOL_COMPATIBILITY` | [ToolCompatibility](#toolcompatibility) | No | Tools or libraries compatible with this file | | `NUMBER_OF_FEATURES` | integer | Yes | Number of features (e.g. transcripts) | | `NUMBER_OF_OBJECTS` | integer | Yes | Number of objects (e.g. cells) | | `HAS_DIMENSIONALITY_REDUCTION` | boolean | Yes | Indicates presence of dimensionally reduced data | | `DIMENSIONALITY_REDUCTION_METHOD` | [DimensionalityReductionMethodLevel4](#dimensionalityreductionmethodlevel4) | Conditional: DIMENSIONALITY_REDUCTION_METHOD is required when HAS_DIMENSIONALITY_REDUCTION is true | Method used for dimensionality reduction | | `HAS_CLUSTERING` | boolean | Yes | Indicates if clustering was performed | | `CLUSTERING_METHOD` | string | Conditional: CLUSTERING_METHOD is required when HAS_CLUSTERING is true | Method used to define clusters | | `NUMBER_OF_CLUSTERS` | integer | Conditional: NUMBER_OF_CLUSTERS is required when HAS_CLUSTERING is true | Number of clusters identified | | `HAS_CELL_TYPE_CALLING` | boolean | Yes | Indicates presence of cell type annotations | | `CELL_TYPE_CALLING_METHOD` | string | Conditional: CELL_TYPE_CALLING_METHOD is required when HAS_CELL_TYPE_CALLING is true | Method used for cell type annotation | | `CELL_TYPES` | string | Conditional: CELL_TYPES is required when HAS_CELL_TYPE_CALLING is true | List of cell types present in the data | | `HAS_NORMALISED_ARRAY` | boolean | Yes | Indicates presence of normalized array | | `NORMALISATION_METHOD` | [NormalisationMethod](#normalisationmethod) | Conditional: NORMALISATION_METHOD is required when HAS_NORMALISED_ARRAY is true | Method used for normalizing the array data | | `HAS_RAW_ARRAY` | boolean | Yes | Indicates presence of raw expression array | | `HAS_IMAGE` | boolean | Yes | Indicates presence of associated image data | | `IMAGE_TYPE` | [ImageTypeLevel4](#imagetypelevel4) | Conditional: IMAGE_TYPE is required when HAS_IMAGE is true | Type of image associated with the data file | | `HTAN_DATA_FILE_ID` | string | Yes | HTAN Data File ID (Primary Key) | | `HTAN_PARENT_ID` | string | Yes | HTAN Parent ID(s) - Foreign key(s) to parent entity (B for Biospecimen, D for data file). One or more IDs; for aggregated files provide multiple. Each ID must have B or D suffix. Supports HTA200-229 for phase 2. | ## SpatialPanel **Spatial omics panel information for targeted sequencing or protein panels** | Attribute | Type | Required | Description | |-----------|------|----------|-------------| | `HTAN_PANEL_ID` | string | Yes | Unique identifier for the panel | | `TARGET_TYPE` | [TargetTypeEnum](#targettypeenum) | Yes | Type of probe target. Determines which identifier fields are required. | | `TARGET_NAME` | string | Yes | Name of the probe target. For human genes use the HGNC-approved gene symbol (e.g., MYC, PIK3CA); for all other target types use the most appropriate available name (e.g., HPV16-E6 for a viral target) | | `ENSEMBL_ID` | string | Conditional: | Stable Ensembl identifier for the target. Use ENSG-prefixed IDs when TARGET_TYPE is Human Gene (e.g., ENSG00000136997 or ENSG00000136997.20 for MYC); use ENST-prefixed IDs when TARGET_TYPE is Human Transcript (e.g., ENST00000621592 or ENST00000621592.7). Required when TARGET_TYPE is Human Gene or Human Transcript | | `HGNC_VERSION` | string | Conditional: | Version of the HGNC used for gene naming, indicated with the date of the HGNC reference (e.g., 2025-08-01). Required when TARGET_TYPE is Human Gene | | `OTHER_TARGET_DESCRIPTION` | string | Conditional: | Free-text description of the target. Required when TARGET_TYPE is Other (e.g., microbiome species, synthetic spike-in) | ## Enums ### AssayType | Value | Description | |-------|-------------| | in situ sequencing | In situ sequencing assay type | | molecular barcoding | Molecular barcoding assay type | | multi-omic sequencing | Multi-omic sequencing assay type | | spot-based sequencing | Spot-based sequencing assay type | ### CaptureArea | Value | Description | |-------|-------------| | A | Capture area A (CytAssist slides with 11 mm Capture Area) | | A1 | Capture area A1 (Visium slides v1 with 6.5 mm Capture Area, or CytAssist/Gateway slides with 6.5 mm Capture Area) | | B | Capture area B (CytAssist slides with 11 mm Capture Area) | | B1 | Capture area B1 (Visium slides v1 with 6.5 mm Capture Area) | | C1 | Capture area C1 (Visium slides v1 with 6.5 mm Capture Area) | | D1 | Capture area D1 (Visium slides v1 with 6.5 mm Capture Area, or CytAssist/Gateway slides with 6.5 mm Capture Area) | ### CellSegmentedObjectType | Value | Description | |-------|-------------| | cytoplasm | Cytoplasm segmentation object type | | nucleus | Nucleus segmentation object type | | Whole cell | Whole cell segmentation object type | ### DimensionalityReductionMethod | Value | Description | |-------|-------------| | PCA | Principal Component Analysis | | t-SNE | t-Distributed Stochastic Neighbor Embedding | | UMAP | Uniform Manifold Approximation and Projection | | other | Other dimensionality reduction method | ### DimensionalityReductionMethodLevel4 | Value | Description | |-------|-------------| | PCA | Principal Component Analysis | | t-SNE | t-Distributed Stochastic Neighbor Embedding | | UMAP | Uniform Manifold Approximation and Projection | | other | Other dimensionality reduction method | ### FileFormatLevel1 | Value | Description | |-------|-------------| | tar | TAR archive format | | tar.gz | TAR GZIP compressed archive format | | zip | ZIP compressed archive format | ### FileFormatLevel4 | Value | Description | |-------|-------------| | h5ad | AnnData HDF5 format (Python) | | rds | RDS format (R) | | zarr | Zarr format | ### ImageType | Value | Description | |-------|-------------| | DAPI | DAPI (4',6-diamidino-2-phenylindole) image type | | H&E | Hematoxylin and Eosin image type | | MIF | Multiplex Immunofluorescence image type | | Other | Other image type | ### ImageTypeLevel4 | Value | Description | |-------|-------------| | jpeg | JPEG image format | | other | Other image format | | png | PNG image format | | tiff | TIFF image format | ### NormalisationMethod | Value | Description | |-------|-------------| | CPM | Counts Per Million normalization | | log normalization | Log normalization | | SCTransform | SCTransform normalization | | TPM | Transcripts Per Million normalization | | other | Other normalization method | ### Platform | Value | Description | |-------|-------------| | 10x Genomics Visium | 10x Genomics Visium platform | | 10x Genomics Visium HD | 10x Genomics Visium HD platform | | 10x Genomics Xenium | 10x Genomics Xenium platform | | Nanostring CosMX | Nanostring CosMX platform | | STOmics Stereo-CITE | STOmics Stereo-CITE platform | | STOmics Stereo-seq | STOmics Stereo-seq platform | ### PlatformLevel3 | Value | Description | |-------|-------------| | 10x Genomics Visium | 10x Genomics Visium platform | | 10x Genomics Visium HD | 10x Genomics Visium HD platform | | 10x Genomics Xenium | 10x Genomics Xenium platform | | DBiT-seq | DBiT-seq platform | | Nanostring CosMX | Nanostring CosMX platform | | SeqFISH | SeqFISH platform | | STOmics Stereo-CITE | STOmics Stereo-CITE platform | | STOmics Stereo-seq | STOmics Stereo-seq platform | ### QCSpatialUnit | Value | Description | |-------|-------------| | 100um area | 100 micrometer area spatial unit | | 8um bin | 8 micrometer bin spatial unit | | cell | Cell spatial unit | | spot | Spot spatial unit | ### SameSectionImagingModality | Value | Description | |-------|-------------| | fluorescence | Fluorescence imaging modality | | H&E | Hematoxylin and Eosin imaging modality | ### SequencingFileType | Value | Description | |-------|-------------| | BAM | BAM alignment file format | | FASTQ | FASTQ sequencing file format | ### SpatialAssayType | Value | Description | |-------|-------------| | capture-based | Capture-based spatial assay type | | In situ | In situ spatial assay type | ### TargetTypeEnum | Value | Description | |-------|-------------| | Bacterial | A probe targeting a bacterial gene or sequence | | Control Probe | A control probe used for normalization or quality control | | Human Gene | A probe targeting a human gene | | Human Protein | A probe targeting a human protein | | Human Transcript | A probe targeting a human transcript | | Other | A probe targeting a target not covered by other categories | | Viral | A probe targeting a viral gene or sequence | ### ToolCompatibility | Value | Description | |-------|-------------| | anndata | AnnData library compatibility | | seurat | Seurat library compatibility | | spatialdata | SpatialData library compatibility | ### TranscriptomeType | Value | Description | |-------|-------------| | Protein coding | Protein coding transcriptome type | | Targeted | Targeted transcriptome type | | Whole transcriptome | Whole transcriptome type |