DigitalPathology

HTAN Digital Pathology Data Model Schema for Phase 2

πŸ“₯ Download attributes as CSV

If submitting Digital Pathology files, here are the list of attributes you need to fill out:

DigitalPathologyData

Container for digital pathology imaging data

Core File Attributes

These attributes are inherited from CoreFileAttributes and apply to all file-based data.

Attribute

Type

Required

Description

HTAN_DATA_FILE_ID

string, pattern: ^(?=.{1,50}$)(HTA2[0-2][0-9])(0000|EXT[0-9]{1,18}|[0-9]{1,21})(D[0-9]{1,20})$

Yes

HTAN Data File ID (Primary Key)

HTAN_PARENT_ID

string, pattern: ^(?=.{1,50}$)(HTA2[0-2][0-9])(0000|EXT[0-9]{1,18}|[0-9]{1,21})([BD][0-9]{1,20})$

Yes

HTAN Parent ID - Foreign Key to parent entity (B for Biospecimen, D for data file). Must have B or D suffix. Supports HTA200-229 for phase 2.

Base Imaging Attributes

These attributes are inherited from BaseImagingAttributes.

Attribute

Type

Required

Description

CITATION_OR_DOI

string, pattern: ^(?:(?:https?)://)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:.\d{1,3}){3})(?!(?:169.254|192.168)(?:.\d{1,3}){2})(?!172.(?:1[6-9]|2\d|3[0-1])(?:.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-)[a-z\u00a1-\uffff0-9]+)(?:.(?:[a-z\u00a1-\uffff0-9]-)[a-z\u00a1-\uffff0-9]+)(?:.(?:[a-z\u00a1-\uffff]{2,})).?)(?::\d{2,5})?(?:[/?#]\S)?$

Yes

Raw Data Protocol or Digital Object Identifier Text; Publication and/or digital object identifier of the publication for open access studies. Must be a valid URL (http or https).

DE_IDENTIFICATION_METHOD_DESCRIPTION

string

No

Description of the process of removing potentially identifying data or data elements to render data into a form that does not identify individuals and where identification is not likely to take place.

DE_IDENTIFICATION_METHOD_TYPE

DeIdentificationMethodType

Yes

De-identification Method Type

DE_IDENTIFICATION_SOFTWARE

string

No

Software that was used to de-identify the images (if used)

DE_IDENTIFIED

boolean

Yes

Confirm that any HIPAA identifiers are redacted, masked, or not present in the slide label and that any dates or strings present in internal metadata does not represent PHI

EXPERIMENTAL_STRATEGY_AND_DATA_SUBTYPES

ExperimentalStrategyAndDataSubtypes

Yes

What is the experimental strategy used for the study (or what type of data subtypes exist in the study)? Per RFC, the only valid value for imaging data types is β€œPathological”.

HAS_SLIDE_LABEL

boolean

Yes

Does the image contain a slide label

IMAGE_MODALITY

ImageModality

Yes

The method in which the images are generated.

IMAGING_EQUIPMENT_MANUFACTURER

string

Yes

Producer of the imaging equipment that was used to generate the digital image

IMAGING_EQUIPMENT_MODEL

string

No

The words used to describe the specific model of the instrument used to carry out an imaging experiment

IMAGING_PROTOCOL

string, pattern: ^(?:(?:https?)://)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:.\d{1,3}){3})(?!(?:169.254|192.168)(?:.\d{1,3}){2})(?!172.(?:1[6-9]|2\d|3[0-1])(?:.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-)[a-z\u00a1-\uffff0-9]+)(?:.(?:[a-z\u00a1-\uffff0-9]-)[a-z\u00a1-\uffff0-9]+)(?:.(?:[a-z\u00a1-\uffff]{2,})).?)(?::\d{2,5})?(?:[/?#]\S)?$

No

A rule which guides how an activity should be performed. Protocols.io ID or DOI link to a free/open protocol resource describing in detail the assay protocol. Must be a valid URL (http or https).

IMAGING_SOFTWARE

string

No

The name of the software package that was used to capture, generate, and process the image

IMMERSION

ImmersionMedium

No

Immersion medium. Each objective is designed for a specific immersion medium, which is marked on the objective. The main types of immersion media are air, oil, and water.

LENS_NUMERICAL_APERTURE

float

No

The numerical aperture of the lens. Floating point value > 0.

LICENSE

License

Yes

Official or legal permission to do or own a specified thing. Per RFC, the only valid value is β€œCC BY 4.0”.

NOMINAL_MAGNIFICATION

integer

Yes

The magnification of the lens as specified by the manufacturer - i.e. β€˜60’ is a 60X lens. Integer value >= 0 (no units)

OBJECTIVE

string

Yes

The manufacturer and or model number for the optical element that gathers light from an object being observed and focuses the light rays from it to produce a real image of the object

PASSED_QC

boolean

Yes

Confirm that the image has passed internal quality control checks

QC_COMMENT

string

Yes

Comments related to quality control checks

SLIDE_LABEL_REDACTED

boolean

No

Have identifiers including dates been masked in the label image

SPECIES

Species

Yes

NCBI Taxonomy ID. Per RFC, the only valid value is β€œ9606 (Homo sapiens)”.

STAINING_METHOD

StainingMethod

Yes

Any of the various methods that use a dye, reagent, or other material for producing coloration in tissues or microorganisms for microscopic examination

Module-Specific Attributes

Attribute

Type

Required

Description

ANNOTATION_TYPE

AnnotationType

Required IF HAS_ANNOTATIONS = true

What types of annotation are contained in the image

FILENAME

string, pattern: ^.+.(ome.(tif|tiff|tf2|tf8|btf)|tiff?|qptiff|svs)$

Yes

Name of the file. Must end with an extension matching the FILE_FORMAT (.ome.tif, .ome.tiff, .ome.tf2, .ome.tf8, .ome.btf for ome-tiff; .tiff or .tif for tiff; .qptiff for qptiff; .svs for svs)

FILE_FORMAT

string, pattern: ^(ome-tiff|tiff|qptiff|svs)$

Yes

Format of the imaging file. Must be compatible with Bio-Formats or OpenSlide Python. OME-TIFF files use extensions .ome.tif, .ome.tiff, .ome.tf2, .ome.tf8, or .ome.btf

HAS_ANNOTATIONS

boolean

Yes

Does the image contain annotations

Enums

AnnotationType

Value

Description

Artifact

Artifact annotation

Cell

Cell annotation

Nucleus

Nucleus annotation

ROI

Region of Interest annotation

Tissue

Tissue annotation

DeIdentificationMethodType

Value

Description

Automatic

Automatic de-identification method

Manual

Manual de-identification method

Not Applicable

De-identification not applicable

Semiautomatic

Semi-automatic de-identification method

ExperimentalStrategyAndDataSubtypes

Value

Description

Pathological

Pathological experimental strategy and data subtype

ImageModality

Value

Description

SM

Slide Microscopy

ImmersionMedium

Value

Description

Air

Air immersion medium

Glycerol

Glycerol immersion medium

Oil

Oil immersion medium

Other

Other immersion medium

Water

Water immersion medium

License

Value

Description

CC BY 4.0

Creative Commons Attribution 4.0 International License

Species

Value

Description

9606 (Homo sapiens)

NCBI Taxonomy ID for Homo sapiens

StainingMethod

Value

Description

CODEX

CODEX staining method

CyCIF

Cyclic Immunofluorescence staining method

ExSeq

Expansion Sequencing staining method

GeoMX-DSP

GeoMX Digital Spatial Profiling staining method

H&E

Hematoxylin and Eosin staining method

IHC

Immunohistochemistry staining method

IMC

Imaging Mass Cytometry staining method

MERFISH

Multiplexed Error-Robust Fluorescence In Situ Hybridization staining method

MIBI

Multiplexed Ion Beam Imaging staining method

MxIF

Multiplexed Immunofluorescence staining method

Not Applicable

Staining not applicable

SABER

Signal Amplification By Exchange Reaction staining method

mIHC

Multiplexed Immunohistochemistry staining method

t-CyCIF

Tissue Cyclic Immunofluorescence staining method