One or more genome sequencing center reporting the variant
"0" is used for regions that do not correspond to a gene region or Ensembl ID "Unknown" is used for regions that do not correspond to a geneĮntrez gene ID (an integer). HUGO symbol for the gene (HUGO symbols are always in all caps). Note that the somatic (open-access) MAF structure is the same except for having the last six columns removed. The table below describes the columns in a protected MAF and their definitions. Set values to be blank in the following columns that may contain information about germline genotypes:.Remaining variants with dbSNP_RS = 'novel' or null are included in the Somatic MAF.Remaining variants with SOMATIC != null are included in the Somatic MAF.Remaining variants with GDC_FILTER = 'ndp', 'NonExonic', 'bitgt', 'gdc_pon' are removed.Remaining variants with MC3_Overlap = True are included in the Somatic MAF.Note that the FILTER != panel_of_normals value is only relevant for the variants generated from the MuTect2 pipeline. Remaining variants with FILTER != 'panel_of_normals' or PASS are removed.Remaining variants with GDC_Valid_Somatic = True are included in the Somatic MAF.Variants with Mutation_Status != 'Somatic' or GDC_FILTER = 'Gapfiller', 'ContEst', 'multiallelic', 'nonselectedaliquot', 'BCR_Duplicate' or 'BadSeq' are removed.Low quality variant filtering and germline masking:.Aliquot Selection: only one tumor-normal pair are selected for each tumor sample based on the plate number, sample type, analyte type and other features extracted from tumor TCGA aliquot barcode.
The process for modifying a protected MAF into a somatic MAF is as follows: If omission of true-positive somatic mutations is a concern, the GDC recommends using protected MAFs. Note: The criteria for allowing mutations into open-access are purposefully implemented to overcompensate and filter out germline variants. The GDC MAF file format is based on the TCGA Mutation Annotation Format specifications, with additional columns included. Somatic MAFs are publicly available and can be freely distributed within the boundaries of the GDC Data Access Policies. For tumor samples that contain variants from multiple combinations of tumor-normal aliquot pairs, only one pair is selected in the Somatic MAF based on their sample type. Somatic MAFs (*somatic.maf), which are also known as Masked Somatic Mutation files, are further processed to remove lower quality and potential germline variants. MAFs are produced by aggregating the GDC annotated VCF files generated from one pipeline for one project.Īnnotated VCF files often have variants reported on multiple transcripts whereas the MAF files generated from the VCFs (*protected.maf) only report the most critically affected one. One MAF file is produced per variant calling pipeline per GDC project. The GDC produces MAF files at two permission levels: protected and somatic (or open-access). MAF files are produced through the Somatic Aggregation Workflow. Mutation Annotation Format (MAF) is a tab-delimited text file with aggregated mutation information from VCF Files and are generated on a project-level.