nsaradical.blogg.se - Bioedit align more than two sequences

While the sample NA12878 was sequenced at a depth of 300x, we will only be using a subset of the dataset aligning to chromosome 20. Father-mother-child ‘trios’ are often sequenced to utilize genetic links between family members.

The source DNA, known as NA12878, was taken from a single person: the daughter in a father-mother-child ‘trio’ (she is also mother to 11 children of her own).

Detailed information on the data and methods have been published, and the project information, data and analyses are available on Github (). Additionally, the DNA is available for validating new sequencing technologies / analysis methods, and ~8300 vials of DNA from a homogenized large batch of the sample cells is available for distribution to other labs. The dataset acts as a ‘truth set’ for variation in the human genome to be used as a genotype reference set to compare variant calls against. To minimize bias from any specific DNA sequencing method, the dataset was sequenced separately by 14 different sequencing experiments and 5 different platforms. The human WGS dataset we will be using in class was completed by GIAB and is “essentially the first complete human genome to have been extensively sequenced and re-sequenced by multiple techniques, with the results weighted and analyzed to eliminate as much variation and error as possible”. GIAB was initiated in 2011 by the National Institute of Standards and Technology “to develop the technical infrastructure (reference standards, reference methods, and reference data) to enable translation of whole human genome sequencing to clinical practice”. To explore the variant calling workflow, we will be using a subset of a human WGS dataset attained from the Genome in a Bottle Consortium (GIAB). $ cp /n/groups/hbctraining/ngs-data-analysis-longcourse/var-calling/reference_data/chr20.fa reference_data/ $ cp /n/groups/hbctraining/ngs-data-analysis-longcourse/var-calling/raw_fastq/ *fq raw_data/