Sequence Variation Of Copy Number Variable Regions In The Human Genome
thesisposted on 23.07.2020, 10:29 authored by Hasret Ozturk Pala
Accurate genotyping of sequence variation in repeated and copy number variable regions of genomes remains challenging, because of the problems inherent in mapping short sequence reads to a reference genome. A computational pipeline was designed to attempt to resolve the short-read mapping ambiguity for duplicated DNA regions mapping short-reads to reference sequence comprising of a single copy of a region repeated in the reference genome. The RHCE/RHD, the beta-defensin and the low-affinity FC gamma receptor repeat regions were chosen as initial analyses. The reliability of mapping to a reduced reference was assessed by comparing sequence read depth and known copy number across a subset of samples from the 1000 Genomes Project and a three-generation pedigree from Illumina’s Platinum Genomes Project.
The accuracy of variant calling was analysed by comparing variant calls at the inhibitory low-affinity Fc gamma receptor gene FCGR2B with 1000 Genomes variant calls and the variant calls generated by paralogue-specific long PCR and Ion Torrent sequencing.
Both the reduced reference read approach and the 1000 genomes variant calls did not call all variants found by the Ion Torrent sequencing variant calls, with the 1000 Genomes variant calls significantly underestimating and mis-genotyping samples. Several variants in FCGR2B were found to be in strong LD with variants previously associated with complex traits by genome wide association studies (GWAS). However, these GWAS variants were in weak linkage disequilibrium with a gene conversion variant upstream of FCGR2B.
Given that a coding variant of FCGR2B (rs1050501) has been previously associated with protection against severe malaria and susceptibility to systemic lupus erythematosus, the variation data was interrogated for signature of selection across global populations, and the genetic diversity of this locus revealed high haplotype diversity with 52 haplotypes. However, the population genetic statistics showed no evidence of natural selection at FCGR2B.