Characterising the loss-of-function impact of 5′ untranslated region variants in 15,708 individuals

We are excited to share our work using gnomAD to characterise 5’UTR variants that create or disrupt upstream open reading frames (uORFs) and explore their role in disease: https://www.biorxiv.org/content/10.1101/543504v2

We show that these variants are under strong negative selection (indicative of being deleterious), and identify a subset (that form ORFs overlapping the coding sequence) with signals of selection equivalent to coding missense variants.

We find increased signals of selection when these variants occur in the 5’UTRs of curated haploinsufficient, LoF intolerant and known dominant LoF developmental disease genes, supporting a loss-of-function effect of these variants on translation.

We identify specific genes where uORF perturbation appears to be an important disease mechanism (e.g. NF1 and IRF6), and report a novel uORF frameshift variant in NF2 that segregates with disease in two families with neurofibromatosis.

Our approach illustrates the power of using large population databases and grouping non-coding bases by functional effect, to identify subsets of variants that are highly deleterious. Although the strength of selection at the level of UTRs is equivalent to synonymous variants, we see a much stronger signal at these specific uORF-perturbing sites.

Finally, we have created a VEP plugin that annotates 5’UTR variants for uORF-perturbing effects:
https://github.com/ImperialCardioGenetics/uORFs/tree/master/5primeUTRannotator

post by Nicky Whiffin