In this section, we ran summary-data–based Mendelian randomization (SMR) on the UKB vitamin D GWAS results (with no correction for BMI). The aim was to prioritize putative causal genes with expression underlying vitamin D levels. The associations identified with SMR (between gene expression and the trait of interest - in this case vitamin D) can result from one of three scenarios:

  • Causality: Where a genetic variant (Z) modulates gene expression (X), which in turn affect the trait (Y), i.e. Z–>X–>Y
  • Pleiotropy: Where Z modulates both X and Y (Z–>X + Z–>Y)
  • Linkage: Where two variants in LD (Z1 and Z2) are independently associated with X and Y (Z1–>X, Z2–>Y)

Zhu et al. 2016 proposed a method (the heterogeneity in dependent instruments [HEIDI] test) to detect cases where the association arises due to linkage - the scenario that is of less biological interest.

Here, we ran SMR + HEIDI to identify causal or pleiotropic associations between genes and vitamin D. First, we applied these methods to the largest eQTL dataset available to us (eQTLGen). Second, we validated our results using other eQTL data sets generated in relevant tissues.

Below we present results from these analyses.



# Number of variants available in dataset used as LD reference
referenceSNPs_N=length(count.fields("input/UKB_v3EURu_impQC/v3EURu_impQC.bim", sep="\t"))

Methods

We ran SMR with the following input parameters:

  • bfile: A reference dataset for LD estimation. We used the files in Q0286/UKBiobank/v3EUR_impQC/. These include 44741804 autossomal SNPs and are restricted to unrelated individuals of European ancestry

  • gwas-summary: UKB GWAS summary statistics obtained with fastGWA (in .ma format)

  • extract-snp: We restricted these analyses to SNPs that were not in the MHC region (Chr6:26056121 – Chr6:33377699) and with MAF>0.01

  • beqtl-summary: eQTL summary-level data, i.e. association results between genetic variants and gene expression levels or methylation levels. Files are in BESD format, i.e. a file set containing the following three files: an .epi file (expression probe information), an .esi file (expression SNP information), and a .besd file (binary with expression summary data). We used the following eQTL datasets:

    • 32K eQTLGen cis-eQTL: Blood eQTLs (N=31,684)
    • GTExV7 eQTL:
      1. Whole blood (N=369)
      2. Sun-exposed skin (N=369)
      3. Non-sun-exposed skin (N=335)
      4. Liver (N=153)
      5. Sixteen brain regions (80<N<154)
    • PsychENCODE: eQTLs in prefrontal cortex (N=1,866). We downloaded the data from the SMR Data Resource webpage. The file set that we downloaded (PsychENCODE_cis_eqtl_HCP100_summary.tar.gz (hg19)) included results for SNPs in a one-Mbp window around each gene and corrected for 100 hidden covariate (HCP) factors.
    • O’Brien et al. 2018: Fetal brain eQTLs (N=120)

All other settings were left as default:

  • –peqtl-smr: Maximum P-value for eQTLs used (5x10-8)
  • –diff-freq: Maximum allele frequency difference between datasets (0.2)
# #######################
# Prepare list of SNPs to include in the analyses
# #######################

geno=$medici/UKBiobank/v3EUR_impQC/ukbEUR_imp_chr{TASK_ID}_v3_impQC
filters=maf0.01_noMHC
genoDir=$WD/input/UKB_v3EUR_impQC
snpList=$genoDir/UKB_EUR_impQC_$filters

# Get list of SNPs to include in the analysis
# Make file with MHC region to be excluded from analysis
echo "6 26056121 33377699 mhc" > $WD/input/mhc_range

# Extract common (MAF>0.01) variants that are not in MHC region
cd $genoDir
tmp_command="plink --bfile $geno \
                   --maf 0.01 \
                   --exclude range $WD/input/mhc_range \
                   --write-snplist \
                   --out ${snpList}_chr{TASK_ID}"
qsubshcom "$tmp_command" 1 30G filter.$filters 10:00:00 "-array=1-22"
cat ${snpList}_chr*.snplist > $snpList.snplist
rm UKB_EUR_impQC_maf0.01_noMHC_chr*

# ------ Copy files from Delta to Tinaroo (run this step on delta)
# GWAS
cd $WD
scp results/fastGWA/vitD_fastGWA.ma $rooID:$WD/results/fastGWA
# SNPs to include in analysis
scp input/UKB_v3EUR_impQC/UKB_EUR_impQC_maf0.01_noMHC.snplist $rooID:$WD/input/UKB_v3EUR_impQC



# #######################
# Prepare QTL datasets
# #######################
# Common settings
qtlDir=$WD/input/qtls
results=$WD/results/fastGWA
tmpDir=$results/tmpDir

# ---- Create links to eQTL datasets
  cd $qtlDir
  
  # eQTLgen
  ln -s $medici/uqtqi2/eQTLGen/rug_32k_cis* $qtlDir
  for i in `ls $qtlDir/rug_32k_cis*`; do mv $i `echo $i | sed 's/rug_32k_cis/eQTLgen/g'`; done
  
  # GTEx v7
  ln -s $medici/GTExV7/Summary/besd/* $qtlDir
  
  # PsychENCODE
  #wget http://cnsgenomics.com/data/SMR/PsychENCODE_cis_eqtl_HCP100_summary.tar.gz
  #tar -xzf PsychENCODE_cis_eqtl_HCP100_summary.tar.gz
  ln -s $qtlDir/PsychENCODE_cis_eqtl_HCP100_summary/Gandal_PsychENCODE_eQTL_HCP100+gPCs20_QTLtools* $qtlDir
  for i in `ls $qtlDir/Gandal_PsychENCODE_eQTL_HCP100+gPCs20_QTLtools*`; do mv $i `echo $i | sed 's/Gandal_PsychENCODE_eQTL_HCP100+gPCs20_QTLtools/PsychENCODE/g'`; done
  
  # Fetal brain (O’Brien et al. 2018)
  ln -s $medici/uqtqi2/smr_data/fetal/Brien_eQTL* $qtlDir
  
# ---- Create links to mQTL datasets
  # McRae et al. 2018 (Whole blood)
  #wget http://cnsgenomics.com/data/SMR/LBC_BSGS_meta_lite.tar.gz
  #tar -xzf LBC_BSGS_meta_lite.tar.gz
  ln -s $qtlDir/LBC_BSGS_meta_lite/bl_mqtl_lite_chr* $qtlDir
  for i in `ls $qtlDir/bl_mqtl_lite_chr*`; do mv $i `echo $i | sed 's/bl_mqtl_lite_chr/lbc_bl_mqtl_chr/g'`; done
  
  # Qi et al. 2018 (Brain mQTLs)
  #wget http://cnsgenomics.com/data/SMR/Brain-mMeta.tar.gz
  #tar -xzf Brain-mMeta.tar.gz
  ln -s $qtlDir/Brain-mMeta/Brain-mMeta* $qtlDir
  
  # Hannon et al. 2018 (Fetal brain mQTLs)
  #wget http://cnsgenomics.com/data/SMR/Hannon_FetalBrain.zip
  #unzip Hannon_FetalBrain.zip
  mv "Hannon et al. FetalBrain" "Hannon_et_al._FetalBrain"
  ln -s $qtlDir/Hannon_et_al._FetalBrain/FB_Brain_2* $qtlDir
    # Get rs IDs  (Run this step in R)
    # -------------------------------------------------------------------------------------
    qsub -I -A UQ-IMB-CNSG -X -l nodes=1:ppn=5,mem=30GB,walltime=6:00:00
    R
    library(biomaRt)
    setwd("$WD/input/qtls")
    df=read.table("FB_Brain_2.esi", col.names=c("CHR","CHR_BP","V3","BP","A1","A2"))
    
    # Initiate storage vector    
    results=c()
    
    # Define mart to get information from
    snp_mart = useMart(biomart="ENSEMBL_MART_SNP", 
                       host="grch37.ensembl.org", 
                       path="/biomart/martservice", 
                       dataset="hsapiens_snp")
 
    for(chr in unique(df$CHR))
    {
      print(paste("Getting IDs for variants in chromosome",chr))
      
      tmp=df[df$CHR==chr,]
      minBP=min(tmp$BP)
      maxBP=max(tmp$BP)
      
      s1 <- seq(minBP, maxBP, by = 1500000)
      s2 <- c(s1[-1]-1, maxBP)
      regions <- matrix(c(rep(chr, length(s1)), s1, s2), ncol = 3)

      getBM_values <- function(values)
      {
        getBM(attributes = c("refsnp_id","chr_name","chrom_start","allele"), 
        filters = c("chr_name", "start", "end"), 
        values = list(values[1], values[2], values[3]), 
        mart = snp_mart)
      }
      
      res_list=apply(regions, 1, getBM_values)
      print(paste("Merging IDs retrieved for chromosome",chr))
      snp_ids=do.call(rbind, res_list)
      snp_ids$CHR_BP=paste(snp_ids$chr_name, snp_ids$chrom_start, sep=":")
      
      # Restrict to SNPs of interest
      snp_ids=snp_ids[snp_ids$CHR_BP %in% tmp$CHR_BP,]
      
      # Keep only positions with unique rsID
      dupSNPs=snp_ids[duplicated(snp_ids$CHR_BP),"refsnp_id"]
      snp_ids=snp_ids[!snp_ids$refsnp_id %in% dupSNPs,]
      
      # Save
      print(paste("Saving results for chromosome",chr))
      results = rbind(results, snp_ids)
      write.table(snp_ids, paste0("FB_Brain_2_rsIDs_chr",chr), quote=F, row.names=F)
    }
    
    
   
   
   
   
    
    
       # Get rsIDs
# nrow(df)=307714
    for (i in 21:nrow(df))
    {
      tmp = getBM(attributes=c('refsnp_id', 'allele', 'chrom_start', 'chrom_strand'), 
                   filters=c('chr_name', 'start', 'end'), 
                   values=list(df[i,1], df[i,4] , df[i,4]), 
                   mart=snp_mart)
      # Add chromosome and BP
      tmp[,5:6] = df[i,c(1,4)] 
      # Save
      results = rbind(results, tmp)
    }

    # -------------------------------------------------------------------------------------
  


# ---- Create links to caQTL datasets
  # Bryois et al. chromatin accessibility QTL (caQTL) data
  wget http://cnsgenomics.com/data/SMR/Bryois_caQTL_summary.tar.gz
  tar -xzf Bryois_caQTL_summary.tar.gz
  ln -s $qtlDir/Bryois_caQTL_summary/bryois_NatCommun_2018_50kb_cQTLs.* $qtlDir
  for i in `ls $qtlDir/bryois_NatCommun_2018_50kb_cQTLs*`; do mv $i `echo $i | sed 's/bryois_NatCommun_2018_50kb_cQTLs/Prefrontal_Cortex_caQTL/g'`; done


  
# ---- List QTL datasets to use
qtl_datasetList=$tmpDir/qtl_datasets_4SMR
echo "Brien_eQTL eQTLgen Liver PsychENCODE Brain_Amygdala Brain_Anterior_cingulate_cortex_BA24 Brain_Caudate_basal_ganglia Brain_Cerebellar_Hemisphere Brain_Cerebellum Brain_Cortex Brain_Frontal_Cortex_BA9 Brain_Hypothalamus Brain_Hippocampus Brain_Nucleus_accumbens_basal_ganglia Brain_Putamen_basal_ganglia Brain_Spinal_cord_cervical_c-1 Brain_Substantia_nigra Skin_Not_Sun_Exposed_Suprapubic Skin_Sun_Exposed_Lower_leg Whole_Blood Prefrontal_Cortex_caQTL" | tr ' ' '\n' > $qtl_datasetList 


# #######################
# Run SMR
# #######################
# RCC settings
smr=$bin/smr/smr_Linux
nThreads=22
account="-acct=UQ-IMB-CNSG"

# Delta settings
smr=$bin/smr/smr_v0.706
nThreads=16
account=""

# Common settings
prefix=vitD_fastGWA_withX
qtlDir=$WD/input/qtls
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
ld_ref=$medici/UKBiobank/v3EUR_impQC/ukbEUR_imp_chr{TASK_ID}_v3_impQC
gwas=$results/maFormat/$prefix.ma
snps2include=$WD/input/UKB_v3EUR_impQC/UKB_EUR_impQC_maf0.01_noMHC.snplist
qtl_datasetList=$tmpDir/qtl_datasets_4SMR
output=$tmpDir/${prefix}_chr{TASK_ID}


# Run SMR
cd $results
for i in `cat $qtl_datasetList`
do
  echo $i
  qtl=$qtlDir/$i
  tmp_command="$smr --bfile $ld_ref \
                    --gwas-summary $gwas \
                    --beqtl-summary $qtl \
                    --extract-snp $snps2include \
                    --thread-num $nThreads \
                    --out $output.$i"
  
  qsubshcom "$tmp_command" $nThreads 100G $prefix.$i.smr 24:00:00 "-array=1-22 $account"
done

# Merge SMR results
for i in `cat $qtl_datasetList`
do
  echo $i
  sed 1q $tmpDir/${prefix}_chr1.$i.smr  > $results/smr/$prefix.$i.smr 
  cat $tmpDir/${prefix}_chr*.$i.smr | grep -v 'Gene' >> $results/smr/$prefix.$i.smr 
done

# #########################################
# Copy SMR results to local computer (run command from local)
# #########################################
scp $deltaID:$WD/results/fastGWA/smr/vitD_fastGWA_withX*smr $local/results/fastGWA/smr/




# #########################################
# Generate data for SMR locus plots
# #########################################

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
smrDir=$WD/results/fastGWA/smr
genoDir=$medici/UKBiobank/v3EUR_impQC
qtlDir=$WD/input/qtls

# Files and settings
prefix=vitD_fastGWA_withX
qtl_datasetList=$tmpDir/qtl_datasets_4SMR
gwas=$WD/results/fastGWA/maFormat/vitD_fastGWA_withX.ma
smr=$bin/smr/smr_v0.706
windowSize=6000

# Get list of gene locations for plotting
cd $smrDir
#wget https://www.cog-genomics.org/static/bin/plink/glist-hg19

# Loop through SMR datasets to get data to plot significant loci
for datasetName in `cat $qtl_datasetList`
do

  dataset=$prefix.$datasetName.smr
  
  # Generate file with significant results (the ones we want to plot)
  n=`sed 1d $smrDir/$dataset | wc -l`
  awk '$19<((0.05/'$n'))' $smrDir/$dataset > $smrDir/$dataset.smr_sig

  # Retreive information for each probe
  nProbes=`wc -l < $smrDir/$dataset.smr_sig `
  if test $nProbes != 1; then arrays="-array=1-$nProbes"; i="{TASK_ID}"; else i=1; arrays=; fi 
  getProbe="probe=\$(awk -v i=$i 'NR==i {print \$1}' $smrDir/$dataset.smr_sig)"
  getChr="chr=\$(awk -v i=$i 'NR==i {print \$2}' $smrDir/$dataset.smr_sig)"
  runSMR="$smr --bfile $genoDir/ukbEUR_imp_chr\${chr}_v3_impQC \
                    --gwas-summary $gwas \
                    --beqtl-summary $qtlDir/$datasetName \
                    --plot \
                    --probe \$probe  \
                    --probe-wind 6000 \
                    --gene-list $smrDir/glist-hg19 \
                    --out $smrDir/smrPlot.$datasetName.${windowSize}kb.chr\$chr"
  qsubshcom "$getProbe; $getChr; $runSMR" 1 30G getSMRplot.${windowSize}kb.$dataset 24:00:00 "$arrays"

done

# #########################################
# Copy to results to local computer (run command from local)
# #########################################
scp $deltaID:$WD/results/fastGWA/smr/plot/plot/smrPlot.eQTLgen.chr* $local/results/fastGWA/smr/plot/

Results

discovery = read.table("results/fastGWA/smr/vitD_fastGWA_withX.eQTLgen.smr", h=T, stringsAsFactors=F)
m=nrow(discovery)
pthrsh=0.05/m
discovery=discovery[discovery$p_SMR<pthrsh,]

Discovery data set

The discovery dataset for eQTLs was eQTLgen.

  • 15504 tests performed
  • 112 significant (at P = 0.05/15504 = 3.22e-06) associations
    • 24 also pass the HEIDI test (P>0.05) and can be considered pleiotropic or causal for vitamin D levels
# Significant SMR associations in eQTLgen
tmp=discovery[,c("probeID","ProbeChr","Probe_bp","Gene","topSNP","A1","b_SMR","se_SMR","p_SMR","p_HEIDI","b_GWAS","se_GWAS","p_GWAS","b_eQTL","se_eQTL","p_eQTL")]
#tmp=tmp[tmp$p_HEIDI>0.05,]
names(tmp)[grep("ProbeChr",names(tmp))]="Chr"
tmp=tmp[order(tmp$Chr,tmp$Probe_bp),]


# Format table
#options(warn=-1)
#tmp=tmp %>% mutate_at(vars(b_GWAS, se_GWAS, b_eQTL, se_eQTL, b_SMR, se_SMR), funs(round(., 2)))
#options(warn=0)
tmp$p_GWAS=format(tmp$p_GWAS, digits=3)
tmp$p_eQTL=format(tmp$p_eQTL, digits=3)
tmp$p_SMR=format(tmp$p_SMR, digits=3)
tmp$p_HEIDI=format(tmp$p_HEIDI, digits=3)

# Present table
datatable(tmp, 
          rownames=FALSE, 
          extensions=c('Buttons','FixedColumns'),
          options=list(pageLength=10,
                       dom='frtipB',
                       buttons=c('csv', 'excel'),
                       scrollX=TRUE),
          caption="List of associations that are significant in SMR analyses - eQTLgen") %>% formatStyle("p_eQTL","white-space"="nowrap") %>% formatStyle("p_SMR","white-space"="nowrap") %>% formatStyle("p_HEIDI","white-space"="nowrap")
# Source SMR R script to plot
source("scripts/plot_SMR.r") 

# Identify files with data to plot (one per probe)
plotData=list.files("results/fastGWA/smr/plot/", pattern="smrPlot.*txt")

# Change SMR plotting function to deal with cases where P=0 (change those to the largest P-value)
## Added these lines:
## pZY[pZY==Inf]=max(pZY[pZY!=Inf])
## pXY=pXY[pXY!=Inf]
SMRLocusPlot_changed=function(data=SMRData, probeNEARBY=NULL,smr_thresh=NULL, smr_thresh_plot=NULL, heidi_thresh=NULL, plotWindow=NULL,pointsize=20,max_anno_probe=16,anno_selfdef=TRUE)
{
    
    cex_coeff=3/4 * pointsize/15;
    if(length(smr_thresh)==0){
        print("ERROR: please specify the threshold of SMR test!");
        quit();
    }
    if(length(heidi_thresh)==0){
        print("ERROR: please specify the threshold of HEIDI test!");
        quit();
    }
    if(length(plotWindow)==0){
        print("ERROR: please specify the plot window size!");
        quit();
    }
    if(length(which(is.na(data$SMR[,3])))>0)
    {
        print("ERROR: Some probes' physical positon is missing!");
        quit();
    }
    idx=match(data$probeID,data$SMR[,1]);
    if(length(idx)==0){
        print("ERROR: Plot file is not generated correctly, can't find target probe!");
        quit();
    }
    if(length(smr_thresh_plot)==0){
        smr_thresh_plot=smr_thresh;
    }
    cis_start=data$SMR[idx,3]-plotWindow*1000;
    if(cis_start<0) cis_start=0
    cis_end=data$SMR[idx,3]+plotWindow*1000;
    idx=which(data$SMR[,3]>=cis_start & data$SMR[,3]<=cis_end)
    data$SMR=data$SMR[idx,]
    idx=match(data$GWAS[,1],data$SNP[,1])
    tmpsnpbp=data$SNP[idx,3]
    idx=which(tmpsnpbp>=cis_start &tmpsnpbp<=cis_end)
    data$GWAS=data$GWAS[idx,]
    idx=match(data$eQTL[,2],data$SNP[,1])
    tmpsnpbp=data$SNP[idx,3]
    idx=which(tmpsnpbp>=cis_start &tmpsnpbp<=cis_end)
    data$eQTL=data$eQTL[idx,]
    
    if(!is.null(data$Gene))
    {
        idx=which(data$Gene[,2]>=cis_start & data$Gene[,3]<=cis_end )
        data$Gene=data$Gene[idx,]
    }
    
    #start to plot
    smrindx = which(data$SMR[,8] <= smr_thresh_plot)
    #heidiindx = which((data$SMR[,8] <= smr_thresh_plot) & (data$SMR[,9] >= heidi_thresh_plot))
    smrprobes = NULL; heidiprobes = NULL;
    if(length(smrindx)>0) { smrprobes =  as.character(data$SMR[smrindx,1]) }
    #if(length(heidiindx)>0) { heidiprobes = as.character(data$SMR[heidiindx,1]) }
    
    smrindx_bonferr = which(data$SMR[,8] <= smr_thresh)
    heidiindx_strengent = which((data$SMR[,9] >= heidi_thresh))
    smrprobes_red = NA; heidiprobes_solid = NA;
    if(length(smrindx_bonferr)>0) { smrprobes_red =  as.character(data$SMR[smrindx_bonferr,1]) }
    if(length(heidiindx_strengent)>0) { heidiprobes_solid = as.character(data$SMR[heidiindx_strengent,1]) }
    
    if(length(probeNEARBY)>0)
    {
        idx=match(probeNEARBY,data$SMR[,1])
        idxx=which(is.na(idx))
        if(length(idxx)>0)
        {
            for(ii in 1:length(idxx)) {
                print(paste("WARNING: cann't find probe ",probeNEARBY[idxx[ii]], " in plot region.",sep=""))
            }
            probeNEARBY=probeNEARBY[-idxx]
        }
        
    }
    probePLOT=smrprobes #draw the eQTL of all the probes that passed smr_thresh_plot
    probePLOT=unique(c(data$probeID,probePLOT,probeNEARBY)) # draw the target probe anyway
    nprobePLOT = length(probePLOT)
    
    idx=which(is.na(data$GWAS[,2]) | is.na(data$GWAS[,3]))
    if(length(idx)>0) data$GWAS=data$GWAS[-idx,]
    pZY=-log10(pchisq((data$GWAS[,2]/data$GWAS[,3])^2,1,lower.tail=F))
  pZY[pZY==Inf]=max(pZY[pZY!=Inf])
  
    idx=match(data$probeID,data$SMR[,1]);
    if(length(idx)>0){
        chrPLOT = data$SMR[idx,2]
    }else{
        print("ERROR: Plot file is not generated correctly, please report this bug!");
        quit();
    }
    idx=which(is.na(data$SMR[,8]) )
    if(length(idx)>0) {
        probeINFO=data$SMR[-idx,];
    }else{
        probeINFO=data$SMR;
    }
    idx=which(is.na(probeINFO[,5]) | is.na(probeINFO[,6]));
    idx2=which(is.na(probeINFO[,3]));
    if(length(intersect(idx,idx2))>0)
    {
        print("ERROR: Some probes' physical positon is missing!");
        quit();
    }
    probeINFO[idx,5]=probeINFO[idx,3]-7500;
    probeINFO[idx,6]=probeINFO[idx,3]+7500;
    probeINFO[,8]=-log10(probeINFO[,8]);
    probeINFO[,3]=probeINFO[,3]/1e6;
    probeINFO[,5]=probeINFO[,5]/1e6;
    probeINFO[,6]=probeINFO[,6]/1e6;
    pXY=probeINFO[,8];
    pXY=pXY[pXY!=Inf]
    yMAX = ceiling(max(c(pZY, pXY), na.rm=T)) + 1;
    if(is.null(data$Gene))
    {
        glist=cbind(probeINFO[,2],probeINFO[,5:6],as.character(probeINFO[,4]),probeINFO[,7]);
    } else {
        glist=data$Gene;
        glist[,2]=glist[,2]/1e6;
        glist[,3]=glist[,3]/1e6;
    }
    colnames(glist)=c("CHR", "GENESTART",  "GENEEND",   "GENE", "ORIENTATION");
    idx=which(is.na(glist[,2]) | is.na(glist[,3]));
    if(length(idx>0)) glist=glist[-idx,];
    generow = GeneRowNum(glist);
    num_row = max(as.numeric(generow$ROW));
    offset_map = ceiling(yMAX);
    offset_probe = yMAX / 2.5;
    num_probe = nprobePLOT
    offset_eqtl = ceiling(yMAX / 2.5) + 0.5;
    dev_axis = 0.1*yMAX;
    if(dev_axis<1.5) dev_axis = 1.5;
    yaxis.min = -offset_map - offset_eqtl*num_probe - dev_axis*(num_probe+1);
    yaxis.max = yMAX + ceiling(offset_probe) + 1;
    # scales of x-axis
    idx=match(data$GWAS[,1],data$SNP[,1]);
    gwasBP = as.numeric(data$SNP[idx,3])/1e6;
    #min.pos = min(gwasBP);
    #max.pos = max(gwasBP);
    min.pos = cis_start/1e6
    max.pos = cis_end/1e6
    start = min(as.numeric(glist[,2]));
    end = max(as.numeric(glist[,3]));
    bp = c(min.pos, max.pos, start, end);
    xmin = min(bp, na.rm=T) - 0.001;  xmax = max(bp, na.rm=T) +0.001;
     xmax=xmax+(xmax-xmin)*0.1 #extend
    ylab = expression(paste("-", log[10], "(", italic(P), " GWAS or SMR)", sep=""));
    xlab = paste("Chromosome", chrPLOT, "Mb");
    # plot GWAS p value
    par(mar=c(5,5,3,2), xpd=TRUE)
    plot(gwasBP, pZY, yaxt="n", bty="n", ylim=c(yaxis.min,yaxis.max),
    ylab="", xlab=xlab, cex.lab=lab, cex.axis=axis,cex=0.6,
    xlim=c(xmin, xmax), pch=20, col="gray68");
    
    # y1 axis
    devbuf1 = yMAX/4
    axis(2, at=seq(0,yMAX,devbuf1), labels=round(seq(0,yMAX,devbuf1),0), las=1, cex.axis=axis);
    mtext(ylab, side=2, line=3, at=(yMAX*2/3), cex=cex_coeff);
    eqtl.lab = expression(paste("-", log[10], "(", italic(P), " eQTL)", sep=""));
    axis.start = 0; axis.down = offset_eqtl + dev_axis;
    for( k in 1 : nprobePLOT ) {
        axis.start = axis.start - axis.down
        eqtlinfobuf = data$eQTL[which(data$eQTL[,1]==probePLOT[k]),]
        if(dim(eqtlinfobuf)[1]==0) next;
        pvalbuf=-log10(pchisq((eqtlinfobuf[,3]/eqtlinfobuf[,4])^2,1,lower.tail=F));
        pvalbuf[which(is.infinite(pvalbuf))]=1e-300;
        if(length(which(smrprobes_red==probePLOT[k]))==0) {
            col_eqtl = "navy"
        } else col_eqtl = "maroon"
        eqtl.min = 0; eqtl.max = ceiling(max(pvalbuf))
        eqtl.max =ceiling(eqtl.max *1.25) #extend
        pvalbuf = pvalbuf/eqtl.max * offset_eqtl + axis.start
        idx=match(eqtlinfobuf[,2],data$SNP[,1]);
        eqtlbp = as.numeric(data$SNP[idx,3])/1e6;
        probegene = unique(as.character(data$SMR[which(data$SMR[,1]==probePLOT[k]),4]))
        par(new=TRUE)
        pchbuf = 4;
        #if(k%%2==0) pchbuf = 20;
        plot(eqtlbp, pvalbuf, yaxt="n", bty="n", ylim=c(yaxis.min,yaxis.max), xaxt="n",
        ylab="", xlab="", cex=0.8, pch=pchbuf, col=col_eqtl, xlim=c(xmin, xmax))
        # annotate the eQTLs
        text(xmin, axis.start+offset_eqtl-dev_axis/2 , label=substitute(paste(probeid, " (",italic(geneid), ")", sep=""),list(probeid=probePLOT[k], geneid=probegene)),col="black", cex=1, adj=0)
        # axis
        devbuf1 = offset_eqtl/3; devbuf2 = eqtl.max/3
        axis(2, at=seq(axis.start,(axis.start+offset_eqtl),devbuf1),
        labels=round(seq(0,eqtl.max,devbuf2),0),
        las=1, cex.axis=axis)
        # add separator line
        segments(xmin, axis.start+offset_eqtl+dev_axis/2, xmax, axis.start+offset_eqtl+dev_axis/2,
        col="dim grey", lty="24", lwd=1)
    }
    #ypos = (axis.start - dev_axis)/2
    ypos = (axis.start - dev_axis)*2/3
    mtext(eqtl.lab, side=2, at=ypos, line=3, cex=cex_coeff)
    
    # plot p value of bTG
    # all the probes
    num_gene = dim(generow)[1]
    dist = offset_map/num_row
    for( k in 1 : num_row ) {
        generowbuf = generow[which(as.numeric(generow[,5])==k),]
        xstart = as.numeric(generowbuf[,3])
        xend = as.numeric(generowbuf[,4])
        snbuf = which(xend-xstart< 1e-3)
        if(length(snbuf)>0) {
            xstart[snbuf] = xstart[snbuf] - 0.0025
            xend[snbuf] = xend[snbuf] + 0.0025
        }
        xcenter = (xstart+xend)/2
        xcenter = spread.labs(xcenter, mindiff=0.01, maxiter=1000, min = xmin, max = xmax)
        num_genebuf = dim(generowbuf)[1]
        for( l in 1 : num_genebuf ) {
            ofs=0.3
            if(l%%2==0) ofs=-0.8
            m = num_row - k
            ypos = m*dist + yaxis.min
            code = 1
            if(generowbuf[l,2]=="+") code = 2;
            arrows(xstart[l], ypos, xend[l], ypos, code=code, length=0.07, ylim=c(yaxis.min,yaxis.max),
            col=colors()[75], lwd=1)
            movebuf = as.numeric(generowbuf[l,6])*genemove
            text(xcenter[l]+movebuf, ypos,label=substitute(italic(genename), list(genename=as.character(generowbuf[l,1]))), pos=3, offset=ofs, col="black", cex=0.9)
        }
    }
    
    # plot the probes
    probeINFO=probeINFO[order(probeINFO[,8],decreasing = TRUE),];
    nprobeINFO=dim(probeINFO)[1];
    if(nprobeINFO>max_anno_probe){
        probeINFO=probeINFO[c(1:max_anno_probe),]
        nprobeINFO=dim(probeINFO)[1];
    }
    if(anno_selfdef) probeINFO=probeINFO[order(probeINFO[2],probeINFO[3]),] ####20170217
    xcenter = as.numeric(probeINFO[,3])
    xcbuf = xcenter
    ####20170217####
    if(anno_selfdef)
    {
        reginlength=(xmax-(xmax-xmin)*0.15)-xmin
        leftspot=xmin+reginlength/20
        rightspot=(xmax-(xmax-xmin)*0.15)-reginlength/20
        itvl=(rightspot-leftspot)/dim(probeINFO)[1]
        if(dim(probeINFO)[1]==1) {
            xcenter=as.numeric(probeINFO[,3])
        } else {
            xcenter=leftspot+itvl/2
            for( k in 2:dim(probeINFO)[1]) xcenter=c(xcenter,leftspot+k*itvl)
        }
        
    } else {
        xcenter = spread.labs(xcenter[1:nprobeINFO], mindiff=0.08, maxiter=1000, min = xmin, max = xmax-1)
    }
    # adjust the line position
    
    adjflag = rep(0, nprobeINFO)
    if(nprobeINFO>1) {
        dbuf = c(0, xcbuf[1:(nprobeINFO-1)])
        mflag = as.numeric(abs(xcbuf[1:(nprobeINFO)] - dbuf) < 0.01)
        adjflag = as.numeric( mflag | c(mflag[2:nprobeINFO],0) )
    }
    
    for( k in 1 : nprobeINFO)  {
         hitflag=FALSE
        if(length(which(heidiprobes_solid==probeINFO[k,1]))>0 & length(which(smrprobes_red==probeINFO[k,1]))>0) {
             hitflag=TRUE
             colplot = "maroon"; colfont=2; pchbuf=23;
        } else if(length(which(smrprobes_red==probeINFO[k,1]))>0) {
            colplot = "maroon"; colfont=2; pchbuf=5
            #} else if (length(which(heidiprobes_solid==probeINFO[k,1]))>0) {
            #hitflag=TRUE
            # colplot = "navy"; colfont=1; pchbuf=23
        } else {
            colplot = "navy"; colfont=1; pchbuf=5
        }
        if( as.numeric(probeINFO[k,8]) < 0 ) {
            colplot = "black"; colfont=1;
        }
        # plot p value of bxy
        plot_probe(probeINFO, k, colplot, xmin, xmax, yaxis.min, yaxis.max,pchbuf,hitflag)
        # annotate the probes
        if(k<=max_anno_probe)
        {
            ypos = 1.02*yMAX
            strbuf =
            text(xcenter[k], ypos,
            labels=substitute(paste(probeid, " (", italic(genename), ")", sep=""),
            list(probeid=as.character(probeINFO[k,1]),
            genename=as.character(probeINFO[k,4]))),
            ylim=c(yaxis.min, yaxis.max),
            srt=30, col=colplot, font=colfont, cex=1, adj=0)
            # plot the lines
            # 1st step
            xstart = xcbuf[k]
            ystart = as.numeric(probeINFO[k,8]); yend = yMAX*(1-1/20);
            if( nprobeINFO > 1 ) {
                if(adjflag[k]==1) {
                    xstart = (xcbuf[k] + xcenter[k])/2
                    segments(xcbuf[k], ystart, xstart, ystart, col=colplot, lwd=axis, lty=3)
                }
            }
            segments(xstart, ystart, xstart, yend, col=colplot, lwd=axis, lty=3)
            # 2nd step
            xend = xcenter[k]; ystart = yMAX*(1-1/20); yend = yMAX*1.01;
            segments(xstart, ystart, xend, yend, col=colplot, lwd=axis, lty=3)
        }
    }
    # plot the threshold
    # SMR threshold
    ybuf = -log10(as.numeric(smr_thresh)); dev_anno = yMAX/9;
    strbuf = paste("pSMR = ",smr_thresh, sep="")
    segments(xmin, ybuf, xmax, ybuf, col="maroon", lty=2, lwd=1);
    text(xmax, ybuf+dev_anno, labels=strbuf, adj=1, col="maroon", cex=axis,font=3);
}



# Loop through probes and generate SMR plots
  # Probes with problems:
  # 40 
#- Issue is that there are P=0, which result in a Inf -log(P) for plotting
  #, 44:49 - Issue is that we have no start or end position for some of the probes being plotted
for(i in plotData[-c(1:39)])
{
  print(paste(grep(i, plotData)," of ", length(plotData)))
  # Read data for this probe
  SMRData = ReadSMRData(paste0("results/fastGWA/smr/plot/",i))

  # Plot SMR results in the genomic region centred on this probe
  plot_fileName=paste0("results/fastGWA/smr/plot/",sub("txt","png",i))
  png(plot_fileName, width=15, height=8, units='in', bg="transparent", res=150)
  SMRLocusPlot_changed(data=SMRData, smr_thresh=8.4e-6, heidi_thresh=0.05, plotWindow=1000, max_anno_probe=16)
  dev.off()
  
}

# Pick an example plot and put in "plots/" to show it as an example in the Rmd fileg
getwd()

The plot below is an example of a locus identified with the SMR and HEIDI-threshold filtering methods.

img

In the top plot, grey dots represent the P-values from the vitamin D GWAS (conditioned on BMI). Probes in red font pass the SMR test. Diamonds represent the SMR P-values for gene expression probes. Filled diamonds represent probes that pass the HEIDI test. In the bottom plot, the crosses represent the P-values for the represented gene probes in the eQTLgen dataset.

This Dorpbox folder includes all locus plots for associations identified with the SMR and HEIDI-threshold filtering methods using the eQTLgen dataset.



Validation data sets

We validated the results in the remaining eQTL data sets.

# Create list of data sets used
dataset_names=data.frame(tissue=c("Blood","Sun-exposed skin","Non-sun-exposed skin","Liver","Brain - Amygdala","Brain - Anterior cingulate cortex BA24","Brain - Caudate basal ganglia","Brain - Cerebellar hemisphere","Brain - Cerebellum","Brain - Cortex","Brain - Frontal cortex BA9","Brain - Hippocampus","Brain - Hypothalamus","Brain - Nucleus accumbens basal ganglia","Brain - Putamen basal ganglia","Brain - Spinal cord cervical c-1","Brain - Substantia nigra","Brain - Prefrontal cortex","Fetal brain"),
                    type=rep("eQTL",19),
                    prefix=c("Whole_Blood","Skin_Sun_Exposed_Lower_leg","Skin_Not_Sun_Exposed_Suprapubic","Liver","Brain_Amygdala","Brain_Anterior_cingulate_cortex_BA24","Brain_Caudate_basal_ganglia","Brain_Cerebellar_Hemisphere","Brain_Cerebellum","Brain_Cortex","Brain_Frontal_Cortex_BA9","Brain_Hippocampus","Brain_Hypothalamus","Brain_Nucleus_accumbens_basal_ganglia","Brain_Putamen_basal_ganglia","Brain_Spinal_cord_cervical_c-1","Brain_Substantia_nigra","PsychENCODE","Brien_eQTL"),
                    stringsAsFactors=F)
dataset_names$fileName=paste0("vitD_fastGWA_withX.",dataset_names$prefix,".smr")


# Merge results from all validation datasets
df=c()
datasets_available=list.files("results/fastGWA/smr/", pattern="vitD_fastGWA_withX")
datasets_available=grep("eQTLgen|caQTL", invert=T, datasets_available ,value=T) # Omit results from eQTLgen or caQTLs
for(i in datasets_available)
{
  #print(grep(i,datasets_available))
  
  # Load results for this dataset
  tmp=read.table(paste0("results/fastGWA/smr/",i), h=T, stringsAsFactors=F, quote = "")
  tmp=tmp[,c("probeID","ProbeChr","Probe_bp","Gene","topSNP","A1","b_SMR","se_SMR","p_SMR","p_HEIDI","b_GWAS","se_GWAS","p_GWAS","b_eQTL","se_eQTL","p_eQTL")]
  names(tmp)[grep("ProbeChr",names(tmp))]="Chr"
  
  # Restrict to SMR significant results
  m=nrow(tmp)
  tmp=tmp[tmp$p_SMR<0.05/m,]
  
  # Add to table with results from all datasets
  if(nrow(tmp)>0)
  {
    tmp$fileName=i
    tmp$N_probes=m
    tmp$bonf_P=0.05/m
    df=rbind(df, tmp)
  }
}
df=merge(df,dataset_names, all.x=T)
df$fileName=NULL




# Table with all SMR significant results
tmp=df[df$type=="eQTL",grep("prefix|type", names(df), invert=T)]
tmp=tmp[,c("tissue",grep("tissue", names(tmp), invert=T, value=T))]
datatable(tmp, 
          rownames=FALSE, 
          extensions=c('Buttons','FixedColumns'),
          options=list(pageLength=10,
                       dom='frtipB',
                       buttons=c('csv', 'excel'),
                       scrollX=TRUE),
          caption="List of associations that are significant in SMR analyses in other eQTL datasets") %>% 
  formatStyle("tissue","white-space"="nowrap") %>%
  formatStyle("Gene","white-space"="nowrap") %>%
  formatStyle("p_eQTL","white-space"="nowrap") %>% 
  formatStyle("p_SMR","white-space"="nowrap") %>% 
  formatStyle("p_HEIDI","white-space"="nowrap")
#### mQTLs

#```{r mQTL - Discovery}
discovery = read.table("results/fastGWA/smr/vitD_BMIcond.lbc_bl_mqtl.smr", h=T, stringsAsFactors=F)
m=nrow(discovery)
pthrsh=0.05/m
discovery=discovery[discovery$p_SMR<pthrsh,]
#```


**<u>Discovery data set:</u>** McRae et al. 2018

- `r m` tests performed
- `r nrow(discovery)` significant (at P = 0.05/`r m` = `r as.character(format(pthrsh, digits=3))`) associations
    - `r nrow(discovery[discovery$p_HEIDI>0.05,])` also pass the HEIDI test (P>0.05) and can be considered pleiotropic or causal for vitamin D 


#```{r mQTL - Discovery significant results}
# Significant SMR associations in eQTLgen
tmp=discovery[,c("probeID","ProbeChr","Probe_bp","Gene","topSNP","A1","b_SMR","se_SMR","p_SMR","p_HEIDI","b_GWAS","se_GWAS","p_GWAS","b_eQTL","se_eQTL","p_eQTL")]
#tmp=tmp[tmp$p_HEIDI>0.05,]
names(tmp)[grep("ProbeChr",names(tmp))]="Chr"
tmp=tmp[order(tmp$Chr,tmp$Probe_bp),]


# Format table
tmp=tmp %>% mutate_at(vars(b_GWAS, se_GWAS, b_eQTL, se_eQTL, b_SMR, se_SMR), funs(round(., 2)))
tmp$p_GWAS=format(tmp$p_GWAS, digits=3)
tmp$p_eQTL=format(tmp$p_eQTL, digits=3)
tmp$p_SMR=format(tmp$p_SMR, digits=3)
tmp$p_HEIDI=format(tmp$p_HEIDI, digits=3)

# Present table
datatable(tmp, 
          rownames=FALSE, 
          extensions=c('Buttons','FixedColumns'),
          options=list(pageLength=10,
                       dom='frtipB',
                       buttons=c('csv', 'excel'),
                       scrollX=TRUE),
          caption="List of associations that are significant in SMR analyses - McRae et al. 2018") %>% 
  formatStyle("p_eQTL","white-space"="nowrap") %>% 
  formatStyle("p_SMR","white-space"="nowrap") %>% 
  formatStyle("p_HEIDI","white-space"="nowrap")

#```


##### Validation data sets  
We validated the results in the remaining mQTL data sets.
#```{r mQTL - Validation, eval=T}

# Table with all SMR significant results
tmp=df[df$type=="mQTL" & df$prefix!="lbc_bl_mqtl",]
tmp=tmp[,c("tissue",grep("tissue|prefix|type", names(tmp), invert=T, value=T))]
datatable(tmp, 
          rownames=FALSE, 
          extensions=c('Buttons','FixedColumns'),
          options=list(pageLength=10,
                       dom='frtipB',
                       buttons=c('csv', 'excel'),
                       scrollX=TRUE),
          caption="List of associations that are significant in SMR analyses in other mQTL datasets") %>% 
  formatStyle("tissue","white-space"="nowrap") %>%
  formatStyle("Gene","white-space"="nowrap") %>%
  formatStyle("p_eQTL","white-space"="nowrap") %>% 
  formatStyle("p_SMR","white-space"="nowrap") %>% 
  formatStyle("p_HEIDI","white-space"="nowrap")

#```