BMI has previously been reported to associate with vitamin D levels. However, to prevent collider bias, we did not include BMI as a covariate in our GWAS.

Here, we assessed the relationship between BMI and vitamin D. For this, we used GWA results for BMI presented in Xue et al. 2018 Nat Commun. First, we assessed the genetic correlation between vitamin D and BMI. Second, we used generalised summary-data-based Mendelian randomisation (GSMR) to test for putative causal effects between the two traits. Third, we performed a multi-trait-based conditional & joint analysis (mtCOJO) to condition the vitamin D GWAS on BMI and estimate its effect on vitamin D.




Methods

# Number of variants available in dataset used as LD reference
referenceSNPs_N=length(count.fields("input/UKB_v3EUR_HM3/ukbEUR_imp_allchr_v3_HM3.bim", sep="\t"))

Genetic correlation

We calculated the genetic correlation between vitamin D and BMI using LDSC regression. We restricted this analysis to a list of HapMap3 SNPs with no variants in the MHC region. The list of SNPs (w_hm3.noMHC.snplist) was downloaded from LDHub.

# Get BMI GWAS sumstats from Tian's directory
#cp $pctg/methylation/mQTL_project/28_VitD_BMI/Meta-analysis_Locke_et_al+UKBiobank_2018_UPDATED.txt $WD/input/

# Get BMI GWAS susmstats from Angli's directory
cp $pctg/a.xue/ukbEUR_BMI_common.txt $WD/input/gwas_sumstats/0_original/


# ##############################
# Run bivariate LDSC regression
# ##############################

#  ------ Using UKB BMI sumstats
module load ldsc
prefix=vitD_fastGWA
bmi_prefix=ukbBMI

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
ldscDir=$bin/ldsc

# Format BMI GWAS results
echo "SNP A1 A2 b n p" > $WD/input/gwas_sumstats/1_LDSRFormat/$bmi_prefix.ldHubFormat
awk 'NR>1 {print $2,$4,$5,$8,$7,$10}' $WD/input/gwas_sumstats/0_original/ukbEUR_BMI_common.txt >> $WD/input/gwas_sumstats/1_LDSRFormat/$bmi_prefix.ldHubFormat

# Munge (process sumstats and restrict to HapMap3 SNPs) BMI sumstats
munge_sumstats.py --sumstats $WD/input/gwas_sumstats/1_LDSRFormat/$bmi_prefix.ldHubFormat \
                  --merge-alleles $ldscDir/w_hm3.snplist \
                  --out  $WD/input/gwas_sumstats/2_LDSRMungeFormat/$bmi_prefix
                  
# Run rg
ldsc.py --rg  $results/ldscFormat/$prefix.sumstats.gz,$WD/input/gwas_sumstats/2_LDSRMungeFormat/$bmi_prefix.sumstats.gz\
        --ref-ld-chr $ldscDir/eur_w_ld_chr/ \
        --w-ld-chr $ldscDir/eur_w_ld_chr/ \
        --out $results/ldsc/$prefix.$bmi_prefix.ldsc.rg
        
# Copy results to local computer (run command in local computer)
scp $deltaID:$WD/results/fastGWA/ldsc/vitD_fastGWA.ukbBMI.ldsc.rg.log $local/results/fastGWA/ldsc/


#  ------ Using Locke et al. BMI sumstats (PMID 25673413)
module load ldsc
prefix=vitD_fastGWA
bmi_prefix=locke2015BMI

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
ldscDir=$bin/ldsc

# Format BMI GWAS results
echo "SNP A1 A2 b n p" > $WD/input/gwas_sumstats/1_LDSRFormat/$bmi_prefix.ldHubFormat
awk 'NR>1 {print $1,$2,$3,$5,$8,$7}' $WD/input/gwas_sumstats/0_original/bmi_locke2015 >> $WD/input/gwas_sumstats/1_LDSRFormat/$bmi_prefix.ldHubFormat

# Munge (process sumstats and restrict to HapMap3 SNPs) BMI sumstats
munge_sumstats.py --sumstats $WD/input/gwas_sumstats/1_LDSRFormat/$bmi_prefix.ldHubFormat \
                  --merge-alleles $ldscDir/w_hm3.snplist \
                  --out  $WD/input/gwas_sumstats/2_LDSRMungeFormat/$bmi_prefix
                  
# Run rg
ldsc.py --rg  $results/ldscFormat/$prefix.sumstats.gz,$WD/input/gwas_sumstats/2_LDSRMungeFormat/$bmi_prefix.sumstats.gz\
        --ref-ld-chr $ldscDir/eur_w_ld_chr/ \
        --w-ld-chr $ldscDir/eur_w_ld_chr/ \
        --out $results/ldsc/$prefix.$bmi_prefix.ldsc.rg
        
# Copy results to local computer (run command in local computer)
scp $deltaID:$WD/results/fastGWA/ldsc/vitD_fastGWA.locke2015BMI.ldsc.rg.log $local/results/fastGWA/ldsc/




GSMR

We performed bi-directional GSMR on BMI and vitamin D with the GCTA-implemented GSMR method. The following options were used:

  • –mbfile: We used a random subset of 20,000 unrelated individuals of European ancestry from the UKB as LD reference. The subset was generated based with the files in Q0286/UKBiobank/v3EURu_impQC/, which include 1365446 autossomal SNPs and are restricted to unrelated individuals of European ancestry
  • –gsmr-file: The two GWAS used in the analysis:
    • Vitamin D fastGWA results
    • BMI GWAS results by Xue et al. 2018 Nat Commun (PMID 30054458). Note that these do not include results for the X chromosome
  • –gsmr-direction: We ran bi-directoial GSMR analysis (coded as 2)
  • –gsmr-snp-min 10 (default): Minimum number of GW significant and near-independent SNPs required for the GSMR analysis.
  • –gwas-thresh 5e-8 (default): P-value threshold to select SNPs for clumping.
prefix=vitD_fastGWA
bmi_prefix=ukbBMI

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
gwasDir=$WD/input/gwas_sumstats
random20k_dir=$WD/input/UKB_v3EURu_impQC/random20K


# ########################
# Prepare files to run GSMR
# ########################

# --- Create file with paths to LD reference data
#     Use random set of 20K unrelated Europeans previously extracted with fastGWA variants for COJO analysis
  echo $random20k_dir/ukbEURu_imp_chr1_v3_impQC_random20k_maf0.01 > $tmpDir/ukb_ref_data.txt 
  for i in `seq 2 22`; do echo $random20k_dir/ukbEURu_imp_chr${i}_v3_impQC_random20k_maf0.01 >> $tmpDir/ukb_ref_data.txt; done


# --- Convert BMI results to .ma format, remove duplicated SNPs and create file with name and path to that file
  echo "SNP A1 A2 freq b se p N" > $gwasDir/3_maFormat/$bmi_prefix.ma
  awk 'NR>1 {print $2,$4,$5,$6,$8,$9,$10,$7}' $gwasDir/0_original/ukbEUR_BMI_common.txt | awk '!seen[$1]++'>> $gwasDir/3_maFormat/$bmi_prefix.ma
  echo "bmi $gwasDir/3_maFormat/$bmi_prefix.ma" > $tmpDir/files4gsmr.bmi
  
# --- Create file with name and path to vitamin D results
echo "vitD $results/maFormat/$prefix.ma" > $tmpDir/files4gsmr.vitD


# ########################
# Run bi-directional GSMR
# ########################

# Files/settings to use
ld_ref=$tmpDir/ukb_ref_data.txt
ids2keep=$medici/v3Samples/ukbEURu_v3_all_20K.fam
exposure=$tmpDir/files4gsmr.bmi
outcome=$tmpDir/files4gsmr.vitD

# Run GSMR without HEIDI
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
output=$results/gsmr/$prefix.$bmi_prefix.gsmr_with20kUKB_noHeidi
tmp_command="$gcta --mbfile $ld_ref  \
                   --gsmr-file $exposure $outcome \
                   --gsmr-direction 2 \
                   --heidi-thresh 0  \
                   --effect-plot \
                   --gsmr-snp-min 10 \
                   --gwas-thresh 5e-8 \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""


# Run GSMR with new HEIDI-outlier method
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
output=$results/gsmr/$prefix.$bmi_prefix.gsmr_with20kUKB_2heidi
tmp_command="$gcta --mbfile $ld_ref  \
                   --keep  $ids2keep \
                   --gsmr-file $exposure $outcome \
                   --gsmr-direction 2 \
                   --effect-plot \
                   --gsmr-snp-min 10 \
                   --gwas-thresh 5e-8 \
                   --gsmr2-beta  \
                   --heidi-thresh  0.01 0.01  \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""

# Copy results to local computer (run command in local computer)
scp $deltaID:$WD/results/fastGWA/gsmr/vitD_fastGWA.ukbBMI.gsmr_with20kUKB_noHeidi.* $local/results/fastGWA/gsmr/

scp $deltaID:$WD/results/fastGWA/gsmr/vitD_fastGWA.ukbBMI.gsmr_with20kUKB_2heidi.* $local/results/fastGWA/gsmr/



# ##############################################
# Run bi-directional GSMR with pleiotropic SNPs 
# ##############################################

# --------- Generate lists of pleiotropic SNPs to use in the analyses

# BMI pleiotropic SNPs
bmi_pleioSNPs=$tmpDir/$bmi_prefix.$prefix.pleioSNPs
awk 'NR==1' $results/gsmr/$prefix.$bmi_prefix.gsmr_with20kUKB_2heidi.pleio_snps | awk '{print $3}' | tr ',' '\n' > $bmi_pleioSNPs

# Vitamin D pleiotropic SNPs
vitD_pleioSNPs=$tmpDir/$prefix.$bmi_prefix.pleioSNPs
awk 'NR==2' $results/gsmr/$prefix.$bmi_prefix.gsmr_with20kUKB_2heidi.pleio_snps | awk '{print $3}' | tr ',' '\n' > $vitD_pleioSNPs

# -------- Run GSMR

# Files/settings to use
ld_ref=$tmpDir/ukb_ref_data.txt
exposure=$tmpDir/files4gsmr.bmi
outcome=$tmpDir/files4gsmr.vitD

# Run GSMR without HEIDI - with BMI pleiotropic SNPs
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
output=$results/gsmr/gsmr_with20kUKB_noHeidi.$prefix.$bmi_prefix.bmi_pleioSNPs
tmp_command="$gcta --mbfile $ld_ref  \
                   --gsmr-file $exposure $outcome \
                   --gsmr-direction 0 \
                   --heidi-thresh 0  \
                   --extract $bmi_pleioSNPs \
                   --effect-plot \
                   --gsmr-snp-min 10 \
                   --gwas-thresh 5e-8 \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""

# Run GSMR without HEIDI - with vitamin D pleiotropic SNPs
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
output=$results/gsmr/gsmr_with20kUKB_noHeidi.$prefix.$bmi_prefix.vitD_pleioSNPs
tmp_command="$gcta --mbfile $ld_ref  \
                   --gsmr-file $exposure $outcome \
                   --gsmr-direction 1 \
                   --heidi-thresh 0  \
                   --extract $vitD_pleioSNPs \
                   --effect-plot \
                   --gsmr-snp-min 10 \
                   --gwas-thresh 5e-8 \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""


# Copy results to local computer (run command in local computer)
scp $deltaID:$WD/results/fastGWA/gsmr/gsmr_with20kUKB_noHeidi.vitD_fastGWA.ukbBMI.bmi_pleioSNPs.gsmr $local/results/fastGWA/gsmr/

scp $deltaID:$WD/results/fastGWA/gsmr/gsmr_with20kUKB_noHeidi.vitD_fastGWA.ukbBMI.vitD_pleioSNPs.gsmr $local/results/fastGWA/gsmr/

In addition, we ran bi-directional GSMR with BMI and BMI-conditioned vitamin D obtained with mtCOJO.




mtCOJO

We conducted a mtCOJO analysis to condition the vitamin D GWAS on the BMI GWAS results.


prefix=vitD_fastGWA
bmi_prefix=ukbBMI

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
ldscDir=$bin/ldsc
gwasDir=$WD/input/gwas_sumstats/3_maFormat

# ########################
# Prepare files to run mtCOJO
# ########################

# Vitamin D conditioned on BMI
echo "vitD $results/maFormat/$prefix.ma
bmi $gwasDir/$bmi_prefix.ma" > $tmpDir/files4mtCOJO.${prefix}_condition_on_$bmi_prefix

# BMI conditioned on vitamin D
echo "bmi $gwasDir/$bmi_prefix.ma
vitD $results/maFormat/$prefix.ma" > $tmpDir/files4mtCOJO.${bmi_prefix}_condition_on_$prefix

# ########################
# Run mtCOJO
# ########################

# Files/settings to use
ld_ref=$tmpDir/ukb_ref_data.txt
ids2keep=$medici/v3Samples/ukbEURu_v3_all_20K.fam
ldscFiles=$ldscDir/eur_w_ld_chr/
analysis=${prefix}_condition_on_$bmi_prefix
#analysis=${bmi_prefix}_condition_on_$prefix
gwasDatasets=$tmpDir/files4mtCOJO.$analysis
output=$results/mtcojo.$analysis

# Run mtCOJO
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
tmp_command="$gcta --mbfile $ld_ref  \
                   --keep $ids2keep \
                   --mtcojo-file $gwasDatasets \
                   --ref-ld-chr $ldscFiles \
                   --w-ld-chr $ldscFiles \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""

# Convert results to .ma format (for COJO, mtCOJO and SMR)
  echo "SNP A1 A2 freq b se p n" > $results/maFormat/`basename $output`.ma
  awk 'NR>1 {print $1,$2,$3,$4,$9,$10,$11,$8}' $output.mtcojo.cma | awk '!seen[$1]++' >>  $results/maFormat/`basename $output`.ma
  ln -s $results/maFormat/mtcojo.vitD_fastGWA_condition_on_ukbBMI.ma $results/maFormat/vitD_BMIcond_fastGWA.ma

# Convert results to fastGWA format (to upload to FUMA - need CHR and BP)
  ## Run in R
  qsub -I -A UQ-IMB-CNSG -X -l nodes=1:ppn=5,mem=15GB,walltime=6:00:00
  R
  setwd("$WD/")
  posFile=read.table("input/UKB_v3EURu_impQC/v3EURu_impQC.bim", h=F, stringsAsFactors=F, col.names=c("CHR","SNP","MG","BP","A1","A2"))
  sumstats=read.table("results/fastGWA/mtcojo.vitD_fastGWA_condition_on_ukbBMI.mtcojo.cma", h=T, stringsAsFactors=F)
  tmp=merge(posFile[,c("CHR","SNP","BP")], sumstats, all.y=T)
  tmp=tmp[,c("CHR","SNP","BP","A1","A2","N","freq","bC","bC_se","bC_pval")]
  names(tmp)=c("CHR","SNP","POS","A1","A2","N","AF1","BETA","SE","P")
  write.table(tmp, file=gzfile("results/fastGWA/vitD_BMIcond_fastGWA.gz"), row.names=F, quote=F)
  
  

# ###############################
# Copy results to local computer
# ###############################
#  Run command in local computer
scp $deltaID:$WD/results/fastGWA/*ukbBMI*mtcojo.cma $local/results/fastGWA/

scp $deltaID:$WD/results/fastGWA/vitD_BMIcond_fastGWA.gz $local/results/fastGWA/

# Transfer BMI sumstats locally (to generate Manhattan plot)
scp $deltaID:$WD/input/gwas_sumstats/0_original/ukbEUR_BMI_common.txt $local/input/gwas_sumstats/0_original/

Then, we re-ran GSMR on the BMI-conditioned vitamin D GWAS (mtCOJO) results.


prefix=vitDmtcojo
bmi_prefix=ukbBMI

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
ldscDir=$bin/ldsc


# ########################
# Prepare files to run GSMR
# ########################

# Create a file with name and path to mtCOJO results in .ma format
echo "vitD_bmi_mtcojo $results/maFormat/mtcojo.vitD_fastGWA_condition_on_ukbBMI.ma" > $tmpDir/files4gsmr.vitDmtcojo


# ########################
# Run bi-directional GSMR
# ########################

# Files/settings to use
ld_ref=$tmpDir/ukb_ref_data.txt
exposure=$tmpDir/files4gsmr.bmi
outcome=$tmpDir/files4gsmr.vitDmtcojo

# Run GSMR without HEIDI
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
output=$results/gsmr/gsmr_with20kUKB_noHeidi_$prefix.$bmi_prefix
tmp_command="$gcta --mbfile $ld_ref  \
                   --gsmr-file $exposure $outcome \
                   --gsmr-direction 2 \
                   --heidi-thresh 0  \
                   --effect-plot \
                   --gsmr-snp-min 10 \
                   --gwas-thresh 5e-8 \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""


# Run GSMR with new HEIDI-outlier method
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
output=$results/gsmr/gsmr_with20kUKB_2heidi_$prefix.$bmi_prefix
tmp_command="$gcta --mbfile $ld_ref  \
                   --gsmr-file $exposure $outcome \
                   --gsmr-direction 2 \
                   --effect-plot \
                   --gsmr-snp-min 10 \
                   --gwas-thresh 5e-8 \
                   --gsmr2-beta  \
                   --heidi-thresh  0.01 0.01  \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""


# Copy results to local computer (run command in local computer)
scp $deltaID:$WD/results/fastGWA/gsmr/gsmr_with20kUKB_noHeidi_vitDmtcojo.ukbBMI* $local/results/fastGWA/gsmr/

scp $deltaID:$WD/results/fastGWA/gsmr/gsmr_with20kUKB_2heidi_vitDmtcojo.ukbBMI.* $local/results/fastGWA/gsmr/

Lastly, we conducted COJO analysis on the mtCOJO results to identify the set of independent variants that had a jointly significant association with vitamin D after conditioning on BMI. The following options were used to run COJO:

  • –bfile: We used a random subset of 20,000 unrelated individuals of European ancestry from the UKB as LD reference. The subset was generated based with the files in Q0286/UKBiobank/v3EURu_impQC/, which include 1365446 autossomal SNPs and are restricted to unrelated individuals of European ancestry.
  • –cojo-slct: Performs a step-wise model selection of independently associated SNPs
  • –cojo-p 5e-8 (default): P-value to declare a genome-wide significant hit
  • –cojo-wind 10,000: Distance (in KB) from which SNPs are considered in linkage equilibrium
  • –cojo-collinear 0.9 (default): maximum r2 to be considered an independent SNP, i.e. SNPs with r2>0.9 will not be considered further
  • –diff-freq 0.2 (default): maximum allele difference between the GWAS and the LD reference sample
  • extract: Restrict analysis to SNPs with MAF>0.01

# ########################
# Run COJO on mtCOJO results
# ########################

prefix=mtcojo.vitD_fastGWA_condition_on_ukbBMI

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
random20k_dir=$WD/input/UKB_v3EURu_impQC/random20K

# Files/settings
ldRef=$random20k_dir/ukbEURu_imp_chr{TASK_ID}_v3_impQC_random20k_maf0.01
inFile=$results/maFormat/$prefix.ma
filters=maf0.01
snps2include=$WD/input/UKB_v3EUR_impQC/UKB_EUR_impQC_$filters.snplist
outFile=$tmpDir/$prefix.$filters.chr{TASK_ID}

# Run COJO
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
tmp_command="$gcta --bfile $ldRef \
                   --cojo-slct \
                   --cojo-file $inFile \
                   --extract $snps2include \
                   --out $outFile"
qsubshcom "$tmp_command" 1 60G COJO.$prefix 24:00:00 "-array=1-22"

# Merge COJO results once all chr finished running
cat $tmpDir/$prefix.$filters*.jma.cojo | grep "Chr" | uniq > $results/$prefix.cojo
cat $tmpDir/$prefix.$filters*.jma.cojo | grep -v "Chr" |sed '/^$/d'  >> $results/$prefix.cojo

# Copy results to local (run from local)
scp $deltaID:$WD/results/fastGWA/mtcojo.vitD_fastGWA_condition_on_ukbBMI.cojo $local/results/fastGWA/

COJO on SNPs common to all BMI ajustment levels

The number of SNPs tested in the 25OHD GWAS with no adjustment and in the 25OHD GWAS with BMI as covariate was the same. However, when we conditioned the 25OHD GWAS on BMI, results were available for a lower number of SNPs, as the BMI UKB GWAS used in the conditional analysis had results available for less SNPs than the 25OHD GWAS.

To compare the results obtained with the three levels of BMI adjustment (no adjustment, BMI covariate and BMI-conditioned), we restricted each set of results to the set of SNPs that were available in all 3 analyses. Then, we conducted COJO analyses to assess the number of independent associations identified in each analysis.


# Restrict the three sets of results (vitD, vitD_BMIcov, and vitD_BMIcond) to overlapping SNPs and run clumping and COJO for comparison

# ########################
# Restrict results
# ########################
results=$WD/results/fastGWA
cd $results

# Run in R
  ## Load data
  vitD=read.table("maFormat/vitD_fastGWA_withX.ma",h=T,stringsAsFactors=F)
  vitD_BMIcov=read.table("maFormat/vitD_BMIcov_fastGWA_withX.ma",h=T,stringsAsFactors=F)
  vitD_BMIcond=read.table("maFormat/vitD_BMIcond_fastGWA.ma",h=T,stringsAsFactors=F)
  ## Identify overlapping SNPs
  snps=intersect(intersect(vitD$SNP, vitD_BMIcov$SNP), vitD_BMIcond$SNP)
  ## Restrict datasets to overlapping SNPs
  vitD=vitD[vitD$SNP %in% snps,]
  vitD_BMIcov=vitD_BMIcov[vitD_BMIcov$SNP %in% snps,]
  vitD_BMIcond=vitD_BMIcond[vitD_BMIcond$SNP %in% snps,]
  ## Save restricted datasets
  write.table(vitD, "bmi_corrections/vitD_overlappingSNPs.ma", quote=F, row.names=F)
  write.table(vitD_BMIcov, "bmi_corrections/vitD_BMIcov_overlappingSNPs.ma", quote=F, row.names=F)
  write.table(vitD_BMIcond, "bmi_corrections/vitD_BMIcond_overlappingSNPs.ma", quote=F, row.names=F)


# ########################
# Run COJO 
# ########################

# Directories
results=$WD/results/fastGWA/bmi_corrections
tmpDir=$results/tmpDir
random20k_dir=$WD/input/UKB_v3EURu_impQC/random20K

# Files/settings
ldRef=$random20k_dir/ukbEURu_imp_chr{TASK_ID}_v3_impQC_random20k_maf0.01
filters=maf0.01
snps2include=$WD/input/UKB_v3EUR_impQC/UKB_EUR_impQC_$filters.snplist
gcta=$bin/gcta/gcta_1.92.3beta3

# Run COJO
cd $results
for prefix in vitD vitD_BMIcov vitD_BMIcond
do
  inFile=$results/${prefix}_overlappingSNPs.ma
  outFile=$tmpDir/$prefix.$filters.chr{TASK_ID}
  
  tmp_command="$gcta --bfile $ldRef \
                     --cojo-slct \
                     --cojo-file $inFile \
                     --extract $snps2include \
                     --out $outFile"
  qsubshcom "$tmp_command" 1 60G COJO.$prefix.overlappingSNPs 24:00:00 "-array=1-22"
done

# Merge COJO results once all chr finished running
for prefix in vitD vitD_BMIcov vitD_BMIcond
do
  cat $tmpDir/$prefix.$filters*.jma.cojo | grep "Chr" | uniq > $results/$prefix.cojo
  cat $tmpDir/$prefix.$filters*.jma.cojo | grep -v "Chr" |sed '/^$/d'  >> $results/$prefix.cojo
done

scp $deltaID:$WD/results/fastGWA/bmi_corrections/*cojo $local/results/fastGWA/bmi_corrections/




BMI-EA-25OHD

Based on the results below, we see a complex relationship between BMI and 25OHD. High BMI appears to cause low 25OHD levels. However, this association could be confounded.

Here we assessed if educational attainment (EA; a cognitive measurement) was a confounder. Specifically, we:

  • Conditioned 25OHD GWAS on EA (25OHD|EA), with mtCOJO
  • Calculated the genetic correltion between:
    • 25OHD|BMI ~ EA
    • 25OHD|EA ~ BMI
  • Conducted bi-directional GSMR analyses between:
    • EA <–> 25OHD
    • EA <–> BMI
# ##########
# mtCOJO
# ##########

# ----- 25OHD|EA
prefix=vitD_fastGWA_withX
prefix2=EA2018
prefix_cond=vitD_EAcond_fastGWA

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
ldscDir=$bin/ldsc
gwasDir=$WD/input/gwas_sumstats/3_maFormat

# Files to run mtCOJO
  ## Vitamin D conditioned on EA
  cond12=$tmpDir/files4mtCOJO.${prefix}_condition_on_$prefix2
  echo "vitD $results/maFormat/$prefix.ma" > $cond12
  echo "ea $gwasDir/$prefix2.ma" >> $cond12

  ## EA conditioned on vitamin D
  cond21=$tmpDir/files4mtCOJO.${prefix2}_condition_on_$prefix
  tail -1 $cond12 > $cond21
  sed 1q $cond12 >> $cond21

# Files/settings to use
ld_ref=$tmpDir/ukb_ref_data.txt
ids2keep=$medici/v3Samples/ukbEURu_v3_all_20K.fam
ldscFiles=$ldscDir/eur_w_ld_chr/
analysis=${prefix}_condition_on_$prefix2
gwasDatasets=$tmpDir/files4mtCOJO.$analysis
output=$results/mtcojo.$analysis

# Run mtCOJO
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
tmp_command="$gcta --mbfile $ld_ref  \
                   --keep $ids2keep \
                   --mtcojo-file $gwasDatasets \
                   --ref-ld-chr $ldscFiles \
                   --w-ld-chr $ldscFiles \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""

# Convert results to .ma format (for GSMR)
echo "SNP A1 A2 freq b se p n" > $results/maFormat/`basename $output`.ma
awk 'NR>1 {print $1,$2,$3,$4,$9,$10,$11,$8}' $output.mtcojo.cma | awk '!seen[$1]++' >>  $results/maFormat/`basename $output`.ma
ln -s $results/maFormat/`basename $output`.ma $results/maFormat/vitD_EAcond_fastGWA.ma

# Format 25OHD|EA GWAS results for LDSC
echo "SNP A1 A2 b n p" > $results/ldscFormat/$prefix_cond.ldHubFormat
awk 'NR>1 {print $1,$2,$3,$9,$8,$11}' $output.mtcojo.cma >> $results/ldscFormat/$prefix_cond.ldHubFormat

# Munge (process sumstats and restrict to HapMap3 SNPs) 25OHD|EA sumstats
munge_sumstats.py --sumstats $results/ldscFormat/$prefix_cond.ldHubFormat \
                  --merge-alleles $ldscDir/w_hm3.snplist \
                  --out  $results/ldscFormat/$prefix_cond

# ##########
# rg
# ##########

# ----- 25OHD|BMI ~ EA

module load ldsc
prefix=vitD_BMIcond_fastGWA
prefix2=EA2018

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
ldscDir=$bin/ldsc
  ## Note: EA sumstats were processed in the GSMR section
  ##       vitD sumstats were processed in the GWAS section 
  
# Run rg
ldsc.py --rg  $results/ldscFormat/$prefix.sumstats.gz,$WD/input/gwas_sumstats/2_LDSRMungeFormat/$prefix2.sumstats.gz \
        --ref-ld-chr $ldscDir/eur_w_ld_chr/ \
        --w-ld-chr $ldscDir/eur_w_ld_chr/ \
        --out $results/ldsc/$prefix.$prefix2.ldsc.rg

# Copy results to local computer
scp $deltaID:$WD/results/fastGWA/ldsc/vitD_BMIcond_fastGWA.EA2018.ldsc.rg.log $local/results/fastGWA/ldsc/


# ----- 25OHD|EA ~ BMI

module load ldsc
prefix=vitD_EAcond_fastGWA
prefix2=ukbBMI

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
ldscDir=$bin/ldsc

## Note: 25OHD|EA and ukbBMI sumstats were processed above

# Run rg
ldsc.py --rg  $results/ldscFormat/$prefix.sumstats.gz,$WD/input/gwas_sumstats/2_LDSRMungeFormat/$prefix2.sumstats.gz \
        --ref-ld-chr $ldscDir/eur_w_ld_chr/ \
        --w-ld-chr $ldscDir/eur_w_ld_chr/ \
        --out $results/ldsc/$prefix.$prefix2.ldsc.rg

# Copy results to local computer
scp $deltaID:$WD/results/fastGWA/ldsc/vitD_EAcond_fastGWA.ukbBMI.ldsc.rg.log $local/results/fastGWA/ldsc/




# ##########
# GSMR
# ##########

# Directories
results=$WD/results/fastGWA
tmpDir=$results/tmpDir
gwasDir=$WD/input/gwas_sumstats
random20k_dir=$WD/input/UKB_v3EURu_impQC/random20K

# ----- EA <--> 25OHD

prefix=vitD_fastGWA_withX
prefix2=EA2018

# Paths to the files to use in the analysis
  ## LDref: $tmpDir/ukb_ref_data.txt (generated above)
  ## vitD:  $tmpDir/files4gsmr.vitD (generated above)
  ## EA
  echo "EA $gwasDir/3_maFormat/$prefix2.ma" > $tmpDir/files4gsmr.EA2018

# Files/settings to use
ld_ref=$tmpDir/ukb_ref_data.txt
ids2keep=$medici/v3Samples/ukbEURu_v3_all_20K.fam
exposure=$tmpDir/files4gsmr.EA2018
outcome=$tmpDir/files4gsmr.vitD

# Run GSMR without HEIDI
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
output=$results/gsmr/$prefix.$prefix2.gsmr_with20kUKB_noHeidi
tmp_command="$gcta --mbfile $ld_ref  \
                   --gsmr-file $exposure $outcome \
                   --gsmr-direction 2 \
                   --heidi-thresh 0  \
                   --effect-plot \
                   --gsmr-snp-min 10 \
                   --gwas-thresh 5e-8 \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""


# Run GSMR with new HEIDI-outlier method
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
output=$results/gsmr/$prefix.$prefix2.gsmr_with20kUKB_2heidi
tmp_command="$gcta --mbfile $ld_ref  \
                   --keep  $ids2keep \
                   --gsmr-file $exposure $outcome \
                   --gsmr-direction 2 \
                   --effect-plot \
                   --gsmr-snp-min 10 \
                   --gwas-thresh 5e-8 \
                   --gsmr2-beta  \
                   --heidi-thresh  0.01 0.01  \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""

# Copy results to local computer
scp $deltaID:$WD/results/fastGWA/gsmr/vitD_fastGWA_withX.EA2018.gsmr_with20kUKB_*.gsmr $local/results/fastGWA/gsmr/


# ----- EA <--> BMI

prefix=ukbBMI
prefix2=EA2018

# Paths to the files to use in the analysis
  ## LDref: $tmpDir/ukb_ref_data.txt (generated above)
  ## BMI:   $tmpDir/files4gsmr.bmi (generated above)
  ## EA:    $tmpDir/files4gsmr.EA2018 (generated above)

# Files/settings to use
ld_ref=$tmpDir/ukb_ref_data.txt
ids2keep=$medici/v3Samples/ukbEURu_v3_all_20K.fam
exposure=$tmpDir/files4gsmr.bmi
outcome=$tmpDir/files4gsmr.EA2018

# Run GSMR without HEIDI
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
output=$results/gsmr/$prefix.$prefix2.gsmr_with20kUKB_noHeidi
tmp_command="$gcta --mbfile $ld_ref  \
                   --gsmr-file $exposure $outcome \
                   --gsmr-direction 2 \
                   --heidi-thresh 0  \
                   --effect-plot \
                   --gsmr-snp-min 10 \
                   --gwas-thresh 5e-8 \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""


# Run GSMR with new HEIDI-outlier method
cd $results
gcta=$bin/gcta/gcta_1.92.3beta3
output=$results/gsmr/$prefix.$prefix2.gsmr_with20kUKB_2heidi
tmp_command="$gcta --mbfile $ld_ref  \
                   --keep  $ids2keep \
                   --gsmr-file $exposure $outcome \
                   --gsmr-direction 2 \
                   --effect-plot \
                   --gsmr-snp-min 10 \
                   --gwas-thresh 5e-8 \
                   --gsmr2-beta  \
                   --heidi-thresh  0.01 0.01  \
                   --threads 16  \
                   --out $output"
qsubshcom "$tmp_command" 16 30G $(basename $output) 24:00:00 ""


# Copy results to local computer
scp $deltaID:$WD/results/fastGWA/gsmr/ukbBMI.EA2018.gsmr_with20kUKB_*.gsmr $local/results/fastGWA/gsmr/

2SMR

In addition to GSMR, we perfomed MR analyses between 25OHD and BMI using 2-sample MR.

# #########################################
# Generate table with all combinations of exposure and outcome GWAS to run 2SMR with
# #########################################
cd $WD
R

WD=getwd()

# Details of vitamin D GWAS 
exposureList=data.frame(Trait="vitD",
                        FilePath=paste0(WD,"/results/fastGWA/maFormat/vitD_fastGWA_withX.ma"),
                        stringsAsFactors=F)

# Details of other trait GWAS
outcomeList=data.frame(Trait="ukbBMI", 
                       FilePath=paste0(WD,"/input/gwas_sumstats/3_maFormat/ukbBMI.ma"))
  
# Combine in one table
df=cbind(exposureList, outcomeList)
tmp=cbind(outcomeList, exposureList)
df=rbind(df,tmp)
names(df)=c("exposure","exposureFile","outcome","outcomeFile")
df$heidi="noHeidi"
tmp=df
tmp$heidi="withHeidi"
df=rbind(df,tmp)
df[grep("vitD", df$exposure),"direction"]="12"
df[grep("vitD", df$outcome),"direction"]="21"

# Save table
write.table(df, "results/fastGWA/twosmr/input/analyses2run_ukbBMI", quote=F, row.names=F)

# #########################################
# Copy to cluster to run 2SMR
# #########################################
scp $local/results/fastGWA/twosmr/input/analyses2run_ukbBMI $deltaID:$WD/results/fastGWA/twosmr/input/
# #########################################
# Extract instruments from GSMR analyses
# #########################################

results=$WD/results/fastGWA
tmpDir=$results/tmpDir

exposure=ukbBMI
outcome=vitD

# Get lists of instruments used in both directions - no HEIDI
heidi=noHeidi
gsmr_log=$results/gsmr/vitD_fastGWA.ukbBMI.gsmr_with20kUKB_noHeidi.eff_plot.gz
zcat $gsmr_log | tail -4 | sed 1q | tr ' ' '\n' > $tmpDir/gsmr_with20kUKB_${heidi}_$exposure.$outcome.gsmr12instruments
zcat $gsmr_log | tail -2 | sed 1q | tr ' ' '\n' > $tmpDir/gsmr_with20kUKB_${heidi}_$exposure.$outcome.gsmr21instruments
  
# Get lists of instruments used in both directions - with HEIDI
heidi=withHeidi
gsmr_log=$results/gsmr/vitD_fastGWA.ukbBMI.gsmr_with20kUKB_2heidi.eff_plot.gz
zcat $gsmr_log | tail -4 | sed 1q | tr ' ' '\n' > $tmpDir/gsmr_with20kUKB_${heidi}_$exposure.$outcome.gsmr12instruments
zcat $gsmr_log | tail -2 | sed 1q | tr ' ' '\n' > $tmpDir/gsmr_with20kUKB_${heidi}_$exposure.$outcome.gsmr21instruments



# #########################################
# Submit 2SMR jobs
# #########################################    

results=$WD/results/fastGWA

analyses2run=$results/twosmr/input/analyses2run_ukbBMI
n=`wc -l < $analyses2run`

cd $results
if test $n != 1; then arrays="-array=2-$n"; i="{TASK_ID}"; else arrays=""; i=1; fi 
get_inputParameters="exposure=\$(awk -v i=$i 'NR==i {print \$1}' $analyses2run);
                     outcome=\$(awk -v i=$i 'NR==i {print \$3}' $analyses2run);
                     exposure_GWASfile=\$(awk -v i=$i 'NR==i {print \$2}' $analyses2run);
                     outcome_GWASfile=\$(awk -v i=$i 'NR==i {print \$4}' $analyses2run);
                     direction=\$(awk -v i=$i 'NR==i {print \$6}' $analyses2run);
                     heidi=\$(awk -v i=$i 'NR==i {print \$5}' $analyses2run)"
run2smr="Rscript $WD/scripts/run2smr.R $WD 
                                       \$exposure 
                                       \$outcome \
                                       \$exposure_GWASfile 
                                       \$outcome_GWASfile \
                                       \$direction 
                                       \$heidi"
qsubshcom "$get_inputParameters; $run2smr" 1 30G  run2smr 5:00:00 "$arrays"


# Merge results
sed 1q $results/twosmr/*ukbBMI*2smr | uniq | tr ' ' '\t' > $results/twosmr/ukbBMI.2smr 
cat $results/twosmr/*ukbBMI*2smr | grep -v 'method' | tr ' ' '\t' >> $results/twosmr/ukbBMI.2smr 

# Change method names (no spaces)
sed -i 's/MR\tEgger/MR Egger/g' $results/twosmr/ukbBMI.2smr 
sed -i 's/Weighted\tmedian/Weighted median/g' $results/twosmr/ukbBMI.2smr
sed -i 's/Inverse\tvariance\tweighted/Inverse variance weighted/g' $results/twosmr/ukbBMI.2smr
sed -i 's/Simple\tmode/Simple mode/g' $results/twosmr/ukbBMI.2smr 
sed -i 's/Weighted\tmode/Weighted mode/g' $results/twosmr/ukbBMI.2smr 


# Copy results to local
scp $deltaID:$WD/results/fastGWA/twosmr/ukbBMI.2smr $local/results/fastGWA/twosmr/




Results

Genetic Correlations

# ---- 25OHD ~ BMI
tmp=read.table("results/fastGWA/ldsc/vitD_fastGWA.ukbBMI.ldsc.rg.log", skip=30, fill=T, h=F, colClasses="character", sep="_")[,1]
intercept=strsplit(grep("Intercept",tmp, value=T)[3],":")[[1]][2]
rg=strsplit(grep("Genetic Correlation:",tmp, value=T),":")[[1]][2]

# ---- 25OHD|BMI ~ EA
tmp=read.table("results/fastGWA/ldsc/vitD_BMIcond_fastGWA.EA2018.ldsc.rg.log", skip=30, fill=T, h=F, colClasses="character", sep="_")[,1]
intercept2=strsplit(grep("Intercept",tmp, value=T)[3],":")[[1]][2]
rg2=strsplit(grep("Genetic Correlation:",tmp, value=T),":")[[1]][2]

# ---- 25OHD|EA ~ BMI
tmp=read.table("results/fastGWA/ldsc/vitD_EAcond_fastGWA.ukbBMI.ldsc.rg.log", skip=30, fill=T, h=F, colClasses="character", sep="_")[,1]
intercept3=strsplit(grep("Intercept",tmp, value=T)[3],":")[[1]][2]
rg3=strsplit(grep("Genetic Correlation:",tmp, value=T),":")[[1]][2]
  • Bivariate LDSC Intercept
    • 25OHD ~ BMI: -0.1789 (0.0093)
    • 25OHD|BMI ~ EA: -0.0135 (0.008)
    • 25OHD|EA ~ BMI: -0.1849 (0.0089)
  • Genetic Correlation
    • 25OHD ~ BMI: -0.1716 (0.0266)
    • 25OHD|BMI ~ EA: -0.2189 (0.0257)
    • 25OHD|EA ~ BMI: -0.2132 (0.0302)




Mendelian Randomization

# Open scripts to plot gsmr results
source("scripts/gsmr_plot.r")

# ################################
# GSMR results before mtCOJO
# ################################

# No HEIDI
gsmr_noHeidi = read_gsmr_data("results/fastGWA/gsmr/vitD_fastGWA.ukbBMI.gsmr_with20kUKB_noheidi.eff_plot.gz")
  # Change trait names
  gsmr_noHeidi[[1]][3:4]=c("BMI","Vitamin D")
  gsmr_noHeidi[[2]]=gsub("bmi","BMI",gsmr_noHeidi[[2]])
  gsmr_noHeidi[[2]]=gsub("vitD","Vitamin D",gsmr_noHeidi[[2]])
  gsmr_noHeidi_bm=gsmr_noHeidi

# With HEIDI
gsmr_newHeidi = read_gsmr_data("results/fastGWA/gsmr/vitD_fastGWA.ukbBMI.gsmr_with20kUKB_2heidi.eff_plot.gz")
  # Change trait names
  gsmr_newHeidi[[1]][3:4]=c("BMI","Vitamin D")
  gsmr_newHeidi[[2]]=gsub("bmi","BMI",gsmr_newHeidi[[2]])
  gsmr_newHeidi[[2]]=gsub("vitD","Vitamin D",gsmr_newHeidi[[2]])
  gsmr_newHeidi_bm=gsmr_newHeidi

# Pleiotropic SNPs
gsmr_bmiPleioSNPs_bm = read.table("results/fastGWA/gsmr/gsmr_with20kUKB_noHeidi.vitD_fastGWA.ukbBMI.bmi_pleioSNPs.gsmr", h=T, stringsAsFactors=F)
gsmr_vitDPleioSNPs_bm = read.table("results/fastGWA/gsmr/gsmr_with20kUKB_noHeidi.vitD_fastGWA.ukbBMI.vitD_pleioSNPs.gsmr", h=T, stringsAsFactors=F)

  
# ################################
# GSMR results after mtCOJO
# ################################

# No HEIDI
gsmr_noHeidi = read_gsmr_data("results/fastGWA/gsmr/gsmr_with20kUKB_noHeidi_vitDmtcojo.ukbBMI.eff_plot.gz")
  # Change trait names
  gsmr_noHeidi[[1]][3:4]=c("BMI","BMI-conditioned-vitamin D")
  gsmr_noHeidi[[2]]=gsub("vitD_bmi_mtcojo","BMI-conditioned-vitamin D",gsmr_noHeidi[[2]])
  gsmr_noHeidi[[2]]=gsub("bmi","BMI",gsmr_noHeidi[[2]])
  gsmr_noHeidi_am=gsmr_noHeidi

# With HEIDI
gsmr_newHeidi = read_gsmr_data("results/fastGWA/gsmr/gsmr_with20kUKB_2heidi_vitDmtcojo.ukbBMI.eff_plot.gz")
  # Change trait names
  gsmr_newHeidi[[1]][3:4]=c("BMI","BMI-conditioned-vitamin D")
  gsmr_newHeidi[[2]]=gsub("vitD_bmi_mtcojo","BMI-conditioned-vitamin D",gsmr_newHeidi[[2]])
  gsmr_newHeidi[[2]]=gsub("bmi","BMI",gsmr_newHeidi[[2]])
  gsmr_newHeidi_am=gsmr_newHeidi

Effect of BMI on vitamin D

GSMR before mtCOJO

# Get GSMR results beofre mtCOJO
gsmr_noHeidi=gsmr_noHeidi_bm
gsmr_newHeidi=gsmr_newHeidi_bm

# Define exposure and outcome
expo_str="BMI"
outcome_str="Vitamin D"


# ##############################################
# Plot GSMR results - Effect of BMI on vitamin D
# ##############################################
par(mfrow=c(1, 4))
par(mar=c(4.5, 4.5, 4, 2))

# Summary of results
tmp=as.data.frame(rbind(gsmr_noHeidi[[2]],gsmr_newHeidi[[2]]), stringsAsFactors=F)
tmp$Outcome=rep(c("Without HEIDI-outlier filtering","With HEIDI-outlier filtering"),each=2)
tmp=tmp[tmp$Exposure==expo_str,-1]
  # Add results from pleiotropic SNPs
  names(gsmr_bmiPleioSNPs_bm)[grep("nsnp",names(gsmr_bmiPleioSNPs_bm))]="n_snps"
  gsmr_bmiPleioSNPs_bm$Outcome="Pleiotropic SNPs"
  gsmr_bmiPleioSNPs_bm$Exposure=NULL
  tmp=rbind(tmp,gsmr_bmiPleioSNPs_bm)
names(tmp)[grep("Outcome",names(tmp))]=""
tmp=kable(tmp, row.names=F)
kable_styling(tmp, full_width=F, position = "left")
bxy se p n_snps
Without HEIDI-outlier filtering -0.110623 0.00464589 2.57808e-125 1090
With HEIDI-outlier filtering -0.129899 0.00478853 4.7141e-162 1020
Pleiotropic SNPs 0.169297 0.0181766 1.23106e-20 70
# Define colors
effect_col=c("#9EC1C2","#FAC477")

# Identify pleiotropic SNPs
tmp=gsmr_noHeidi[[3]]
all_snps=tmp[tmp[,2]==1,1]
tmp=gsmr_newHeidi[[3]]
non_pleiotropic_snps=tmp[tmp[,2]==1,1]
pleiotropicSNP=factor(all_snps %in% non_pleiotropic_snps, levels=c("TRUE","FALSE"), labels=c("no","yes"))


# ---- Effect sizes in both GWAS
  # Define sumstats to plot
  resbuf = gsmr_snp_effect(gsmr_noHeidi, expo_str, outcome_str)
  bxy = resbuf$bxy
  bzx = resbuf$bzx; bzx_se = resbuf$bzx_se
  bzy = resbuf$bzy; bzy_se = resbuf$bzy_se
  tmp=gsmr_noHeidi[[2]]
  b=round(bxy,4)
  se=round(as.numeric(tmp[tmp[,1]==expo_str,4]), 4)
  p=format(as.numeric(tmp[tmp[,1]==expo_str,5]), digits=3)
  resbuf = gsmr_snp_effect(gsmr_newHeidi, expo_str, outcome_str)
  bxy2 = resbuf$bxy
  # Plot
    vals = c(bzx-bzx_se, bzx+bzx_se)
    xmin = min(vals); xmax = max(vals)
    vals = c(bzy-bzy_se, bzy+bzy_se)
    ymin = min(vals); ymax = max(vals)
    std = sd(vals)
    plot(bzx, bzy, pch=20, cex=0.8, bty="n", cex.axis=1.1, cex.lab=1.2,
         col=effect_col[pleiotropicSNP], xlim=c(xmin, xmax), ylim=c(ymin, ymax),
         xlab=substitute(paste(trait, " (", italic(b[zx]), ")", sep=""), list(trait=expo_str)),
         ylab=substitute(paste(trait, " (", italic(b[zy]), ")", sep=""), list(trait=outcome_str)))
    ## Regression line
    if(!is.na(bxy)) abline(0, bxy, lwd=1.5, lty=2, col=effect_col[2])
    if(!is.na(bxy)) abline(0, bxy2, lwd=1.5, lty=2, col=effect_col[1])
    ## Standard errors
    nsnps = length(bzx)
    for( i in 1:nsnps ) {
        # x axis
        xstart = bzx[i] - bzx_se[i]; xend = bzx[i] + bzx_se[i]
        ystart = bzy[i]; yend = bzy[i]
        segments(xstart, ystart, xend, yend, lwd=1.5, col=effect_col[pleiotropicSNP[i]])
        # y axis
        xstart = bzx[i]; xend = bzx[i] 
        ystart = bzy[i] - bzy_se[i]; yend = bzy[i] + bzy_se[i]
        segments(xstart, ystart, xend, yend, lwd=1.5, col=effect_col[pleiotropicSNP[i]])
    }
    # Add GSMR statistcis
    #text(xmin+std, ymin+5*std , "GSMR", adj = c(0,0))
    #text(xmin+std, ymin+4*std , bquote(hat(beta)[xy] == .(b)), adj = c(0,0))
    #text(xmin+std, ymin+3*std , paste("SE =",se), adj = c(0,0))
    #text(xmin+std, ymin+2*std , bquote(P[xy] == .(p)), adj = c(0,0))


# ---- P-values in both GWAS
plot_gsmr_pvalue(gsmr_noHeidi, expo_str, outcome_str, effect_col=effect_col[pleiotropicSNP])



# ---- P-value of exposure against effect size of 
plot_bxy_distribution(gsmr_noHeidi, expo_str, outcome_str, effect_col="lightcoral")
mtext('Without HEIDI-outlier filtering', side=3, line=2, outer=F, cex=.8)
plot_bxy_distribution(gsmr_newHeidi, expo_str, outcome_str, effect_col="lightcoral")
mtext('With HEIDI-outlier filtering', side=3, line=2, outer=F, cex=.8)

—– bxy obtained without HEIDI-outlier method (i.e. including pleiotropic associations)
—– bxy obtained with HEIDI-outlier method (i.e. after removing pleiotropic associations)
Variants represented in yellow are pleiotropic associations identified, and removed, with the HEIDI-outlier method.

Before mtCOJO, we see a causal effect of BMI on vitamin D, but not of vitamin D on BMI (see below).


GSMR after mtCOJO

# Get GSMR results beofre mtCOJO
gsmr_noHeidi=gsmr_noHeidi_am
gsmr_newHeidi=gsmr_newHeidi_am

# Define exposure and outcome
expo_str="BMI"
outcome_str="BMI-conditioned-vitamin D"


# ##############################################
# Plot mtCOJO GSMR results - Effect of BMI on vitamin D
# ##############################################
par(mfrow=c(1, 4))
par(mar=c(4.5, 4.5, 4, 2))

# Summary of results
tmp=as.data.frame(rbind(gsmr_noHeidi[[2]],gsmr_newHeidi[[2]]))
tmp$Outcome=rep(c("Without HEIDI-outlier filtering","With HEIDI-outlier filtering"),each=2)
names(tmp)[grep("Outcome",names(tmp))]=""
tmp=kable(tmp[tmp$Exposure==expo_str,-1], row.names=F)
kable_styling(tmp, full_width=F, position = "left")
bxy se p n_snps
Without HEIDI-outlier filtering 0.0192748 0.00454385 2.21571e-05 1090
With HEIDI-outlier filtering 0.038672 0.00483553 1.27002e-15 966
# Define colors
effect_col=c("#9EC1C2","#FAC477")

# Identify pleiotropic SNPs
tmp=gsmr_noHeidi[[3]]
all_snps=tmp[tmp[,2]==1,1]
tmp=gsmr_newHeidi[[3]]
non_pleiotropic_snps=tmp[tmp[,2]==1,1]
pleiotropicSNP=factor(all_snps %in% non_pleiotropic_snps, levels=c("TRUE","FALSE"), labels=c("no","yes"))


# ---- Effect sizes in both GWAS
  # Define sumstats to plot
  resbuf = gsmr_snp_effect(gsmr_noHeidi, expo_str, outcome_str)
  bxy = resbuf$bxy
  bzx = resbuf$bzx; bzx_se = resbuf$bzx_se
  bzy = resbuf$bzy; bzy_se = resbuf$bzy_se
  tmp=gsmr_noHeidi[[2]]
  b=round(bxy,4)
  se=round(as.numeric(tmp[tmp[,1]==expo_str,4]), 4)
  p=format(as.numeric(tmp[tmp[,1]==expo_str,5]), digits=3)
  resbuf = gsmr_snp_effect(gsmr_newHeidi, expo_str, outcome_str)
  bxy2 = resbuf$bxy
  # Plot
    vals = c(bzx-bzx_se, bzx+bzx_se)
    xmin = min(vals); xmax = max(vals)
    vals = c(bzy-bzy_se, bzy+bzy_se)
    ymin = min(vals); ymax = max(vals)
    std = sd(vals)
    plot(bzx, bzy, pch=20, cex=0.8, bty="n", cex.axis=1.1, cex.lab=1.2,
         col=effect_col[pleiotropicSNP], xlim=c(xmin, xmax), ylim=c(ymin, ymax),
         xlab=substitute(paste(trait, " (", italic(b[zx]), ")", sep=""), list(trait=expo_str)),
         ylab=substitute(paste(trait, " (", italic(b[zy]), ")", sep=""), list(trait=outcome_str)))
    ## Regression line
    if(!is.na(bxy)) abline(0, bxy, lwd=1.5, lty=2, col=effect_col[2])
    if(!is.na(bxy)) abline(0, bxy2, lwd=1.5, lty=2, col=effect_col[1])
    ## Standard errors
    nsnps = length(bzx)
    for( i in 1:nsnps ) {
        # x axis
        xstart = bzx[i] - bzx_se[i]; xend = bzx[i] + bzx_se[i]
        ystart = bzy[i]; yend = bzy[i]
        segments(xstart, ystart, xend, yend, lwd=1.5, col=effect_col[pleiotropicSNP[i]])
        # y axis
        xstart = bzx[i]; xend = bzx[i] 
        ystart = bzy[i] - bzy_se[i]; yend = bzy[i] + bzy_se[i]
        segments(xstart, ystart, xend, yend, lwd=1.5, col=effect_col[pleiotropicSNP[i]])
    }

# ---- P-values in both GWAS
plot_gsmr_pvalue(gsmr_noHeidi, expo_str, outcome_str, effect_col=effect_col[pleiotropicSNP])



# ---- P-value of exposure against effect size of 
plot_bxy_distribution(gsmr_noHeidi, expo_str, outcome_str, effect_col="lightcoral")
mtext('Without HEIDI-outlier filtering', side=3, line=2, outer=F, cex=.8)
plot_bxy_distribution(gsmr_newHeidi, expo_str, outcome_str, effect_col="lightcoral")
mtext('With HEIDI-outlier filtering', side=3, line=2, outer=F, cex=.8)

—– bxy obtained without HEIDI-outlier method (i.e. including pleiotropic associations)
—– bxy obtained with HEIDI-outlier method (i.e. after removing pleiotropic associations)
Variants represented in yellow are pleiotropic associations identified, and removed, with the HEIDI-outlier method.

After mtCOJO the effect of BMI on vitamin D levels is greatly attenuated.


2SMR

# Load and format 2SMR results
df=read.table("results/fastGWA/twosmr/ukbBMI.2smr", h=T, stringsAsFactors=F, sep="\t")
df$heidi=gsub(".*\\.","",df$exposure)
df$exposure=gsub("\\..*","",df$exposure)
df=df[,c("exposure","outcome","method","heidi","nsnp","b","se","pval")]
df=df[df$exposure=="ukbBMI",]

## Round results
  df$pval=format(df$pval, digits=3)
  cols=grep("se|b",names(df)); df[,cols]=apply(df[,cols], 2, function(x){round(x, 4)})

# Present table
datatable(df, rownames=FALSE, 
          options=list(pageLength=5, 
                       dom='frtipB', 
                       buttons=c('csv', 'excel'), 
                       scrollX=TRUE), 
          caption="2SMR results for for BMi on 25OHD",  
          extensions=c('Buttons','FixedColumns')) %>% 
  formatStyle("pval","white-space"="nowrap") 




Effect of vitamin D on BMI

GSMR before mtCOJO

# Get GSMR results beofre mtCOJO
gsmr_noHeidi=gsmr_noHeidi_bm
gsmr_newHeidi=gsmr_newHeidi_bm

# Define exposure and outcome
expo_str="Vitamin D"
outcome_str="BMI"


# ##############################################
# Plot GSMR results - Effect of vitamin D on BMI
# ##############################################
par(mfrow=c(1, 4))
par(mar=c(4.5, 4.5, 4, 2))

# Summary of results
tmp=as.data.frame(rbind(gsmr_noHeidi[[2]],gsmr_newHeidi[[2]]), stringsAsFactors=F)
tmp$Outcome=rep(c("Without HEIDI-outlier filtering","With HEIDI-outlier filtering"),each=2)
tmp=tmp[tmp$Exposure==expo_str,-1]
  # Add results from pleiotropic SNPs
  names(gsmr_vitDPleioSNPs_bm)[grep("nsnp",names(gsmr_vitDPleioSNPs_bm))]="n_snps"
  gsmr_vitDPleioSNPs_bm$Outcome="Pleiotropic SNPs"
  gsmr_vitDPleioSNPs_bm$Exposure=NULL
  tmp=rbind(tmp,gsmr_vitDPleioSNPs_bm)
names(tmp)[grep("Outcome",names(tmp))]=""
tmp=kable(tmp, row.names=F)
kable_styling(tmp, full_width=F, position = "left")
bxy se p n_snps
Without HEIDI-outlier filtering -0.016529 0.00598459 0.00574623 277
With HEIDI-outlier filtering 0.00806448 0.00631405 0.201522 210
Pleiotropic SNPs -0.144461 0.0165637 2.74477e-18 67
# Define colors
effect_col=c("#9EC1C2","#FAC477")

# Identify pleiotropic SNPs
tmp=gsmr_noHeidi[[3]]
all_snps=tmp[tmp[,3]==1,1]
tmp=gsmr_newHeidi[[3]]
non_pleiotropic_snps=tmp[tmp[,3]==1,1]
pleiotropicSNP=factor(all_snps %in% non_pleiotropic_snps, levels=c("TRUE","FALSE"), labels=c("no","yes"))

  

# ---- Effect sizes in both GWAS
  # Define sumstats to plot
  resbuf = gsmr_snp_effect(gsmr_noHeidi, expo_str, outcome_str)
  bxy = resbuf$bxy
  bzx = resbuf$bzx; bzx_se = resbuf$bzx_se
  bzy = resbuf$bzy; bzy_se = resbuf$bzy_se
  tmp=gsmr_noHeidi[[2]]
  b=round(bxy,4)
  se=round(as.numeric(tmp[tmp[,1]==expo_str,4]), 4)
  p=format(as.numeric(tmp[tmp[,1]==expo_str,5]), digits=3)
  resbuf = gsmr_snp_effect(gsmr_newHeidi, expo_str, outcome_str)
  bxy2 = resbuf$bxy
  # Plot
    vals = c(bzx-bzx_se, bzx+bzx_se)
    xmin = min(vals); xmax = max(vals)
    vals = c(bzy-bzy_se, bzy+bzy_se)
    ymin = min(vals); ymax = max(vals)
    std = sd(vals)
    plot(bzx, bzy, pch=20, cex=0.8, bty="n", cex.axis=1.1, cex.lab=1.2,
         col=effect_col[pleiotropicSNP], xlim=c(xmin, xmax), ylim=c(ymin, ymax),
         xlab=substitute(paste(trait, " (", italic(b[zx]), ")", sep=""), list(trait=expo_str)),
         ylab=substitute(paste(trait, " (", italic(b[zy]), ")", sep=""), list(trait=outcome_str)))
    ## Regression line
    if(!is.na(bxy)) abline(0, bxy, lwd=1.5, lty=2, col=effect_col[2])
    if(!is.na(bxy)) abline(0, bxy2, lwd=1.5, lty=2, col=effect_col[1])
    ## Standard errors
    nsnps = length(bzx)
    for( i in 1:nsnps ) {
        # x axis
        xstart = bzx[i] - bzx_se[i]; xend = bzx[i] + bzx_se[i]
        ystart = bzy[i]; yend = bzy[i]
        segments(xstart, ystart, xend, yend, lwd=1.5, col=effect_col[pleiotropicSNP[i]])
        # y axis
        xstart = bzx[i]; xend = bzx[i] 
        ystart = bzy[i] - bzy_se[i]; yend = bzy[i] + bzy_se[i]
        segments(xstart, ystart, xend, yend, lwd=1.5, col=effect_col[pleiotropicSNP[i]])
    }


# ---- P-values in both GWAS
plot_gsmr_pvalue(gsmr_noHeidi, expo_str, outcome_str, effect_col=effect_col[pleiotropicSNP])



# ---- P-value of exposure against effect size of 
plot_bxy_distribution(gsmr_noHeidi, expo_str, outcome_str, effect_col="lightcoral")
mtext('Without HEIDI-outlier filtering', side=3, line=2, outer=F, cex=.8)
plot_bxy_distribution(gsmr_newHeidi, expo_str, outcome_str, effect_col="lightcoral")
mtext('With HEIDI-outlier filtering', side=3, line=2, outer=F, cex=.8)

—– bxy obtained without HEIDI-outlier method (i.e. including pleiotropic associations)
—– bxy obtained with HEIDI-outlier method (i.e. after removing pleiotropic associations)
Variants represented in yellow are pleiotropic associations identified, and removed, with the HEIDI-outlier method.




GSMR after mtCOJO

# Get GSMR results beofre mtCOJO
gsmr_noHeidi=gsmr_noHeidi_am
gsmr_newHeidi=gsmr_newHeidi_am

# Define exposure and outcome
expo_str="BMI-conditioned-vitamin D"
outcome_str="BMI"


# ##############################################
# Plot mtCOJO GSMR results - Effect of vitamin D on BMI
# ##############################################
par(mfrow=c(1, 4))
par(mar=c(4.5, 4.5, 4, 2))

# Summary of results
tmp=as.data.frame(rbind(gsmr_noHeidi[[2]],gsmr_newHeidi[[2]]))
tmp$Outcome=rep(c("Without HEIDI-outlier filtering","With HEIDI-outlier filtering"),each=2)
names(tmp)[grep("Outcome",names(tmp))]=""
tmp=kable(tmp[tmp$Exposure==expo_str,-1], row.names=F)
kable_styling(tmp, full_width=F, position = "left")
bxy se p n_snps
Without HEIDI-outlier filtering 0.0214569 0.00594841 0.000309563 285
With HEIDI-outlier filtering 0.0143795 0.00624067 0.0212141 221
# Define colors
effect_col=c("#9EC1C2","#FAC477")

# Identify pleiotropic SNPs
tmp=gsmr_noHeidi[[3]]
all_snps=tmp[tmp[,3]==1,1]
tmp=gsmr_newHeidi[[3]]
non_pleiotropic_snps=tmp[tmp[,3]==1,1]
pleiotropicSNP=factor(all_snps %in% non_pleiotropic_snps, levels=c("TRUE","FALSE"), labels=c("no","yes"))

  

# ---- Effect sizes in both GWAS
  # Define sumstats to plot
  resbuf = gsmr_snp_effect(gsmr_noHeidi, expo_str, outcome_str)
  bxy = resbuf$bxy
  bzx = resbuf$bzx; bzx_se = resbuf$bzx_se
  bzy = resbuf$bzy; bzy_se = resbuf$bzy_se
  tmp=gsmr_noHeidi[[2]]
  b=round(bxy,4)
  se=round(as.numeric(tmp[tmp[,1]==expo_str,4]), 4)
  p=format(as.numeric(tmp[tmp[,1]==expo_str,5]), digits=3)
  resbuf = gsmr_snp_effect(gsmr_newHeidi, expo_str, outcome_str)
  bxy2 = resbuf$bxy
  # Plot
    vals = c(bzx-bzx_se, bzx+bzx_se)
    xmin = min(vals); xmax = max(vals)
    vals = c(bzy-bzy_se, bzy+bzy_se)
    ymin = min(vals); ymax = max(vals)
    std = sd(vals)
    plot(bzx, bzy, pch=20, cex=0.8, bty="n", cex.axis=1.1, cex.lab=1.2,
         col=effect_col[pleiotropicSNP], xlim=c(xmin, xmax), ylim=c(ymin, ymax),
         xlab=substitute(paste(trait, " (", italic(b[zx]), ")", sep=""), list(trait=expo_str)),
         ylab=substitute(paste(trait, " (", italic(b[zy]), ")", sep=""), list(trait=outcome_str)))
    ## Regression line
    if(!is.na(bxy)) abline(0, bxy, lwd=1.5, lty=2, col=effect_col[2])
    if(!is.na(bxy)) abline(0, bxy2, lwd=1.5, lty=2, col=effect_col[1])
    ## Standard errors
    nsnps = length(bzx)
    for( i in 1:nsnps ) {
        # x axis
        xstart = bzx[i] - bzx_se[i]; xend = bzx[i] + bzx_se[i]
        ystart = bzy[i]; yend = bzy[i]
        segments(xstart, ystart, xend, yend, lwd=1.5, col=effect_col[pleiotropicSNP[i]])
        # y axis
        xstart = bzx[i]; xend = bzx[i] 
        ystart = bzy[i] - bzy_se[i]; yend = bzy[i] + bzy_se[i]
        segments(xstart, ystart, xend, yend, lwd=1.5, col=effect_col[pleiotropicSNP[i]])
    }


# ---- P-values in both GWAS
plot_gsmr_pvalue(gsmr_noHeidi, expo_str, outcome_str, effect_col=effect_col[pleiotropicSNP])



# ---- P-value of exposure against effect size of 
plot_bxy_distribution(gsmr_noHeidi, expo_str, outcome_str, effect_col="lightcoral")
mtext('Without HEIDI-outlier filtering', side=3, line=2, outer=F, cex=.8)
plot_bxy_distribution(gsmr_newHeidi, expo_str, outcome_str, effect_col="lightcoral")
mtext('With HEIDI-outlier filtering', side=3, line=2, outer=F, cex=.8)

—– bxy obtained without HEIDI-outlier method (i.e. including pleiotropic associations)
—– bxy obtained with HEIDI-outlier method (i.e. after removing pleiotropic associations)
Variants represented in yellow are pleiotropic associations identified, and removed, with the HEIDI-outlier method.






2SMR

# Load and format 2SMR results
df=read.table("results/fastGWA/twosmr/ukbBMI.2smr", h=T, stringsAsFactors=F, sep="\t")
df$heidi=gsub(".*\\.","",df$exposure)
df$exposure=gsub("\\..*","",df$exposure)
df=df[,c("exposure","outcome","method","heidi","nsnp","b","se","pval")]
df=df[df$exposure=="vitD",]

## Round results
  df$pval=format(df$pval, digits=3)
  cols=grep("se|b",names(df)); df[,cols]=apply(df[,cols], 2, function(x){round(x, 4)})

# Present table
datatable(df, rownames=FALSE, 
          options=list(pageLength=5, 
                       dom='frtipB', 
                       buttons=c('csv', 'excel'), 
                       scrollX=TRUE), 
          caption="2SMR results for for BMi on 25OHD",  
          extensions=c('Buttons','FixedColumns')) %>% 
  formatStyle("pval","white-space"="nowrap") 






BMI-EA-25OHD

# GSMR between EA and 25OHD
  ## No HEIDI
  df=read.table("results/fastGWA/gsmr/vitD_fastGWA_withX.EA2018.gsmr_with20kUKB_noHeidi.gsmr", h=T, stringsAsFactors=F)
  df$heidi="No HEIDI"
  ## With HEIDI
  tmp=read.table("results/fastGWA/gsmr/vitD_fastGWA_withX.EA2018.gsmr_with20kUKB_2heidi.gsmr", h=T, stringsAsFactors=F)
  tmp$heidi="With HEIDI"
  df=rbind(df,tmp[,names(df)])
  assign("ea_vitD_gsmr", df)
  ## Present table
  datatable(df, 
          rownames=FALSE, 
          extensions=c('Buttons','FixedColumns'),
          options=list(dom='frtipB',
                       buttons=c('csv', 'excel'),
                       scrollX=TRUE),
          caption="Bi-directional GSMR associations between vitamin D (25OHD) and Educational attainment (EA)")
# GSMR between EA and BMI
  ## No HEIDI
  df=read.table("results/fastGWA/gsmr/ukbBMI.EA2018.gsmr_with20kUKB_noHeidi.gsmr", h=T, stringsAsFactors=F)
  df$heidi="No HEIDI"
  ## With HEIDI
  tmp=read.table("results/fastGWA/gsmr/ukbBMI.EA2018.gsmr_with20kUKB_2heidi.gsmr", h=T, stringsAsFactors=F)
  tmp$heidi="With HEIDI"
  df=rbind(df,tmp[,names(df)])
    ## Present table
  datatable(df, 
          rownames=FALSE, 
          extensions=c('Buttons','FixedColumns'),
          options=list(dom='frtipB',
                       buttons=c('csv', 'excel'),
                       scrollX=TRUE),
          caption="Bi-directional GSMR associations between Educational attainment (EA) and BMI")








25OHD conditioned on BMI (mtCOJO)

# ########################
# Summarise mtCOJO results
# ########################

# mtCOJO (vitamin D condition on BMI)
#mtcojo=read.table("results/fastGWA/mtcojo.vitD_fastGWA_condition_on_ukbBMI.mtcojo.cma",h=T,stringsAsFactors = F)
#mtcojo=merge(mtcojo,vitD[,c("SNP","CHR","BP")])
#saveRDS(mtcojo, file="results/fastGWA/mtcojo.vitD_fastGWA_condition_on_ukbBMI.mtcojo.RDS")
mtcojo=readRDS("results/fastGWA/mtcojo.vitD_fastGWA_condition_on_ukbBMI.mtcojo.RDS")
cojo_vitD_BMIcond=read.table("results/fastGWA/mtcojo.vitD_fastGWA_condition_on_ukbBMI.cojo", h=T, stringsAsFactors=F)

Summary of results

After conditioning the vitamin D GWAS results on BMI (with mtCOJO), we have:

  • 7250403 variants tested – none in the X chromosome, as the BMI GWAS did not have summary statistics on the X chromosome
  • 16012 genome-wide significant (GWS; P<5x10-8) hits
  • 155 independent variants (determined with COJO)
  • 147 independent variants (determined with COJO) that were GWS in the original GWAS

# ################################################
# Present list of independent associations (identified with COJO)
# ################################################
indep = cojo_vitD_BMIcond %>% arrange(Chr, bp)

# Format table
indep=indep[,c("Chr","SNP","bp","refA","freq","b","se","p","n","freq_geno","bJ","bJ_se","pJ")]
indep$p=format(indep$p, digits=3)
indep$pJ=format(indep$pJ, digits=3)
  options(warn=-1)
indep=indep %>% mutate_at(vars(se, bJ_se, b, bJ), funs(round(., 4)))
  options(warn=0)
indep=indep %>% mutate_at(vars(freq, freq_geno), funs(round(., 2)))


# Present table
datatable(indep, 
          rownames=FALSE, 
          extensions=c('Buttons','FixedColumns'),
          options=list(pageLength=5,
                       dom='frtipB',
                       buttons=c('csv', 'excel'),
                       scrollX=TRUE),
          caption="List of independent associations") %>% 
  formatStyle("p","white-space"="nowrap") %>% 
  formatStyle("pJ","white-space"="nowrap")

Columns are: Chr, chromosome; SNP, SNP rs ID; bp, physical position; refA, the effect allele; type, type of QTL; freq, frequency of the effect allele in the original data; b, se and p, effect size, standard error and p-value from the original GWAS; n, estimated effective sample size; freq_geno, frequency of the effect allele in the reference sample; bJ, bJ_se and pJ, effect size, standard error and p-value from a joint analysis of all the selected SNPs.






Comparison of GWAS with different BMI adjustments

Manhattan plots

# ########################
# Load GWAS results
# ########################
# vitamin D
#gwas=read.table("results/fastGWA/vitD_fastGWA_withX.gz", h=T, stringsAsFactors=F)
#saveRDS(gwas, file="results/fastGWA/vitD_fastGWA_withX.RDS")
vitD=readRDS("results/fastGWA/vitD_fastGWA_withX.RDS")
names(vitD)[grep("POS",names(vitD))]="BP"
  
# BMI
#bmi=read.table("input/gwas_sumstats/0_original/ukbEUR_BMI_common.txt", h=T, stringsAsFactors=F)
#saveRDS(bmi, file="input/gwas_sumstats/0_original/ukbEUR_BMI_common.RDS")
bmi=readRDS("input/gwas_sumstats/0_original/ukbEUR_BMI_common.RDS")

# Vitamin D with BMI as covariate
#vitD_BMIcov=read.table("results/fastGWA/vitD_BMIcov_fastGWA_withX.gz", h=T, stringsAsFactors=F)
#saveRDS(vitD_BMIcov, file="results/fastGWA/vitD_BMIcov_fastGWA_withX.RDS")
vitD_BMIcov=readRDS("results/fastGWA/vitD_BMIcov_fastGWA_withX.RDS")
names(vitD_BMIcov)[grep("POS",names(vitD_BMIcov))]="BP"

# ########################
# Manhattan plots
# ########################
par(mfrow=c(2, 2))

p=0.01 # P-value threhold from which to plot everything or just a sample (e.g. plot all SNPs with P<0.03 and a sample of 100K with P>0.03)

# mtCOJO
  tmp=mtcojo
  tmp$P=tmp$bC_pval
  m=nrow(tmp)
  # Remove SNPs with P=0
  tmp=tmp[tmp$P!=0,]
  # Select sample of SNPs to plot faster
  snps=c(tmp[tmp$P<p,"SNP"], sample(tmp[tmp$P>p,"SNP"],10000))
  tmp=tmp[tmp$SNP %in% snps,]
  # Plot
  manhattan(tmp, col=c("#C9D9D9","#609DA0"),ylim=c(0, 50))
  mtext('Vitamin D conditioned on BMI (mtCOJO)', side=3, line=2.5, outer=F, cex=.9)
  mtext(paste('M =',m,"SNPs"), side=3, line=1.5, outer=F, cex=.9)
  
# Vitamin D GWAS restricted to variants in the BMI GWAS
  tmp=vitD[vitD$SNP %in% mtcojo$SNP,]
  m=nrow(tmp)
  # Remove SNPs with P=0
  tmp=tmp[tmp$P!=0,]
  # Select sample of SNPs to plot faster
  snps=c(tmp[tmp$P<p,"SNP"], sample(tmp[tmp$P>p,"SNP"],50000))
  tmp=tmp[tmp$SNP %in% snps,]
  # Plot
  manhattan(tmp, col=c("#C9D9D9","#609DA0"),ylim=c(0, 50))
  mtext('Vitamin D GWAS restricted to variants in the BMI GWAS', side=3, line=2.5, outer=F, cex=.9)
  mtext(paste('M =',m,"SNPs"), side=3, line=1.5, outer=F, cex=.9)
  
# Original vitamin D GWAS
  tmp=vitD
  m=nrow(tmp)
  # Remove SNPs with P=0
  tmp=tmp[tmp$P!=0,]
  # Select sample of SNPs to plot faster
  snps=c(tmp[tmp$P<p,"SNP"], sample(tmp[tmp$P>p,"SNP"],50000))
  tmp=tmp[tmp$SNP %in% snps,]
  # Plot
  manhattan(tmp, col=c("#C9D9D9","#609DA0"),ylim=c(0, 50))
  mtext('Original vitamin D GWAS', side=3, line=2.5, outer=F, cex=.9)
  mtext(paste('M =',m,"SNPs"), side=3, line=1.5, outer=F, cex=.9)

# Vitamin D GWAS with BMI as covariate
  tmp=vitD_BMIcov
  m=nrow(tmp)
  # Remove SNPs with P=0
  tmp=tmp[tmp$P!=0,]
# Select sample of SNPs to plot faster
  snps=c(tmp[tmp$P<p,"SNP"], sample(tmp[tmp$P>p,"SNP"],50000))
  tmp=tmp[tmp$SNP %in% snps,]
  # Plot
  manhattan(tmp, col=c("#C9D9D9","#609DA0"),ylim=c(0, 50))
  mtext('Vitamin D GWAS with BMI as covariate', side=3, line=2.5, outer=F, cex=.9)
  mtext(paste('M =',m,"SNPs"), side=3, line=1.5, outer=F, cex=.9)

Note: The Y-axis on the Manhattan plots above has been restricted to P=10-50.

Comparison of P-values

# GWS SNPs in vitamin D GWAS
tmp=mtcojo[mtcojo$p!=0 & mtcojo$bC_pval!=0,]
tmp=tmp[tmp$p<5e-8,]
#p1=
  ggplot(mtcojo, aes(-log10(p), -log10(bC_pval))) +
  geom_point(color="#609DA0") +
  geom_smooth(method="lm", color="#C9D9D9") +
  labs(x=expression("Vitamin D (-log"[10]*"P)"), 
       y=expression("Vitamin D conditioned on BMI (-log"[10]*"P)"),
       title="GWS SNPs in vitamin D GWAS",
       subtitle=paste("M =",nrow(tmp),"SNPs"))
## `geom_smooth()` using formula 'y ~ x'

# GWS SNPs in mtCOJO
tmp=mtcojo[mtcojo$p!=0 & mtcojo$bC_pval!=0,]
tmp=tmp[tmp$bC_pval<5e-8,]
#p2=
  ggplot(mtcojo, aes(-log10(p), -log10(bC_pval))) +
  geom_point(color="#609DA0") +
  geom_smooth(method="lm", color="#C9D9D9") +
  labs(x=expression("Vitamin D (-log"[10]*"P)"), 
       y=expression("Vitamin D conditioned on BMI (-log"[10]*"P)"),
       title="GWS SNPs in mtCOJO",
       subtitle=paste("M =",nrow(tmp),"SNPs"))
## `geom_smooth()` using formula 'y ~ x'

#grid.arrange(p1, p2, ncol=2)



Comparison of effect sizes

# GWS SNPs in vitamin D GWAS
tmp=mtcojo[mtcojo$p!=0 & mtcojo$bC_pval!=0,]
tmp=tmp[tmp$p<5e-8,]

  # Association between effect sizes
  test=lm(tmp$b~tmp$bC)
  beta=round(summary(test)$coefficients[2,1],4)
  p=summary(test)$coefficients[2,4]
  
 
  # Plot effect sizes
  plot(tmp$b, tmp$bC, pch=20, cex=0.8, cex.axis=1.1, cex.lab=1.2, col="lightcoral",
       xlab="Viatmin D", ylab="Vitamin D conditioned on BMI",
       main="Effect size estimates (and SE) for SNPs with\n GWS association in vitamin D GWAS")
        # Add SE
        nsnps = nrow(tmp)
        for( i in 1:nsnps ) {
          # x axis
          xstart = tmp[i,"b"] - tmp[i,"se"]; xend = tmp[i,"b"] + tmp[i,"se"]
          ystart = tmp[i,"bC"]; yend = tmp[i,"bC"]
          segments(xstart, ystart, xend, yend, lwd=1.5, col="lightcoral")
          # y axis
          xstart = tmp[i,"b"]; xend = tmp[i,"b"]
          ystart = tmp[i,"bC"] - tmp[i,"bC_se"]; yend = tmp[i,"bC"] + tmp[i,"bC_se"]
          segments(xstart, ystart, xend, yend, lwd=1.5, col="lightcoral")
        }
        # Add regression line and statistics
        abline(test)
        text(min(tmp$b), max(tmp$bC), adj=c(0,1), label=paste0("b=",beta))
        text(min(tmp$b), max(tmp$bC)-sd(tmp$bC), adj=c(0,1), label=paste0("p=",p))



Comparison of GWAS sumstats

There were:

  • 18864 GWS associations in the unadjusted vitamin D GWAS
    • 1262 also significant in the BMI GWAS
  • 7250403 variants with results after mtCOJO. Of these:
    • 15150 were GWS before mtCOJO
      • 14142 remained significant after mtCOJO
      • 1008 no longer significant after mtCOJO
    • 16012 were GWS after mtCOJO
      • 1870 not significant before mtCOJO


# ###############################################
# Venn diagram - COJO with overlapping SNPs only
# ###############################################

# What COJO variants in one analysis were also COJO in the remaining two?
  ## Read all COJO results
  cojo_vitD=read.table("results/fastGWA/bmi_corrections/vitD.cojo", h=T, stringsAsFactors=F)
  cojo_vitD_BMIcov=read.table("results/fastGWA/bmi_corrections/vitD_BMIcov.cojo", h=T, stringsAsFactors=F)
  cojo_vitD_BMIcond=read.table("results/fastGWA/bmi_corrections/vitD_BMIcond.cojo", h=T, stringsAsFactors=F)
  
## Generate lists of COJO and GWS SNPs
  cojo_vitD=cojo_vitD$SNP
  cojo_vitD_BMIcov=cojo_vitD_BMIcov$SNP
  cojo_vitD_BMIcond=cojo_vitD_BMIcond$SNP

## COJO sets to loop through
  snpSet=c("cojo_vitD","cojo_vitD_BMIcov","cojo_vitD_BMIcond")
  snpSetNames=data.frame(snpSet=snpSet,
                          snpSetNames=factor(1:3,
                                              labels=c(paste0("COJO variants in\nVitamin D GWAS\n(N=",length(cojo_vitD),")"),
                                                       paste0("COJO variants in\nVitamin D GWAS with BMI cov\n(N=",length(cojo_vitD_BMIcov),")"),
                                                       paste0("COJO variants in\nVitamin D GWAS conditioned on BMI\n(N=",length(cojo_vitD_BMIcond),")"))))
## Position of Venn diagram circles
  df.venn=data.frame(x=c(-0.866, 0.866, 0),
                    y=c(-0.5, -0.5, 1),
                     labels=factor(c("cojo_vitD","cojo_vitD_BMIcov","cojo_vitD_BMIcond"),
                                   levels=c("cojo_vitD","cojo_vitD_BMIcov","cojo_vitD_BMIcond"),
                                   labels=c("Vitamin D GWAS","Vitamin D GWAS with BMI covariate","Vitamin D GWAS conditioned on BMI (mtCOJO)")),
                   snpSet=rep(snpSet, each=3))
  df.venn=merge(df.venn,snpSetNames)

## Counts for each circle in each COJO set
  df.vdc=c()
  for(i in snpSet)
  {
    cojoSNPs=get(i)
    tmp=data.frame(cojo_vitD=c(cojoSNPs %in% cojo_vitD),
                   cojo_vitD_BMIcov=c(cojoSNPs %in% cojo_vitD_BMIcov),
                   cojo_vitD_BMIcond=c(cojoSNPs %in% cojo_vitD_BMIcond))
    vdc=vennCounts(tmp)
    class(vdc)='matrix'
    vdc=as.data.frame(vdc)[-1,]
    vdc$x=c(0, 1.2, 0.8, -1.2, -0.8, 0, 0)
    vdc$y=c(1.2, -0.6, 0.5, -0.6, 0.5, -1, 0)
    vdc$snpSet=i
    df.vdc=rbind(df.vdc,vdc)
  }
  df.vdc=merge(df.vdc,snpSetNames)
  
  ggplot(df.venn) +
    geom_circle(aes(x0=x, y0=y, r=1.5, fill=labels), alpha=.3, size=1, colour='grey') +
    facet_wrap(.~snpSetNames) +
    coord_fixed() +
    theme_void() +
    theme(legend.position='bottom',
          strip.text=element_text(face="bold")) +
    scale_fill_manual(values=c('cornflowerblue', 'firebrick',  'gold'),
                      name="Number of COJO variants in:") +
    scale_colour_manual(values=c('cornflowerblue', 'firebrick', 'gold'), guide=F) +
    labs(fill=NULL) +
    geom_text(data=df.vdc, mapping=aes(x=x, y=y, label=Counts))

# ###############################################
# Venn diagram - overlap of independent associations
# ###############################################

# What COJO variants in one analysis were GWS in the remaining two?
  ## Read all COJO results
  cojo_vitD=read.table("results/fastGWA/vitD_fastGWA_maf0.01_withX.cojo", h=T, stringsAsFactors=F)
  cojo_vitD_BMIcov=read.table("results/fastGWA/vitD_BMIcov_fastGWA_maf0.01_withX.cojo", h=T, stringsAsFactors=F)
  cojo_vitD_BMIcond=read.table("results/fastGWA/mtcojo.vitD_fastGWA_condition_on_ukbBMI.cojo", h=T, stringsAsFactors=F)
  
## Generate lists of COJO and GWS SNPs
  cojo_vitD=cojo_vitD$SNP
  cojo_vitD_BMIcov=cojo_vitD_BMIcov$SNP
  cojo_vitD_BMIcond=cojo_vitD_BMIcond$SNP
  gws_vitD=vitD[vitD$P<5e-8,"SNP"]
  gws_vitD_BMIcov=vitD_BMIcov[vitD_BMIcov$P<5e-8,"SNP"]
  gws_vitD_BMIcond=mtcojo[mtcojo$bC_pval<5e-8,"SNP"]
  gws_BMI=bmi[bmi$P<5e-8,"SNP"]

## COJO sets to loop through
  snpSet=c("cojo_vitD","cojo_vitD_BMIcov","cojo_vitD_BMIcond")
  snpSetNames=data.frame(snpSet=snpSet,
                          snpSetNames=factor(1:3,
                                              labels=c(paste0("COJO variants in\nVitamin D GWAS\n(N=",length(cojo_vitD),")"),
                                                       paste0("COJO variants in\nVitamin D GWAS with BMI cov\n(N=",length(cojo_vitD_BMIcov),")"),
                                                       paste0("COJO variants in\nVitamin D GWAS conditioned on BMI\n(N=",length(cojo_vitD_BMIcond),")"))))
## Position of Venn diagram circles
  df.venn=data.frame(x=c(-0.866, 0.866, 0),
                    y=c(-0.5, -0.5, 1),
                     labels=factor(c("gws_vitD","gws_vitD_BMIcov","gws_vitD_BMIcond"),
                                   levels=c("gws_vitD","gws_vitD_BMIcov","gws_vitD_BMIcond"),
                                   labels=c("Vitamin D GWAS","Vitamin D GWAS with BMI covariate","Vitamin D GWAS conditioned on BMI (mtCOJO)")),
                   snpSet=rep(snpSet, each=3))
  df.venn=merge(df.venn,snpSetNames)

## Counts for each circle in each COJO set
  df.vdc=c()
  for(i in snpSet)
  {
    cojoSNPs=get(i)
    tmp=data.frame(gws_vitD=c(cojoSNPs %in% gws_vitD),
                   gws_vitD_BMIcov=c(cojoSNPs %in% gws_vitD_BMIcov),
                   gws_vitD_BMIcond=c(cojoSNPs %in% gws_vitD_BMIcond))
    vdc=vennCounts(tmp)
    class(vdc)='matrix'
    vdc=as.data.frame(vdc)[-1,]
    vdc$x=c(0, 1.2, 0.8, -1.2, -0.8, 0, 0)
    vdc$y=c(1.2, -0.6, 0.5, -0.6, 0.5, -1, 0)
    vdc$snpSet=i
    df.vdc=rbind(df.vdc,vdc)
  }
  df.vdc=merge(df.vdc,snpSetNames)
  
#  ggplot(df.venn) +
#    geom_circle(aes(x0=x, y0=y, r=1.5, fill=labels), alpha=.3, size=1, colour='grey') +
#    facet_wrap(.~snpSetNames) +
#    coord_fixed() +
#    theme_void() +
#    theme(legend.position='bottom',
#          strip.text=element_text(face="bold")) +
#    scale_fill_manual(values=c('cornflowerblue', 'firebrick',  'gold'),
#                      name="Number of GWS variants in:") +
#    scale_colour_manual(values=c('cornflowerblue', 'firebrick', 'gold'), guide=F) +
#    labs(fill=NULL) +
#    geom_text(data=df.vdc, mapping=aes(x=x, y=y, label=Counts))
#The plots below show the number of variants that were identified as independent and jointly significant in each analysis (identified with COJO) and how many of those were GWS in each analysis. Note that not all COJO variants are GWS, even in the analysis where they were identified.
  
# ###############################################
# Venn diagram - overlap of GWS associations
# ###############################################
  
## GWS sets to loop through
  snpSet=c("gws_vitD","gws_vitD_BMIcov","gws_vitD_BMIcond")
  snpSetNames=data.frame(snpSet=snpSet,
                          snpSetNames=factor(1:3,
                                              labels=c(paste0("GWS variants in\nVitamin D GWAS\n(N=",length(gws_vitD),")"),
                                                       paste0("GWS variants in\nVitamin D GWAS with BMI cov\n(N=",length(gws_vitD_BMIcov),")"),
                                                       paste0("GWS variants in\nVitamin D GWAS conditioned on BMI\n(N=",length(gws_vitD_BMIcond),")"))))
## Position of Venn diagram circles
  df.venn=data.frame(x=c(-0.866, 0.866, 0),
                    y=c(-0.5, -0.5, 1),
                     labels=factor(c("gws_vitD","gws_vitD_BMIcov","gws_vitD_BMIcond"),
                                   levels=c("gws_vitD","gws_vitD_BMIcov","gws_vitD_BMIcond"),
                                   labels=c("Vitamin D GWAS","Vitamin D GWAS with BMI covariate","Vitamin D GWAS conditioned on BMI (mtCOJO)")),
                   snpSet=rep(snpSet, each=3))
  df.venn=merge(df.venn,snpSetNames)

## Counts for each circle in each COJO set
  df.vdc=c()
  for(i in snpSet)
  {
    cojoSNPs=get(i)
    tmp=data.frame(gws_vitD=c(cojoSNPs %in% gws_vitD),
                   gws_vitD_BMIcov=c(cojoSNPs %in% gws_vitD_BMIcov),
                   gws_vitD_BMIcond=c(cojoSNPs %in% gws_vitD_BMIcond))
    vdc=vennCounts(tmp)
    class(vdc)='matrix'
    vdc=as.data.frame(vdc)[-1,]
    vdc$x=c(0, 1.2, 0.8, -1.2, -0.8, 0, 0)
    vdc$y=c(1.2, -0.6, 0.5, -0.6, 0.5, -1, 0)
    vdc$snpSet=i
    df.vdc=rbind(df.vdc,vdc)
  }
  df.vdc=merge(df.vdc,snpSetNames)
  
#  ggplot(df.venn) +
#    geom_circle(aes(x0=x, y0=y, r=1.5, fill=labels), alpha=.3, size=1, colour='grey') +
#    facet_wrap(.~snpSetNames) +
#    coord_fixed() +
#    theme_void() +
#    theme(legend.position='bottom',
#          strip.text=element_text(face="bold")) +
#    scale_fill_manual(values=c('cornflowerblue', 'firebrick',  'gold'),
#                      name="Number of GWS variants in:") +
#    scale_colour_manual(values=c('cornflowerblue', 'firebrick', 'gold'), guide=F) +
#    labs(fill=NULL) +
#    geom_text(data=df.vdc, mapping=aes(x=x, y=y, label=Counts))



  
# ###############################################
# Variants that were GWS before and/or after mtCOJO
# ###############################################

# Generate data frame with all GWAS results
  df=mtcojo[mtcojo$p<5e-8 | mtcojo$bC_pval<5e-8,c("SNP","CHR","BP","A1","A2","b","se","p","bC","bC_se","bC_pval")]
  names(df)=c("SNP","CHR","BP","A1","A2",paste0(c("b","se","p"),rep(c("_vitD","_vitD_BMIcond"), each=3)))

  ## Add BMI results for those SNPs
  suffix="_bmi"
  tmp=bmi[bmi$SNP %in% df$SNP,c("SNP","A1","A2","b","se","P")]
  names(tmp)=c("SNP","a1","a2",paste0(c("b","se","p"),suffix))
  df=merge(df,tmp, all=T)
  df[df$A1==df$a2,paste0("b",suffix)]=df[df$A1==df$a2,paste0("b",suffix)]*-1
  df[,c("a1","a2")]=NULL

  ## Add vitD obtained with BMI as covariate
  suffix="_vitD_BMIcov"
  tmp=vitD_BMIcov[vitD_BMIcov$SNP %in% df$SNP,c("SNP","A1","A2","BETA","SE","P")]
  names(tmp)=c("SNP","a1","a2",paste0(c("b","se","p"),suffix))
  df=merge(df,tmp)
  df[df$A1==df$a2,paste0("b",suffix)]=df[df$A1==df$a2,paste0("b",suffix)]*-1
  df[,c("a1","a2")]=NULL

# Re-arranje columns
df=df[,c("SNP","CHR","BP","A1","A2",paste0(rep(c("b","se","p"),each=4),c("_vitD","_vitD_BMIcov","_vitD_BMIcond","_bmi")))]
df=df[order(df$p_vitD, df$p_vitD_BMIcond),]
  
# Round numbers 
  # P-values
  cols=grep("p_",names(df))
  df[,cols]=apply(df[,cols], 2, function(x){format(x, digits=3)})
  # SE and beta
  cols=grep("se_|b_",names(df))
  df[,cols]=apply(df[,cols], 2, function(x){round(x, 4)})

# Present table
datatable(df, rownames=FALSE, 
          options=list(pageLength=10,
                       dom='frtipB', 
                       buttons=c('csv', 'excel'), 
                       scrollX=TRUE),
          caption="Variants that were GWS before and/or after mtCOJO",
          
          extensions=c('Buttons','FixedColumns'))