Thursday, March 18, 2010

Create annotated GWAS manhattan plots using ggplot2 in R

*** Update April 25, 2011: This code has gone through a major revision. Please see the updated code and tutorial here. ***

 A few months ago I showed you in this post how to use some code I wrote to produce manhattan plots in R using ggplot2. The qqman() function I described in the previous post actually calls another function, manhattan(), which has a few options you can set. I recently had to update this function to allow me to color code SNPs of interest, similar to the plots shown in figure 1 of Cristen Willer's 2008 Nature Genetics paper on lipids. I'll try to explain how to utilize that feature here.

The only extra thing you'll need here is a list of SNPs that you want to highlight. The only thing - that list of SNPs can't have the "rs" prefix on the rs numbers. They must be integers. E.g. if you want to highlight rs1234 and rs5678, you would create an array containing the integers 1234 and 5678. If you already have a list of SNPs, use the substr() command to perform a substring operation to extract only the digits from the rs numbers.

Once you load in your PLINK results and your array containing the rs numbers you want to highlight, simply call the manhattan() function with the option annotate=T, and SNPlist=x, where x is the name of the vector containing rs numbers.

Here's some example code:

# This requires ggplot2
require(ggplot2)

# First, load these functions from source:
source("http://dl.dropbox.com/u/66281/0_Permanent/qqman.r")

# Next, load your PLINK results file to a data frame:
mydata=read.table("plink.qassoc", header=TRUE)

# Assuming you already have a vector of rs numbers to highlight
head(ImportantSNPs)
[1] 3821815 1851665 1621816 1403694 1656922  166479

# Call the manhattan function, with annotate=T.
# The SNPlist argument takes the list of SNPs to highlight.
# Save the plot to an object
myplot=manhattan(mydata,annotate=T,SNPlist=ImportantSNPs)

# Finally, save the plot in the current directory using ggsave()
ggsave("manhattan.png",myplot,w=12,h=9,dpi=100)

If all goes well, you should have a manhattan plot with SNPs of interest highlighted. It might look something like this:

A few tips: You can use the UCSC genome browser to look up coordinates for genes, then select rs numbers based on that range, if you want to highlight certain genes. The default color is green but you can change this on line 118 of the code at the URL above.

**** UPDATE, May 15 2014 *****
The functions described here have now been wrapped into an R package. View the updated blog post or see the online package vignette for how to install and use. If you'd still like to use the old code described here, you can access this at version 0.0.0 on GitHub. The code below likely won't work.
*****************************

13 comments:

  1. You may want to pass alpha argument in order to avoid overploting!

    ReplyDelete
  2. Not sure if you've encountered Galaxy (http://usegalaxy.org) but I'm about to push a bunch of WGA/SNP tools to test including a manhattan/qq plotter for p values from WGA models - based on your code - thanks!
    I've shared a history item so you can see what it produces at http://galaxy.rgenetics.org/datasets/6116a6bb4d400758/display/?preview=True - the history that page comes from includes some more examples at http://galaxy.rgenetics.org/u/fubar/h/unnamed-history

    ReplyDelete
  3. Hmmm... This did not work for me. I think my problem is this part:
    "if you want to highlight rs1234 and rs5678, you would create an array containing the integers 1234 and 5678. If you already have a list of SNPs, use the substr()" Well, I have 4 columns: SNP, CHR, BP, P. And my SNPlist is read in as a table: specialsnps=read.table("specialsnps.txt")
    But when I try to run the command, I get:
    "Warning message: In inherits(x, "factor") : NAs introduced by coercion"
    I changed my specialsnps.txt to have no rs before the numbers. Ideas?

    ReplyDelete
  4. RxDx... Hmm, perhaps try leaving the rs in front of the number, and the substr operation will take care of it.

    ReplyDelete
  5. You know, I tried this and I may be doing it wrong. Perhaps you could just elaborate the code you use to:
    1. Enter the vector of SNPs and
    2. Do the substring operation

    I really want to make this plot.

    ReplyDelete
  6. Is there a possiblitity to arrange multiple manhattan plots in the same image window?

    ReplyDelete
  7. Yes, see this post for arranging multiple ggplot2 plots in the same window.

    ReplyDelete
  8. Thank you for the advice, Stephen. I have been trying to finish the code with the folowing:
    myplot1=manhattan(mydata1)
    myplot2=manhattan(mydata2)
    myplot<- arrange(myplot1, myplot2)

    and I got the message:
    Error in rank(x, ties.method = "min", na.last = "keep") :
    unimplemented type 'list' in 'greater'
    Do you have any clue how to solve this?

    ReplyDelete
  9. This code is now part of the gridextra package. Perhaps you'd have luck using that package, or by emailing the developer. He's left comments here before so I'm sure he'd be willing to help you. Best of luck.

    http://code.google.com/p/gridextra/

    ReplyDelete
  10. Hi,

    I've tried using your code. The bit I don't understand is:

    # Assuming you already have a vector of rs-numbers to highlight
    # In this case al 52 TC-SNPs found by Teslovich et al.
    >head(ImportantSNPs)
    >[1] 12027135 2479409 3850634 7515577

    I try to load the Teslovich et al. TC-markers (here's a couple of them). But I get two errors:
    - object ImportantSNPs not found
    - unexpected '[' in "["

    What does this mean?

    Could you help me out in this department?

    Thanks,

    Sander

    ReplyDelete
  11. Please forgive me for my sloth in responding. In reality I can't promise to find time to investigate some of these issues at the moment – I recently started a new job, and the code I post here doesn’t do much for my CV. I hope you understand – I know it's a bit unsatisfactory – I write some software for myself for a quick one-off plot or scripting job, and I put the code on here touting it as useful, but it's not always clear how maintaining it fits into my job description. If you can fix it, or at least work out exactly what's going wrong, please leave a comment. If there are any useful changes I can make to the source code I will be happy to push a new version to GitHub and put the link here.

    ReplyDelete
  12. I recently made a few bugfixes and rewrote the manhattan plot function using base graphics, which is way faster and more memory efficient. You can grab the code here: https://gist.github.com/929702

    ReplyDelete
  13. *** Update April 25, 2011: This code has gone through a major revision. Please see the updated code and tutorial here. ***

    ReplyDelete

Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.