Populations and haplotype networks

In this tutorial I will show you a quick and dirty way of plotting pie charts on the statistical parsimony network. This feature will be added in future updates of the package. Copy and paste the following commands at the R prompt. It requires the package ‘plotrix’ to display pie charts on the existing plot.

require("haplotypes")
require("plotrix")

1-Read your DNA data in fasta format, in this example, system file ‘example.fas’ was used.

f<-system.file("example.fas",package="haplotypes")
x<-read.fas(file=f)
#the number of DNA sequences
ns<-nrow(x)

2-Define your populations, in this example it is randomly generated.

pop<-c("pop1","pop2","pop3","pop4","pop5","pop6","pop7","pop8")
#populations
pops<-sample(pop,nrow(x),replace=TRUE)

3-Infer haplotypes from DNA sequences, in this example, indels are coded separately following the simple indel coding method. They can be coded as a fifth state character or as missing.

h<-haplotypes::haplotype(x,indels="sic")

4-Create a matrix with haplotypes as rows, populations as columns and abundance as entries.

g<-grouping(h,pops)$hapmat

5-Conduct statistical parsimony analysis with %95 connection limit. Again indels are coded following the simple indel coding method.

p<-parsimnet(x,indels="sic",prob=.95)

6-Plot statistical parsimony network. Vertex positions should be interactively adjust before plotting pie charts if pies are overlapping.

#vcoor is a two-column matrix containing the vertex positions as x,y coordinates
vcoor<-plot(p,interactive=TRUE,vertex.cex=.4,
vertex.col=1, displaylabels=FALSE)

Alternatively, you can try other vertex placement algorithms.

 vcoor<-plot(p,interactive=FALSE,vertex.cex=.4,
vertex.col=1,displaylabels=FALSE,mode= "fruchtermanreingold")

7-Get the number and the vertex positions of haplotypes and give colors for the populations.

 #the number of haplotypes in the network.
nhap<-p@nhap[1]
#vertex positions of haplotypes
vcoor<-vcoor[1:nhap,]
#colors for the populations
cols<-colors()[c(30,369,552,558,538,642,142,91)]

8-Define size of the pie charts, in this example pie sizes are proportional (divided by 5) to the number of sequences per haplotype.

rads<-apply(g,1,sum)/5

9-Display pie charts on the existing network.

#create new labels for haplotypes
labs<-paste("H",rownames(g),sep="")
for(i in 1: nhap)
{
 floating.pie(vcoor[i,1], vcoor[i,2],g[i,],edges=200,radius=rads[i],col= cols[g[i,]>0],startpos=0,shadow=FALSE)
#plot new labels on pie charts
 text(vcoor[i,1], vcoor[i,2],labels=labs[i],col="white",cex=rads[i]*2)
}

10-Add legends to the plot.

legend("topright", pop, cex=1, fill=cols)

Resultant network (click to see original image):


What if you get more than one networks (unconnected subnetworks)?

11-Conduct statistical parsimony analysis with %99 connection limit, it returns six unconnected subnetworks.

p<-parsimnet(x,indels="sic",prob=.99)

12- Suppose we use the first network. Modify the matrix ‘g’ created at step-4.

#index of haplotypes in the first network.
hi<-p@rowindex[[1]]
g<-g[hi,]
#remove empty columns (if any)
rmc<-(apply(g,2,sum)>0)
g<-g[, rmc]

13-Plot the first network.

vcoor<-plot(p,net=1,interactive=FALSE,vertex.cex=.4,
vertex.col=1,displaylabels=FALSE,mode= "fruchtermanreingold")

14-Get the number and the vertex positions of haplotypes and give colors for the populations.

 #the number of haplotypes in the first network.
nhap<-p@nhap[1]
#vertex positions of haplotypes
vcoor<-vcoor[1:nhap,]
#colors for the remaining populations
cols<-colors()[c(30,369,552,558,538,642,142,91)][rmc]

15-Define size of the pie charts.

rads<-apply(g,1,sum)/5

16-Display pie charts on the existing network.

#labels for remaining haplotypes
labs<-paste("H",rownames(g),sep="")
for(i in 1: nhap)
{
 floating.pie(vcoor[i,1], vcoor[i,2],g[i,],edges=200,radius=rads[i],col= cols[g[i,]>0],startpos=0,shadow=FALSE)
#plot new labels on pie charts
 text(vcoor[i,1], vcoor[i,2],labels=labs[i],col="white",cex=rads[i]*2)
}

17-Add legends to the plot.

legend("topright", pop[rmc], cex=1, fill=cols)
Reklamlar

Populations and haplotype networks” üzerine 9 yorum

  1. Hello, how are you?
    Sorry for the kind of naive question. I have one alignment with 185 sequences that I’d like to sample (create populations) in three groups. What is the best way to do that? Is should I create a column in the alignment object (matrix)? Or can I do that using the pop <- c(…) string?
    Thank you very much
    JP


    pop <- c(…) should work well 🙂
    -Caner

  2. Hi. I am trying to use your code to draw an haplotype network and so far I’ve been unsuccessful. I modified the code, basically changed the name of the fasta file, and I dont get a network at the end.
    Also, look when I do > require(“plotrix”)
    Loading required package: plotrix
    Warning message:
    In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
    there is no package called ‘plotrix’
    ………….
    I get this message, so I installed it. also I get numbered lines which is rare. Look
    > f x<-read.fas(file=f)
    1: ns<-nrow(x)
    2: <-nrow(x)
    3: pop<-sample(("pop1","pop2","pop3","pop4","pop5","pop6","pop7","pop8"),nrow(x))
    4: pops<-sample(pop,nrow(x),replace=TRUE)
    5: h<-haplotypes::haplotype(x,indels="sic")

    and so on.

  3. I would love for this package to work with my data… it is a large alignment ~ 18kb (whole mitochondrial genomes). I keep getting an error at this step, anyone seen this before?

    > p<-parsimnet(x,indels="sic",prob=.95)
    Error in jump[prevset, nexp] <- jump[prevp, prevset] + x :
    replacement has length zero


    • Unfortunately, this is a bug. I’ve already solved it. It is complicated problem, simply copy and paste the codes does not work. In the following update of the package I will correct the problem. If it is OK for you, I can run the analysis for you if you e-mail me the alignment file. How many sequences do you have?

      • I am encountering a similar bug at the moment. Using around 60 sequences. When I plot(p, prob = 0.95) i get an error “Error in jump[prevset, nexp] <- jump[prevp, prevset] + x :
        replacement has length zero"
        When i increase probability to 0.96 or above, this error disappears. Why is it limited at this value?


        I’ve already solved it. In the following update of the package I will correct the problem. I can run the analysis for you if you e-mail me the alignment file.

  4. Hello,

    Thanks for sharing this, really appreciate!

    I tried the second tutorial- plot pie chart on a haplotype network. However, whenever I proceed at this line:
    “#the number of haplotypes in the network
    nhap<-p@nhap[1]"
    The program seems to be frozen or just takes extremely long time to execute.
    Just wondering if this is a special case for me, did I do anything wrong?

    Thank you so much for your time and help!

    Wen


    That’s interesting, what is the number of haplotypes in the first network?

    print(p)
    


    -Caner

  5. Dear all,

    I find this package very interesting, especially because “haplotypes” allow to considers indels mutations for network computation. I have encounter some difficulties for plotting my haplotype network with frequencies and indicating population origin. My dataset is a sequence alignment of 653 sequences in fasta. I’ve succeeded to use haplotype function ( so I have 31 haplotypes), distance, parsimnet and even grouping by population factor.

    x <- read.fas ("PoQ 31 haplotypes 190815 v2.fas")
    h<-haplotype(x,indels="s")
    *** S4 Object of Class Haplotype ***

    $haplotype1
    [1] "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3"
    [12] …
    $haplotype2
    [1] "REU_A1_Mad15" "MDG_A1_Mad15" "MDG_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15"

    $haplotype3
    [1] "REU_A1_Run1"….

    d <- distance(x, indels = "sic")
    p<-parsimnet(x, indels = "sic", prob=.95)

    pops <- c(rep("REU", 123), rep("MUS", 132), rep("MDG", 157), rep ("SYC", 115), rep ("COM", 90), rep ("A", 34), rep ("E", 2))

    g y class(h)
    [1] “Haplotype”
    attr(,”package”)
    [1] “haplotypes”

    Can anyone help me with this problem? I also would like to plot my network as in the tutorial with one population = one color and size of pie chart according to frequency of the haplotype.

    Thank you very much for you help,

    Cheers,

    Maéva TECHER
    Ph.D student (Université de La Réunion)

    Your code looks fine. You have 7 populations instead of 8, so you should modify example codes and give 7 colors.

    cols<-colors()[c(30,369,552,558,538,642,142)]
    

    -Caner

Bir Cevap Yazın

Aşağıya bilgilerinizi girin veya oturum açmak için bir simgeye tıklayın:

WordPress.com Logosu

WordPress.com hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap /  Değiştir )

Google+ fotoğrafı

Google+ hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap /  Değiştir )

Twitter resmi

Twitter hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap /  Değiştir )

Facebook fotoğrafı

Facebook hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap /  Değiştir )

w

Connecting to %s