Populations and haplotype networks

In this tutorial I will show you a quick and dirty way of plotting pie charts on the statistical parsimony network. This feature will be added in future updates of the package. Copy and paste the following commands at the R prompt. It requires the package ‘plotrix’ to display pie charts on the existing plot.

require("haplotypes")
require("plotrix")

1-Read your DNA data in fasta format, in this example, system file ‘example.fas’ was used.

f<-system.file("example.fas",package="haplotypes")
x<-read.fas(file=f)
#the number of DNA sequences
ns<-nrow(x)

2-Define your populations, in this example it is randomly generated.

pop<-c("pop1","pop2","pop3","pop4","pop5","pop6","pop7","pop8")
#populations
pops<-sample(pop,nrow(x),replace=TRUE)

3-Infer haplotypes from DNA sequences, in this example, indels are coded separately following the simple indel coding method. They can be coded as a fifth state character or as missing.

h<-haplotypes::haplotype(x,indels="sic")

4-Create a matrix with haplotypes as rows, populations as columns and abundance as entries.

g<-grouping(h,pops)$hapmat

5-Conduct statistical parsimony analysis with %95 connection limit. Again indels are coded following the simple indel coding method.

p<-parsimnet(x,indels="sic",prob=.95)

6-Plot statistical parsimony network. Vertex positions should be interactively adjust before plotting pie charts if pies are overlapping.

#vcoor is a two-column matrix containing the vertex positions as x,y coordinates
vcoor<-plot(p,interactive=TRUE,vertex.cex=.4,
vertex.col=1, displaylabels=FALSE)

Alternatively, you can try other vertex placement algorithms.

 vcoor<-plot(p,interactive=FALSE,vertex.cex=.4,
vertex.col=1,displaylabels=FALSE,mode= "fruchtermanreingold")

7-Get the number and the vertex positions of haplotypes and give colors for the populations.

 #the number of haplotypes in the network.
nhap<-p@nhap[1]
#vertex positions of haplotypes
vcoor<-vcoor[1:nhap,]
#colors for the populations
cols<-colors()[c(30,369,552,558,538,642,142,91)]

8-Define size of the pie charts, in this example pie sizes are proportional (divided by 5) to the number of sequences per haplotype.

rads<-apply(g,1,sum)/5

9-Display pie charts on the existing network.

#create new labels for haplotypes
labs<-paste("H",rownames(g),sep="")
for(i in 1: nhap)
{
 floating.pie(vcoor[i,1], vcoor[i,2],g[i,],edges=200,radius=rads[i],col= cols[g[i,]>0],startpos=0,shadow=FALSE)
#plot new labels on pie charts
 text(vcoor[i,1], vcoor[i,2],labels=labs[i],col="white",cex=rads[i]*2)
}

10-Add legends to the plot.

legend("topright", pop, cex=1, fill=cols)

Resultant network (click to see original image):


What if you get more than one networks (unconnected subnetworks)?

11-Conduct statistical parsimony analysis with %99 connection limit, it returns six unconnected subnetworks.

p<-parsimnet(x,indels="sic",prob=.99)

12- Suppose we use the first network. Modify the matrix ‘g’ created at step-4.

#index of haplotypes in the first network.
hi<-p@rowindex[[1]]
g<-g[hi,]
#remove empty columns (if any)
rmc<-(apply(g,2,sum)>0)
g<-g[, rmc]

13-Plot the first network.

vcoor<-plot(p,net=1,interactive=FALSE,vertex.cex=.4,
vertex.col=1,displaylabels=FALSE,mode= "fruchtermanreingold")

14-Get the number and the vertex positions of haplotypes and give colors for the populations.

 #the number of haplotypes in the first network.
nhap<-p@nhap[1]
#vertex positions of haplotypes
vcoor<-vcoor[1:nhap,]
#colors for the remaining populations
cols<-colors()[c(30,369,552,558,538,642,142,91)][rmc]

15-Define size of the pie charts.

rads<-apply(g,1,sum)/5

16-Display pie charts on the existing network.

#labels for remaining haplotypes
labs<-paste("H",rownames(g),sep="")
for(i in 1: nhap)
{
 floating.pie(vcoor[i,1], vcoor[i,2],g[i,],edges=200,radius=rads[i],col= cols[g[i,]>0],startpos=0,shadow=FALSE)
#plot new labels on pie charts
 text(vcoor[i,1], vcoor[i,2],labels=labs[i],col="white",cex=rads[i]*2)
}

17-Add legends to the plot.

legend("topright", pop[rmc], cex=1, fill=cols)

Populations and haplotype networks” üzerine 5 yorum

  1. I would love for this package to work with my data… it is a large alignment ~ 18kb (whole mitochondrial genomes). I keep getting an error at this step, anyone seen this before?

    > p<-parsimnet(x,indels="sic",prob=.95)
    Error in jump[prevset, nexp] <- jump[prevp, prevset] + x :
    replacement has length zero


    • Unfortunately, this is a bug. I’ve already solved it. It is complicated problem, simply copy and paste the codes does not work. In the following update of the package I will correct the problem. If it is OK for you, I can run the analysis for you if you e-mail me the alignment file. How many sequences do you have?

  2. Hello,

    Thanks for sharing this, really appreciate!

    I tried the second tutorial- plot pie chart on a haplotype network. However, whenever I proceed at this line:
    “#the number of haplotypes in the network
    nhap<-p@nhap[1]"
    The program seems to be frozen or just takes extremely long time to execute.
    Just wondering if this is a special case for me, did I do anything wrong?

    Thank you so much for your time and help!

    Wen


    That’s interesting, what is the number of haplotypes in the first network?

    print(p)
    


    -Caner

  3. Dear all,

    I find this package very interesting, especially because “haplotypes” allow to considers indels mutations for network computation. I have encounter some difficulties for plotting my haplotype network with frequencies and indicating population origin. My dataset is a sequence alignment of 653 sequences in fasta. I’ve succeeded to use haplotype function ( so I have 31 haplotypes), distance, parsimnet and even grouping by population factor.

    x <- read.fas ("PoQ 31 haplotypes 190815 v2.fas")
    h<-haplotype(x,indels="s")
    *** S4 Object of Class Haplotype ***

    $haplotype1
    [1] "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3" "REU_A1_Mad3"
    [12] …
    $haplotype2
    [1] "REU_A1_Mad15" "MDG_A1_Mad15" "MDG_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15" "COM_A1_Mad15"

    $haplotype3
    [1] "REU_A1_Run1"….

    d <- distance(x, indels = "sic")
    p<-parsimnet(x, indels = "sic", prob=.95)

    pops <- c(rep("REU", 123), rep("MUS", 132), rep("MDG", 157), rep ("SYC", 115), rep ("COM", 90), rep ("A", 34), rep ("E", 2))

    g y class(h)
    [1] “Haplotype”
    attr(,”package”)
    [1] “haplotypes”

    Can anyone help me with this problem? I also would like to plot my network as in the tutorial with one population = one color and size of pie chart according to frequency of the haplotype.

    Thank you very much for you help,

    Cheers,

    Maéva TECHER
    Ph.D student (Université de La Réunion)

    Your code looks fine. You have 7 populations instead of 8, so you should modify example codes and give 7 colors.

    cols<-colors()[c(30,369,552,558,538,642,142)]
    

    -Caner

Bir Cevap Yazın

Aşağıya bilgilerinizi girin veya oturum açmak için bir simgeye tıklayın:

WordPress.com Logosu

WordPress.com hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap / Değiştir )

Twitter resmi

Twitter hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap / Değiştir )

Facebook fotoğrafı

Facebook hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap / Değiştir )

Google+ fotoğrafı

Google+ hesabınızı kullanarak yorum yapıyorsunuz. Çıkış  Yap / Değiştir )

Connecting to %s