Computational and Evolutionary Analysis of HIV Molecular Sequences

Molecular Phylogenetics and Evolution Ethnic differences in the adaptation rate of HIV gp from a vaccine trial. Multigene phylogeny of Malagasy day geckos of the genus Phelsuma. Widespread gene conversion of alphafucosyltransferase genes in mammals. Low mitochondrial diversity and lack of structure in the velvet swimming crab Necora puber along the Galician coast. Marine Biology Characterization of reticulate networks based on the coalescent. Genetic variation of the spiny spider crab Maja brachydactyla in the northeastern Atlantic.

Computational and Evolutionary Analysis of HIV Molecular Sequences / Edition 1

Marine Ecology Progress Series Inverted replication of vertebrate mitochondria. Disease progression and evolution of the HIV-1 env gene in 24 infected infants. Infection, Genetics and Evolution 8: Genetic identification of the northeastern Atlantic spiny spider crab as Maja brachydactyla. Journal of Crustacean Biology Introgression and genetic structure in northern Spanish Atlantic salmon Salmo salar L. Conservation Genetics 9: Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography.

BMC Bioinformatics 8: Identification and characterization of microsatellite loci in the spiny spider crab Maja brachydactyla. Conservation Genetics 8: Phylogenetic evidence for multiple sympatric ecological diversification in a marine snail. Evolution Recombination favors the evolution of drug resistance in HIV-1 during antiretroviral therapy.

Infection, Genetics and Evolution 7: An exact nonparametric method for inferring mosaic structure in sequence triplets.

Journal of Acquired Immune Deficiency Syndromes Virology Phylogenetic affinities of Comoroan and East African day geckos genus Phelsuma : multiple natural colonisations, introductions and island radiations. Spatio-temporal genetic variability in sea trout Salmo trutta populations from northwestern Spain.

Freshwater Biology MtArt: A new model of amino acid replacement for Arthropoda. Automated phylogenetic detection of recombination using a genetic algorithm. Longitudinal population analysis of dual infection with recombination in two strains of HIV-1 subtype B in an individual from a phase 3 HIV vaccine efficacy trial.

Nested Clade Analysis statistics. Molecular Ecology Notes 6: Genetic variation at MHC, mitochondrial and microsatellite loci in isolated populations of Brown trout Salmo trutta. Conservation Genetics 7: GenDecoder: genetic code prediction for metazoan mitochondria.

Nucleic Acids Research WW ModelTest Server: a web-based tool for the statistical selection of models of nucleotide substitution online. Perkinsoide chabelardi n.

Environmental Microbiology 8 6 : Parallel evolution of the genetic code in arthropod mitochondrial genomes. PloS Biology 4 5 :e Recombination estimation under complex evolutionary models with the coalescent composite likelihood method. Infection, Genetics and Evolution 6: The evolutionary value of recombination is constrained by genome modularity. PLoS Genetics 1 4 : e Identification of a novel HIV-1 complex circulating recombinant form of central african origin in cuba. AIDS Polymorphisms in the sequences of Marteilia internal transcribed spacer region of the ribosomal RNA genes ITS-1 in Spain: genetic types are not related with bivalve hosts.

Journal of Fish Diseases ProtTest: Selection of best-fit models of protein evolution. Widespread recombination in published animal mtDNA sequences. A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. RDP2: Recombination detection and analysis from sequence alignments. Bos D, Posada D. Using models of nucleotide evolution to build phylogenetic trees. Posada D and Buckley TR. Model selection and model averaging in phylogenetics: advantages of the AIC and Bayesian approaches over likelihood ratio tests.

Phylogeography and speciation of colour morphs in Pseudodistoma crucigaster Ascidiacea: Polyclinidae. Molecular Ecology A pharmacogenetic study of statin therapy and cholesterol reduction. Journal of the American Medical Association The causes and consequences of HIV evolution. Nature Reviews Genetics 5: New approach to an old problem: incorporating gap-rich regions from ITS and rDNA large subunit into phylogenetic analyses to solve the Peltigera canina species complex.

Each window was visually inspected for residual alignment errors and problematic sites, which were manually removed. For each window, we reconstructed a ML phylogenetic tree using FastTree2 Price, Dehal, and Arkin under the default general time-reversible model. Each tree was rooted based on the sample collection dates and a strict molecular clock model using the root-to-tip function rtt Paradis, Claude, and Strimmer in the ape package in R.

To calculate the confidence intervals associated with the estimated tMRCA and rate, the LSD program uses a parametric bootstrap approach where the branch lengths in the tree are resampled from a Poisson distribution calibrated to the number of sites.

We set the sampling number to for this step. To characterize the autocorrelation in estimates of tMRCA across windows, we fit a piecewise linear regression model with two segments and zero slopes, and selected the best segment breakpoint using the Akaike information criterion. We measured for autocorrelation in the residuals of the regression analysis using the Durbin-Watson test as implemented in the car package Lenth in R.

However, RIP is a web-based program, making it difficult to perform batch processing on large numbers of full-length genome sequences. Therefore we developed an RIP-like script in Python to perform the analysis. For each query sequence, we used MAFFT to regenerate an MSA of the reference sequences with the query, and excluded all insertions in the query relative to the reference consensus. For sliding windows of nt and a step size of 5 nt, we calculated the p-distances between the query and every reference, ignoring any gaps in either sequence, and tracked which references had the shortest and next-shortest distances.

Finally, we used the change points in the resulting vector of reference names to identify potential breakpoints. Since it is not feasible to generate a random sample from the posterior distribution for trees relating to thousands of sequences, we uniformly down-sampled the sequence alignments in each window with respect to time, by selecting up to ten sequences per collection year at random without replacement. A previous simulation study Hall, Woolhouse, and Rambaut determined that uniform sampling of sequences over time can provide unbiased reconstructions of past dynamics.

This down-sampling resulted in a median of sequences range — , which in our experience is at the upper limit of sample size at which a chain sample in BEAST might be expected to converge to the posterior distribution in a reasonable amount of time. Each chain sample was propagated for 10 8 steps; the first 10 7 steps were discarded as burn-in and the remainder was thinned at intervals of 10 4 steps. FUBAR fits a limited number of rate categories defining non-synonymous dN and synonymous dS substitution rate combinations to individual codon sites for a given codon alignment and phylogenetic tree.

We extracted codon alignments from the regions encoding gag , pol and env in our final sequence alignment obtained as described in the preceding Section 2. Because the reading frames of these major genes were disrupted by our filtering of extensive indel polymorphisms from the final alignment, we limited the codon alignments to cover a portion of each gene comparable to a single window in our dated-tip analysis. The resulting codon alignment each comprised nt codons with the following consensus sequence coordinates: gag —1,, pol 2,—2,, and env 6,—6, Next, we used FastTree2 to reconstruct phylogenetic trees for each of the codon alignments.

We retrieved 7, near full-length HIV-1 genome records from Genbank. Although it was feasible to construct a MSA from these data using a conventional program e. S2 , the resulting alignment became long and sparse due to an excessive number of gaps induced by highly divergent sequence intervals or non-homologous sequence insertions Supplementary Fig.

S3 , top. Supplementary Fig. S1 summarizes the increasing alignment length with progressively larger numbers of genome sequences in the alignment, starting from a mean of 10, nt with randomly selected sequences to a mean of 24, nt with 2, sequences. As a result, we constructed a consensus genome sequence from the multiple alignment of a subset of representative genomes, which we selected by a network clustering analysis of a p-spectrum kernel matrix Shawe-Taylor and Cristianini of the entire data set.

In brief, we calculated the inner products of hexamer frequencies for all pairs of genomes and converted the resulting matrix into a graph using a cutoff value to define edges. We used a clustering method to extract thirty-two subgraphs and identified the highest degree node in each subgraph. Finally, we screened the MSA of genomes corresponding to these central nodes for regions of low homology and generated the consensus from the remaining sites.

The consensus was used to make a pairwise draft alignment of all 3, genomes then, a final alignment was made using MAFFT see Section 2 for details. The resulting alignment using this approach consisted mainly of regions of evolutionary homology across the genome sequences Supplementary Fig.