TRACESPipe is a hybrid pipeline for efficient reconstruction and analysis of viral and host genomes that can be set at multi-organ level for clinical or aDNA purposes.
About
Diogo Pratas is a researcher in interdisciplinary informatics, dedicated to developing computational methods that bridge informatics with biomedical, anthropological, and historical research. His work focuses on analyzing and interpreting complex data. Current projects include efficient compression of biological data, reconstruction and analysis of viral genomes, metagenomic studies of ancient DNA, and the application of machine learning to date and localize ancient artifacts such as metals, texts, and genetic material.
Research
Bioinformatics: Advanced genetic and structural analysis through the development of efficient computational tools. This includes genome classification, identification of patterns in biological sequences, and the reconstruction of ancient and modern genomes (mostly viruses).
Computational Biology: Modeling and simulating biological systems to explore functional genomics and evolutionary dynamics. This line involves developing interdisciplinary methods and tools to interpret complex biological patterns across species and time scales, and includes applied solutions to environmental challenges, such as plastic biodegradation.
Computational Medicine: Development and application of innovative computational methods for virus detection in human tissues, virome analysis (including in disease and transplantation contexts), cytogenetic analysis, and identification of therapeutic targets, contributing to diagnosis, prognosis, and treatment in medicine.
Artificial Intelligence for Cultural Heritage: Application of machine learning and data mining techniques to the preservation and digital analysis of traditional knowledge. Projects involve dating and localize ancient artifacts, such as metals, texts, and DNA.
Data Compression and Analysis: Development of high-efficiency algorithms for processing and interpreting large-scale datasets. Emphasis is placed on statistical and algorithmic techniques for data reduction, pattern discovery, and extracting meaningful structures embedded within complex data.
Algorithmic Information Theory: Development of methodologies for generating and analyzing pseudo-random Turing Machines to identify those that produce statistically complex tape behaviors, along with the development of computational approaches to discover short Turing Machines that describe exact or approximate digital objects and to compute approximations of their logical depth.
Students
PhD Student
- M. J. P. Sousa - Intelligent reconstruction and analysis of viral genomes.
Master Students
- A. Gomes - Human Virus Genomics and distribution.
- B. Simões - Characterization and visualization of plant genomes using machine learning.
- D. Yamunaque - A machine learning approach for authentication of ancient DNA samples.
- D. Lourenço - Exploring microalgae-enzymes for sustainable plastic biodegradation.
- L. Marques - Automatic DNA classification of organisms contained in ancient samples.
- P. Pinto - Text dating using data compression and machine learning.
- R. Dias - Complexity analysis of musics.
- R. Dias - Predicting the age of metal artifacts using machine learning.
- S. Almeida - Analysis of Relative Absent Words for Innovative Diagnosis.
Alumni
- Rita Ferrolho - Optimization of a genomic data compressor using metameric genetic algorithms (2024).
- Clara Cerqueira - Machine learning-enhanced optimization of plastic-degrading enzymes for sustainable ocean cleanup (2024).
- Dinis Lei - Study of the impact of data compression on energy consumption reduction (2024).
- Margarida Pinheiro - Genomic diversity and zoonotic potential of hepatitis E virus in European rabbits: implications for diagnostic and therapeutic approaches (2024).
- Mariana Fernandes - Designing optimal 3D enzyme computational models for efficient plastic degradation (2024).
- Rafael Vieira - Designing in-silico aptamers for potential use in marine bioremediation (2024).
- Renato Soares - Improving a database of cyanobacterial bioactive compounds that can be used for therapeutic approaches in human diseases (2023).
- Tiago Fonseca - Impact of sorting in DNA sequence compression (2023).
- Jorge M. Silva - Algorithmic information approximations in data analysis (2023).
- Alexandre Lourenço - Reconstruction and classification of unknown DNA sequences (2021).
- Milton Silva - Efficient biosequence compression using neural networks (2021).
- Morteza Hosseini - Compression models and tools for omics data (2020).
- Manuel Gaspar - Automatic system for approximate and noncontiguous DNA sequences search (2017).
Publications
Journal Articles:
- M. J. P. Sousa, A. J. Pinho, D. Pratas*. JARVIS3: an efficient encoder for genomic data. Bioinformatics, 2024.
- J. M. Silva, A. J. Pinho, D. Pratas*. AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data. GigaScience, 2024.
- R. Soares, L. Azevedo, V. Vasconcelos, D. Pratas, S. Sousa, J. Carneiro. Machine Learning-Driven Discovery and Database of Cyanobacteria Bioactive Compounds: A Resource for Therapeutics and Bioremediation. Journal of Chemical Information and Modeling, 2024.
- L. Pyöriä, D. Pratas, M. Toppinen, P. Simmonds, K. Hedman, A. Sajantila, M. F. Perdomo. Intra-host genomic diversity and integration landscape of human tissue-resident DNA virome. Nucleic Acids Research, 2024.
- L. Hannolainen, L. Pyöriä, D. Pratas, J. Lohi, S. Skuja, S. Rasa-Dzelzkaleja, M. Murovska, K. Hedman, T. Jahnukainen, M. F. Perdomo. Perinnöllinen herpesvirus elinsiirron kiusana. Duodecim, 2024.
- L. Hannolainen, L. Pyöriä, D. Pratas, J. Lohi, S. Skuja, S. Rasa-Dzelzkaleja, M. Murovska, K. Hedman, T. Jahnukainen, M. F. Perdomo. Reactivation of a transplant recipient’s inherited human herpesvirus 6 and implications to the graft. The Journal of Infections Diseases, 2024.
- J. M. Silva, W. Qi, A. J. Pinho, D. Pratas*. AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data. GigaScience, 2023.
- J. Carneiro, F. Pascoal, M. Semedo, D. Pratas, M. P. Tomasino, A. Rego, M. F. Carvalho, A. P. Mucha, C. Magalhães. Mapping human pathogens in wastewater using a metatranscriptomic approach. Environmental Research, 2023.
- J. Carneiro, R. P. Magalhães, V. M. de la Oliva Roque, M. Simões, D. Pratas, S. Sousa. TargIDe: a machine-learning workflow for target identification of molecules with antibiofilm activity against Pseudomonas aeruginosa. J Comput Aided Mol Des, 2023.
- L. Pyöriä, D. Pratas, M. Toppinen, K. Hedman, A. Sajantila, M. F. Perdomo. Unmasking the Tissue-Resident Eukaryotic DNA Virome in Humans. Nucleic Acids Research, 2023.
- L. Pyöriä, D. Pratas, M. Toppinen, K. Hedman, A. Sajantila, M. F. Perdomo. Elimistömme on lukuisten terveyteemme vaikuttavien virusten koti. Duodecim, 2023.
- M. K. Jauhiainen, U. Mohanraj, M. Lehecka, M. Niemelä, T. P. Hirvonen, D. Pratas, M. F. Perdomo, M. Söderlund-Venermo, A. A. Mäkitie, S. T. Sinkkonen. Herpesviruses, polyomaviruses, parvoviruses, papillomaviruses, and anelloviruses in vestibular schwannoma. Journal of NeuroVirology, 2023.
- J. M. Silva, D. Pratas, T. Caetano, S. Matos. The complexity landscape of viral genomes. GigaScience, 2022.
- W. Qi, Y. Lim, A. Patrignani, P. Schläpfer, A. Bratus-Neuenschwander, S. Grüter, C. Chanez, N. Rodde, E. Prat, S. Vautrin, M. Fustier, D. Pratas, R. Schlapbach, W. Gruissem. The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features. GigaScience, 2022.
- O. I. Mielonen, D. Pratas, K. Hedman, A. Sajantila, M. F. Perdomo. Detection of Low-Copy Human Virus DNA upon Prolonged Formalin Fixation. Viruses, 2022.
- M. Toppinen, A. Sajantila, D. Pratas, K. Hedman, M. F. Perdomo. The Human Bone Marrow Is Host to the DNAs of Several Viruses. Frontiers in cellular and infection microbiology, 2021.
- J. Monteiro, D. Pratas, A. Videira, F. Pereira. Revisiting the Neurospora crassa mitochondrial genome. Letters in Applied Microbiology, 2021.
- M. Silva*, D. Pratas*, A. J. Pinho. AC2: An Efficient Protein Sequence Compression Tool Using Artificial Neural Networks and Cache-Hash Models. Entropy, 2021.
- J. M. Silva, D. Pratas, R. Antunes, S. Matos, A. J. Pinho. Automatic analysis of artistic paintings using information-based measures. Pattern Recognition, 2021.
- J. R. Almeida, D. Pratas, J. L. Oliveira. A semi-automatic methodology for analysing distributed and private biobanks. Computers in Biology and Medicine, 2021.
- D. Pratas*, J. M. Silva. Persistent minimal sequences of SARS-CoV-2. Bioinformatics, 2020.
- D. Pratas*, M. Toppinen, L. Pyöriä, K. Hedman, A. Sajantila, M. F. Perdomo. A hybrid pipeline for reconstruction and analysis of viral genomes at multi-organ level. GigaScience, 2020.
- M. Silva*, D. Pratas*, A. J. Pinho. Efficient DNA sequence compression with neural networks. GigaScience, 2020.
- M. Toppinen, D. Pratas, E. Väisänen, M. Söderlund-Venermo, K. Hedman, M. F. Perdomo, A. Sajantila. The landscape of persistent human DNA viruses in femoral bone. Forensic Science International: Genetics, 2020.
- M. Hosseini, D. Pratas, B. Morgenstern, A. J. Pinho. Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements. GigaScience, 2020.
- J. R. Almeida, A. J. Pinho, J. L. Oliveira, O. Fajarda, D. Pratas. GTO: A toolkit to unify pipelines in genomic and proteomic research. SoftwareX, 2020.
- J. M. Silva, E. Pinho, S. Matos, D. Pratas. Statistical Complexity Analysis of Turing Machine Tapes with Fixed Algorithmic Complexity Using the Best-Order Markov Model. Entropy, 2020.
- D. Pratas*, M. Hosseini, J. M. Silva, A. J. Pinho. A reference-free lossless compression algorithm for DNA sequences using a competitive prediction of two classes of weighted models. Entropy, 2019.
- M. Hosseini, D. Pratas, A. J. Pinho. Cryfa: a secure encryption tool for genomic data. Bioinformatics, 2019.
- M. Hosseini, D. Pratas, A. J. Pinho. AC: A Compression Tool for Amino Acid Sequences. Interdisciplinary Sciences: Computational Life Sciences, 2019.
- J. M. Carvalho, S. Brás, D. Pratas, J. Ferreira, S. C. Soares, A. J. Pinho. Extended-Alphabet Finite-Context Models. Pattern Recognition Letters, 2018.
- D. Pratas*, M. Hosseini, G. Grilo, A. J. Pinho, R. Silva, T. Caetano, J. Carneiro, F. Pereira. Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard. Genes, 2018.
- D. Pratas*, R. Silva, A. J. Pinho. Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes. Entropy, 2018.
- M. Hosseini, D. Pratas, A. J. Pinho. A Survey on Data Compression Methods for Biological Sequences. Information, 2016.
- D. Pratas*, R. Silva, A. J. Pinho, P. J. S. G. Ferreira. An alignment-free method to find and visualise rearrangements between pairs of DNA sequences. Scientific Reports, 2015.
- R. Silva*, D. Pratas*, L. Castro, A. J. Pinho, P. J. S. G. Ferreira. Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinformatics, 2015.
- L. Matos, A. J. R. Neves, D. Pratas, A. J. Pinho. MAFCO: A compression tool for MAF files. PLoS ONE, 2015.
- A. J. Pinho, D. Pratas. MFCompress: a compression tool for FASTA and multi-FASTA data. Bioinformatics, 2014.
- D. Pratas*, A. J. Pinho, J. M. O. S. Rodrigues. XS: a FASTQ read simulator. BMC Research Notes, 2014.
- A. J. Pinho, S. P. Garcia, D. Pratas, P. J. S. G. Ferreira. DNA sequences at a glance. PLoS ONE, 2013.
- L. Matos, D. Pratas, A. J. Pinho. A compression model for DNA multiple sequence alignment blocks. IEEE Transactions on Information Theory, 2013.
- S. P. Garcia, J. M. O. S. Rodrigues, S. Santos, D. Pratas, V. Afreixo, C. A. C. Bastos, P. J. S. G. Ferreira, A. J. Pinho. A genomic distance for assembly comparison based on compressed maximal exact matches. IEEE Transactions on Computational Biology and Bioinformatics, 2013.
- A. J. Pinho, D. Pratas, S. P. Garcia. GReEn: a tool for efficient compression of genome resequencing data. Nucleic Acids Research, 2012.
International Conference Articles:
- D. Yamunaque, A. J. Pinho, A. Sajantila, D. Pratas. A Machine Learning Method for Authentication of Human Ancient Mitochondrial DNA. IbPRIA 2025, Coimbra, Portugal, July 2025.
- D. Lei, D. Yamunaque, A. J. Pinho, D. Pratas. ECOmpress: A web tool for boosting energy efficiency through data compression. IbPRIA 2025, Coimbra, Portugal, July 2025.
- A. J. Pinho, D. Pratas. Optimization of data compression parameters using genetic algorithms. DCC 2025, Snowbird, United States, March 2025.
- L. Almeida, P. Rodrigues, D. Magalhães, A. J. Pinho, D. Pratas. AIDetx: a compression-based method for identification of machine-learning generated text. DCC 2025, Snowbird, United States, March 2025.
- M. J. P. Sousa, A. J. Pinho, D. Pratas. Improving the generation of viral consensus sequences using adaptive models. EUSIPCO 2024, Lyon, France, August 2024.
- M. J. P. Sousa, A. J. Pinho, D. Pratas*. A sensitive compression-based method for filtering targeted FASTQ sequencing reads. EUSIPCO 2024, Lyon, France, August 2024.
- T. Fonseca, M. J. P. Sousa, A. J. Pinho, D. Pratas. A sorting tool for improving FASTA data compression tools. EUSIPCO 2024, Lyon, France, August 2024.
- A. J. Pinho, D. Pratas. Copy models for protein sequence compression. DCC 2024, Snowbird, United States, March 2024.
- D. Pratas*, A. J. Pinho. An experimental sorting method for improving metagenomic data encoding. DCC 2024, Snowbird, United States, March 2024.
- M. J. P. Sousa, D. Pratas. A method for improving the generation of consensus sequences. Workshop on Informatics Engineering Research, Porto, Portugal, 2024.
- J. M. Silva, D. Pratas, S. Matos. Exploring Kolmogorov Complexity Approximations for Data Analysis: Insights and Applications. DoCEIS 2023, Caparica, Portugal, pp. 161–174, 2023.
- D. Pratas*, A. J. Pinho. JARVIS2: a data compressor for large genome sequences. DCC 2023, Snowbird, United States, March 2023.
- J. M. Silva, D. Pratas, T. Caetano, S. Matos. Feature-Based Classification of Archaeal Sequences Using Compression-Based Methods. IbPRIA 2022, Aveiro, Portugal, May 2022.
- M. Hosseini, D. Pratas, A. J. Pinho. A probabilistic method to find and visualize distinct regions in protein sequences. EUSIPCO 2019, A Coruña, Spain, September 2019.
- D. Pratas*, M. Hosseini, A. J. Pinho. GeCo2: An optimized tool for lossless compression and analysis of DNA sequences. PACBB 2019, Ávila, Spain, June 2019.
- D. Pratas*, M. Hosseini, A. J. Pinho. Visualization of similar primer and adapter sequences in assembled archaeal genomes. PACBB 2019, Ávila, Spain, June 2019.
- A. J. Pinho, D. Pratas. An application of data compression models to handwritten digit classification. ACIVS 2018, September 2018.
- D. Pratas*, A. J. Pinho. Metagenomic composition analysis of sedimentary ancient DNA from the Isle of Wight. EUSIPCO 2018, Rome, Italy, September 2018.
- D. Pratas*, A. J. Pinho. A DNA sequence corpus for compression benchmark. PACBB 2018, Toledo, Spain, June 2018.
- D. Pratas*, M. Hosseini, A. J. Pinho. Compression of amino acid sequences. PACBB 2018, Toledo, Spain, June 2018.
- M. Gaspar, D. Pratas, A. J. Pinho. NET-ASAR: a tool for DNA sequence search based on data compression. PACBB 2018, Toledo, Spain, June 2018.
- D. Pratas*, M. Hosseini, A. J. Pinho. Cryfa: a tool to compact and encrypt FASTA files. PACBB 2017, Porto, Portugal, June 2017.
- M. Hosseini, D. Pratas, A. J. Pinho. On the role of inverted repeats in DNA sequence similarity. PACBB 2017, Porto, Portugal, June 2017.
- D. Pratas*, M. Hosseini, A. J. Pinho. Substitutional tolerant Markov models for relative compression of DNA sequences. PACBB 2017, Porto, Portugal, June 2017.
- D. Pratas*, A. J. Pinho. On the approximation of the Kolmogorov complexity for DNA sequences. IbPRIA 2017, Faro, Portugal, June 2017.
- D. Pratas*, M. Hosseini, R. Silva, A. J. Pinho, P. J. S. G. Ferreira. Visualization of distinct DNA regions of the modern human relative to a Neanderthal genome. IbPRIA 2017, Faro, Portugal, June 2017.
- A. J. Pinho, D. Pratas, P. J. S. G. Ferreira. Authorship attribution using relative compression. DCC 2016, Snowbird, United States, March 2016.
- D. Pratas*, A. J. Pinho, P. J. S. G. Ferreira. Efficient compression of genomic sequences. DCC 2016, Snowbird, United States, March 2016.
- A. J. Pinho, D. Pratas, P. J. S. G. Ferreira. A new compressor for measuring distances among images. ICIAR 2014, Vilamoura, Portugal, October 2014.
- D. Pratas*, A. J. Pinho. Exploring deep Markov models in genomic data compression using sequence pre-analysis. EUSIPCO 2014, Lisbon, Portugal, September 2014.
- D. Pratas*, A. J. Pinho. A conditional compression distance that unveils insights of the genomic evolution. DCC 2014, Snowbird, United States, March 2014.
- A. J. Pinho, D. Pratas, P. J. S. G. Ferreira. Information profiles for DNA pattern discovery. DCC 2014, Snowbird, United States, March 2014.
- S. P. Garcia, J. M. O. S. Rodrigues, D. Pratas, A. J. Pinho. Comparing maximal exact repeats in human genome assemblies using a normalized compression distance. ISMB 2012, Long Beach, United States, July 2012.
- L. Matos, D. Pratas, A. J. Pinho. Compression of whole genome alignments using a mixture of finite-context models. ICIAR 2012, Aveiro, Portugal, June 2012.
- D. Pratas*, A. J. Pinho. On the Detection of Unknown Locally Repeating Patterns in Images. ICIAR 2012, Aveiro, Portugal, June 2012.
- D. Pratas*, A. J. Pinho, S. P. Garcia. Exon: A web-based software toolkit for DNA sequence analysis. PACBB 2012, Salamanca, Spain, March 2012.
- D. Pratas*, A. J. Pinho, S. P. Garcia. Computation of the normalized compression distance of DNA sequences using a mixture of finite-context models. Bioinformatics 2012, Vilamoura, Portugal, February 2012.
- A. J. Pinho, D. Pratas, S. P. Garcia. Complexity profiles of DNA sequences using finite-context models. USAB 2011, Graz, Austria, November 2011.
- A. J. Pinho, D. Pratas, P. J. S. G. Ferreira, S. P. Garcia. Symbolic to numerical conversion of DNA sequences using finite-context models. EUSIPCO 2011, Barcelona, Spain, August 2011.
- A. J. Pinho, D. Pratas, P. J. S. G. Ferreira. Bacteria DNA sequence compression using a mixture of finite-context models. SSP 2011, Nice, France, June 2011.
- D. Pratas*, C. A. C. Bastos, A. J. Pinho, A. J. R. Neves, L. Matos. DNA synthetic sequences generation using multiple competing Markov models. SSP 2011, Nice, France, June 2011.
- D. Pratas*, A. J. Pinho. Compressing the human genome using exclusively Markov models. PACBB 2011, Salamanca, Spain, April 2011.
National Conference Articles
- A. Martins, D. Pratas, A. Pinho, S. Gouveia. Finding relevant features to enhance compression of categorical time series. RecPad 2024, Covilhã, Portugal, October 2024.
- M. J. P. Sousa, D. Pratas. A method for accurate reconstruction of persistent human viral sequences. RecPad 2023, Coimbra, Portugal, October 2023.
- M. J. P. Sousa, D. Pratas. A survey on computational tools for human viral genomes reconstruction. RecPad 2022, Leiria, Portugal, October 2022.
- M. J. P. Sousa, R. Ferrolho, T. Fonseca, A. J. Pinho, D. Pratas. Improving the compression of a complete Telomere-to-Telomere (T2T) human genome sequence. RecPad 2022, Leiria, Portugal, October 2022.
- J. M. Silva, D. Pratas, T. Caetano, S. Matos. Archaea Taxonomic Classification. RecPad 2021, Évora, Portugal, November 2021.
- J. M. Silva, D. Pratas, S. Matos. Comparison and Evaluation of Information-based Measures in Images. RecPad 2020, Évora, Portugal, October 2020.
- M. Hosseini, D. Pratas, A. J. Pinho. Clustering DNA sequences by relative compression. RecPad 2019, Porto, Portugal, October 2019.
- J. M. Silva, D. Pratas, S. Matos. Evaluation of Statistical Complexity in Viral Genome Sequences. RecPad 2019, Porto, Portugal, October 2019.
- M. Hosseini, D. Pratas, A. Amorim, J. Carneiro. Improving the detection of mtDNA rearrangements using a fast and accurate algorithm. ENBE 2019, Porto, Portugal, November 2019.
- A. Teixeira, D. Pratas, A. J. Pinho, R. Silva. Evolutionary insights from the comparative analysis of hominid genomes. RecPad 2018, Coimbra, October 2018.
- C. Figueiredo, D. Pratas, A. J. Pinho, R. Silva. Identification of antifungal targets using alignment-free methods. RecPad 2018, Coimbra, October 2018.
- D. Pratas*, R. Silva, A. J. Pinho, P. J. S. G. Ferreira. Detection and visualization of regions of human DNA not present in other primates. RecPad 2015, Faro, Portugal, October 2015.
- D. Pratas*, R. Silva, A. J. Pinho. Large-scale inversions between human reference assemblies. RecPad 2014, Covilhã, Portugal, October 2014.
- R. Silva, L. Castro, D. Pratas, A. J. Pinho. Towards personalized medicine: ebola virus absent words in the human genome. RecPad 2014, Covilhã, Portugal, October 2014.
- D. Pratas*, A. J. Pinho. Insights into primates genomic evolution using a compression distance. RecPad 2013, Lisbon, November 2013.
- D. Pratas*, A. J. Pinho. On the compression of FASTQ quality-scores. RecPad 2012, Coimbra, Portugal, October 2012.
- D. Pratas*, A. J. Pinho. M6: a method for compressing complete genomes using Markov models. DSIE 2012, Porto, Portugal, January 2012.
- D. Pratas*, S. P. Garcia, A. J. Pinho. Analysis of patterns in S. pombe genome through compression-based complexity profiles. RecPad 2011, Porto, Portugal, October 2011.
- D. Pratas*, A. J. Pinho. Analysis of DNA sequences using finite-context modelling and compression. RecPad 2010, Vila Real, Portugal, October 2010.
- D. Pratas*, A. J. Pinho, A. J. R. Neves, C. A. C. Bastos. DNA synthetic sequences generated by finite-context models. RecPad 2010, Vila Real, Portugal, October 2010.
Book Chapters
- A. J. Pinho, D. Pratas, S. P. Garcia. Compressing resequencing data with GReEn. In: Deep Sequencing Data Analysis, ed. Noam Shomron, Humana Press (Methods in Molecular Biology, Vol. 1038), pp. 27–37, July 2013.
Other
- M. J. P. Sousa, M. Toppinen, L. Pyöriä, K. Hedman, A. Sajantila, M. F. Perdomo, D. Pratas*. Comparative evaluation of computational methods for reconstruction of human viral genomes. bioRxiv, 2025.
- D. Pratas*, A. J. Pinho, R. M. Silva, J. M. O. S. Rodrigues, M. Hosseini, T. Caetano, P. J. S. G. Ferreira. FALCON-meta: A method to infer metagenomic composition of ancient DNA. bioRxiv, 2018.
Books
- D. Figueiredo, C. Martín-Vide, D. Pratas, M. A. Vega-Rodríguez. Algorithms for Computational Biology. Springer International Publishing, June 2017.
- D. Pratas. Compression and analysis of genomic data. PhD Thesis, University of Aveiro, Portugal, 2016.
Classes
Algorithmic Information Theory
Algorithmic Information Theory (AIT, or TAI in Portuguese) is a field at the intersection of computer science, mathematics, and information theory that explores how information can be measured, represented, and processed using algorithms. It provides a deep understanding of concepts such as data compression, randomness, and the limits of computation. AIT equips students with fundamental tools to reason about what information is, how to model sources of data, and how efficiently data can be encoded or processed. These ideas are essential for anyone working in areas like machine learning, cryptography, data science, and theoretical computer science. The AIT course at the University of Aveiro combines both theoretical and practical components. In addition to lectures, students engage in hands-on learning through three group-based practical assignments, where they apply the concepts studied in class to real problems. This structure helps students to consolidate their understanding and develop collaborative problem-solving skills that are valuable in both research and industry.
Datasets
Biological Datasets
Software

AltaiR is a fast-flexible C toolkit for alignment-free temporal analysis of multi-FASTA data in large genomic collections and for use in potential epidemic scenarios.

AlcoR is an alignment-free toolkit for analysis of low-complexity regions in biological data, supporting mapping, masking, simulation and visualization.

JARVIS3 is an efficient lossless data (de)compression tool tailored for genomic sequences with extension for compressing FASTA and FASTQ data.

FALCON is an ultra-fast method to infer metagenomic composition of sequenced reads while minimizing false positives and maximizing accuracy.

SPARK is a toolkit to simulate, search, and analyze exact or approximate Turing Machines using alignment-free methods and colorized visualizations.

CRYFA is an ultrafast encryption tool specifically designed for genomic data, while it can also compress FASTA / FASTQ data by a factor of three.

SMASH is a compression-based and alignment-free method to automatically find and visualise rearrangements between pairs of DNA sequences.

AC is a lossless compressor to compress efficiently amino acid sequences (proteins). It uses a cooperation between multiple context models.

GeCo3 is an efficient lossless genomic compressor that uses a neural network for expert mixing. It supports relative and conditional compression.
Challenge
Human genome sequence compression
- Provide a data compressor that improves the lossless and reference-free minimal representation of a human genome sequence (T2T Chm13 version 2.0).
- T2T Human genome sequence (link).
- T2T Human genome article (link).
- Current Leaderboard (link).
Contact
Address
IEETA, University of Aveiro Campus Universitário Santiago, 3810-193 Aveiro, Portugal.
Phone
+351 234 370 506
Affiliation
IEETA/LASI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro.
DETI, Department of Electronics, Telecommunications and Informatics, University of Aveiro.
DV, Department of Virology, University of Helsinki.