I am a computer scientist with leading research interests in computational biology, bioinformatics, and data compression. I hold an Information and Communication Technologies Degree from the University of Aveiro (Portugal), with a segment carried at the Pontifical University of Salamanca (Spain). After, I worked in the private sector for a couple of years. Then, I rejoined the University of Aveiro and completed the Ph.D. in Informatics (2016) and the PostDoc in Computer Science (2019). In 2019, I worked as a Bioinformatician at the University of Helsinki (Finland). Currently, I am an auxiliary Scientist/Professor at the DETI/IEETA of the University of Aveiro and an Invited Scientist at the Department of Virology of the University of Helsinki. My memberships include the Super Dimension Fortress, the APRP, and the ESCV.
Current:
A. Cerqueira (MS Student)
A. Ferrolho (MS Student)
B. Simões (MS Student)
D. Yamunaque (MS Student)
L. Marques (MS Student)
M. J. Sousa (PhD Student)
M. Fernandes (MS Student)
M. Pinheiro (MS Student)
P. Pinto (MS Student)
R. Dias (MS Student)
R. Vieira (MS Student)
Alumni:
MS. D. Lei
MS. A. Lourenço
Dr. J. M. Silva
MS. M. Silva
Dr. M. Hosseini
MS. M. Gaspar
MS. R. Soares
MS. T. Fonseca
We develop new mathematical and computational models, including their efficient implementation into computer programs for biomedical, anthropological, and coding applications. We address both statistical and algorithmic natures creating innovative data mining and machine learning methodologies. We are currently working on projects such as the development of efficient biological data compression tools, reconstruction and analysis of ancient and extant viral genomes, identification of specific viral signatures, genomic variation quantification, classification of unknown sequences, and metagenomic analysis of ancient DNA samples. The following word map summarizes a part of our works.
In order to promote the development of efficient computational models for the minimal lossless representation of DNA and Amino Acid sequences, we hold two benchmarks. The latest top developments involve the use of neural networks for context mixing and cache hashes in weighted stochastic repeats models. Please, click on the following images to download the sequence data and benchmark with a new compression algorithm.
Provide a data compressor that improves the lossless and reference-free minimal representation of a human genome sequence (T2T Chm13 version 2.0 [article, sequence]).
Top 5 entries:
Ranking | Bytes | Bps | Time (m) | RAM (GB) | Program | Replication | Factor |
---|---|---|---|---|---|---|---|
1 | 544,059,173 | 1.396 | 389 | 28.8 | JARVIS2 | Run51 | |
2 | 544,267,353 | 1.396 | 420 | 27.4 | JARVIS2 | Run50 | |
3 | 544,292,577 | 1.397 | 399 | 26.9 | JARVIS2 | Run49 | |
4 | 545,960,947 | 1.401 | 283 | 26.9 | JARVIS2 | Run48 | |
5 | 549,594,830 | 1.410 | 284 | 11 | JARVIS2 | Run47 |