An efficient compressor for genomic sequences


What is GECO ?

GeCo is an efficient compressor for genomic sequences. GECO has been published in DCC Conference in March 2016.

It answers the following questions:

How do I get GeCo ?

Clone our repository and run CMake command:
git clone
cd geco/src/
cmake .
Alternatively to CMake, run:
git clone
cd geco/src/
cp Makefile.nix Makefile


To see the possible options type
./GeCo -h
These will print the following options:
Usage: GeCo [OPTION]... -r [FILE]  [FILE]:[...]
Compress and analyze a genomic sequence (by default, compress).

Non-mandatory arguments:

  -h                     give this help,
  -x                     show several running examples,
  -s                     show GeCo compression levels,
  -v                     verbose mode (more information),
  -V                     display version number,
  -f                     force overwrite of output,
  -l <level>             level of compression [1;9] (lazy -tm setup),
  -g <gamma>             mixture decayment forgetting factor. It is
                         a real value in the interval [0;1),
  -c <cache>             maximum collisions for hash cache. Memory
                         values are higly dependent of the parameter
  -e                     it creates a file with the extension ".iae"
                         with the respective information content. If
                         the file is FASTA or FASTQ it will only use
                         the "ACGT" (genomic) data,

  -r <FILE>              reference file ("-rm" are loaded here),

  -rm <c>:<d>:<i>:<m/e>  reference context model (ex:-rm 13:100:0:0/0),
  -rm <c>:<d>:<i>:<m/e>  reference context model (ex:-rm 18:1000:0:1/1000),
  -tm <c>:<d>:<i>:<m/e>  target context model (ex:-tm 4:1:0:0/0),
  -tm <c>:<d>:<i>:<m/e>  target context model (ex:-tm 18:20:1:2/10),
                         target and reference templates use <c> for
                         context-order size, <d> for alpha (1/<d>),
                         <i> (0 or 1) to set the usage of inverted
                         repeats (1 to use) and <m> to the maximum
                         allowed mutation on the context without
                         being discarded (usefull in deep contexts),
                         under the estimator <e>,

Mandatory arguments:

  <FILE>                 file to compress (last argument). For more
                         files use splitting ":" characters.

Report bugs to <{pratas,ap,pjf}>.


GPL v3. For more information see LICENSE file or visit