GECO

An efficient compressor for genomic sequences

Introduction

What is GECO ?

GeCo is an efficient compressor for genomic sequences. GECO has been published in DCC Conference in March 2016.

It answers the following questions:

How do I get GeCo ?

Clone our repository and run CMake command:
git clone https://github.com/pratas/geco.git
cd geco/src/
cmake .
make
Alternatively to CMake, run:
git clone https://github.com/pratas/geco.git
cd geco/src/
cp Makefile.nix Makefile
make

Usage

To see the possible options type
./GeCo
or
./GeCo -h
These will print the following options:
Usage: GeCo [OPTION]... -r [FILE]  [FILE]:[...]
Compress and analyze a genomic sequence (by default, compress).

Non-mandatory arguments:

  -h                     give this help,
  -x                     show several running examples,
  -s                     show GeCo compression levels,
  -v                     verbose mode (more information),
  -V                     display version number,
  -f                     force overwrite of output,
  -l <level>             level of compression [1;9] (lazy -tm setup),
  -g <gamma>             mixture decayment forgetting factor. It is
                         a real value in the interval [0;1),
  -c <cache>             maximum collisions for hash cache. Memory
                         values are higly dependent of the parameter
                         specification,
  -e                     it creates a file with the extension ".iae"
                         with the respective information content. If
                         the file is FASTA or FASTQ it will only use
                         the "ACGT" (genomic) data,

  -r <FILE>              reference file ("-rm" are loaded here),

  -rm <c>:<d>:<i>:<m/e>  reference context model (ex:-rm 13:100:0:0/0),
  -rm <c>:<d>:<i>:<m/e>  reference context model (ex:-rm 18:1000:0:1/1000),
  ...
  -tm <c>:<d>:<i>:<m/e>  target context model (ex:-tm 4:1:0:0/0),
  -tm <c>:<d>:<i>:<m/e>  target context model (ex:-tm 18:20:1:2/10),
  ...
                         target and reference templates use <c> for
                         context-order size, <d> for alpha (1/<d>),
                         <i> (0 or 1) to set the usage of inverted
                         repeats (1 to use) and <m> to the maximum
                         allowed mutation on the context without
                         being discarded (usefull in deep contexts),
                         under the estimator <e>,

Mandatory arguments:

  <FILE>                 file to compress (last argument). For more
                         files use splitting ":" characters.

Report bugs to <{pratas,ap,pjf}@ua.pt>.

License

GPL v3. For more information see LICENSE file or visit
http://www.gnu.org/licenses/gpl-3.0.html