Commit cd000267 authored by Marco Chierici's avatar Marco Chierici
Browse files

Updated README, runner.sh

parent dd9fed29
...@@ -2,14 +2,15 @@ ...@@ -2,14 +2,15 @@
**Requirements** **Requirements**
Python3 with mlpy (!), numpy, scikit-learn Python3 with mlpy (!), numpy, scikit-learn, pandas, snakemake
R >= 3.2.3 with cvTools, doParallel, TunePareto, igraph
R >= 3.2.3 with argparse, cvTools, doParallel, TunePareto, igraph, lubridate, data.table
To install R via Anaconda: [doc](https://docs.anaconda.com/anaconda/user-guide/tasks/using-r-language/) To install R via Anaconda: [doc](https://docs.anaconda.com/anaconda/user-guide/tasks/using-r-language/)
To install the R dependencies, run the following command from the R prompt: To install the R dependencies, run the following command from the R prompt:
`install.packages(c("cvTools", "doParallel", "TunePareto", "igraph"))` `install.packages(c("argparse", "cvTools", "doParallel", "TunePareto", "igraph", "lubridate", "data.table"))`
**Input files** **Input files**
...@@ -20,21 +21,29 @@ To install the R dependencies, run the following command from the R prompt: ...@@ -20,21 +21,29 @@ To install the R dependencies, run the following command from the R prompt:
**Example run** **Example run**
The original pipeline was reimplemented in a Makefile, with variables that can be set runtime. The INF pipeline is implemented as a Snakefile.
The following directory tree is required:
* {datafolder}/{dataset}/{layer1}_{layer2}_{tr,ts}.txt
* {datafolder}/{dataset}/labels_{target}_{tr,ts}.txt
* {datafolder}/{dataset}/{layer1,layer2}_{tr,ts}.txt
* {outfolder}/{dataset}/{target}/{juxt,rSNF,rSNFi,single}/ _(these will be created if not present)_
An example is given in the `runner.sh` script: All the {variables} can be specified either in a config.yaml file or on the command line; for example:
```{python}
snakemake --config datafolder="data" dataset="breast" target="ER" layer1="gene" layer2="cnv"
``` ```
make -f run_INF_RF-KBest.mk \
OUTBASE=${OUT} \ A maximum number of cores can also be set:
# layer1 dataset
DATA1=data/AG1-G_145_LIT_ALL_tr.txt \ ```{python}
# layer2 dataset snakemake [--config etc.] --cores 12
DATA2=data/CNV-G_145_LIT_ALL_tr.txt \
# layer1 + layer2 juxtaposed dataset
FILE=data/AG1-G_CNV-G_145_LIT_ALL_tr.txt \
# sample labels
LABEL=data/label_145_ALL-EFS_tr.lab
``` ```
The pipeline can be "dry-run" using the `-n` flag:
```{python}
snakemake --cores 12 -n
```
#!/bin/bash #!/bin/bash
# Example script for the INF pipeline # Example script for the INF pipeline
# output folder THREADS=12
OUT=results_breast
DATA_FOLDER=data/TCGA_data/Breast/INF
OUTFOLDER=results_breast
DATAFOLDER=data/breast
LAYER1=gene LAYER1=gene
LAYER2=cnv LAYER2=cnv
# prepare output tree TARGET=ER
# go! # go!
make all \ snakemake --cores $THREADS --config datafolder=$DATAFOLDER outfolder=$OUTFOLDER target=$TARGET layer1=$LAYER1 layer2=$LAYER2
OUTBASE=${OUT} \
DATA1=${DATA_FOLDER}/${LAYER1}_tr.txt \
DATA2=${DATA_FOLDER}/${LAYER2}_tr.txt \
FILE=${DATA_FOLDER}/${LAYER1}_${LAYER2}_tr.txt \
LABEL=${DATA_FOLDER}/labels_ER_tr.txt \
ENDPOINT=breast_ER_${LAYER1}_${LAYER2}
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment