Commit cd000267 authored by Marco Chierici's avatar Marco Chierici
Browse files

Updated README, runner.sh

parent dd9fed29
......@@ -2,14 +2,15 @@
**Requirements**
Python3 with mlpy (!), numpy, scikit-learn
R >= 3.2.3 with cvTools, doParallel, TunePareto, igraph
Python3 with mlpy (!), numpy, scikit-learn, pandas, snakemake
R >= 3.2.3 with argparse, cvTools, doParallel, TunePareto, igraph, lubridate, data.table
To install R via Anaconda: [doc](https://docs.anaconda.com/anaconda/user-guide/tasks/using-r-language/)
To install the R dependencies, run the following command from the R prompt:
`install.packages(c("cvTools", "doParallel", "TunePareto", "igraph"))`
`install.packages(c("argparse", "cvTools", "doParallel", "TunePareto", "igraph", "lubridate", "data.table"))`
**Input files**
......@@ -20,21 +21,29 @@ To install the R dependencies, run the following command from the R prompt:
**Example run**
The original pipeline was reimplemented in a Makefile, with variables that can be set runtime.
The INF pipeline is implemented as a Snakefile.
The following directory tree is required:
* {datafolder}/{dataset}/{layer1}_{layer2}_{tr,ts}.txt
* {datafolder}/{dataset}/labels_{target}_{tr,ts}.txt
* {datafolder}/{dataset}/{layer1,layer2}_{tr,ts}.txt
* {outfolder}/{dataset}/{target}/{juxt,rSNF,rSNFi,single}/ _(these will be created if not present)_
An example is given in the `runner.sh` script:
All the {variables} can be specified either in a config.yaml file or on the command line; for example:
```{python}
snakemake --config datafolder="data" dataset="breast" target="ER" layer1="gene" layer2="cnv"
```
make -f run_INF_RF-KBest.mk \
OUTBASE=${OUT} \
# layer1 dataset
DATA1=data/AG1-G_145_LIT_ALL_tr.txt \
# layer2 dataset
DATA2=data/CNV-G_145_LIT_ALL_tr.txt \
# layer1 + layer2 juxtaposed dataset
FILE=data/AG1-G_CNV-G_145_LIT_ALL_tr.txt \
# sample labels
LABEL=data/label_145_ALL-EFS_tr.lab
A maximum number of cores can also be set:
```{python}
snakemake [--config etc.] --cores 12
```
The pipeline can be "dry-run" using the `-n` flag:
```{python}
snakemake --cores 12 -n
```
#!/bin/bash
# Example script for the INF pipeline
# output folder
OUT=results_breast
DATA_FOLDER=data/TCGA_data/Breast/INF
THREADS=12
OUTFOLDER=results_breast
DATAFOLDER=data/breast
LAYER1=gene
LAYER2=cnv
# prepare output tree
TARGET=ER
# go!
make all \
OUTBASE=${OUT} \
DATA1=${DATA_FOLDER}/${LAYER1}_tr.txt \
DATA2=${DATA_FOLDER}/${LAYER2}_tr.txt \
FILE=${DATA_FOLDER}/${LAYER1}_${LAYER2}_tr.txt \
LABEL=${DATA_FOLDER}/labels_ER_tr.txt \
ENDPOINT=breast_ER_${LAYER1}_${LAYER2}
snakemake --cores $THREADS --config datafolder=$DATAFOLDER outfolder=$OUTFOLDER target=$TARGET layer1=$LAYER1 layer2=$LAYER2
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment