Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
MPBA
INF
Commits
cd000267
Commit
cd000267
authored
Nov 25, 2019
by
Marco Chierici
Browse files
Updated README, runner.sh
parent
dd9fed29
Changes
2
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
cd000267
...
@@ -2,14 +2,15 @@
...
@@ -2,14 +2,15 @@
**Requirements**
**Requirements**
Python3 with mlpy (!), numpy, scikit-learn
Python3 with mlpy (!), numpy, scikit-learn, pandas, snakemake
R >= 3.2.3 with cvTools, doParallel, TunePareto, igraph
R >= 3.2.3 with argparse, cvTools, doParallel, TunePareto, igraph, lubridate, data.table
To install R via Anaconda:
[
doc
](
https://docs.anaconda.com/anaconda/user-guide/tasks/using-r-language/
)
To install R via Anaconda:
[
doc
](
https://docs.anaconda.com/anaconda/user-guide/tasks/using-r-language/
)
To install the R dependencies, run the following command from the R prompt:
To install the R dependencies, run the following command from the R prompt:
`install.packages(c("cvTools", "doParallel", "TunePareto", "igraph"))`
`install.packages(c(
"argparse",
"cvTools", "doParallel", "TunePareto", "igraph"
, "lubridate", "data.table"
))`
**Input files**
**Input files**
...
@@ -20,21 +21,29 @@ To install the R dependencies, run the following command from the R prompt:
...
@@ -20,21 +21,29 @@ To install the R dependencies, run the following command from the R prompt:
**Example run**
**Example run**
The original pipeline was reimplemented in a Makefile, with variables that can be set runtime.
The INF pipeline is implemented as a Snakefile.
The following directory tree is required:
*
{datafolder}/{dataset}/{layer1}_{layer2}_{tr,ts}.txt
*
{datafolder}/{dataset}/labels_{target}_{tr,ts}.txt
*
{datafolder}/{dataset}/{layer1,layer2}_{tr,ts}.txt
*
{outfolder}/{dataset}/{target}/{juxt,rSNF,rSNFi,single}/ _(these will be created if not present)_
A
n example is given in the
`runner.sh`
script
:
A
ll the {variables} can be specified either in a config.yaml file or on the command line; for example
:
```
{python}
snakemake --config datafolder="data" dataset="breast" target="ER" layer1="gene" layer2="cnv"
```
```
make -f run_INF_RF-KBest.mk \
OUTBASE=${OUT} \
A maximum number of cores can also be set:
# layer1 dataset
DATA1=data/AG1-G_145_LIT_ALL_tr.txt \
```
{python}
# layer2 dataset
snakemake [--config etc.] --cores 12
DATA2=data/CNV-G_145_LIT_ALL_tr.txt \
# layer1 + layer2 juxtaposed dataset
FILE=data/AG1-G_CNV-G_145_LIT_ALL_tr.txt \
# sample labels
LABEL=data/label_145_ALL-EFS_tr.lab
```
```
The pipeline can be "dry-run" using the
`-n`
flag:
```
{python}
snakemake --cores 12 -n
```
runner.sh
View file @
cd000267
#!/bin/bash
#!/bin/bash
# Example script for the INF pipeline
# Example script for the INF pipeline
# output folder
THREADS
=
12
OUT
=
results_breast
DATA_FOLDER
=
data/TCGA_data/Breast/INF
OUTFOLDER
=
results_breast
DATAFOLDER
=
data/breast
LAYER1
=
gene
LAYER1
=
gene
LAYER2
=
cnv
LAYER2
=
cnv
# prepare output tree
TARGET
=
ER
# go!
# go!
make all
\
snakemake
--cores
$THREADS
--config
datafolder
=
$DATAFOLDER
outfolder
=
$OUTFOLDER
target
=
$TARGET
layer1
=
$LAYER1
layer2
=
$LAYER2
OUTBASE
=
${
OUT
}
\
DATA1
=
${
DATA_FOLDER
}
/
${
LAYER1
}
_tr.txt
\
DATA2
=
${
DATA_FOLDER
}
/
${
LAYER2
}
_tr.txt
\
FILE
=
${
DATA_FOLDER
}
/
${
LAYER1
}
_
${
LAYER2
}
_tr.txt
\
LABEL
=
${
DATA_FOLDER
}
/labels_ER_tr.txt
\
ENDPOINT
=
breast_ER_
${
LAYER1
}
_
${
LAYER2
}
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment