README.md 1.7 KB
Newer Older
1
2
### INF pipeline

Alessia Marcolini's avatar
Alessia Marcolini committed
3
4
5
6
7
8
9
10
11
12
13
14
15
**Setup**
```bash
git clone https://gitlab.fbk.eu/MPBA/inf_revamped
cd inf_revamped
conda env create -f env.yml -n inf
conda activate inf
```

To install the R dependencies (not in conda channels), run the following command via the R prompt:
```bash
install.packages("TunePareto")
```

Alessia Marcolini's avatar
Alessia Marcolini committed
16
To install `mlpy`, follow the instructions [here](https://gitlab.fbk.eu/MPBA/mlpy). 
17

Alessia Marcolini's avatar
Alessia Marcolini committed
18
19
20
21
22
To install `openslide`:
```bash
apt-get install openslide-tools
pip install openslide-python
```
Alessia Marcolini's avatar
Alessia Marcolini committed
23
or follow the instructions [here](https://openslide.org/download/).
Alessia Marcolini's avatar
Alessia Marcolini committed
24

25
26
27
28
29
To install `bootstrapped`:
```bash
pip install bootstrapped
```

30
31
32
33
34
35
36
37
38
**Input files**

* omics layer 1 data: samples x features, tab-separated, with row & column names
* omics layer 2 data: same as above (**samples must be in the same order as the first file**)
* omics layers 1+2 data: the juxtaposition of the above two files
* labels file: one column, just the labels, no header (**same order as the data files**)

**Example run**

Marco Chierici's avatar
Marco Chierici committed
39
40
41
42
43
44
45
46
The INF pipeline is implemented as a Snakefile.

The following directory tree is required:

* {datafolder}/{dataset}/{layer1}_{layer2}_{tr,ts}.txt
* {datafolder}/{dataset}/labels_{target}_{tr,ts}.txt
* {datafolder}/{dataset}/{layer1,layer2}_{tr,ts}.txt
* {outfolder}/{dataset}/{target}/{juxt,rSNF,rSNFi,single}/ _(these will be created if not present)_
47

Marco Chierici's avatar
Marco Chierici committed
48
All the {variables} can be specified either in a config.yaml file or on the command line; for example:
49

Marco Chierici's avatar
Marco Chierici committed
50
51
```{python}
snakemake --config datafolder="data" dataset="breast" target="ER" layer1="gene" layer2="cnv"
52
```
Marco Chierici's avatar
Marco Chierici committed
53
54
55
56
57

A maximum number of cores can also be set:

```{python}
snakemake [--config etc.] --cores 12
58
59
```

Marco Chierici's avatar
Marco Chierici committed
60
The pipeline can be "dry-run" using the `-n` flag:
61

Marco Chierici's avatar
Marco Chierici committed
62
63
64
```{python}
snakemake --cores 12 -n
```