README.md 2.55 KB
Newer Older
1
# Integrative Network Fusion (INF)
Nicole Bussola's avatar
Nicole Bussola committed
2
![INF pipeline ](figs/INF_pipeline.jpeg)
3

4
## Setup
Alessia Marcolini's avatar
Alessia Marcolini committed
5
```bash
Alessia Marcolini's avatar
Alessia Marcolini committed
6
7
git clone https://gitlab.fbk.eu/MPBA/INF
cd INF
Alessia Marcolini's avatar
Alessia Marcolini committed
8
9
10
11
conda env create -f env.yml -n inf
conda activate inf
```

12
13
14
### Additional dependencies

#### R dependencies
Alessia Marcolini's avatar
Alessia Marcolini committed
15
16
17
18
19
To install the R dependencies (not in conda channels), run the following command via the R prompt:
```bash
install.packages("TunePareto")
```

20
21
22
23
24
25
#### MLPY
To install `mlpy` follow this instructions:

`mlpy` package is required for some operations included in the DAP procedure.

The `mlpy` package available on PyPI is outdated and not working on OSX platforms.
26

27
These are the steps to follow:
Alessia Marcolini's avatar
Alessia Marcolini committed
28

29
30
31
32
33
34
35
36
37
38
39
40
41
Let `<ANACONDA>` be your anaconda path (e.g., `/home/user/anaconda3`).

Adjust these environmental variables:
```bash
export LD_LIBRARY_PATH=<ANACONDA>/envs/<ENV>/lib:${LD_LIBRARY_PATH}
export CPATH=<ANACONDA>/envs/<ENV>/include:${CPATH}
```

and then install `mlpy` from GitLab:
```bash
pip install git+https://gitlab.fbk.eu/MPBA/mlpy.git
```

42

43
44
## Usage

45
46
47
48
49
50
51
52
53
**Input files**

* omics layer 1 data: samples x features, tab-separated, with row & column names
* omics layer 2 data: same as above (**samples must be in the same order as the first file**)
* omics layers 1+2 data: the juxtaposition of the above two files
* labels file: one column, just the labels, no header (**same order as the data files**)

**Example run**

Alessia Marcolini's avatar
Alessia Marcolini committed
54
The INF pipeline is implemented with a [Snakefile](https://snakemake.readthedocs.io/en/stable/index.html).
Marco Chierici's avatar
Marco Chierici committed
55
56
57

The following directory tree is required:

Alessia Marcolini's avatar
Alessia Marcolini committed
58
59
60
* `{datafolder}/{dataset}/{target}/{split_id}/{layer}_{tr,ts,ts2}.txt`
* `{datafolder}/{dataset}/{split_id}/labels_{target}_{tr,ts,ts2}.txt`
* `{outfolder}/{dataset}/{target}/{model}/{split_id}/{juxt,rSNF,rSNFi,single}` _(these will be created if not present)_
61

Alessia Marcolini's avatar
Alessia Marcolini committed
62
All the {variables} can be specified either in a config.yaml file or on the command line. 
63

Alessia Marcolini's avatar
Alessia Marcolini committed
64
65
66
67
Example:

```bash
snakemake --config datafolder=data outfolder=results dataset=tcga_brca target=ER layer1=gene layer2=cnv layer3=prot model=randomForest random=false split_id=0 -p 
68
```
Marco Chierici's avatar
Marco Chierici committed
69

Alessia Marcolini's avatar
Alessia Marcolini committed
70
71
72
This example showed an example pipeline using three omics layers from BRCA-ER dataset. You can use an arbitrary number of omics layers by adding or removing `layer` arguments accordingly.

A maximum number of cores can also be set (default is 1):
Marco Chierici's avatar
Marco Chierici committed
73

Alessia Marcolini's avatar
Alessia Marcolini committed
74
```bash
Marco Chierici's avatar
Marco Chierici committed
75
snakemake [--config etc.] --cores 12
76
77
```

Marco Chierici's avatar
Marco Chierici committed
78
The pipeline can be "dry-run" using the `-n` flag:
79

Alessia Marcolini's avatar
Alessia Marcolini committed
80
```bash
Marco Chierici's avatar
Marco Chierici committed
81
82
snakemake --cores 12 -n
```
Alessia Marcolini's avatar
Alessia Marcolini committed
83
84

A bash script (`runner.sh`) is provided for convenience, in order to run the pipeline for each split, to compute Borda of Bordas and to average metrics for all the splits.