Commit a6335076 authored by Marco Chierici's avatar Marco Chierici
Browse files

Add resplitter.py

parent 1b6eb13f
......@@ -62,12 +62,25 @@ mv tcga* data
#### Data splits generation
To recreate the 10 data splits, run the following commands in a shell:
To recreate the 10 data splits, first run the following commands in a shell:
```bash
Rscript scripts/prepare_ACGT.R --tumor aml --suffix 03 --datadir data/original/Shamir_lab --outdir data/tcga_aml
Rscript scripts/prepare_ACGT.R --tumor kidney --suffix 01 --datadir data/original/Shamir_lab --outdir data/tcga_kirc
Rscript scripts/prepare_BRCA.R --task ER --datadir data/original --outdir data/tcga_brca
Rscript scripts/prepare_BRCA.R --task subtypes --datadir data/original --outdir data/tcga_brca
```
This creates 10 TR/TS partitions, with ID 0 to 9. To further partition into the 10 TR/TS/TS2 splits described in the paper, with ID 50 to 59 (you can use any other IDs), run in a shell:
```bash
for dataset in tcga_aml tcga_kirc; do
python resplitter.py --datafolder data/$dataset --target OS --n_splits_start 0 --n_splits_end 10 --split_offset 50
done
for target in ER subtypes; do
python resplitter.py --datafolder data/tcga_breast --target $target --n_splits_start 0 --n_splits_end 10 --split_offset 50
done
```
#### Input files
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment