数据文件配置说明

安装数据文件,配置说明!

1、下载database.zip文件,

压缩包的目录结构为:

database

└── diamond_db

├── Animals.dmnd

├── Fungi.dmnd

├── ko.dmnd

└── Plants.dmnd

2、在软件安装目录解压缩database.zip文件。确保 VGenomics_RS/database/diamond_db 目录下存在以下4个文件:

├── Animals.dmnd

├── Fungi.dmnd

├── ko.dmnd

└── Plants.dmnd

测试数据详细信息!

测试数据:ref-based_test_rawdata.zip,

压缩包,包含文件列表如下所示:

├── chr22_with_ERCC92.fa

├── chr22_with_ERCC92.gtf

├── gene_GO_anno.txt

├── gene_KEGG_anno.txt

├── HBR_Rep1_ERCC-Mix2_Build37-ErccTranscripts-chr22.read1.fastq

├── HBR_Rep1_ERCC-Mix2_Build37-ErccTranscripts-chr22.read2.fastq

├── HBR_Rep2_ERCC-Mix2_Build37-ErccTranscripts-chr22.read1.fastq

├── HBR_Rep2_ERCC-Mix2_Build37-ErccTranscripts-chr22.read2.fastq

├── HBR_Rep3_ERCC-Mix2_Build37-ErccTranscripts-chr22.read1.fastq

├── HBR_Rep3_ERCC-Mix2_Build37-ErccTranscripts-chr22.read2.fastq

├── UHR_Rep1_ERCC-Mix1_Build37-ErccTranscripts-chr22.read1.fastq

├── UHR_Rep1_ERCC-Mix1_Build37-ErccTranscripts-chr22.read2.fastq

├── UHR_Rep2_ERCC-Mix1_Build37-ErccTranscripts-chr22.read1.fastq

├── UHR_Rep2_ERCC-Mix1_Build37-ErccTranscripts-chr22.read2.fastq

├── UHR_Rep3_ERCC-Mix1_Build37-ErccTranscripts-chr22.read1.fastq

├── UHR_Rep3_ERCC-Mix1_Build37-ErccTranscripts-chr22.read2.fastq

数据说明:

chr22_with_ERCC92.fa

only a single chromosome (chr22) and the ERCC spike-in, that is the human GRCh38 version of the genome from Ensembl.

chr22_with_ERCC92.gtf annotations obtained from Ensembl (Homo_sapiens.GRCh38.86.gtf.gz) for chromosome 22 only.
gene_GO_anno.txt GO functional annotation file.
gene_KEGG_anno.txt KEGG functional annotation file.
*.fastq

The test data consists of two commercially available RNA samples: Universal Human Reference (UHR)and Human Brain Reference (HBR) . The UHR is total RNA isolated from a diverse set of 10 cancer cell lines. The HBR is total RNA isolated from the brains of 23 Caucasians, male and female, of varying age but mostly 60-80 years old.

In addition, a spike-in control was used. Specifically we added an aliquot of the ERCC ExFold RNA Spike-In Control Mixes to each sample. The spike-in consists of 92 transcripts that are present in known concentrations across a wide abundance range (from very few copies to many copies).