How using wapRNA for RNA data analysis?
1.1  Sequencing platform choosing

wapRNA supports two sequencing methods: SOLiD and Solexa. Users should choose the sequencing technology and experiment method first.

Figure 1: A screen shot of sequencing platforms Back 

1.2  Data file uploading

RNA analysis starts with raw data uploading. Raw Data must be prescriptive format, csfasta for SOLiD and fastq for Solexa. Only the sequence file (eg. test.csfasta) and the quality file (eg. test.qual) are requested to upload. To reduce upload time, we only accept the compressed file (*.zip or *.tar.gz). The total size of raw data must be less than 2GB. Input file format:

(1) csfasta format

>1279_6_61_F3
T300102110122020021202223233131010323111.1213322233
>1279_6_64_F3
T201302312122203001300011300301030313112.1210132001
>1279_6_81_F3
T203102113202103300003321320032000010112.2030021200
>1279_6_97_F3
T230123100123121321002211002200201000302.3310223100
>1279_6_145_F3
T331033013131233321013201203100311303310.3010023100

(2) qual format

>853_6_44_F3
8 6 6 9 22 18 4 8 5 20 8 11 21 8 5 23 6 11 23 7 2 15 16 5 7 20 13 14 6 10 8 21 18 22 16 6 21 17 7 7 5 27 21 15 5 6 6 5 8 2
>853_6_159_F3
6 8 10 4 5 22 24 6 11 7 26 23 13 11 16 8 8 5 5 8 13 8 7 11 4 6 8 8 3 5 7 8 7 5 10 4 2 11 10 24 2 3 5 14 8 6 2 2 5 24
>853_6_399_F3
4 14 8 27 6 7 8 11 3 16 13 26 14 3 11 21 4 22 5 24 5 11 5 7 20 26 5 4 8 22 9 8 3 2 25 15 24 16 11 23 14 10 4 11 16 7 6 17 13 11
>853_6_606_F3
12 25 5 16 11 12 12 8 8 6 8 4 4 12 2 5 13 4 4 5 9 4 5 7 11 9 5 8 20 21 8 12 5 16 19 13 8 5 16 20 18 5 17 11 8 20 18 13 5 13
>853_6_639_F3
4 2 13 3 6 2 18 3 6 7 16 16 3 5 9 9 13 6 11 11 11 19 3 14 3 13 23 8 5 8 8 5 4 9 23 8 8 8 15 2 8 20 8 4 2 8 11 12 8 6

(3) fastq format

@3_B2SbLmlF321/1
ACAGGTTTAAAAATGTTTTATAAAGTGCTAAAGTTGTGTTTAAA
+
hhXhhhhhabghKhh_hhhhhZQNhRh^]KPIhP[hLhKVRLDH
@5_H4SbLmlF321/1
ATACATTTCTTCAATACTAAAATCATTAACTACAAGTTGTTTAA
+
hhhhhhhhhhhhh`hhhha^h[hNfgY_SNMKSVFQKRPNIPMM
@1_z0SbLmlF321/1
GTTTCTTAGGCAGAAAAGAAGGAACGAAAGAGTTAACTCATATT
+
hhhhhhhhhhfThhh^ZhRUhaWaVhUPShHhRNGSNSKKXIVS
@7_IyRbLmlF321/1
ATCGTTTCCTAGTGCATCAATGAAAGGTTTTAAAGTAGCTACTG
+
\hbhbhhfbh]h\hffh]WOhhTOOcSM[dhMNBPFQUHRIBRV

Back 

1.3  Filter parameters

Users could choose to filter the low quality reads of uploaded raw data by setting the filter parameters.

Figure 2: A screen shot of filter parameter Back 

1.4  Reference sequence choosing

In the current version, one of the following 10 species reference data (genome sequences and junction sequences) can be chosen:

  • Homo sapiens (ENSEMBL 62, GRCh 37)
  • Mus musculus (ENSEMBL 62, NCBI 37)
  • Rattus norvegicus (ENSEMBL 62, GRSC 3.4)
  • Caenorhabditis elegans (ENSEMBL 62, WS220)
  • Danio rerio (ENSEMBL 62, Zv9)
  • Macaca mulatta (ENSEMBL 62, MMUL_1)
  • Pan troglodytes (ENSEMBL 62, CHIMP2.1)
  • Gallus gallus (ENSEMBL 62, WASHUC 2)
  • Drosophila melanogaster (ENSEMBL 62, BEGP5.25)
  • Sus Scrofa (ENSEMBL 62 Sscrofa9)

Users must choose the species used for your sample data (as in Figure 3, showed for test data reference preparing).

Figure 3: A screen shot of our test data reference preparing Back 

1.5  Mapping parameters

wapRNA supports two different mapping program for different next generation sequencing platform, Corona_Lite for SOLiD and BWA for Solexa.

SOLiD data

wapRNA uses Corona_Lite_Plus_4.2.1 program to map raw reads to the references selected by user.

Figure 4: A screen shot of test data (SOLiD) mapping page

General parameters

SOLiD/Solexa raw data has a higher sequencing error than Sanger sequencing, but also have a much higher redundancy and quantity. Therefore, the user should carefully choose the number of allowed errors to perform mapping.

-t    Tag length
Option: 25, 30, 35, 40, 45 and 50
-e    Number of mismatch
Option: 0, 1, 2, 3, 4 or 5
-z    Maximum number of hits
The default value is 10.
Note:
(i) Tag length of SOLiD and Solexa data have 25, 35 and 50 base pair now;
(ii) The error rate of tag is higher in the tail part than in the head part, so some user want to cut the tail part during mapping;
(iii) Recover some tags by mapping tag's head;
(iv) Tag length and number of errors are corresponding to one another; you can choose multiple -t and -e. The mapping program will map all reads against the reference sequences using the first defined "-t and -e" parameter, then align the unmatched reads against the reference again using the second pair of parameters, it will end until there is no defined parameters.

Solexa data

wapRNA use BWA program to map raw reads to the references selected by user.

Figure 5: A screen shot of the Solexa mapping page

General parameters

-l    Seeds length
-k    Seeds mismatches
-n    Total mismatches in the first 24bp

Back 

1.6  Advanced option

wapRNA supports two kinds of functional analysis tools, GO and KEGG. Users could do these functions by setting the optional parameters.

GO

GO (Gene ontology) is provided for downstream gene functional clustering and classification.

wapRNA provides two ways of GO analysis, one is the second class (including all the vocabularies of cellular component, biological process, and molecular function), the other is the third class (one of the above three vocabularies selected by users).

Figure 6: A screen shot of GO parameters

KEGG

KEGG PATHWAY is a collection of manually drawn pathway maps representing the knowledge on the molecular interaction and reaction networks and structure relationships.

Back 

7  Email address

wapRNA will notify users by sending a message when each step of the task is finished. So users should write an valid email address here to receive the messages.

Back 

©BIG 2010, Beijing Institute of Genomics, Chinese Academy of Sciences
No.7 Beitucheng West Road, Chaoyang District, Beijing 100029, PR China