|
|
|
|
|
Identification of 5’UTR Splicing Site Using Sequence and Structural Specificities Based on Combination Statistical Method with SVM |
|
PP: 91-97 |
|
Author(s) |
|
Lv Jun-Jie,
Wang Ke-Jun,
Feng Wei-Xing,
Wang Xin,
Xiong Xin-yan,
|
|
Abstract |
|
To identify untranslated regions (UTR) splice sites more accurately and efficiently, a method for the recognition of UTR
splice sites using both splicing sequences and secondary structures of flank sequence information based on combination statistical
method with support vector machine was proposed. The method consists of two stages: a statistical method is used in the first stage and
a support vector machine (SVM) with polynomial kernel is used in the second stage. The statistical method serves as a pre-processing
step for the SVM and takes UTR sequences as its input. It models the compositional features and dependencies of nucleotides in terms
of probabilistic parameters around splice site regions. The probabilistic parameters are then fed into the SVM, which combines them
nonlinearly to predict splice sites. Then the Mfold package in Vienna soft was used to predict the most stable secondary structure of flank
sequences. The traditional four-letter alphabet was converted into eight-letter alphabet sequence. The sequence- structure combination
strings were used for training models then recognized splice sites by the well trained models. Using the actual 5’UTR splice dataset of
human gene tested the method; it shows a good performance for UTR splice sites recognition. |
|
|
|
|
|