ShaoPeng Wang, Yu-Hang Zhang, GuoHua Huang, Lei Chen* and Yu-Dong Cai* Pages 96 - 106 ( 11 )
Background: Myristoylation is an important hydrophobic post-translational modification that is covalently bound to the amino group of Gly residues on the N-terminus of proteins. The many diverse functions of myristoylation on proteins, such as membrane targeting, signal pathway regulation and apoptosis, are largely due to the lipid modification, whereas abnormal or irregular myristoylation on proteins can lead to several pathological changes in the cell.Objective: To better understand the function of myristoylated sites and to correctly identify them in protein sequences, this study conducted a novel computational investigation on identifying myristoylation sites in protein sequences. Materials and Methods: A training dataset with 196 positive and 84 negative peptide segments were obtained. Four types of features derived from the peptide segments following the myristoylation sites were used to specify myristoylatedand non-myristoylated sites. Then, feature selection methods including maximum relevance and minimum redundancy (mRMR), incremental feature selection (IFS), and a machine learning algorithm (extreme learning machine method) were adopted to extract optimal features for the algorithm to identify myristoylation sites in protein sequences, thereby building an optimal prediction model. Results: As a result, 41 key features were extracted and used to build an optimal prediction model. The effectiveness of the optimal prediction model was further validated by its performance on a test dataset. Furthermore, detailed analyses were also performed on the extracted 41 features to gain insight into the mechanism of myristoylation modification. Conclusion: This study provided a new computational method for identifying myristoylation sites in protein sequences. We believe that it can be a useful tool to predict myristoylation sites from protein sequences.
Post-translational modification, myristoylation site prediction, modified glycine residue, extreme learning machine, minimum redundancy maximum relevance, incremental feature selection.
School of Life Sciences, Shanghai University, Shanghai 200444, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, Department of Mathematics, Shaoyang University, Shaoyang, Hunan 422000, College of Information Engineering, Shanghai Maritime University, Shanghai 201306, School of Life Sciences, Shanghai University, Shanghai 200444