Subcellular Localization Prediction Service
Subcellular localization refers to the specific location of a protein or a gene expression product in a cell. Creative Proteomics has extensive proteomics experience and is a good partner for pharmaceutical companies or research institutions. Creative Proteomics provides a subcellular localization prediction service that can predict the subcellular location of a given protein sequence, helping you accelerate the study of the mechanisms and patterns of protein localization in cells. The prediction of subcellular localization of proteins will contribute to the study of protein structure, properties and functions, protein-protein interactions, disease mechanisms, and the development of new drugs.
The Process of Subcellular Localization Prediction Service
- Construction of objective representative protein data sets
Filter the data in the SWISS-PROT database.
The selection criteria include: 1) selecting the relevant protein sequence of a specific species for the research object; 2) when constructing a dataset, the subcellular location of each protein sequence needs to be known, so only sequences containing explicit subcellular localization information are selected for the dataset; 3) the sequence length should not be too short; 4) the data redundancy, which requires low homology; 5) the sub-cellular category with too little sample size is excluded.
- Extraction of protein characteristic information
Extracting characteristic information from protein sequences. Then describe or represent the information in an appropriate mathematical way to correctly reflect the relationship between sequence and structure or function. The extracted feature information is mainly divided into three categories: 1) based on the composition and properties of amino acids; 2) based on the N-terminal sorting signal of protein sequences; 3) based on functional domains and gene annotations. We will use a combination of multiple feature information to ensure more accurate results.
- Protein subcellular localization prediction algorithm
The main algorithms include 5 categories: 1) method based on simple selection of discrimination rules; 2) nearest neighbor method based on distance measurement; 3) methods based on artificial neural network; 4) method based on markov model; 5) method based on vector machine.
We can provide you with these three commonly used algorithms: neural networks, support vector machines, and nearest neighbor algorithms.
- Neural network is a mathematical model of distributed parallel information processing, which imitates the behavior characteristics of animal neural network. It has good robustness and fault tolerance.
- Support vector machine (SVM) is a classification technique based on statistical learning theory. It finds an optimal classification surface that minimizes the classification error in the high-dimensional space mapped by the protein eigenvectors.
- The distance-based nearest neighbor method measures the similarity between samples according to some distance measurement method. The closer the two samples are likely to appear in the same organelle. This method does not require artificial selection parameters. It is suitable for solving large-scale problems, and the calculation speed is fast.
If you are unsure which strategy is better for you, please contact us. We are glad to discuss with you.
- Validation and evaluation of predictive algorithms
We will choose the following methods to test the accuracy of the algorithm according to the actual situation: self-consistency test, independence test, and leave-one-outcross-validation. Leaving a cross-validation is our preferred method because the results are more rigorous and reliable.
The commonly used evaluation indicators for evaluating predictive algorithms are sensitivity, specificity and Matthew correlation coefficient.
Want to Know about Other Protein Subcellular Localization Techniques?
- Sperschneider J, Catanzariti A M, et al. LOCALIZER: subcellular localization prediction of both plant and effector proteins in the plant cell. Scientific reports, 2017, 7(1): 1-14.
- Lu Z, Szafron D, Greiner R, et al. Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics, 2004, 20(4): 547-556.
- Nakai K. Protein sorting signals and prediction of subcellular localization. Advances in protein chemistry, 2000, 54: 277-344.
*For Research Use Only. Not for use in diagnostic procedures.