PREDICTION OF PROTEIN SECONDARY STRUCTURE by Incorporating Biophysical Information into Artificial Neural Networks
Abstract: This project applied artificial neural networks to the field of secondary structure prediction of proteins. A NETtalk architecture with a window size 13 was used. Over-fitting was avoided by the use of 3 real numbers to represent amino acids, reducing the number of adjustable weights to 840. Two alternative representations of amino acids that incorporated biophysical data were created and tested. They were tested both separately and in combination on a standard 7-fold cross-validation set of 126 proteins. The best performance was achieved using an average result from two predictions. This was then filtered and gave the following results. Accuracy levels for core structures were: Q3total accuracy of 61.3% consisting of Q3 accuracy’s of 54.0%, 38.1% & 77.0% for Helix, Strand and Coil respectively with Matthew’s correlation’s Ca = 0.34, Cb = 0.26 , Cc = 0.31. The average lengths of structures predicted were 9.8, 4.9 and 11.0, for helix, sheet and coil respectively. These results are lower than those of other methods using single sequences and localist representations. The most likely reason for this is over generalisation caused by using a small number of units.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)