Application of machine learning for the clustering of wheat transcription factor proteins into families and sub-families

University essay from Stockholms universitet/Institutionen för data- och systemvetenskap

Abstract: Wheat plays an important role in ensuring the global food security. Salinity of soil and water poses a major threat to its production and it affects both growth and development of wheat in a negative way. Wheat plants uses certain molecular mechanisms to adapt themselves under the salt stress.Transcription factor proteins are the proteins that control the response of the wheat towards abiotic stress like salinity.There are 56 transcription factor protein families in the wheat genome. However these transcription factor protein families are not classified into subfamilies.The main goal of this research study is to understand how machine learning algorithm can be used to identify and cluster the transcription factor proteins into sub families that can help in associating them with specific biological processes like salt stress. In this project K Mean Clustering method is used to cluster the WRKY transcription factor family into subfamilies. WRKY is identified and clustered into three distinct clusters. Cluster validation is performed using external validation and resulted in 90% validation score. This method can be applied to other transcription factor families also. This can ultimately be helpful in producing salt-tolerant varieties of the wheat that are resistant to abiotic stress like salinity and this can help to improve crop yield.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)