Title

A Machine Learning Approach to Prioritizing Functionally Active F-box Members in Arabidopsis thaliana

Document Type

Article

Publication Date

5-28-2021

Abstract

Protein degradation through the Ubiquitin (Ub)-26S Proteasome System (UPS) is a major gene expression regulatory pathway in plants. In this pathway, the 76-amino acid Ub proteins are covalently linked onto a large array of UPS substrates with the help of three enzymes (E1 activating, E2 conjugating, and E3 ligating enzymes) and direct them for turnover in the 26S proteasome complex. The S-phase Kinase-associated Protein 1 (Skp1), CUL1, F-box (FBX) protein (SCF) complexes have been identified as the largest E3 ligase group in plants due to the dramatic number expansion of the FBX genes in plant genomes. Since it is the FBX proteins that recognize and determine the specificity of SCF substrates, much effort has been done to characterize their genomic, physiological, and biochemical roles in the past two decades of functional genomic studies. However, the sheer size and high sequence diversity of the FBX gene family demands new approaches to uncover unknown functions. In this work, we first identified 82 known FBX members that have been functionally characterized up to date in Arabidopsis thaliana. Through comparing the genomic structure, evolutionary selection, expression patterns, domain compositions, and functional activities between known and unknown FBX gene members, we developed a neural network machine learning approach to predict whether an unknown FBX member is likely functionally active in Arabidopsis, thereby facilitating its future functional characterization.

COinS