Novel selenoprotein neighborhoods suggest specialized biochemical processes
2025
Daniel H. Haft | Igor Tolstoy
ABSTRACT Prokaryotic genomes encode selenoproteins sparsely, roughly one protein per 5,000. Finding novel selenoprotein families can expose unknown biological processes that are enabled, or at least enhanced, by having a selenium atom replace a sulfur atom in some cysteine residues. Here, we report the discovery of 18 novel selenoprotein families or second selenocysteine sites in previously unrecognized extensions of protein translations. Most of these families had some confounding factors—too small a family, too few selenoproteins in the family, selenocysteine (U) too close to one end, a skew toward understudied or uncultured lineages, and consequently were missed previously. Discoveries were triggered by observations during the ongoing construction of protein family models for the National Center for Biotechnology Information’s RefSeq and Prokaryotic Gene Annotation Pipeline or made by targeted searches for novel selenoproteins in the vicinity of known ones, rather than by any broadly applied genome mining method. Unrelated families TsoA, TsoB, TsoC, and TsoX are adjacent in tso (three selenoprotein operon) loci in the bacterial phylum Thermodesulfobacteriota. TrsS (third radical SAM selenoprotein) occurs strictly in the context of a molybdopterin-dependent aldehyde oxidoreductase. A short carboxy-terminal motif, U-X-X-stop (UXX-star), occurs in selenoproteins with various architectures, usually providing the second U in the protein. The multiple new selenocysteine insertion sites, selenoprotein families, and selenium-dependent operons we curated manually suggest that many more proteins and pathways remain to be discovered; once improved computational methods are applied comprehensively to the latest collections of microbial genomes and metagenomes, they may reveal surprising new biochemical processes.IMPORTANCENext-generation DNA sequencing and assembly of metagenome-assembled genomes (MAGs) for uncultured species of various microbiomes adds a vast “dark matter” of hard-to-decipher protein sequences. Selenoproteins, optimized by natural selection to encode selenocysteine where cysteine might have been encoded much more easily, carry a strong clue to their function—some specialized aspect of binding or catalysis. Operons with multiple adjacent, but otherwise unrelated, selenoproteins should provide even more vivid information. In this study, efforts in protein family construction and curation, aimed at improving the PGAP genome annotation pipeline, generated multiple novel selenoprotein-containing genomic contexts that may lead to the future characterization of several systems of proteins. Past observations suggest roles in the metabolic handling of trace elements (mercury, tungsten, arsenic, etc.) or of organic compounds refractory to simpler enzymatic pathways. In addition, the work significantly expands the truth set of validated selenoproteins, which should aid future, more automated genome mining efforts.
显示更多 [+] 显示较少 [-]