Ana D. Simonović, Milan B. Dragićević, Milica D. Bogdanović, Milana M. Trifunović-Momčilov, Angelina R. Subotić, Slađana I. Todorović


Over 20% of all protein domains are currently annotated as “domains of unknown function” or DUFs. In a recently identified Centaurium erythraea arabinogalactan peptide, CeAGP3 (AGN92423), a conserved DUF1070 domain was found. Since identifying functions for DUFs is important in systems biology, we have analyzed the distribution and structure of DUF1070 domain (pfam06376) using a set of bioinformatics tools. There are 271 publically available DUF1070 members from 25 diverse families of vascular plants, and most are short sequences (50-100 aa). The N-terminal signal peptide (Nsp) was found in almost all complete sequences. In 233 sequences, at least two noncontiguous prolines were found as clustered dipeptides predicted to be hydroxylated and glycosylated with type II arabino-3,6-galactans, thus representing AG-II glycomodules. In addition, 35 sequences contained a region rich in basic residues (basic linker, BL). The N-terminal part of the DUF1070 domain is comprised of (part of) AG-II and/or BL, while the highly conserved C-terminus is a region of 26 aa, termed SH26. In 212 sequences, SH26 was a typical glycosylphosphatidylinositol lipid anchor signal peptide (GPIsp), but in 83 cases GPIsp was not predicted due to software constraints. In sequences where both Nsp and GPIsp were predicted, the length of mature peptides could be calculated, and it was 10-16 aa. Our analysis suggests that DUF1070 members are arabinogalactan (AG) peptides, of which the majority are GPI-anchored. DUF1070 is the only conserved domain found in classical arabinogalactan proteins and AG peptides. The SH26 region can be used for mining and annotation of AG peptides.

DOI: 10.2298/ABS151120023S

Key words: AG peptides; arabinogalactan proteins; DUF1070; GPI anchor; pfam06376

Received: November 20, 2015; Revised: January 31, 2015; Accepted: February 1, 2016; Published online: March 17, 2016

How to cite this article: Simonović AD, Dragićević MB, Bogdanović MD, Trifunović-Momčilov MM, Subotić AR, Todorović SI. DUF1070 as a signature domain of a subclass of arabinogalactan peptides. Arch Biol Sci. 2016;68(4):737-46.

Full Text:



Showalter AM. (2001) Arabinogalactan-proteins: structure, expression and function. Cell Mol Life Sci. 2001;58(10):1399-417.

Seifert GJ, Roberts K. The biology of arabinogalactan proteins. Annu Rev Plant Biol. 2007;58:137-61.

Ellis M, Egelund J, Schultz CJ, Bacic A. Arabinogalactan-proteins: key regulators at the cell surface? Plant Phys. 2010;153(2):403-19.

Simonović AD, Filipović BK, Trifunović MM, Malkov SN, Milinković VP, Jevremović SB, Subotić AR. Plant regeneration in leaf culture of Centaurium erythraea Rafn. Part 2: The role of arabinogalactan proteins. Plant Cell Tiss Org. 2015;121(3):721-39.

Schultz CJ, Johnson KL, Currie G, Bacic A. The classical arabinogalactan protein gene family of Arabidopsis. Plant Cell. 2000;12(9):1751-67.

Schultz CJ, Rumsewicz MP, Johnson KL, Jones BJ, Gaspar YM, Bacic A. Using genomic resources to guide research directions. The arabinogalactan protein gene family as a test case. Plant Physiol. 2002;129(4):1448-63.

Ma H, Zhao J. Genome-wide identification, classification, and expression analysis of the arabinogalactan protein gene family in rice (Oryza sativa L.). J Exp Bot. 2010;61(10):2647-68.

Tan L, Showalter AM, Egelund J, Hernandez-Sanchez A, Doblin MS, Bacic A. Arabinogalactan-proteins and the research challenges for these enigmatic plant cell surface proteoglycans. Front Plant Sci. 2012;3:140.

Gaspar Y, Johnson KL, McKenna JA, Bacic A, Schultz CJ. The complex structures of arabinogalactan-proteins and the journey towards understanding function. Plant Mol Biol. 2001;7:161-76.

Schultz CJ, Ferguson KL, Lahnstein J, Bacic A. Post-translational Modifications of Arabinogalactan-peptides of Arabidopsis thaliana endoplasmic reticulum and glycosylphosphatidylinositol-anchor signal cleavage sites and hydroxylation of proline. J Biol Chem. 2004;279(44):45503-11.

Schultz C, Gilson P, Oxley D, Youl J, Bacic A. GPI-anchors on arabinogalactan-proteins: implications for signalling in plants. Trends Plant Sci. 1998;3(11):426-31.

Borner GH, Sherrier DJ, Stevens TJ, Arkin IT, Dupree P. Prediction of glycosylphosphatidylinositol-anchored proteins in Arabidopsis. A genomic analysis. Plant Phys. 2002;129(2):486-99.

Eisenhaber B, Wildpaner M, Schultz CJ, Borner GHH, Dupree P, Eisenhaber F. Glycosylphosphatidylinositol lipid anchoring of plant proteins. Sensitive prediction from sequence- and genome-wide studies for Arabidopsis and rice. Plant Physiol. 2003;133(4):1691-701.

Bateman A, Coggill P, Finn RD. DUFs: families in search of function. Acta Crystallogr Sect F Struct Biol Cryst Commun. 2010;66(10):1148-52.

Goodacre NF, Gerloff DL, Uetz P. Protein domains of unknown function are essential in bacteria. mBio. 2013;5(1):e00744-13.

Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He J, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Marchler GH, Song JS, Thanki N, Wang Z, Yamashita RA, Zhang D, Zheng C, Bryant SH. CDD: NCBI's conserved domain database. Nucleic Acids Res. 2015;43:D222-6.

Geer LY, Domrachev M, Lipman DJ, Bryant SH. CDART: protein homology by domain architecture. Genome Res.2002;12(10):1619-23.

Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Version 2 – a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25(9):1189-91.

Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8:785-6.

Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol. 2000;300(4):1005-16.

Lassmann T, Sonnhammer E. Kalign - an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics. 2005;6:298.

Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. P Natl Acad Sci USA. 1992;89:10915-9.

Wickham H. stringr: Make it easier to work with strings; R package version 0.6.2. 2012.

R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2014.

Wickham H. ggplot2: elegant graphics for data analysis. New York: Springer Science & Business Media; 2009.

Auguie B. gridExtra: functions in Grid graphics. R package version 0.9.1. 2012.

Chen H. VennDiagram: Generate high-resolution Venn and Euler plots. R package, version 1.6.9. 2014.

Popper Z. Evolution and diversity of green plant cell walls. Curr. Opin. Plant Biol. 2008;11(3)286-92.

Showalter AM, Keppler BD, Lichtenberg J, Gu D, Welch LR. A bioinformatics approach to the identification, classification, and analysis of hydroxyproline-rich glycoproteins. Plant Phys. 2010;153:485-513.

Kumar A, Bachhawat AK. Pyroglutamic acid: throwing light on a lightly studied metabolite. Curr Sci. 2012;102(2):288-97.

Schilling S, Stenzel I, von Bohlen A, Wermann M, Schulz K, Demuth HU, Wasternack C. Isolation and characterization of the glutaminyl cyclases from Solanum tuberosum and Arabidopsis thaliana: implications for physiological functions. Biol Chem. 2007;388(2):145-53.

Eisenhaber B, Bork P, Eisenhaber F. Sequence properties of GPI-anchored proteins near the omega-site: constraints for the polypeptide binding site of the putative transamidase. Protein Eng. 1998;1(12):1155-61.

Tillett RL, Wheatley MD, Tattersall EA, Schlauch KA, Cramer GR, Cushman JC. The Vitis vinifera C‐repeat binding protein 4 (VvCBF4) transcriptional factor enhances freezing tolerance in wine grape. Plant Biotechnol J. 2012;10(1):105-24.

Dalley JA, Bulleid NJ. The endoplasmic reticulum (ER) translocon can differentiate between hydrophobic sequences allowing signals for glycosylphosphatidylinositol anchor addition to be fully translocated into the ER lumen. J Biol Chem. 2003;278(51):51749-57.


  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.