We have determined the complete sequence of the 637-kilodalton precursor for the proline-rich polypeptides (PRPs). This protein is encoded in one large exon of a single copy gene. The acidic precursor of 5761 residues comprises a signal peptide and three large domains displaying a high proline content (11-15%). The sequence of domain A (928 residues) is unique and contains several small clusters of acidic amino acids. Domain B (830 residues) exhibits seven tandem repeats, four of them displaying a strongly diverged sequence. In domain C (3914 residues) 39 units, of which only 8 are degenerate, occur in a tandem repeat. Their sequence of 100 amino acids shows a high structural similarity (76-92%) and contains all the PRP variants which are produced by specific proteolytic processing. The COOH-terminal part (35 residues) is basic. Two variant PRP-precursor alleles occur which slightly differ in the number of repeats in domain C. The high degree of sequence conservation within the repeat regions suggests that the gene presumably evolved by multiple amplification and dispersion of two internal segments. In the 5097-base pair genomic region 5' upstream from the translation start, several control elements for transcription are recognized. A potential binding site for the Sp1 factor (GGGCGG) separated by 47 nucleotides from an initiator motif, most probably elements of the promoter, is detected in the vicinity of the ATG codon. Several putative androgen response elements (TGTYCT) are found in the 5' adjacent region and far upstream two Alu type III repeats and two (CA)n repeats are located. These results provide the basis for a detailed study of the androgen-regulated and tissue-specific expression of the PRP-precursor gene.