Background: Large-scale international projects are underway to generate collections of knockout mouse mutants and subsequently to perform high throughput phenotype assessments, raising new challenges for computational researchers due to the complexity and scale of the phenotype data. Phenotypes can be described using ontologies in two differing methodologies. Traditionally an individual phenotypic character has either been defined using a single compound term, originating from a species-specific dedicated phenotype ontology, or alternatively by a combinatorial annotation, using concepts from a range of disparate ontologies, to define a phenotypic character as an entity with an associated quality (EQ). Both methods have their merits, which include the dedicated approach allowing use of community standard terminology, and the combinatorial approach facilitating cross-species phenotypic statement comparisons. Previously databases have favoured one approach over another. The EUMODIC project will generate large amounts of mouse phenotype data, generated as a result of the execution of a set of Standard Operating Procedures (SOPs) and will implement both ontological approaches to capture the phenotype data generated.
Results: For all SOPs a four-tier annotation is made: a high-level description of the SOP, to broadly define the type of data generated by the SOP; individual parameter annotation using the EQ model; annotation of the qualitative data generated for each mouse; and the annotation of mutant lines after statistical analysis. The qualitative assessments of phenodeviance are made at the point of data entry, using child PATO qualities to the parameter quality. To facilitate data querying by scientists more familiar with single compound terms to describe phenotypes, the mappings between the Mammalian Phenotype (MP) ontology and the EQ PATO model are exploited to allow querying via MP terms.
Conclusion: Well-annotated and comparable phenotype databases can be achieved through the use of ontologically derived comparable phenotypic statements and have been implemented here by means of OBO compatible EQ annotations. The implementation we describe also sees scientists working seamlessly with ontologies through the assessment of qualitative phenotypes in terms of PATO qualities and the ability to query the database using community-accepted compound MP terms. This work represents the first time the combinatorial and single-dedicated approaches have both been implemented to annotate a phenotypic dataset.