An analysis of extensible modelling for functional genomics data

Andrew R Jones; Norman W Paton

doi:10.1186/1471-2105-6-235

An analysis of extensible modelling for functional genomics data

BMC Bioinformatics. 2005 Sep 27:6:235. doi: 10.1186/1471-2105-6-235.

Authors

Andrew R Jones¹, Norman W Paton

Affiliation

¹ School of Computer Science, University of Manchester, Manchester, UK. ajones@cs.man.ac.uk

Abstract

Background: Several data formats have been developed for large scale biological experiments, using a variety of methodologies. Most data formats contain a mechanism for allowing extensions to encode unanticipated data types. Extensions to data formats are important because the experimental methodologies tend to be fairly diverse and rapidly evolving, which hinders the creation of formats that will be stable over time.

Results: In this paper we review the data formats that exist in functional genomics, some of which have become de facto or de jure standards, with a particular focus on how each domain has been modelled, and how each format allows extensions. We describe the tasks that are frequently performed over data formats and analyse how well each task is supported by a particular modelling structure.

Conclusion: From our analysis, we make recommendations as to the types of modelling structure that are most suitable for particular types of experimental annotation. There are several standards currently under development that we believe could benefit from systematically following a set of guidelines.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Chemistry Techniques, Analytical
Computer Simulation*
Data Collection / standards*
Genomics / methods*
Genomics / standards*
Guidelines as Topic*
Mass Spectrometry
Microarray Analysis / standards*
Models, Biological*
Models, Molecular
Software
Vocabulary, Controlled