Information Technology Reference
In-Depth Information
EMBOSS - A sequence analysis package
Lisa MULLAN 1 and David P. JUDGE 2
1
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge
CB10 1SD England
2
Department of Genetics, University of Cambridge, Tennis Court Road, Cambridge CB2
3EH, England
Abstract. EMBOSS evolved from EGCG, a collection of programs written to
extend the GCG package, originally written by the G enetics C omputer G roup of
Wisconsin University. EMBOSS follows the general structure of GCG and sets out
to reproduce and extend the functionality of GCG in an open source package.
Currently, EMBOSS only runs on UNIX computers. The programs of EMBOSS can
be run from the UNIX command line or from behind a number of G raphical U ser
I nterfaces (GUIs). EMBOSS offers a wide range of programs covering most aspects
of sequence analysis. In addition, a number of well established public domain
programs have been engineered to follow the conventions of EMBOSS and then
incorporated into the package. Software developers from many places across the
world have written programs for the EMBOSS package. Such contributions are
encouraged from the user community and training is offered to aspiring contributors.
1. The origins of EMBOSS
The essential structure of the EMBOSS software package for sequence analysis follows that
of the older GCG package. Indeed, EMBOSS evolved directly from the EGCG (Extended
GCG) package, which was comprised of programs written by various EMBnet 1 researchers
to extend the functionality of GCG.
GCG was originally written by the G enetics C omputer G roup at Wisconsin
University, USA as an open source bioinformatics package. GCG was originally available
relatively inexpensively. As the source code was accessible, algorithms could be verified
and adapted to suit the needs of individual researchers. Many new programs were written
by researchers who were not part of the GCG team using the GCG libraries. In 1988, the
best of these new “GCG extensions” were collected together and the EGCG package was
born. This was achieved by a collaboration of groups within EMBnet and elsewhere.
EGCG provided new sequence analysis software and advanced features, which were used at
approximately 150 sites, and by more than 10,000 users of EMBnet national services.
A few years ago, the GCG software was purchased by a commercial enterprise and
the source code was no longer available to users. Development slowed significantly and
GCG sometimes failed to sufficiently meet the demands of biological advancement. The
EGCG project was no longer viable and had reached the limits of what could be achieved
using the GCG libraries. Consequently, the former EGCG developers, and others, designed
a totally new generation of academic sequence analysis software - the suite of programs
that is now known as EMBOSS ( E uropean M olecular B iology O pen S oftware S uite) [1].
EMBOSS was released in 2000 and has been actively developed ever since. At the
core of the EMBOSS package are the programs designed to reproduce and extend the
1 European Molecular Laboratory Network, http://www.embnet.org
Search WWH ::




Custom Search