An Overview of the Concepts Concerning the Distributed Annotation System (DAS/1)
The full specification, available at http://www.biodas.org/documents/spec-1.6.html, serves as the primary source for this document and will be plagiarized without explicit notice. Any reference to this document will be made through the convention Specs.
(Definitions later defined are italicized; queries are bolded; optional portions within names are [bracketed].)
A client-server system for the sharing of Reference Sequences, a system conceptually composed of a Reference Server and Annotation Server(s).
A sequence, consisting of a set of entry points into the sequence and of the lengths of each entry point, which possesses a reference sequence ID.
The identification for a sequence, which corresponds to sequences of either a low-level (e.g., clones) or a high-level (e.g., contigs) and which is composed of any set of printable characters, save for the colon, newline, tab, and carriage return characters.
A position defined for each genome at which the server may begin dispensing data for a sequence, for a given length (of variable size), e.g., the head of a chromosome, the beginning of a series of contigs, and the beginning of a contig. A list of entry points for a given species may be retrieved via entry_points.
A server specialized for returning lists of annotations across a certain segment of the genome.
An annotation server that, given a reference sequence ID, can also return the following data:
An entity which:
An entity selected from a list of types which have biological significance and which roughly correspond to EMBL/GenBank feature table tags, e.g., exon, intron, CDS, and splice3.
A description of how the annotated feature was discovered, possibly including a reference to a software program.
A intentionally broad functional genre that can be used to filter, group, and sort annotations, e.g., homology, variation, and transcribed. (For the sake of consistency, cf. Specs: “Feature Types and Categories” for a general list of types and categories.)
A project containing data on DAS (a list of which may be retrieved via dsn).
The server’s non-binding recommendations on formatting retrieved annotations for a given source (via stylesheet), using a General Feature Format (GFF) document. (Cf., Specs: “The Queries: Retrieving the Stylesheet” and Specs: “Glyph Types.”)
A query can be made via a URL according to HTTP conventions, through either GET or (more preferably because of size) POST. The response is composed of:
PREFIX denotes the URL prefix for the DAS server, e.g., http://servlet.sanger.ac.uk:8080 is the prefix for <http://servlet.sanger.ac.uk:8080/das/dsn%3E. DAS denotes the Data Source Name for a data source.
Command PREFIX/das/dsnFunction Retrieves the list of data sources available from this server Scope Reference and annotation servers entry_points
Command PREFIX/das/DSN/entry_pointsFunction Retrieves the list of entry points and their respective sizes for a data source Scope Reference servers dna
Command PREFIX/das/DSN/dna?segment=RANGE[;segment=RANGE]Function Retrieves the DNA associated with a subsequence Scope Reference servers sequence
Command PREFIX/das/DSN/sequence?segment=RANGE[;segment=RANGE]Function Retrieves the sequence associated with a subsequence Scope Reference servers types
Command PREFIX/das/DSN/types[?segment=RANGE][;segment=RANGE][;type=TYPE]Function Retrieves the types available for a segment of a sequence Scope Reference and annotation servers features
Command PREFIX/das/DSN//features?segment=REF:start,stop[;segment=REF:start,stop][;type=TYPE][;type=TYPE][;category=CATEGORY][;category=CATEGORY]Function Retrieves the annotations across a segment Scope Reference and annotation servers link
Command PREFIX/das/DSN/link?field=TAG;id=IDFunction Retrieves and HTML page describing human-readable information about an annotation Scope Reference servers stylesheet
Command PREFIX/das/DSN/stylesheetFunction Retrieves a stylesheet for the given DSN Scope Annotation servers Genome Assembly
In a client application, Genome Assembly consists of moving “up” or “down” (the nomenclature of the Specs, analogous to zooming “in” or “out”), along component children and supercomponent parent(s).[1] Genome Assembly occurs only upon Reference Servers, a necessary deduction from its definition. This data is contained within the TYPE description for a feature. (Cf. Specs: “Fetching Sequence Assemblies”.)
Thus, in describing such a paradigm, the Specs appear to convey that the client application will have to assemble information for a given segment from its component children (i.e., moving down). (E.g., a requested segment of a chromosome must be composed by the assembly of several contigs.) Conversely, this paradigm simply facilitates the client application to visit the supercomponent category (i.e., moving up). (E.g., a user would like to zoom out from a contig to view the entire chromosome.) However, the programmer should note well that it is a logical possibility for a segment to span more than one supercomponent parent (e.g., a segment may span two contigs).
Notes:
The reference DAS/1 specification has been extended with additional new features, including an ontology and several new commands:
For more information, see the 1.53E specification on the DasRegistry website.
[1] Following Lincoln Stein, the words component and supercomponent refer to categories alone. (E.g., the category contig is a component of the category chromosome, whereas chromosome is a supercomponent of contig.) The words children and parent(s) refer to entities of the given category. (E.g., contigs 17, 18, 19, and 20 are the children of the parent chromosome 4).