Note: this is NOT the official version of the DAS specification.
Oct 1,2008
This is a working document and a proposal for a reworked DAS specification which hopes to clarify the DAS spec based on how DAS is being used in the community today and to include commands from the 1.53E spec and some of the 2.0 spec. Also the document has been adjusted to reflect changes in the use of DAS away from a solely genome centric protocol to a more open one encompassing other reference/coordinate systems such as protein sequences and structures. The spec also includes references to the DAS Registry which is essential for implementing an SOA architecture. Note: this is a technical document but should be readable and understandable by people without a deep understanding of broader technical issues and other system architectures. Note that we are proposing to change dtd for xsd
The Distributed Annotation System is a network of server and client software installations distributed across the web. The DAS protocol is a standard mechanism through which clients can communicate with servers in order to obtain various types of biological data. The protocol defines:
By enforcing these constraints, DAS allows a client to integrate data from many diverse sources implementing the protocol at a scaleable development cost.
The DAS network of servers comprises a registry, several reference servers and several annotation servers. Tying these together are the concepts of reference objects and coordinate systems.
Reference objects are items of data with stable identifiers that are targets for annotation. At the most abstract level a reference object might be an annotatable concept or idea (e.g. a particular gene), but usually describes a biological unit within which annotations can be positioned. For example, “P15056” refers to a protein sequence upon which annotations can be based. Similarly, “chromosome 21” refers to a DNA sequence.
Individual reference objects can in fact have several versions, and it is important to recognise that annotations based upon different versions of the same reference entity are not necessarily equivalent.
Annotations are pieces of information that are always attributed to a reference object. Annotations are usually positional, that is they refer to a specific location within a reference object. An exon within a genomic sequence is an example. Annotations can also be non-positional, in which case they can be considered as information attributed to the whole of the reference object. For example, the description of a protein or gene.
A coordinate system is a stable, logical set of reference objects. A coordinate system provides a mechanism to uniquely identify reference objects that share identifiers, such as chromosomes. For example, chromosome 21 might identify several reference objects from different species’, but only one within the NCBI 36 human assembly. Thus, “human NCBI 36 chromosomes” is a coordinate system containing 25 reference objects.
Coordinate systems are formally described using four properties:
Of these, category and authority are required.
Some example coordinate systems:
| Category | Authority | Version | Species |
|---|---|---|---|
| Chromosome | NCBI | 36 | Homo sapiens |
| Scaffold | ZFISH | 7 | Danio rerio |
| Protein sequence | UniProt | - | - |
A reference server is a DAS server that provides core data for the reference objects in a particular coordinate system. For example, the reference server for “UniProt Protein sequence” provides the actual sequence for each UniProt entry. It does this by implementing the DAS sequence command. So that clients can discover the available reference objects in a coordinate system, a reference server must also list them via the entry_points command.
Annotation servers are specialized for returning lists of annotations for the reference objects within a coordinate system. This is done by implementing the DAS features command.
In future versions of the spec (i.e. those not focussed entirely on sequence) this will be generalised. That is, reference objects won't be assumed to be sequences and annotations won't be assumed to be sequence features.Note: The distinction between reference and annotation servers is conceptual rather than physical. That is, a single server instance can in fact play both roles by offering both sequences and annotations of those sequences.
Note: A server may support multiple coordinate systems.
The DAS registry is a special component of DAS, fulfilling the following roles:
A DAS client typically integrates data from a number of DAS servers, making use of the different data types. For example, a client might implement the following procedure for a particular sequence location:
See this example in diagrammatic form: PNG image
The DAS is web-based. Clients query the reference and annotation servers using the HTTP protocol (see RFC2616) by sending a formatted URL request to the server. Servers process the request and return a response in the form of a formatted XML document (see W3C Extensible Markup Language) according to a predefined schema.
All DAS requests take the form of a hierarchical URL. Each URL has a site-specific prefix, followed by a standardized path and query string. The standardized path begins with the string /das. This is followed by URL components containing the data source name and a command. Should put some guidance or specify the ability for servers to accept encoded URLs For example:
How do we get everyone to specify say "chromosome1" in the exact same way not "chr1" etc. By coordinate system and entry\_points I guess. Reference server MUST implement entry\_points, regardless of number of objects (don't expect it to come back quickly). We can always add a "range" parameter laterhttp://das.sanger.ac.uk/das/ccds_mouse/features?segment=1:174405453,174408689
^^^^^^^^^^^^^^^^^^^^^^^ ^^^ ^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
site-specific prefix `das` `data src` `command` `arguments`
</code>
In this case, the site-specific prefix is http://das.sanger.ac.uk. The request begins with the standardized path /das, and the data source, in this case /ccds_mouse. This is followed by the command /features, which requests a list of features, and a query string providing named arguments to the /features command.
Thus, a single DAS server hosts one or more DAS sources, allowing it to provide different types of information, and/or information in several coordinate systems. This example source provides consensus CDS transcripts for mouse chromosomes, but the same server provides a number of other sources, including a similar source for human along with sources containing very different types of data.
More information on the format of the request and the various available commands is given [#commands below].
The query string portion of the request (the “?” symbol rightward) can be POSTed to the URL following conventional HTTP standards. Since some queries can be quite large, this is the recommended way of argument passing.
’’‘DELETE: The request may be replaced with a SOAP-style XML-encapsulated document in future versions of this specification.
The response from the server to the client consists of a standard HTTP header with DAS status information within that header, followed optionally by XML content that contains the answer to the query. The DAS status portion of the header consists of three lines. The first is X-DAS-Version and gives the current protocol version number, currently DAS/1.6. The second line is X-DAS-Status and contains a three digit status code which indicates the outcome of the request. The third is X-DAS-Capabilities, which describes the parts of of the spec the server implements.
Here is an example HTTP header: (provided by Web server)
HTTP/1.1 200 OK
Date: Sun, 12 Mar 2000 16:13:51 GMT
Server: Apache/1.3.6 (Unix) mod_perl/1.19
Last-Modified: Fri, 18 Feb 2000 20:57:52 GMT
Connection: close
Content-Type: text/plain
X-DAS-Version: `DAS/1.6`
X-DAS-Status: 200
X-DAS-Capabilities: error-segment/1.0; unknown-segment/1.0; unknown-feature/1.0; ...
data follows...
</code>
The defined status codes are listed in Table 1.
| 200 | OK, data follows |
|---|---|
| 400 | Bad command (command not recognized) |
| 401 | Bad data source (data source unknown) |
| 402 | Bad command arguments (arguments invalid) |
| 403 | Bad reference object (reference sequence unknown) |
| 404 | Bad stylesheet (requested stylesheet unknown) |
| 405 | Coordinate error (sequence coordinate is out of bounds/invalid) |
| 500 | Server error, not otherwise specified |
| 501 | Unimplemented feature |
The HTTP/1.0 protocol allows web clients to request byte-level compression of the response by sending the HTTP header Accept-Encoding header. Web servers that are capable of it can reply with a Content-transfer-encoding header and a compressed body. Implementors of DAS clients and servers may wish to implement this HTTP feature.
The X-Das-Capabilities header provides an extensible list of the capabilities that the server provides. This can be used by
clients wishing to make use of optional components of the DAS protocol where they are supported, and bythose writing experimental extensions to DAS to flag clients that those extensions are available. Capabilities have the form CapabilityName/Version and are separated by semicolon, space, as in “capabilityA/1.0; capabilityB/1.4; capabilityC/1.0”. The following standard capabilities are present in the DAS/1.6 protocol:
| Capability Name | Description |
|---|---|
| dsn/1.0 | Deprecate these: The server supports the **deprecated** *dsn* request. |
| dna/1.0 | The server supports the **deprecated** *dna* request. |
| types/1.0 | The server supports the basic types request. |
| stylesheet/1.1 | The server supports the basic stylesheet request. |
| features/1.0 | The server supports the basic features request. |
| entry_points/1.0 | The server supports the basic entry_points request. |
| error-segment/1.0 | Server will report requests for invalid segments with an |
| unknown-segment/1.0 | Server will report requests for unknown or unannotated segments with an |
| unknown-feature/1.0 | Server will report requests for unknown features with an |
| feature-by-id/1.0 | The features request will accept the CGI parameter “feature_id”, enabling the server to look up annotations based on their ID. |
| group-by-id/1.0 | The features request will accept the CGI parameter “group_id”, enabling the server to look up annotations based on the ID of a group. |
| component/1.0 | Deprecate? The features request will return components of the indicated segment when a category type of “component” is requested. |
| supercomponent/1.0 | Deprecate? The features request will return supercomponents of the indicated segment when a category type of “supercomponent” is requested. |
| sequence/1.0 | The server supports the sequence request. |
The ID used by a client or server to refer to a reference object can contain any set of printable characters (including the space character), but not the colon character (“:”), which is reserved for separating reference IDs from sequence ranges (see below). The newline, tab and carriage return characters are also reserved for future use.
A data source that uses the colon character for its internal IDs must map this character to another one on the way in and on the way out. For example:
Client request server's internal id Response to client
gi-123456 --> gi:123456 ---> gi-123456
gi-123456:1,1000 --> gi:123456 start=1 stop=1000 ---> gi-123456:1,1000
This section lists the queries recognized by reference and annotation servers. Each of these queries begins with some site-specific prefix, denoted here as PREFIX. The other meta-variable used in these examples is DSN, which is a symbolic data source (as seen in the [#request above example.])
At present, there is no implementation of the "assembly traversal' features of DAS. In the interests of simplicity, we could opt for removing this capacity. This would involve removing: - entry\_points: subparts and orientation attributes (orientation only makes sense if you have other information about how entry points relate to each other) - features: reference, superparts and subparts attributesRetrieve the list of data sources for a server.
DSN command has been deprecated in favour of this sources cmdScope: Reference and annotation servers.
Command: sources
Format:
PREFIX/das/sources
</code>
Description: This query returns the list of data sources that are available from this server. In particular the following information for a DAS server is important:
Arguments: none
The response to the sources command is the “DASSSOURCE” XML-formatted document:
<?xml version='1.0' encoding='UTF-8' ?> <?xml-stylesheet type="text/xsl" href="das.xsl"?> <SOURCES> <SOURCE uri="URI" title="title" doc_href="URL" description="description"> <MAINTAINER email="email address" /> <VERSION uri="URI" created="date"> <COORDINATES uri="uri" source="data type" authority="authority" test_range="ID">coordinate string</COORDINATES> <CAPABILITY type="das1:command" query_uri="URL" /> <PROP name="key" value="value" /> </VERSION> </SOURCE> </SOURCES>
Format:
| xml-stylesheet | optional | an XSL stylesheet that e.g. allows a browser to nicely display the XML response I'm not sure whether this should actually be part of the spec - could be confused with stylesheet command? We should probably highlight the fact that there is a difference and that this is for display in a web browser not how to display in a client. |
| SOURCES | mandatory | the main container for several DAS sources |
| SOURCE | mandatory, one or many | the description for a DAS datasource |
| uri | mandatory | a unique URI for the DAS source |
| title, description | mandatory | the nickname under which a DAS server shall be known and displayed in a view. The description is a free text description of the provided data |
| doc\_href | optional | points to a web site where more information about a DAS source can get obtained. |
| MAINTAINER, email | mandatory | the email address of the maintainer of this DAS source. |
| VERSION | mandatory | in principle this would allow hosting several versions of a DAS sources (with unique URIs) on a server, but in practise most people provide only the server with the latest data. Different versions of the same source should be considered to be*equivalent*, that is the latest version is definitive. The created attribute provides the date on which a DAS server has been set up initially. For a DAS registation server this is the date at which a DAS server has been pulished. |
| COORDINATES | mandatory, one or many | The description of the coordinate system(s) a DAS source operates on. uri - the unique URI for a DAS coordinate system. For a DAS registration server these should be resolvable and allow to access more information. e.g. [1](http://www.dasregistry.org/dasregistry/coordsys/CS_DS6) for the UniProt,Protein Sequence coordinate system. source - the data type. This refers to the "physical dimension" of the data. Currently the following categories are available: Chromosome, Clone, Contig, Gene\_ID, NT\_Contig, Protein Sequence, Protein Structure authority - the authority, or institution that assigns the accession code for this namespace. In case of genome assemblies the authority that builds the assembly. version - (optional) for genome assemblies the version of the build. To learn more about coordinate systems, please see here. |
| CAPABILTIY | mandatory, one or many | The supported DAS commmand type - the type of the DAS command. to distinguish DAS/1 from DAS/2 servers das1: is used before the name of the command. query\_urithe URL of the server location, with the command attached. e.g. <http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot/features> Note: For some DAS commands this will not resolve, since e.g. for the features command the extension /features?segment=ID needs to be attached. |
| PROP | optional, one or many | a free key- value style property that allows to add more tags to a server |
Example Responses
Retrieve the list of reference objects for a data source
This Entry\_Points cmd is now mandatory for reference servers.Scope: Reference servers.
Command: entry_points
Format:
PREFIX/das/DSN/entry_points
</code>
Description: This query returns the list of sequence entry points available and their sizes in base pairs.
Arguments:
ref (deprecated)
If a sequence reference ID is provided in the ref argument, the
query will return the components of the sequence (its subsequences)
rather than the list of top-level entry point sequences. This argument
is DEPRECATED, and superseded by the “component” category of the
features request.
type (deprecated)
For ACEDB servers, the type parameter provides the class of the
reference sequence, Sequence by default. DEPRECATED
The response to the entry_points command is the “DASEP” XML-formatted document:
Format:
<?xml version=”1.0” standalone=”no”?> <!DOCTYPE DASEP SYSTEM “http://www.biodas.org/dtd/dasep.dtd”>
</code>
;
<!DOCTYPE> (required; one only)
The doctype indicates which formal DTD specification to use. For the entry_points query, the doctype DTD is “http://www.biodas.org/dtd/dasep.dtd”.
*`PREFIX`*`/das/`*`DSN`*`/sequence?segment=`*`RANGE`*`[;segment=`*`RANGE`*`...]`
Description: This query returns the sequence (nucleotide or
protein) corresponding to the indicated segment.
Arguments:
*`PREFIX`*`/das/`*`DSN`*`/types [?segment=`*`RANGE`*`]`
` [;segment=`*`RANGE`*`]`
` [;type=`*`TYPE`*`]`
` [;type=`*`TYPE`*`]`
**Description:** This query returns the annotation available for a
segment of sequence.
**Arguments:**
**segment** (optional)
This is the sequence range. It uses the format format
*reference:start,stop*, where *reference* is the ID of the reference
sequence used to establish the coordinate system, and *start* and *stop*
are the endpoints of the region to query, inclusive.
**type** (optional)
I can't see that this is really needed.
Deprecate? One or more type IDs to be used for filtering
annotations on the type field. If multiple type names are provided, the
resulting list of features will be the logical OR of the list. For
compatibility with versions 0.997 and earlier of this protocol, servers
are allowed to treat the type ID as a regular expression, but this
feature is **deprecated** and should not be used.
If one or more segment arguments are provided, the list of types
returned is restricted to the indicated segments. If no segment argument
is provided, then **all** feature types known to the source are
returned.
#### Response:
The document returned from the *types* request is an XML-formatted
"DASTYPES" documents. This is a shortened form of the full features
format (see below) and is used to summarize the type and number of each
annotation. Annotation types can be grouped into segments, or be totaled
across the entire database.
<!DOCTYPE DASTYPES SYSTEM "http://www.biodas.org/dtd/dastypes.dtd">
` `
` `
` <TYPE id="`*`id1`*`" ``DEPRECATED method="`*`method`*`"`` category="`*`category`*`">`*`Type`
`Count` `1`*</TYPE>
` <TYPE id="`*`id2`*`" ``DEPRECATED method="`*`method`*`"`` category="`*`category`*`">`*`Type`
`Count` `2`*</TYPE>
` ...`
` `
` `
;
<!DOCTYPE>
(required; one only)
The doctype indicates which formal DTD specification to use. For the
types query, the doctype DTD is
"<http://www.biodas.org/dtd/dastypes.dtd>".
*`PREFIX`*`/das/`*`DSN`*`/features?segment=`*`REF[:start,stop]`*`[;segment=`*`REF:start,stop`*`...]`
` [;type=`*`TYPE`*`]`
` [;type=`*`TYPE`*`]`
` [;category=`*`CATEGORY`*`]`
` [;category=`*`CATEGORY`*`]`
` [;categorize=`*`yes|no`*`] ``deprecate...`
` ``[;feature_id=ID]`
` ``[;group_id=ID]`
**Description:** This query returns the annotations across one or more
segments of sequence.
**Arguments:**
**segment** (zero or more)
If specified, the segment argument restricts the list of annotations to
those that overlap the indicated reference object, or
a specific range within the reference object. The argument uses the
format segment=*reference* or segment=*reference:start,stop*.
Here, *reference* is the ID of the reference object used to
establish the coordinate system, and *start* and *stop* are the
endpoints of the region to query, inclusive. Multiple segments may
be specified. For example:
*features?segment=**REF1:100,200**;segment=**REF2***
**type** (zero or more)
Zero or more type IDs to be used for filtering annotations on the
type field. If multiple type IDs are provided,
the resulting list of features will be the logical OR of the list.
Remove this bit: For compatibility with
versions 0.997 and earlier of this protocol, servers are allowed to
treat the type ID as a regular expression, but this feature is
**deprecated** and should not be relied on.
**category** (zero or more)
Zero or more category IDs to be used for filtering annotations
by category. If multiple categories are provided, they are treated as
the logical OR. Remove this bit: For
compatibility with versions 0.997 and earlier of this protocol, servers
are allowed to treat the type ID as a regular expression, but this
feature is **deprecated** and should not be relied on.
**categorize** (optional)
Either "yes" or "no" (default). If "yes", then each annotation must
include its functional category. This parameter
is DEPRECATED. The category is now mandatory in the response.
**feature\_id** (zero or more; new in 1.5)
Instead of, or in addition to, **segment** arguments, you may provide
one or more **feature\_id** arguments, whose values are the identifiers
of particular features. If the server supports this operation, it will
translate the feature ID into the segment(s) that strictly enclose them
and return the result in the *features* response. It is possible for the
server to return multiple segments if the requested feature is present
in multiple locations. At the moment the few servers
that implement this don't just use feature\_id to identify the
segment(s), they actually restrict on the feature ID... I'd say this
behaviour is more valuable so perhaps we should specify it here.
Likewise group\_id below:
**group\_id** (zero or more; new in 1.5)
The **group\_id** argument, is similar to **feature\_id**, but retrieves
segments that contain the indicated feature group.
Servers may offer the same feature on several
coordinate systems. In such a case, they will share a common feature ID
and thus can be filtered by the client.
Annotations **must** be returned using the coordinate
system in which they were requested. For example, if a contig ID was
used to specify the segment, then the annotation endpoints must use
contig coordinates.
Servers should return annotations which overlap the
segment, but are not completely contained within them. Annotation
servers are no longer allowed to only return annotations which are
completely contained within the indicated segment.
This is confusing. Better as: Servers should return
annotations which lie wholly or partially within the query segment. For
example:
` -------------------`
` Query `
`----- --- ------- ----------- -----`
` A B C D E `
In the above example, the server should return
annotations B and C because they lie wholly within the query segment,
and annotation D because it lies partially within the query
segment.
If multiple segment arguments are provided and they happen to overlap,
then the annotation server may return the same annotation multiple
times, possibly using different coordinate systems. It is the
responsibility of the client to merge annotations based on the assembly.
#### Response:
The document returned from the *features* request is an XML-formatted
"DASGFF" document.
**Format:**
<!DOCTYPE DASGFF SYSTEM "http://www.biodas.org/dtd/dasgff.dtd">
` `
` `
` `
` `*`type`
`label`*
` ``'' method label ''`
` ``'' start'' `
` ``'' end'' `
` ``'' [X.XX|-]'' `
` `` [0|-|+] `
` `` [0|1|2|-]`
` `
*note text*
` `` `*`link` `text`*` `</LINK>
` `*`target` `name`*
` `
` `
*note text*
` `` `*`link` `text`*` `</LINK>
` `*`target`
`name`* ` ``DEPRECATED`
` `
` `
` ...`
` `
` `
;
<!DOCTYPE>
(required; one only)
The doctype indicates which formal DTD specification to use. For the
features query, the doctype DTD is
"<http://www.biodas.org/dtd/dasgff.dtd>".
` `
` `
` `
` ``polypeptide_domain`
` ``1090`
` ``1177`
` ``Pfam-A`
` ``1.7e-18`
` `
HMMER Version: 2.3.2
` ``C2`</LINK>
` `
` `
` `
### Exception Handling for Invalid Segments
Rewritten this section, no longer "new"
A request for the sequence or annotations of a named segment may fail
because either:
1. the requested segment is outside the bounds of the reference object.
2. the reference object is not known to the server
In both cases, an exception is indicated by issuing an
` ...`
</code>
The **id** attribute (required) corresponds to the ID of the requested
segment, and **start** and **stop** (optional) correspond to the
*requested* bounds of the segment (if this was specified).
An exception will only be raised if:
1. the reference object is not known to the server **AND**
2. the server can not authoritatively identify that the requested
segment is erroneous
Otherwise an exception is raised.
For example a reference server, which is authoritative for the
coordinate system, knows that any reference object it cannot identify
must be erroroneous. It will therefore raise an . By
contrast an annotation server, which is not required to know the
identities of all the reference objects in the coordinate system, should
respond by issuing an tag - it does not know whether
the request is erroneous or not. All servers should issue
exceptions when they detect a query segment that is beyond the range of
a reference object.
------------------------------------------------------------------------
### Link command
*Linking to a feature.*
We are proposing to remove this command, as
implementations tend to use a separate web server anyway
**Scope:** Annotation servers.
**Command:** *link*
**Format:**
*`PREFIX`*`/das/`*`DSN`*`/link?field=`*`TAG`*`;id=`*`ID`*
**Description:** This query can be issued in order to retrieve further
human-readable information about an annotation. It is best to pass this
URL directly to a browser, as the type of the returned data is not
specified (it will typically be an HTML file, but any MIME format is
allowed).
**Arguments:**
**field** (required)
The field to fetch further information on. Options are:
- **feature** -- the feature itself
- **type** -- the feature type
- **method** -- the feature method
- **category** -- the feature category
- **target** -- the target, applicable to sequence similarities only
**id** (required)
The ID of the indicated annotation field.
**Response:** A web page.
------------------------------------------------------------------------
### Stylesheet Command
*Retrieve guidelines for how to render annotations offered by a server.*
**Scope:** Annotation servers.
**Command:** *stylesheet*
**Format:**
*`PREFIX`*`/das/`*`DSN`*`/stylesheet`
**Description:** This query can be issued to an annotation server in
order to retrieve the server's recommendations on formatting annotations
retrieved from it. These recommendations are not normative. A viewer is
free to use any display format it chooses.
**Arguments:** None.
#### Response:
This document is intended to provide hints to the annotation display
client. It maps feature categories and types to a series of glyphs known
to the display client.
**Format:**
<!DOCTYPE DASSTYLE SYSTEM "http://www.biodas.org/dtd/dasstyle.dtd">
` `
` `
` `
` `
` <`*`ID`*`>`
` <`*`ATTR`*`>`*`value`*</ATTR>
` <`*`ATTR`*`>`*`value`*</ATTR>
` ...`
` </`*`ID`*`>`
` `
` `
` <`*`ID`*`>`
` <`*`ATTR`*`>`*`value`*</ATTR>
` <`*`ATTR`*`>`*`value`*</ATTR>
` ...`
` </`*`ID`*`>`
` `
` `
` <`*`ID`*`>`
` <`*`ATTR`*`>`*`value`*</ATTR>
` <`*`ATTR`*`>`*`value`*</ATTR>
` ...`
` </`*`ID`*`>`
` `
` `
` `
` `
` `
` `
` <`*`ID`*`>`
` <`*`ATTR`*`>`*`value`*</ATTR>
` <`*`ATTR`*`>`*`value`*</ATTR>
` ...`
` </`*`ID`*`>`
` `
` ...`
` `</CATEGORY>
` `
` `
` `
` <`*`ID`*`>`
` <`*`ATTR`*`>`*`value`*</ATTR>
` ...`
` </`*`ID`*`>`
` `
` `
` `
` `
` <`*`ID`*`>`
` <`*`ATTR`*`>`*`value`*</ATTR>
` ...`
` </`*`ID`*`>`
` `
` `
` `
` `
` <`*`ID`*`>`
` <`*`ATTR`*`>`*`value`*</ATTR>
` ...`
` </`*`ID`*`>`
` `
` `
` ...`
` `
` `
` `
` `
` <`*`ID`*`>`
` <`*`ATTR`*`>`*`value`*</ATTR>
` ...`
` </`*`ID`*`>`
` `
` `
` ...`
` `
` ...`
</STYLESHEET>
</DASSTYLE>
</code>
;
<!DOCTYPE>
(required; one only)
The doctype indicates which formal DTD specification to use. For the
stylesheet query, the doctype DTD is
"<http://www.biodas.org/dtd/dasstyle.dtd>".
(required; one only)
The appropriate doctype and root tag is DASSTYLE.
(required; one only)
There is a single tag. Its '''version ''' (required)
attribute indicates the current version of the stylesheet, and can be
used for caching purposes.
(required; one or more)
There are one or more tags, each providing information on the
display of a high-level feature category. The **id** (required) tag
uniquely names the category. A special name is "default", which tells
the annotation viewer what format to use for categories that are not
otherwise specified in the stylesheet. Another special name is "group".
A "group" entry indicates the format to use for a particular group
of features.
; (required; one or more per CATEGORY)
There are one or more tags per , each providing display
suggestions for one type of annotation. The **id** (required) uniquely
identifies the type. A special id is "default", which, if present,
identifies a default style for the enclosing category.
;; (required; one or more per TYPE)
There is one or more tag per . It provides information on
what glyph (graphical widget) to use to display the indicated
annotation type. The optional **zoom** attribute, implements a simple
form of semantic zooming, and allows the client to select the glyph and
its attributes based on the zoom level. Possible values are "high",
"medium" and "low". If multiple tags are present, this attribute
**must** be present in order to select among them. A "high" zoom means
that there are fewer base pairs per pixel (high magnification). A "low"
zoom shows more base pairs. "Medium" is intermediate. It is left to the
client to determine the boundaries for "high", "medium" and "low", since
this is a function of the graphics rendering.
;;; <*ID*> (required; one per GLYPH)
The ID value refers to a recognized glyph from the glyph types list
(\[\#glyphid see below\]).
;;; <*ATTR*> (optional; one or more per ID)
The recognized ATTR (attributes) are determined by which glyph ID
is specified. See the \[\#glyphid glyph types\] list below for
more information.
Here is a short stylesheet example:
` ...`
` `
` `
` `
` `
` ``gray`
` `
` `
` `
` `
` `
` `
` ``4`
` ``black` ` `
` ``red`
` `
` `
` `
` `
` `
` `
` ``4`
` ``black`
` ``blue`
` `
` `
` `
` `
` `
` `
` ``3`
` ``blue`
` ``green`
` `
` `
` `
` `
` `
` `
` ``4`
` ``gray`
` `
` `
` `
` `
` ...`
Provide an ontology-updated stylesheet: A
sample stylesheet used for the WormBase DAS server can be found at
\[sample\_stylesheet.xml
<http://www.biodas.org/documents/sample_stylesheet.xml>\].
#### Glyph Types
This section describes a set of generic "glyphs" that can be used by
sequence display programs to display the position of features on a
sequence map. The annotation server may use these glyphs to send display
suggestions to the viewer via the \[\#stylesheet stylesheet document\].
The current set of glyph ID values are:
- ARROW
- ANCHORED\_ARROW
- BOX
- CROSS
- EX
- HIDDEN
- LINE
- SPAN
- TEXT
- TOOMANY
- TRIANGLE
- PRIMERS
Each glyph has a set of attributes associated with it. Attribute values
come in the following flavors. Note that these are
*types*' of element, not element names.
INT
An integer
FLOAT
A floating point number (not currently used)
STRING
A text string
COLOR
A color. Colors can be specified using the "\#RRGGBB" format commonly
used in HTML, or as one of the 16 IBM VGA colors recognized by Netscape
and Internet Explorer. Ensembl supports more than
that...
BOOL
A boolean value, either "yes" or "no".
FONT
A font. Any of the font identifiers recognized by Web browsers is
acceptable, e.g. "helvetica".
FONT\_STYLE
One of "bold", "italic", "underline".
LINE\_STYLE
One of "hat", "solid", "dashed".
Some attributes are shared by all glyphs. Others are glyph-specific. The
following attributes are shared in common:
HEIGHT
type: INT
The height of the glyph, in pixels. For the text font, this is
equivalent to the FONTSIZE attribute.
FGCOLOR
type: COLOR
The foreground color of the glyph. This is the line and outline color
for graphical glyphs, and the font color for text glyphs.
BGCOLOR
type: COLOR
The background color of the glyph. For hollow glyphs, such as boxes,
this is the color of the interior of the box. For solid glyphs, such as
text, this is ignored
LABEL
BOOL
Whether the glyph should be labeled with its name, as dictated by the
**label** attribute in the DASGFF document.
BUMP
BOOL
Whether the glyph should "bump" intersecting glyphs so that they do
not overlap.
**ARROW**
A double-headed arrow with an axis either orthogonal or parallel to the
sequence map.
Attributes:
PARALLEL
type: BOOL
Arrows run either parallel ("yes") or orthogonal("no") to the
sequence axis.
**ANCHORED\_ARROW**
An arrow that has an arrowhead at one end, and an "anchor" (typically a
diamond or line) at the other. The arrow points in the direction
indicated by the tag.
Attributes:
PARALLEL
type: BOOL
Arrows run either parallel ("yes") or orthogonal("no") to the
sequence axis.
**BOX**
A rectangular box.
Attributes:
LINEWIDTH
type: INT
Width of the box outline.
**CROSS**
A cross "+". Common used for point mutations and other point-like
features.
Attributes:
(no glyph-specific attributes)
**DOT**
A dot. Common used for point mutations and other point-like features.
Attributes:
(no glyph-specific attributes)
**EX**
"X" marks the spot. Common used for point mutations and other point-like
features.
Attributes:
(no glyph-specific attributes)
**HIDDEN**
A feature that is invisible, intended to support semantic zooming
schemes in which a feature is hidden at particular zooms.
Attributes: none.
**LINE**
A line. Lines are equivalent to arrows with both the northeast
and southwest attributes set to "no".
Attributes:
STYLE
type: LINE\_STYLE
The line type. A type of "hat" draws an inverted V (commonly used
for introns). A type of "solid" draws a horizontal solid line in the
indicated color. A type of "dashed" draws a dashed horizonal line in the
indicated color.
**SPAN**
A spanning region, the recommended representation is a horizontal line
with vertical lines at each end.
Attributes:
(no glyph-specific attributes)
**TEXT**
A bit of text.
Attributes:
FONT
type: FONT
The font.
FONTSIZE
type: INT
The font size.
STRING
type: STRING
The text to render.
STYLE
type: FONT\_SYTLE
The style in which to render this glyph. Multiple FONT\_STYLE attributes
may be present.
**PRIMERS**
Two inward-pointing arrows connected by a line of a different color.
Used for showing primer pairs and a PCR product. The length of the
arrows is meaningless.
There are no glyph-specific attributes, but in this context the
foreground color is the color of the arrows, and the background color is
the color of the line that connects them.
**TOOMANY**
Too many features than can be shown. Recommended for use in
consolidating sequence homology hits. The recommended visual
presentation is a set of overlapping boxes.
Attributes:
LINEWIDTH
type: INT
Width of the glyph.
**TRIANGLE**
A triangle. Commonly used for point mutations and other point-like
features. The triangle is always drawn in the center of its range, but
its width and height can be controlled by HEIGHT and LINEWIDTH
respectively.
Attributes:
LINEWIDTH
type: INT
Width of the glyph.
DIRECTION
One of "N", "E", "S", and "W"
------------------------------------------------------------------------
#### Glyphs and Groups
Glyphs and their attributes are typically applied to individual
features. However, they can be applied to entire groups as well (via the
**type** attribute). In this case, the glyph will apply to the
connecting regions **between** the individual
features within the group. Glyphs for groups are identified in the
stylesheet using the special category named "group".
For example, to indicate that the exons in a "transcript" group should
be drawn with a yellow box, that the utrs should be drawn with a blue
box, and that the connections between exons should be drawn with a
hat-shaped line:
Note that these terms aren't in the BS ontology
because it is still protein-specific!
` `` `
` `
` `
` ``yellow`
` `
` `
` `
` `` `
` `
` `
` ``blue`
` `
` `
` `
` `` `
` `
` `
` ``black`
` ``hat`
` `
` `
` `
------------------------------------------------------------------------
Fetching Sequence Assemblies
----------------------------
We are proposing to deprecate this as not many people
use it.
Reference servers, but not annotation servers, must represent and serve
genome assemblies.
The components of an assembly are treated as a set of features with a
type *category* attribute of "component" and a *reference* attribute of
"yes". Intermediate components of the assembly will also have a
*subparts* attribute of "yes". Components that are the parents of the
reference sequence in the assembly have a category attribute of
"supercomponent."
### Moving Down in an Assembly
For those components that have subparts, the start and end of the
feature give the feature's position in the requested segment's
coordinate system, and the id, start and end of the element
gives the feature's position in its native coordinates.
For example:
` 1 200 400 1000`
` +--------+-----------+-------------------+ `**`chr22`**
` 1 200 220 1 20 620`
` +--------+---- A --+-------------------+ `**`B`**
` 1 80 280 400`
` ------+-----------+-------- `**`C`**
` =================== `**`C.1`**
` ============= `**`C.2`**
A request for this assembly will look like the following:
[`http://www.wormbase.org/db/das/elegans/features?segment=chr22:1,1000;category=component`](http://www.wormbase.org/db/das/elegans/features?segment=chr22:1,1000;category=component)
The reference server will return the following (abbreviated) document:
` `
` ``1`
` ``1000`
` ``chr 22`
` ``chr22`
` ...`
` `
` ``1`
` ``200`
` ``a contig`
` ``Contig A`
` ...`
` `
` `
` ``400`
` ``1000`
` ``a contig`
` ``Contig B`
` ...`
` `
` `
` ``200`
` ``400`
` ``a contig`
` ``Contig C` ` ...`
` `
Notice that contig C is marked as having subparts. This is an indication
to the client that it should emit a features request that includes
segment C:80,280 in order to discover its components (C.1 and C.2).
Notice also that chr22 appears as a component of itself with the
attribute **superparts="no"** and **subparts="yes"**. This is a side
effect of providing information about the component parent.
### Moving Up in an Assembly
It is also desirable for a client to fetch the **parent** of a segment,
so as to accomodate the situation in which the user enters the browser
at a contig or sequenced clone, and wants to "zoom out."
This situation is complicated by rough draft issues, in which a single
rough draft sequence segment may have multiple parents, and some
sections of the segment may not belong in the assembly at all. For
example:
` A B C D`
` `**`contig21`**`-----------> <-----------`**`contig100`**
` | | / /`
` | | / /`
` `**`Acc` `A`**` ---------------------`
` a b c d`
Here, the segment "Acc A" contains two fragments, one of which is
located on contig21 and the other on contig100.
To retrieve this information, the client requests the category
**supercomponent**. For segments that are in the middle of the assembly,
one or more assembly parents will be returned in **addition** to
subcomponents. The parent , and tags are
presented in the coordinate system of the requested segment, as always.
The **start** and **stop** attributes of the tag, denote the
corresponding segment in the coordinate system of the parent. As always,
start is less than stop, for both the feature and the target.
` `
` ``a`
` ``b`
` ``+`
` ``a contig`
` `
` `
` `
` ``c`
` ``d`
` ``-`
` ``a contig`
` `
` `
To continue following the parents upward in the assembly, the client
will issue further features requests for the target IDs, in this case
"contig21" and "contig100". In the general case, following parents will
project the requested segment onto a discontinuous set of regions,
potentially on different chromosomes. The client may wish to alert the
user and refuse to proceed further when it encounters a segment with
multiple parents. </font>
------------------------------------------------------------------------
Feature Types and Categories
----------------------------
Annotations returned by the *features* command are classified by type.
Specifically, the type refers to the class of data represented by the
annotation. Previous versions of this specification gave guidelines for
the content (or semantics) of a feature's type, but as of version 1.6
DAS has formally adopted the use of ontologies for this purpose.
The **type** of an annotation is selected from one of the following
ontologies:
| Ontology name | Ontology ID |
|---------------------------------|-------------|
| Sequence Types and Features | SO |
| Protein Modifications (PSI-MOD) | MOD |
| BioSapiens Annotations | BS |
The type is further classified by **category**. The category is a
classification of how the annotation was derived, represented as
*evidence*:
| Ontology name | Ontology ID |
|----------------|-------------|
| Evidence Codes | ECO |
All ontologies can be browsed and searched using the [Ontology Lookup
Service](http://www.ebi.ac.uk/ontology-lookup/) at the European
Bioinformatics Institute.
These ontologies are applied to the DAS *features* response in the
following manner:
### Type ID
The ID attribute of the element is the ontology term ID. For
example:
`<TYPE `**`id="SO:0000114"`**` ... > ... `</TYPE>
### Type label
The content of the element is the ontology term name. For
example:
<TYPE id="SO:0000114" ... >**`methylated_C`**</TYPE>
### Type category
The category attribute of the element is the evidence term name,
followed by the evidence term ID in parentheses. For example:
`<TYPE id="SO:0000114" `**`category="inferred` `from` `experiment`
`(ECO:0000006)"`**`>methylated_C`</TYPE>
The following is DEPRECATED:
This is a list of generic feature categories and specific feature types
within them. This list was derived from the features currently exported
by ACeDB/GFF and is not comprehensive. Suggestions for modifications,
additions and deletions are welcomed.
**component**
This category indicates that the feature is a child component of the
reference sequence in the current assembly. When combined with the
**reference="yes"** attribute, this indicates that the feature can be
used as a reference point to retrieve subfeatures contained within it
(including subcomponents).
**supercomponent**
This category indicates that the feature is the parent of the reference
sequence in the current assembly. When combined with the
**reference="yes"** attribute, this indicates that the feature can be
used as a reference point to retrieve features that completely contain
the selected range of the reference sequence.
**translation**
The translation category is used for features that relate to
regions of the sequence that are translated into proteins. Features that
relate to transcription are separate (see below).
Features:
- stop - position of the translation stop codon
- ATG - position of the start codon
- CDS - position of the coding region
- 5'UTR - untranslated region
- 3'UTR - untranslated region
- misc\_translated - miscellaneous
It is recommended, but not required, that the section contain
and/or
tags that provide further information on the transcription feature.
**transcription**
The transcription category is used for features that relate to
regions of the sequence that are transcribed into RNA.
Features:
- exon
- intron
- tRNA
- mRNA
- ncRNA
- 5'Cap - transcriptional start site
- PolyA
- Splice5 - splice donor
- Splice3 - splice acceptor
- misc\_transcribed
It is recommended, but not required, that the section contain
and/or
tags that provide further information on the transcription feature.
**variation**
The variation category is used for features that relate to
regions of the sequence that are polymorphic.
Features:
- insertion
- deletion
- substitution
- misc\_variation
It is recommended, but not required, that the section contain
and/or
tags that provide further information on the variation.
**structural**
The structural category is used for features that relate to
mapping, sequencing and assembly, as well as for various landmarks that
carry no intrinsic biological information.
Features:
- clone
- primer\_left
- primer\_right
- oligo
- assembly\_tag
- misc\_structural
It is recommended, but not required, that the section contain
and/or
tags that provide further information on the structural feature.
**similarity**
The similarity category is used for areas that are similar to
other sequences. Similarity features should have a tag that
indicates the algorithm used for the sequence comparison, and a
tag that indicates the target of the match.
Features:
- NN -- nucleotide to nucleotide similarity (e.g. blastn)
- NP -- nucleotide to protein similarity (e.g. blastx)
- PN -- protein to nucleotide similarity (e.g. tblastn)
- PP -- protein to protein similarity (e.g. tblastx)
- misc\_homology
**repeat**
The repeat category is used for areas that contain repetitive
DNA. This category is used both for low-complexity regions, such as
microsatellites, and for more biologically interesting features, such as
transposon insertion sites.
Features:
- microsatellite
- inverted
- tandem
- transposable\_element
- LINE - long repeat not definitely identified as a transposon
- misc\_repeat
It is recommended, but not required, that the section contain
and/or
tags that provide further information on the repetitive element.
**experimental**
The experimental category is a catchall used to flag areas
where there is interesting experimental data of one sort or another. It
is intended for use with high-throughput functional genomics work, such
as knockouts or insertional mutagenesis screens.
Features:
- knockout
- expression\_tag
- microarrayed
- RNAi\_result
- transgenic
- mutant - a mutant phenotype associated with region
- misc\_experimental
It is recommended, but not required, that the section contain
and/or
tags that provide further information on the nature of the experimental
data. </font>
------------------------------------------------------------------------
Other Issues
------------
The distributed annotation system must have a mechanism for detecting
and resolving version skew across reference and annotation servers.
Although one such mechanism is currently incorporated into the
ACeDB-based prototype, it is largely untested and hence not yet a part
of the DAS standard.
Changes
-------
Last modified: 08 Oct 2008