IT6801 - Service Oriented Architecture - SOA- UNIT I
Regulation2013, 2013 regulation SOA notes DTD, IT6801, Nmespaces, Notes, RSOA, SOA, UNIT I, VII Sem CSE notes, XML.
SOA notes, Regulation 2013, Document Type Declaration, Well formed and valid documents, Namespaces, X-Files, SOA, UNIT I, IT6801 SOA, IT6801
Here IT6801 Service Oriented Architecture e-books are posted and students can download the notes and e-books and make use of it. Anna university 7th semester IT6801 Service Oriented Architecture lecture notes, IT6801 SOA notes and reference books are given below.
INTRODUCTION TO XML
XML document structure
– Well formed and valid documents – Namespaces – DTD – XML Schema – X-Files.
1.1 XML DOCUMENT STRUCTURE
An
XML document consists of a number of discrete components or sections. Although
not all the sections of an XML document may be necessary, their use and
inclusion helps to make for a well-structured XML document that can easily be
transported between systems and devices.
The
major portions of an XML document include the following:
•
The XML declaration
•
The Document Type Declaration
•
The element data
•
The attribute data
•
The character data or XML content
1.1.1 XML Declaration
The
first part of an XML document is the declaration. It is a definite way of
stating exactly what the document contains. The XML declaration is a processing
instruction of the form <?xml .....?>. In addition, the XML declaration
indicates the presence of external markup declarations and character encoding.
Because a number of document formats use markup similar to XML, the declaration
is useful in establishing the document as being compliant with a specific
version of XML without any doubt or ambiguity. In general, every XML document
should use an XML declaration. The XML declaration consists of a number of
components.
The standalone
document declaration defines whether an external DTD will be processed as part
of the XML document. When standalone is set to “yes”, only internal DTDs will
be allowed. When it is set to “no”, an external DTD is required and an internal
DTD becomes an optional feature.
Valid XML Declarations
The first declaration
defines a well-formed XML document, whereas the second defines a well-formed
and valid XML document. The third declaration shows a more complete definition
that states a typical use-case for XML.
1.1.2
Document Type Declaration
The Document Type
Declaration (DOCTYPE) gives a name to the XML content and provides a means to
guarantee the document’s validity, either by including or specifying a link to
a Document Type Definition (DTD). Valid XML documents must declare the document
type to which they comply, whereas well-formed XML documents can include the
DOCTYPE to simplify the task of the various tools that will be manipulating the
XML document.
A Document Type
Declaration names the document type and identifies the internal content by
specifying the root element. A DOCTYPE can identify the constraints on
the validity of the document by making a reference to an external DTD subset
and/or include the DTD internally within the document by means of an internal
DTD subset.
General
Forms of the Document Type Declarations
<!DOCTYPE
NAME SYSTEM “file”>
<!DOCTYPE
NAME [ ]>
<!DOCTYPE
NAME SYSTEM “file” [ ]>
The DOCTYPE is
referring to a document that only allows use of an externally defined DTD subset.
The second declaration only allows an internally defined subset within the
document. The final listing provides a place for inclusion of an internally defined
DTD subset between the square brackets while also making use of an external
subset. In the preceding listing, the keyword NAME should be replaced with the
actual root element contained in the document, and the “file” keyword should be
replaced with a path to a valid DTD. In the case of our shirt example, the
DOCTYPE is
<!DOCTYPE shirt
SYSTEM “shirt.dtd”>
because the first tag
in the document will be the <shirt> element and our DTD is saved to a
file named shirt.dtd, which saved in the same path as the XML document.
The only real
difference between internally and externally defined DTD subsets is that the
DTD content itself is contained within the square brackets, in the case of
internal subsets, whereas external subsets save this content to a file for
reference, usually with a .dtd extension. The actual components of the Document
Type Declaration are listed in Table 5.1.
Table 5.1Components
of the Document Type Declaration
Component
|
Description
|
<
|
The start of the XML
tag (in this case, the beginning of the Document Type Declaration).
|
!DOCTYPE
|
The beginning of the
Document Type Declaration.
|
NAME
|
Specifies the name
of the document type being defined.
This must comply
with XML naming rules.
|
SYSTEM
|
Specifies that the
following system identifier will be read
and processed.
|
“file”
|
Specifies the name
of the file to be processed by the system.
|
[
|
Starts an internal
DTD subset
|
]
|
Ends the internal
DTD subset
|
>
|
The end of the XML
tag (in this case, the end of the
Document Type
Declaration)
|
1.1.3 Markup and
Content
In addition to the XML
declaration and the Document Type Declaration, XML documents are composed of
markup and content. In general, six kinds of markup can occur in an XML
document:
§ Elements
§ entity references
§ comments
§ processing
instructions
§ marked sections and
§ Document Type
Declarations
1.1.4 Elements
Within an XML
document, elements are the most common form of markup. XML elements are either
a matched pair of XML tags or single XML tags that are “self-closing.” Matching
XML tags consist of markup tags that contain the same content, except that the
ending tag is prefixed with a forward slash. For example, our shirt element
begins with <shirt> and ends with </shirt>. When elements do not
come in pairs, the element name is suffixed by the forward slash. For example,
if we were merely making a statement that a shirt existed, we may use
<on_sale/>. In this case, there would be no other matching element of the
same name used in a different manner. These “unmatched” elements are known as empty
elements. The trailing “/>” in the modified syntax indicates to a
program processing the XML document that the element is empty and no matching
end tag should be sought.
Elements can be
arbitrarily nested within other elements ad infinitum. In essence, XML is a
hierarchical tree. This means that XML elements exist within other elements and
can branch off with various children nodes. Although these elements may be
restricted by DTDs or schema, the nature of XML is to allow for the growth of
these elements in a manner that’s as “wide” or “deep” as possible. This means
that a single XML element can contain any number of child elements, and the
depth of the XML tree can consist of any number of nodes.
In particular, no XML
element names are reserved because namespaces can be used to avoid inadvertent
conflicts. Although punctuation marks (other than the colon) can be used within
an XML element name, you should avoid the hyphen (-) and period (.) characters
in element names because some software applications might confuse them for
arithmetic or object operations. Element names should be descriptive and not
confusing. Also, some devices with constrained memory capabilities may not work
well with overly long XML tag names. In any case, long names are an annoyance
to developers, systems, and users.
1.1.5 Attributes
Within elements,
additional information can be communicated to XML processors that modifies the
nature of the encapsulated content. Attributes are name/value pairs contained
within the start element that can specify text strings that modify the context
of the element.
Attribute
Examples
<price
currency=”USD”>…</price>
<on_sale
start_date=”10-15-2001”/>
Attributes can be
required, optional, or contain a fixed value. Required or optional attributes
can either contain freeform text or contain one of a set list of enumerated
values. Fixed attributes, if present, must contain a specific value. Attributes
can specify a default value that is applied if the attribute is optional but
not present.
1.1.6 Entity
References
The role of the XML
entity is to introduce special characters or make use of content that is constantly
repeated without having to enter it multiple times. Entities provide a means to
indicate to XML-processing applications that a special text string is to follow
that will be replaced with a different literal value.
Each entity has a
unique name that is defined as part of an entity declaration in a DTD or XML
Schema. Entities are used by simply referring to them by name. Entity
references are delimited by an ampersand at the beginning and a semicolon at
the ending. The content contained between the delimiters is the entity that
will be replaced. For example, the < entity inserts the less-than sign
(<) into a document. Elements can be encoded so they aren’t processed or
replaced by their entity equivalents in order to be used for display or encoding
within other element values. For example, the string <element> can be
encoded in an XML document as < element>, and it therefore will
not be processed.
Sample
Entity References
<description>The
following says that 8 is greater than 5</description>
<equation>4
> 5</equation>
<prescription>The
Rx prescription symbol is ℞
which
is the same as ℞</prescription>
Entities can also be
used to refer to often repeated or varying text as well as to include the
content of external files. There are internal and external entities, and they
both can be general or parameter entities. Internal entities are defined and
used within the context of a document, whereas external entities are defined in
a source that is accessible via a URI. Internal entities are largely simple
string replacements, whereas external entities can consist of entire XML
documents or non-XML text, such as binary files. Parameter entities are
entities that are declared and used within the context of a DTD or schema. They
allow users to create replacement text that can be used multiple times to
modularize the creation of valid documents. Parameter entities can be either
internal or external, but they cannot refer to non- XML data because you can’t
have a parameter entity with a notation.
1.1.7 Comments
One of the key
benefits of XML is that humans can read it. Comments are quite simple to
include in a document. The character sequence <!-- begins a comment and
--> ends the comment. Between these two delimiters, any text at all can be written,
including valid XML markup. The only restriction is that the comment delimiters
cannot be used; neither can the literal string --. Comments can be placed
anywhere in a document and are not considered to be part of the textual content
of an XML document. As a result, XML processors are not required to pass
comments along to an application.
A
Sample Comment
<!--
The below element talks about an Elephant I once owned... -->
<animal>Elephant</animal>
1.1.8 Processing
Instructions
Processing instructions
(PIs) perform a similar function as comments in that they are not a textual
part of an XML document but provide information to applications as to how the
content should be processed. Unlike comments, XML processors are required to
pass along PIs. Processing instructions have the following form:
<?instruction
options?>
The instruction name,
called the PI target, is a special identifier that the processing
application is intended to understand. PI names can be formally declared as
notations (a structure for sending such information). The only restriction is
that PI names may not start with xml, which is reserved for the core XML
standards.
Example
of a Processing Instruction
<?send-message
“process complete”?>
1.1.9
Marked CDATA Sections
Some documents will
contain a large number of characters and text that an XML processor should
ignore and pass to an application. These are known as character data (or CDATA)
sections. Within an XML document, a CDATA section instructs the parser to
ignore all markup characters except the end of the CDATA markup instruction.
This allows for a section of XML code to be “escaped” so that it doesn’t
inadvertently disrupt XML processing.
CDATA sections follow
this general form:
<![CDATA[content]]>
1.1.10
Document Type Definitions
Document Type
Definitions (DTDs) provide a means for defining what XML markup can occur in an
XML document. Basically, the DTD provides a mechanism to guarantee that a given
XML document complies with a well-defined set of rules for document structure
and content. DTDs and the more recent XML Schema are the means for defining the
validity constraints on XML documents.
1.1.11 XML Content
The value of XML is
greatly enhanced by the presence of content within the elements. The content
between XML elements is where most of the value lies in an XML document. When a
DTD or XML Schema is used, users can’t change these portions of the document.
Therefore, the informational content that the metadata describes is precisely
where the variable data resides.
In fact, XML content
can consist of any data at all, including binary data, as long as it doesn’t
violate rules that would confuse the content with valid XML metadata
instructions. XML content can contain any characters, including any valid
Unicode and international characters. The content can be as long as necessary
and contain hundreds of megabytes of textual information, if required. Of
course, the size of the content is an implementation decision.
IT6801 SOA notes, Regulation 2013
1.2
WELL FORMED AND VALID DOCUMENTS
In particular, two
specific descriptions can be applied to XML documents to describe the content
contained within them. XML documents can be well formed, and they can also be
valid.
1.2.1 Well-Formed
Documents
An XML document is
well formed if it follows all the preceding syntax rules of XML. On the other
hand, if it includes inappropriate markup or characters that cannot be processed
by XML parsers, the document cannot be considered well formed. It goes without
saying that an XML document can’t be partially well formed. And, by definition,
if a document is not well formed, it is not XML. This means that there is no
such thing as an XML document that is not well formed, and XML processors are
not required to process these documents.
1.2.2 Valid Documents
A well-formed XML
document is considered valid only if it contains a proper Document Type
Declaration and if the document obeys the constraints of that declaration. In
most cases, the constraints of the declaration will be expressed as a DTD or an
XML Schema. Well-formed XML documents are designed for use without any
constraints, whereas valid XML documents explicitly require these constraint
mechanisms.
Although the creation
of well-formed XML is a simple process, the use of valid XML documents can
greatly improve the quality of document processes. Valid XML documents allow
users to take advantage of content management, business-to-business
transactions, enterprise integration, and other processes that require the
exchange of constrained XML documents. After all, any document can be well
formed, but only specific documents are valid when applied against a
constraining DTD or schema.
1.3
NAMESPACES
Namespaces use a
colon-delimited prefix to associate external semantics with elements that can
be identified via a Universal Resource Identifier (URI). The use of the
namespace-identified element then acts as if the element was defined in a local
manner.
Namespace
Example
<?xml
version=”1.0”?>
<shirt:shirt
xmlns:shirt=”http://xmlshirts.org/schema”
xmlns:apparel=”http://xmlapparel.org/schema”>
<shirt:model>Zippy
Tee</shirt:model>
<apparel:mfgID>KFL233562</apparel:mfgID>
<shirt:description>This
is a <b>funky</b> Tee shirt similar to the Floppy Tee shirt
</shirt:description>
</shirt:shirt>
Because XML is an open
standard in which XML authors are free to create whatever elements and
attributes they wish, it’s inevitable that multiple XML developers will choose
the same element and attribute names for their standards. For instance, let’s
examine the following sample XML document:
<Customer>
<Name>John
Smith</Name>
</Customer>
This sample document
contains the root element <Customer>, which contains a child element
called <Name>. We can clearly determine that the <Name> element
contains the name of the customer referred to by the <Customer> element.
This time, however,
the XML document contains details regarding a product, as shown here:
<Product>
<Name>Hot Dog
Buns</Name>
</Product>
This document contains
a <Product> element as the root element and a <Name> element, which
contains the name of the product. The following XML document could be
constructed to indicate that a customer has placed an order for a particular
product:
<Customer>
<Name>John
Smith</Name>
<Order>
<Product>
<Name>Hot Dog
Buns</Name>
</Product>
</Order>
</Customer>
The first <Name>
element, which appears as a child of the <Customer> element, contains the
customer’s name. The second <Name> element, on the other hand, contains
the product’s name. By using namespaces, XML parsers can easily tell the
difference between the two <Name> elements. Therefore, modifying the
preceding XML document to specify the appropriate namespaces turns it into
this:
<Customer>
<cust:Name
xmlns:cust=”customer-namespace-URI”>John Smith</cust:Name>
<Order>
<Product>
<prod:Name
xmlns:prod=”product-namespace-URI”>Hot Dog Buns</prod:Name>
</Product>
</Order>
</Customer>
Now, the XML parsers
can easily tell the difference between any validation rules between the
customer’s <Name> element and the product’s <Name> element.
1.3.1 Declaring
Namespaces
Within an XML document,
namespaces can be declared using one of two methods:
§ default declaration
§ explicit declaration
A default namespace
declaration specifies a namespace to use for all child elements of the current
element that do not have a namespace prefix associated with them. For instance,
in the following XML document, a default declaration for the <Customer>
element is defined by using the xmlns attribute on the parent element without
specifying or attaching a prefix to the namespace:
<Customer
xmlns=”http://www.eps-software.com/po”>
<Name>Travis
Vandersypen</Name>
<Order>
<Product>
<Name>Hot Dog
Buns</Name>
</Product>
</Order>
</Customer>
Sometimes, however, it
may be necessary and more readable to explicitly declare an element’s
namespace. This is accomplished much the same way in which a default namespace
is declared, except a prefix is associated with the xmlns attribute.
Example:
<po:Customer
xmlns:po=”http://www.eps-software.com/po”>
<po:Name>Travis
Vandersypen</po:Name>
<po:Order>
<po:Product>
<po:Name>Hot Dog
Buns</po:Name>
</po:Product>
</po:Order>
</po:Customer>
1.3.2 Identifying
the Scope of Namespaces
By default, all child
elements within a parent element, unless indicated otherwise by referencing another namespace, appear within the parent’s
namespace. This allows all child elements to “inherit” their parent element’s
namespace. However, this “inherited” namespace can be overwritten by specifying
a new namespace on a particular child element.
Example:
<Customer
xmlns=”http://www.eps-software.com/customer”>
<Name>Travis
Vandersypen</Name>
<Order
xmlns=”http://www.eps-software.com/order”>
<Product>
<Name>Hot
Dog Buns</Name>
</Product>
</Order>
</Customer>
All elements contained
within the <Customer> element that do not explicitly qualify a namespace
“inherit”, the namespace declared by the <Customer> element. However, the
<Order> element also declares a default namespace. Starting at the
<Order> element, all unqualified elements within the <Order>
element will inherit the namespace declared by the <Order> element.
IT6801 SOA notes, Regulation 2013
1.4
DTD
DTD stands for Document
Type Definition. A Document Type Definition allows the XML author to define
a set of rules for an XML document to make it valid. An XML document is
considered “well formed” if that document is syntactically correct according to
the syntax rules of XML 1.0.
The DTD will define
the elements required by an XML document, the elements that are optional, the
number of times an element should (could) occur, and the order in which
elements should be nested. DTD markup also defines the type of data that will
occur in an XML element and the attributes that may be associated with those
elements.
The hierarchical
structure of elements defined in the DTD must be maintained. The values of all
attributes will be checked to ensure that they fall within defined guidelines.
In short, every last
detail of the XML document from top to bottom will be defined and validated by
the DTD. A DTD can ensure that the structure of the XML data does not change
from organization to organization (thus rendering the data corrupt and
useless).
A DTD can be internal,
residing within the body of a single XML document. It can also be external,
referenced by the XML document. A single XML document could even have both a
portion (or subset) of its DTD that is internal and a portion that is external.
a single external DTD can be referenced by many XML documents. Because an
external DTD may be referenced by many documents, it is a good repository for
global types of definitions (definitions that apply to all documents).
Simple
DTD Examples
Example
1 : An Internal DTD
<?xml
version=”1.0”?>
<!DOCTYPE
message [
<!ELEMENT
message (#PCDATA)>
]>
<message>
Let the
good times roll!
</message>
The internal DTD is
contained within the Document Type Declaration, which begins with <!DOCTYPE
and ends with ]>. The Document Type Declaration will appear between the XML
declaration and the start of the document itself (the document or root element)
and identify that section of the XML document as containing a Document Type
Definition. Following the Document Type Declaration (DOCTYPE), the root element
of the XML document is defined (in this case, message). The DTD tells us that
this document will have a single element, message, that will contain parsed
character data (#PCDATA).
Example2: An External
DTD
<?xml version=”1.0”?>
<!DOCTYPE message SYSTEM
“message.dtd”>
<message>
Let the good times roll!
</message>
Here, the DTD is
contained in a separate file, message.dtd. The contents of message.dtd are
assumed to be the same as the contents of the DTD in internal DTD example. The keyword SYSTEM in the Document Type Declaration lets us
know that the DTD is going to be found in a separate file. A URL could have
been used to define the location of the DTD. For example, rather than
message.dtd, the Document Type Declaration could have specified something like
../DTD/message.dtd. Validating XML with the Document Type Definition (DTD)
Both of these examples
show us a well-formed XML document. Additionally, because both XML documents
contain a single element, message, which contains only parsed character data,
both adhere to the DTD. Therefore, they are both also valid XML documents.3
Example 3: Document
Not Valid According to Defined DTD
<?xml
version=”1.0”?>
<!DOCTYPE
message SYSTEM “message.dtd”>
<message>
<text>
Let
the good times roll!
</text>
</message>
Even though this is a
well-formed XML document, it is not valid. When this document is validated
against message.dtd, a flag will be raised because message.dtd does not define
an element named text.
1.4.1 Structure of a
Document Type Definition
The structure of a DTD
consists of a
§ Document Type
Declaration
§ Elements
§ Attributes
§ entities and
§ several other minor
keywords
1.4.2 The Document
Type Declaration
In order to reference
a DTD from an XML document, a Document Type Declaration must be included in the
XML document. There may be one Document Type Declaration per XML document. The
syntax is as follows:
<!DOCTYPE
rootelement SYSTEM | PUBLIC DTDlocation [ internalDTDelements ] >
§
The
exclamation mark (!) is used to signify the beginning of the declaration.
§
DOCTYPE
is the keyword used to denote this as a Document Type Definition.
§ rootelement is the
name of the root element or document element of the XML document.
§ SYSTEM and PUBLIC are
keywords used to designate that the DTD is contained in an external document.
Although the use of these keywords is optional, to reference an external DTD
you would have to use one or the other. The SYSTEM keyword is used in tandem
with a URL to locate the DTD. The PUBLIC keyword specifies some public location
that will usually be some application-specific resource reference.
§ internalDTDelements
are internal DTD declarations. These declarations will always be placed within
opening ([) and closing (]) brackets.
- It is possible for a Document Type Declaration to contain both an external DTD subset and an internal DTD subset. It is possible for a Document Type Declaration to contain both an external DTD subset and an internal DTD subset. In other words, if both the external and internal DTDs define a rule for the same element, the rule of the internal element will be the one used. Consider the Document Type Declaration fragment shown in Example.4.
Example: 4 Internal
and External DTDs
<!DOCTYPE rootelement SYSTEM
“http://www.myserver.com/mydtd.dtd”
[
<!ELEMENT element1
(element2,element3)>
<!ELEMENT element2
(#PCDATA)>
<!ELEMENT element3
(#PCDATA)>
]>
In the above example,
the Document Type Declaration references an external DTD. There is also an
internal subset of the DTD contained in the Document Type Declaration. Any
rules in the external DTD that apply to elements defined in the internal DTD
will be overridden by the rules of the internal DTD.
DTD Elements
All elements in a
valid XML document are defined with an element declaration in the DTD. An
element declaration defines the name and all allowed contents of an element.
Element names must start with a letter or an underscore and may contain any
combination of letters, numbers, underscores, dashes, and periods. Colons
should not be used in element names because they are normally used to reference
namespaces. Each element in the DTD should be defined with the following
syntax:
<!ELEMENT
elementname rule >
• ELEMENT is the tag
name that specifies that this is an element definition.
• element name is the
name of the element.
• rule is the
definition to which the element’s data content must conform.
In a DTD, the elements
are processed from the top down. A validating XML parser will expect the order
of the appearance of elements in the XML document to match the order of
elements defined in the DTD. Elements in a DTD should appear in the order you
want them to appear in an XML document. If the elements in an XML document do
not match the order of the DTD, the XML document will not be considered valid
by a validating parser.
Example 5: contactlist.dtd
<!ELEMENT contactlist
(fullname, address, phone, email) >
<!ELEMENT fullname
(#PCDATA)>
<!ELEMENT address
(addressline1, addressline2)>
<!ELEMENT addressline1 (#PCDATA)>
<!ELEMENT addressline2
(#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT email
(#PCDATA)>
The first element in
the DTD, contactlist, is the document element. The rule for this element is
that it contains (is the parent element of) the fullname, address, phone, and email
elements. The rule for the fullname element, the phone element, and the email element
is that each contains parsed character data (#PCDATA). This means that the
elements will contain marked-up character data that the XML parser will
interpret. The address element has two child elements: addressline1 and
addressline2. These two children elements contain #PCDATA. This DTD defines an
XML structure that is nested two levels deep. The root element, contactlist,
has four child elements. The address element is, in turn, parent to two more
elements. In order for an XML document that references this DTD to be valid, it
must be laid out in the same order, and it must have the same depth of nesting.
DTD Element Rules
All data contained in
an element must follow a set rule. As stated previously, the rule is the
definition to which the element’s data content must conform. There are two
basic types of rules that elements must fall into. The first type of rule deals
with content. The second type of rule deals with structure. First, we will look
at element rules that deal with content.
Content Rules
The content rules for elements
deal with the actual data that defined elements may contain. These rules
include the ANY rule, the EMPTY rule, and the #PCDATA rule.
The
ANY
Rule
This rule states that the element may contain other
elements and/or normal character data (just about anything as long as it is
well formed). An element using the ANY rule would appear as follows:
<!ELEMENT
elementname ANY>
A DTD that defines all
its elements using the ANY rule will always be valid as long as the XML is well
formed. This really precludes any effective validation.
XML
Fragments Using the ANY Rule
<elementname>
This
is valid content
</elementname>
<elementname>
<anotherelement>
This
is more valid content
</anotherelement>
This
is still valid content
</elementname>
<elementname>
<emptyelement
/>
<yetanotherelement>
This
is still valid content!
</yetanotherelement>
Here
is more valid content
</elementname>
The
EMPTY
Rule
This rule is the exact
opposite of the ANY rule. An element that is defined with this rule will
contain no data. However, an element with the EMPTY rule could still contain
attributes (more on attributes in a bit). The following element is an example
of the EMPTY rule:
<!ELEMENT
elementname EMPTY>
The #PCDATA Rule
The #PCDATA rule
indicates that parsed character data will be contained in the element. Parsed
character data is data that may contain normal markup and will be interpreted
and parsed by any XML parser accessing the document. The following element
demonstrates the #PCDATA rule:
<!ELEMENT
elementname (#PCDATA)>
Structure Rules
Whereas the content rules deal
with the actual content of the data contained in defined elements, structure
rules deal with how that data may be organized. There are two types of
structure rules.
§ element only rule.
§ mixed rule.
The “Element Only”
Rule
The “element only”
rule specifies that only elements may appear as children of the current
element. The child element sequences should be separated by commas and listed
in the order they should appear. The following element definition demonstrates
the “element only” rule:
<!ELEMENT
elementname (element1, element2, element3)>
The “Mixed” Rule
The “mixed” rule is
used to help define elements that may have both character data (#PCDATA) and
child elements in the data they contain. A list of options or a sequential list
will be enclosed by parentheses. Options will be separated by the pipe symbol
(|), whereas sequential lists will be separated by commas. The following
element is an example of the “mixed” rule:
<!ELEMENT
elementname (#PCDATA | childelement1 | childelement2)*>
The pipe symbol is used here to
indicate that there is a choice between #PCDATA and each of the child elements.
However, the asterisk symbol (*) is added here to indicate that each of the
items within the parentheses may appear zero or more times.
Element Symbols
Symbol
|
Definition
|
Asterisk (*)
|
The data will appear
zero or more times (0, 1, 2, …). Here’s an example:
<!ELEMENT
children (name*)>
In this example, the
element children could have zero or more occurrences of the child element
name. This type of rule would be useful on a form asking a person about his
or
her children. It is
possible that the person could have no children or many children.
|
Comma (,)
|
Provides separation
of elements in a sequence. Here’s an example:
<!ELEMENT
address (street, city, state, zip)>
In this example, the
element address will have four child elements: street, city, state, and zip.
Each of the child elements must appear in the defined order in the XML document.
|
Parentheses [( )]
|
The parentheses are
used to contain the rule for an element.
Parentheses may also
be used to group a sequence, subsequence, or a set of alternatives in a rule.
Here’s an example:
<!ELEMENT
address (street, city, (state |
province),
zip)>
In this example, the
parentheses enclose a sequence. Additionally, a subsequence is nested within
the sequence by a second set of parentheses. The subsequence indicates that
there will be either a state or a province element in that spot in the main
sequence.
|
Pipe (|)
|
Separates choices in
a set of options. Here’s an example:
<!ELEMENT
dessert (cake | pie)>
The element dessert
will have one child element: either cake or pie.
|
Plus sign (+)
|
Signifies that the
data must appear one or more times (1, 2, 3, …). Here’s an example:
<!ELEMENT
appliances (refrigerator+)>
The appliances
element will have one or more refrigerator child elements. This assumes that
every household has at least one refrigerator.
|
Question mark (?)
|
Data will appear
either zero times or one time in the element. Here’s an example:
<!ELEMENT
employment (company?)>
The element
employment will have either zero occurrences or one occurrence of the child
element company.
|
No symbol
|
When no symbol is
used (other than parentheses), this signifies that the data must appear once
in the XML file.
Here’s an example:
<!ELEMENT
contact (name)>
The element contact
will have one child element: name.
|
1.4.3 DTD Attributes
XML attributes are
name/value pairs that are used as metadata to describe XML elements. XML
attributes are very similar to HTML attributes. In HTML, src is an attribute of
the img tag, as shown in the following example:
<img
src=”images/imagename.gif” width=”10” height=”20”>
Attribute Use in XML
<image
src=”images/” width=”10” height=”20”>
imagename.gif
</image>
src, width, and height
are presented as attributes of the XML element image. This is very similar to
the way that these attributes are used in HTML. The only difference is that the
src attribute merely contains the relative path of the image’s directory and
not the actual name of the image file.
Attribute Types
Type
|
Definition
|
CDATA
|
Characterdata only.
The attribute will contain no markup.
Example: <ATTLIST
box height CDATA ”0”>
In this example, an
attribute, height, has been defined for the element box. This attribute will
contain character data and have a default value of “0”.
|
ENTITY
|
The name of an
unparsed general entity that is declared in the DTD but refers to some
external data (such as an image file). Example: <!ATTLIST img src ENTITY
#REQUIRED>
The src attribute is
an ENTITY type that refers to some external image file.
|
ENTITIES
|
This is the same as
the ENTITY type but represents multiple values listed in sequential order,
separated by whitespace. Example: <!ATTLIST imgs srcs ENTITIES
#REQUIRED>
|
ID
|
An attribute that
uniquely identifies the element. The value for this type of attribute must be
unique within the XML document. Each element may only have a single ID
attribute, and the value of the ID attribute must be a valid XML name,
meaning that it may not start with a numeric digit (which precludes the use
of a simple numbering system for IDs). Example: <!ATTLIST cog serial ID
#REQUIRED>
|
IDREF
|
This is the value of
an ID attribute of another element in the document. It’s used to establish a
relationship with other tags when there is not necessarily a parent/child
relationship. Example: <!ATTLIST person cousin IDREF #IMPLIED>
|
IDREFS
|
This is the same as
IDREF; however, it represents multiple values listed in sequential order,
separated by whitespace. Example: <!ATTLIST person cousins IDREFS
#IMPLIED>
|
NMTOKEN
|
Restricts the value
of the attribute to a valid XML name. Example: <!ATTLIST address country
NMTOKEN “usa”>
|
NMTOKENS
|
This is the same as
NMTOKENS; however, it represents multiple values listed in sequential order,
separated by whitespace. Example:
<!ATTLIST region states NMTOKENS “KS OK” >
|
NOTATION
|
This type refers to
the name of a notation declared in the DTD (more on notations later). It is
used to identify the format of non-XML data. An example would be using the
NOTATION type to refer to an external application that will interact with the
document. Example: <!ATTLIST music
play NOTATION “mplayer2.exe “>
|
Enumerated
|
This type is not an
actual keyword the way the other types are. It is actually a listing of
possible values for the attribute separated by pipe symbols (|).Example:
<!ATTLIST college grad (1|0) “1”>
|
1.4.4 DTD Entities
Entities in DTDs are
storage units. They can also be considered placeholders. Entities are special
markups that contain content for insertion into the XML document. An entity’s
content could be well-formed XML, normal text, binary data, a database record,
and so on. The main purpose of an entity is to hold content, and there is
virtually no limit on the type of content an entity can hold.
The general syntax of
an entity is as follows:
<!ENTITY entityname
[SYSTEM | PUBLIC] entitycontent>
§ ENTITY is the tag name
that specifies that this definition will be for an entity.
§ entityname is the name
by which the entity will be referred in the XML document.
§ entitycontent is the
actual contents of the entity—the data for which the entity is serving as a
placeholder.
§ SYSTEM and PUBLIC are
optional keywords. Either one can be added to the definition of an entity to
indicate that the entity refers to external content.
Entities may either
point to internal data or external data. Internal entities represent data that
is contained completely within the DTD. External entities point to content in
another location via a URL. The type of data to which an external entity can
refer is virtually unlimited. An entity is referenced in an XML document by
inserting the name of the entity prefixed by & and suffixed by ;. When
referenced in this manner, the content of the entity will be placed into the
XML document when the document is parsed and validated.
Example:
<?xml
version=”1.0”?>
<!DOCTYPE
library [
<!ENTITY
cpy “Copyright 2000”>
<!ELEMENT
library (book+)>
<!ELEMENT
book (title,author,copyright)>
<!ELEMENT
title (#PCDATA)>
<!ELEMENT
author (#PCDATA)>
<!ELEMENT
copyright (#PCDATA)>
]>
<library>
<book>
<title>How
to Win Friends</title>
<author>Joe
Charisma</author>
<copyright>&cpy;</copyright>
</book>
<book>
<title>Make
Money Fast</title>
<author>Jimmy
QuickBuck</author>
<copyright>&cpy;</copyright>
</book>
</library>
Predefined Entities
There are five
predefined entities, as shown in Table below. These entities do not have to be
declared in the DTD. When an XML parser encounters these entities (unless they
are contained in a CDATA section), they will automatically be replaced with the
content they represent.
Entity
|
Content
|
&
|
&
|
<
|
<
|
>
|
>
|
"
|
“
|
'
|
‘
|
Example:
<icecream>
<flavor>Cherry
Garcia</flavor>
<vendor>Ben
& Jerry’s</vendor>
</icecream>
In this example, the
ampersand in “Ben & Jerry’s” is replaced with the predefined entity for an
ampersand (&).
External Entities
External entities are
used to reference external content. XML is incredibly flexible. External
entities can contain references to almost any type of data—even other XML
documents. One well-formed XML document can contain another well-formed XML
document through the use of an external entity reference.
Example:
<?xml
version=”1.0”?>
<!DOCTYPE
employees [
<!ENTITY
bob SYSTEM “http://srvr/emps/bob.xml”>
<!ENTITY
nancy SYSTEM “http://srvr/emps/nancy.xml”>
<!ELEMENT
employees (clerk)>
<!ELEMENT
clerk (#PCDATA)>
]>
<employees>
<clerk>&bob;</clerk>
<clerk>&nancy;</clerk>
</employees>
Non-Text External
Entities and Notations
Some external entities
will contain non-text data, such as an image file.
<!ENTITY myimage
SYSTEM “myimage.gif” NDATA gif>
The NDATA keyword is
used to alert the parser that the entity content should be sent unparsed to the
output document.
The final part of the
declaration, gif, is a reference to a notation. A notation is a special
declaration that identifies the format of non-text external data so that the
XML application will know how handle the data. Any time an external reference
to non-text data is used, a notation identifying the data must be included and
referenced. Notations are declared in the body of the DTD and have the
following syntax:
<!NOTATION
notationname [SYSTEM | PUBLIC ] dataformat>
Here, notation name is
the name by which the notation will be referred in the XML document. SYSTEM is
a keyword that is added to the definition of the notation to indicate that the
format of external data is being defined. You could also use the keyword PUBLIC
here instead of SYSTEM. However, using PUBLIC requires you to provide a URL to
the data format definition. dataformat is a reference to a MIME type, ISO
standard, or some other location that can provide a definition of the data being
referenced.
Example:
<!NOTATION
gif SYSTEM “image/gif” >
<!ENTITY
employeephoto SYSTEM “images/employees/MichaelQ.gif” NDATA gif >
<!ELEMENT
employee (name, sex, title, years) >
<!ATTLIST
employee pic ENTITY #IMPLIED >
…
<employee
pic=”employeephoto”>
…
</employee>
Parameter Entities
It is very
similar to the internal entity. The main difference between an internal entity
and a parameter entity is that a parameter entity may only be referenced inside
the DTD. Parameter entities are in effect entities specifically for DTDs.
Parameter entities can
be useful when you have to use a lot of repetitive or lengthy text in a DTD.
Use the following syntax for parameter entities:
<!ENTITY
% entityname entitycontent>
The syntax for a
parameter entity is almost identical to the syntax for a normal, internal
entity. However, notice that in the syntax, after the declaration, there is a
space, a percent sign, and another space before entityname. This alerts the XML
parser that this is a parameter entity and will be used only in the DTD. These
types of entities, when referenced, should begin with % and end with ;.
Example:
<!ENTITY
% pc “(#PCDATA)”>
<!ELEMENT
name %pc;>
<!ELEMENT
age %pc;>
<!ELEMENT
weight %pc;>
1.4.5 More DTD
Directives
These are keywords namely, INCLUDE and IGNORE, and they do just what their
names suggest—they indicate pieces of markup that should either be included in
the validation process or ignored.
The IGNORE Keyword
For using normal
command, we can use IGNORE directive.
Example:
<![
IGNORE
This
is the part of the DTD ignored
]]>
The INCLUDE Keyword
The INCLUDE directive
marks declarations to be included in the document. It is very similar to the
syntax for the IGNORE directive.
Example:
<![
INCLUDE
This
is the part of the DTD included
]]>
The INCLUDE directive
follows the same basic rules as the IGNORE directive. It may enclose entire
declarations but not pieces of declarations. The INCLUDE directive can be
useful when you’re in the process of developing a new DTD or adding to an
existing DTD.
Comments Within a DTD
Comments can also be
added to DTDs. Comments within a DTD are just like comments in HTML and take
the following syntax:
<!--
Everything between the opening tag and closing tag is a comment -->
IT6801 SOA notes, Regulation 2013
1.5 XML
SCHEMA
The
XML Schema Definition Language solves a number of problems posed with Document
Type Definitions.
1.5.1 Creating XML Schemas
One
of the first things that comes to mind for most people when authoring an XML
schema is the level of complexity that accompanies it. Table below shows a
complete list of every element the XML Schema Definition Language supports.
Element Name
|
Description
|
All
|
Indicates
that the contained elements may
appear in any order within a parent element.
|
Any
|
Indicates
that any element within the specified namespace may appear within the parent
element’s definition. If a type is not specifically declared, this is the default.
|
anyAttribute
|
Indicates
that any attribute within the specified namespace may appear within the parent
element’s definition.
|
Annotation
|
Indicates an annotation to the schema.
|
Appinfo
|
Indicates
information that can be used by an
application.
|
Attribute
|
Declares an occurrence of an attribute.
|
attributeGroup
|
Defines
a group of attributes that can be
included within a parent element.
|
Choice
|
Indicates
that only one contained element or
attribute
may appear within a parent element.
|
complexContent
|
Defines
restrictions and/or extensions to a
complexType.
|
complexType
|
Defines a complex element’s construction
|
Documentation
|
Indicates
information to be read by an individual.
|
Element
|
Declares an occurrence of an element.
|
Extension
|
Extends the contents of an element
|
Field
|
Indicates
a constraint for an element using XPath.
|
Group
|
Logically
groups a set of elements to be included together within another element definition.
|
Import
|
Identifies
a namespace whose schema elements and attributes can be referenced within the
current schema.
|
Include
|
Indicates
that the specified schema should be included in the target namespace.
|
Key
|
Indicates
that an attribute or element value is a key within the specified scope.
|
Keyref
|
Indicates
that an attribute or element value should correspond with those of the specified
key or unique element.
|
List
|
Defines
a simpleType element as a list of values of a specified data type.
|
Notation
|
Contains a notation definition.
|
Redefine
|
Indicates
that simple and complex types, as well as groups and attribute groups from an
external schema, can be redefined.
|
Restriction
|
Defines
a constraint for the specified element.
|
Schema
|
Contains the schema definition.
|
Selector
|
Specifies
an XPath expression that selects a set of elements for an identity
constraint.
|
Sequence
|
Indicates
that the elements within the specified group must appear in the exact order they appear within the schema.
|
simpleContent
|
Defines
restrictions and/or extensions of a simpleType element.
|
simpleType
|
Defines
a simple element type.
|
Union
|
Defines
a simpleType element as a collection
of
values from specified simple data types.
|
Unique
|
Indicates
that an attribute or element value must be unique within the specified scope.
|
Authoring
an XML schema consists of declaring elements and attributes as well as the “properties”
of those elements and attributes.
1.5.2 Declaring Attributes
Attributes
in an XML document are contained by elements. To indicate that a complex
element has an attribute, use the <attribute> element of the XML Schema
Definition Language.
Example : Purchase order schema
<xsd:complexType
name=”ProductType”>
<xsd:attribute name=”Name”
type=”xsd:string”/>
<xsd:attribute name=”Id”
type=”xsd:positiveInteger”/>
<xsd:attribute
name=”Price”>
<xsd:simpleType>
<xsd:restriction
base=”xsd:decimal”>
<xsd:fractionDigits
value=”2”/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name=”Quantity”
type=”xsd:positiveInteger”/>
</xsd:complexType>
When
declaring an attribute, its type should be specified.
Table 1.5.a Some simple XML data types
Data Type
|
Description
|
anyURI
|
Represents a Uniform Resource
Identifier
(URI).
|
base64Binary
|
Represents Base-64-encoded
binary data.
|
Boolean
|
Represents Boolean values (True
and
False).
|
Byte
|
Represents an integer ranging
from -128
to 127. This type is derived
from short.
|
Date
|
Represents a date
|
dateTime
|
Represents a specific time on a
specific
date.
|
Decimal
|
Represents a variable-precision
number.
|
Double
|
Represents a double-precision,
64-bit,
floating-point number.
|
Duration
|
Represents a duration of time.
|
ENTITIES
|
Represents a set of values of
the ENTITY
type.
|
ENTITY
|
Represents the ENTITY attribute
type in
XML 1.0. This type is derived
from
NCName.
|
Float
|
Represents a single-precision,
32-bit,
floating-point number.
|
gDay
|
Represents a recurring
Gregorian day of
the month.
|
gMonth
|
Represents a Gregorian month.
|
gMonthDay
|
Represents a recurring
Gregorian date
|
gYear
|
Represents a Gregorian year.
|
gYearMonth
|
Represents a specific Gregorian
month in
a specific Gregorian year.
|
hexBinary
|
Represents hex-encoded binary
data
|
Some primitive data types are:
• anyURI
• base64Binary
• Boolean
• date
• dateTime
• decimal
• double
• duration
1.5.3 Declaring Elements
Elements within an XML
schema can be declared using the <element> element from the XML Schema
Definition Language.
Example:
<xsd:element
name=’PurchaseOrder’ type=’PurchaseOrderType’/>
<xsd:complexType
name=”PurchaseOrderType”>
<xsd:all>
<xsd:element
name=”ShippingInformation” type=”InfoType”
minOccurs=”1”
maxOccurs=”1”/>
<xsd:element
name=”BillingInformation” type=”InfoType”
minOccurs=”1”
maxOccurs=”1”/>
<xsd:element
name=”Order” type=”OrderType”
minOccurs=”1”
maxOccurs=”1”/>
</xsd:all>
<xsd:attribute
name=”Tax”>
<xsd:simpleType>
<xsd:restriction
base=”xsd:decimal”>
<xsd:fractionDigits
value=”2”/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute
name=”Total”>
<xsd:simpleType>
<xsd:restriction
base=”xsd:decimal”>
<xsd:fractionDigits
value=”2”/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
An element’s type can
be defined with either a <complexType> element, a <simpleType>
element, a <complexContent> element, or a <simpleContent> element. The
validation requirements for the document will influence the choice for an
element’s type.
The basic construction
of an element declaration using the <element> element within the XML
Schema Definition Language is as follows:
<element name=””
[type=””] [abstract=””] [block=””]
[default=””]
[final=””] [fixed=””] [minOccurs=””]
[maxOccurs=””]
[nillable=””] [ref=””]
[substitutionGroup=””]/>
The block attribute
prevents any element with the specified derivation type from being used in place
of the element. The block attribute may contain any of the following values:
§ #all
§ extension
§ restriction
§ substitution
If the value #all is
specified within the block attribute, no elements derived from this element
declaration may appear in place of this element. A value of extension prevents
any element whose definition has been derived by extension from appearing in
place on this element. If a value of restriction is assigned, an element
derived by restriction from this element declaration is prevented from
appearing in place of this element. Finally, a value of substitution indicates
that an element derived through substitution cannot be used in place of this
element.
The default attribute
may only be specified for an element based on a simpleType or whose content is
text only. The minOccurs and maxOccurs attributes specify the minimum and
maximum number of times this element may appear within a valid XML document.
The nillable attribute
indicates whether an explicit null value can be assigned to the element. If
this particular attribute is omitted, it is assumed to be false. If this
attribute has a value of true, the nil attribute for the element will be true.
The fixed attribute specifies that the element has a constant, predetermined
value. This attribute only applies to those elements whose type definitions are
based on simpleType or whose content is text only.
1.5.4
Declaring Complex Elements
These elements are declared as <complexType>
element.
Example:
<xsd:complexType
name=”PurchaseOrderType”>
<xsd:all>
<xsd:element
name=”ShippingInformation” type=”InfoType”
➥minOccurs=”1”
maxOccurs=”1”/>
<xsd:element
name=”BillingInformation” type=”InfoType”
➥minOccurs=”1”
maxOccurs=”1”/>
<xsd:element
name=”Order” type=”OrderType”
➥minOccurs=”1”
maxOccurs=”1”/>
</xsd:all>
<xsd:attribute
name=”Tax”>
<xsd:simpleType>
<xsd:restriction
base=”xsd:decimal”>
<xsd:fractionDigits
value=”2”/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute
name=”Total”>
<xsd:simpleType>
<xsd:restriction
base=”xsd:decimal”>
<xsd:fractionDigits
value=”2”/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
The basic syntax for
the <complexType> element is as follows:
<xsd:complexType
name=’’ [abstract=’’] [base=’’] [block=’’]
➥[final=’’]
[mixed=’’]/>
The abstract attribute
indicates whether an element may define its content directly from this type
definition or it must define its content from a type derived from this type
definition. If this attribute is true, an element must define its content from
a derived type definition. If this attribute is omitted or its value is false,
an element may define its content directly based on this type definition.
The
base attribute specifies the data type for the element. The block attribute indicates
what types of derivation are prevented for this element definition. This attribute can contain any of the
following values:
§
#all
§
extension
§
restriction
A value of #all
prevents all complex types derived from this type definition from being used in
place of this type definition. A value of extension prevents complex type definitions
derived through extension from being used in place of this type definition.
Assigning a value of
restriction prevents a complex type definition derived through restriction from
being used in place of this type definition. If this attribute is omitted, any
type definition derived from this type definition may be used in place of this
type definition.
The mixed attribute
indicates whether character data is permitted to appear between the child
elements of this type definition. If this attribute is false or is omitted, no
character may appear. If the type definition contains a simpleContent type
element, this value must be false. If the complexContent element appears as a child
element, the mixed attribute on the complexContent element can override the
value specified in the current type definition.
A <complexType>
element in the XML Schema Definition Language may contain only one of the
following elements:
§
all
§
choice
§
complexContent
§
group
§
sequence
§
simpleContent
1.5.5
Declaring Simple Types
These element type
definitions support an element based on the simple XML data types listed in
Table 6.5.a or any simpleType declaration within the current schema.
Example:
<xsd:simpleType
name=”PaymentMethodType”>
<xsd:restriction
base=”xsd:string”>
<xsd:enumeration
value=”Check”/>
<xsd:enumeration
value=”Cash”/>
<xsd:enumeration
value=”Credit Card”/>
<xsd:enumeration
value=”Debit Card”/>
<xsd:enumeration
value=”Other”/>
</xsd:restriction>
</xsd:simpleType>
The basic syntax for
defining a simpleType element definition is as follows:
<xsd:simpleType
name=’’>
<xsd:restriction
base=’’/>
</xsd:simpleType>
The base attribute
type may contain any simple XML data type listed in Table 6.5.a or any
simpleType declared within the schema. Specifying the value of this attribute
determines the type of data it may contain. A simpleType may only contain a
value; not other elements or attributes.
Two other methods are
available to an XML schema author to “refine” a simple type definition:
<list> and <union>. The <list> element allows an element or
attribute based on the type definition to contain a list of values of a
specified simple data type. The <union> element allows you to combine two
or more simple type definitions to create a collection of values.
1.5.6 Anonymous Type Declarations
Anonymous Type declarations are
used in XML to create a separate type definition for an element or attribute.
<xsd:complexType
name=”InfoType”>
<xsd:sequence>
<xsd:element
name=”Name” minOccurs=”1” maxOccurs=”1”>
<xsd:simpleType>
<xsd:restriction
base=”xsd:string”/>
</xsd:simpleType>
</xsd:element>
<xsd:element
name=”Address” type=”AddressType” minOccurs=”1”
➥maxOccurs=”1”/
<xsd:choice
minOccurs=”1” maxOccurs=”1”>
<xsd:group
ref=”BillingInfoGroup”/>
<xsd:group
ref=”ShippingInfoGroup”/>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
This section defines
the type definition for InfoType. If you look closely, you’ll see the
declaration of a <Name> element that does not have a type attribute
specified. Instead, the <element> element, itself, contains a
<simpleType> element without a name attribute specified. This is known as
an “anonymous” type definition.
1.5.7 Annotating
Schemas
The XML
Schema Definition Language defines three new elements to add annotations to an
XML schema:
•
<annotation>
•
<appInfo>
•
<documentation>
The <annotation>
element contains the <appInfo> and <documentation> elements. In other
words, you cannot use the <appInfo> and <documentation> elements by
themselves— they must be contained within an <annotation> element.
<xsd:annotation>
<xsd:documentation>
Purchase order schema
for an online grocery store.
</xsd:documentation>
</xsd:annotation>
In the preceding
example, the <annotation> and <documentation> elements help to identify
the purpose of this particular XML schema. In the preceding example, the
<annotation> and <documentation> elements help to identify the
purpose of this particular XML schema.
For the
<documentation> element, the information it contains is meant to be read
by users, whereas the information contained within an <appInfo> element
is meant to be read and utilized by applications.
1.5.8 Model Groups
A model group,
at least in terms of a schema definition, is a logically grouped set of
elements. A model group within the XML Schema Definition Language consists of a
“compositor” and a list of “particles” (or element declarations). A model group
can be
constructed using one
of the following XML Schema Definition elements:
§ <all>
§ <choice>
§ <sequence>
Here’s the basic syntax for the
<group> element:
<group name=”” [maxOccurs=””]
[minOccurs=””] [ref=””]>
.
.
.
</group>
By default, the
maxOccurs and minOccurs attributes are set to 1. The ref attribute is used
after you have defined the <group> element and you wish to reference it,
as the following example shows:
<xsd:group
name=”exampleGroup”>
<xsd:all>
<xsd:element
name=”Element1” type=”xsd:string”/>
<xsd:element
name=”Element2” type=”xsd:string”/>
<xsd:element
name=”Element3” type=”xsd:string”/>
</xsd:all>
</xsd:group>
<xsd:element
name=”ParentElement”>
<xsd:complexType>
<xsd:group
ref=”exampleGroup”/>
</xsd:complexType>
</xsd:element>
All Groups
The <all>
element indicates that the elements declared within it may appear in any order
within the parent element.
Sequences
The <sequence>
element in the XML Schema Definition Language requires the elements contained
within it to appear in the same order in the parent element.
Attribute Groups
Just as you can
logically group a set of elements together using the <group> element
within the XML Schema Definition Language, you can create a logical group of
attributes to do the same thing. Here’s the basic syntax for the
<attributeGroup>
element:
<attributeGroup
[name=””] [ref=””]>
<attribute
…/>
<attribute
…/>
.
.
.
</attributeGroup>
1.5.9 Targeting
Namespaces
Namespaces allow us to
distinguish element declarations and type definitions of one schema from
another. We can assign an intended namespace for an XML schema by using the targetNamespace attribute on the
<schema> element. By assigning a target namespace for the schema, we
indicate that an XML document whose elements are declared as belonging to the
schema’s namespace should be validated against the XML schema.
IT6801 SOA notes, Regulation 2013
1.6
X- Files
1.6.1
X – Path
The XML Path Language
(XPath) is a standard for creating expressions that can be used to find
specific pieces of information within an XML document. XPath expressions are
used by both XSLT (for which XPath provides the core functionality) and
XPointer to locate a set of nodes.
XPath expressions have
the ability to locate nodes based on the nodes’ type, name, or value or by the
relationship of the nodes to other nodes within the XML document. In addition
to being able to find nodes based on these criteria, an XPath expression can
also return any of the following:
§ A node set
§ A Boolean value
§ A string value
§ A numeric value
1.6.1.1 Operators and
Special Characters
XPath expressions are
composed using a set of operators and special characters, each with its own
meaning. Table 5.6.a lists the various operators and special characters used
within the XML Path Language.
Table 5.6.a Operators and Special Characters for the XML
Path Language
Operators and Special Characters
|
Description
|
/
|
Selects the children from the
node set on the left side of this character
|
//
|
Specifies that the matching
node set should be located at any level within the XML document
|
.
|
Specifies the current context
should be used
|
*
|
A wildcard character that
selects all elements or attributes regardless of name
|
@
|
Selects an attribute
|
:
|
Namespace separator
|
()
|
Indicates a grouping within an
XPath expression
|
[expression]
|
Indicates a filter expression
|
[n]
|
Indicates that the node with
the specified index should be selected
|
+
|
Addition operator
|
-
|
Subtraction operator
|
Div
|
Division operator
|
*
|
Multiplication operator
|
Mod
|
Returns the remainder of a
division operation
|
The priority for evaluating XPath
expressions is as follows:
1. Grouping
2. Filters
3. Path operations
1.6.1.2 XPath Syntax
The XML Path Language
provides a declarative notation, termed a pattern, used to select the
desired set of nodes from XML documents. Each pattern describes a set of
matching nodes to select from a hierarchical XML document. Each pattern
describes a “navigation” path to the desired set of nodes similar to the
Uniform Resource Identifier (URI) syntax.
A “location path” is
needed to locate the result nodes. These location paths select the resulting
node set relative to the current context. A location path is, itself, made up
of one or more location steps. Each step is further comprised of three pieces:
• An axis
• Anode test
• A predicate
Syntax
for an XPath expression
axis::node
test[predicate]
Axes
The axis portion of
the location step identifies the hierarchical relationship for the
desired nodes from the
current context.
TABLE 1.6.b XPath Axes for a
Location Step
Axis
|
Description
|
Ancestor
|
Specifies that the
query should locate the ancestors of the current context node, which includes the parent node, the
parent’s parent node, and ultimately the root node.
|
ancestor-or-self
|
Indicates that in
addition to the ancestors of the
current context node, the context node should also be included in the
resulting node set.
|
Attribute
|
Specifies that the
attributes of the current context node are desired.
|
Child
|
Specifies that the
immediate children of the current context node are desired.
|
Descendant
|
Specifies that in
addition to the immediate children of the current context node, the
children’s children are also desired.
|
descendant-or-self
|
Indicates that in
addition to the descendants of the current context node, the current context
node is also desired.
|
Following
|
Specifies that nodes
in the same document as the current
context node that appear after the current context node should be selected.
|
following-sibling
|
Specifies that all the
following siblings of the current context node should be selected.
|
Namespace
|
Specifies that all the nodes
within the same namespace as the current context node should be selected.
|
Parent
|
Selects the parent of the
current context node.
|
Preceding
|
Selects the nodes within the
document that appear before the current context node.
|
preceding-sibling
|
Selects the siblings
of the current context node that
appear before the current context node.
|
Self
|
Selects the current context
node.
|
Node Tests
The node test portion
of a location step indicates the type of node desired for the results. Every
axis has a principal node type: If an axis is an element, the principal node
type is element; otherwise, it is the type of node the axis can contain. For
instance, if the axis is attribute, the principal node type is attribute.
In addition to
specifying an actual node name, other node tests are available to select the
desired nodes. Here’s a list of these node tests:
• comment()
• node()
•
processing-instruction()
• text()
Predicates
The predicate portion
of a location step filters a node set on the specified axis to create a new
node set. A predicate may consist of a filter condition that is applied to an
axis that either directs the condition in a forward or reverse direction. A
forward axis predicate contains the current context node and nodes that follow
the context node. A reverse axis predicate contains the current context node
and nodes that precede the context node.
A predicate within a
location step may contain an expression that, when evaluated, results in a
Boolean (or logical) value that can be either True or False. XPath
predicates contain a Boolean comparison as shown below:
TABLE 1.6.c Boolean
Operators and Their Respective Descriptions
Boolean
Operator
|
Description
|
>
|
Greater than
|
>=
|
Greater than or equal to
|
<
|
Less than
|
<=
|
Less than or equal to
|
=
|
Equal to
|
!=
|
Not equal to
|
XPath Functions
XPath functions are
used to evaluate XPath expressions and can be divided into one of four main
groups:
• Boolean
• Node set
• Number
• String
1.6.2 XPointer
The XML Pointer
Language (XPointer), currently in the candidate recommendation stage of the W3C
approval process, builds on the XPath specification. XPointer provides two more
important node tests:
• point()
• range()
These two additional
node tests correspond to the new functionality added by XPointer. For this new
functionality to work correctly, the XPointer specification added the concept
of a location within an XML document. Within XPointer, a location can be an
XPath node, a point, or a range.
For an XPath
expression, the result from a location step is known as a node set; for an
XPointer expression, the result is known as a location set. Because an
XPointer expression can yield a result consisting of points or ranges, the idea
of the node set had to be extended to include these types. Therefore, to
prevent confusion, the results of an XPointer expression are referred to location
sets.
TABLE 1.6.d Some
XPointer Functions That Return Location Sets
Function
|
Description
|
id()
|
Selects all nodes with the
specified ID
|
root()
|
Selects the root element as the
only location in a location set
|
here()
|
Selects the current element
location in a location set
|
origin()
|
Selects the current element
location for a node using an out-of-line link
|
The id() function
works exactly the same as the id() function for an XPath expression. The root()
function works just like the / character—it indicates the root element of an XML
document.
The next two
functions, here() and origin(), are interesting functions in their own right.
The here() function, as indicated, refers to the current element. Because an
XPointer expression can be located in a text node or in an attribute value,
this function could be used to refer to the current element rather than simply
the current node. The origin() function works much the same as the here()
function, except that it refers to the originating element. The key idea here
is that the originating element does not need to be located within the same
document as the resulting location set.
1.6.2.1 Points
Many times a link from
one XML document into another must locate a specific point within the target
document. A node point could be considered to be the gap between the child
nodes of a container node. Two different types of points can be represented
using XPointer points:
• Node points
• Character points
When the origin node
is a text node, the index position indicates the number of characters. These
location points are referred to as character points. By specifying 0 for
the index position in a character point, the point is considered to be
immediately before the first character in the text string. For a character
point, the point, conceptually, represents the space between the characters of
a text string.
1.6.2.2 Ranges
An XPointer range
defines just that—a range consisting of a start point and an endpoint. A range
will contain the XML between the start point and endpoint but does not necessarily
have to consist of neat subtrees of an XML document. A range can extend over
multiple branches of an XML document. The only criterion is that the start
point and endpoint must be valid.
A range can be specified by using
the keyword namely, start-point() and end-point() functions. For instance, the
following expression specifies a range beginning at the first character in the
<Name> element for Dillon Larsen and ending after the ninth character in
the <Name> element for Dillon Larsen:
/People/Person[1]/Name/text()start-point()[position()=0]
to
➥/People/Person[1]/Name/text()start-point()[position()=9]
In this example, two
node points are used as the starting and ending points for the range.
TABLE 1.6.e XPointer
Range Functions
Function
|
Description
|
end-point()
|
Selects a location set
consisting of the endpoints of the desired location steps
|
range-inside()
|
Selects the range(s) covering
each location in the location-set argument
|
range-to()
|
Selects a range that completely
covers the locations within the location-set argument
|
start-point()
|
Selects a location set
consisting of the start points of the desired location steps
|
The general syntax for
string-range() is as follows:
string-range(location-set,
string, [index, [length]])
1.6.3 XLink
The anchor element,
<a>, within HTML indicates a link to another resource on an HTML page.
This could be a location within the same document or a document located
elsewhere. In HTML terms, the anchor element creates a hyperlink to another
location. The hyperlink can either appear as straight text, a clickable image,
or a combination of both.
The XML Linking
Language creates a link to another resource through the use of attributes
specified on elements, not through the actual elements themselves. The XML
Linking Language specification supports the attributes listed in Table 1.6.f.
Table 1.6.f.
XLink Attributes
Attribute
|
Description
|
xlink:type
|
This attribute must
be specified and
indicates
what type of XLink is represented or
defined.
|
xlink:href
|
This
attribute contains the information necessary to locate the desired resource.
|
xlink:role
|
This attribute
describes the function of the
link
between the current resource and another.
|
xlink:arcrole
|
This
attributes describes the function of the link between the current resource and
another.
|
xlink:title
|
This
attribute describes the meaning of the link between the resources.
|
xlink:show
|
This
attribute indicates how the resource linked to should be displayed.
|
xlink:actuate
|
This
attribute specifies when to load the linked resource.
|
xlink:label
|
This
attribute is used to identify a name
or a target resource.
|
xlink:from
|
This
attribute identifies the starting resource.
|
xlink:to
|
This
attribute identifies the ending resource.
|
The xlink:type
attribute must contain one of the following values:
• simple
• extended
• locator
• arc
• resource
• title
• none
A value of simple creates a simple link between
resources. Indicating a value of
extended creates an extended link. A value of locator creates a link that points to another resource. A value of arc creates an arc with multiple
resources and various traversal paths. A resource
value creates a link to indicate a specific resource. A value of title creates a title link. By
specifying a value of none for the xlink:type attribute, the parent element has
no XLink meaning, and no other XLink-related content or attributes have any
relationship to the element. For all intents and purposes, a value of none removes the ability to link to
another resource from an element.
The xlink:role
attribute specifies the function of the link. This attribute may only be used
for the following XLink types:
• extended
• simple
• locator
• resource
The xlink:arcrole
attribute may only be used with two types of XLinks:
• arc
• simple
The xlink:title
attribute is completely optional and is provided for us to make some sense of
and document. If the xlink:title
attribute is specified, it should contain a string describing the resource.
The xlink:show
attribute is an optionally specified attribute for a link for the simple and
arc XLink types and will accept the following values:
• new
• replace
• embed
• other
• none
The xlink:actuate
attribute is used to indicate when the linked resource should be loaded. This
attribute will accept the following values:
• onLoad
• onRequest
• other
• none
The xlink:label
attribute is used to name resource and locator XLink types. This value will end
up being used as values within the xlink:from and xlink:to attributes to
indicate the starting and ending resources for an arc XLink type.
1.6.3.1 Simple Links
A simple link combines
the functionality provided by the different pieces available through an
extended link together into a shorthand notation. A simple link consists of an
xlink:type attribute with a value of simple and, optionally, an xlink:href
attribute with a specified value. A simple link may have any content, and even
no content; it is up to the application to provide some means to generate a
traversal request for the link. If no target resource is specified with the xlink:href
attribute, the link is simply considered “dead” and will not be traversable.
Simple links play
multiple roles in linking documents. They link exactly two resources together:
one local and one remote. Therefore, if something more complex
must be handled, an
extended link is necessary.
1.6.3.2 Extended Links
Within the XML Linking
Language, extended links give you the ability to specify relationships between
an unlimited number of resources, both local and remote. Local resources are
part of the actual extended link, whereas remote resources identify external
resources to the link. An out-of-line link is created when there are no local
resources at all for a link.
Two
marks
1. What are the types
of XML document?
·
well-formed
XML document
·
well-formed
and valid XML document
·
use-case
document.
2. Write about well
formed xml document and valid xml document.
An
XML document with correct syntax is well formed document whereas valid XML
document must be well formed. In addition it must conform to a document type
definition.
3. What are the
general forms of Document Type Declaration?
<!DOCTYPE NAME
SYSTEM “file”>
<!DOCTYPE NAME [
]>
<!DOCTYPE
NAME SYSTEM “file” [ ]>
4. What is the use of
XML Namespace?
XML
Namespaces provide a method to avoid element name conflicts. Namespaces use a
colon-delimited prefix to associate external semantics with elements that can
be identified via a Universal Resource Identifier (URI).
5. What are the
methods used for declaring namespaces?
Within an XML
document, namespaces can be declared using one of two methods:
·
default
declaration
·
explicit
declaration
6. Write about
internal DTD. Give example.
If
the DTD is declared inside the XML file, it must be wrapped inside the
<!DOCTYPE> definition.
<?xml
version=”1.0”?>
<!DOCTYPE
message [
<!ELEMENT
message (#PCDATA)>
]>
<message>
Let
the good times roll!
</message>
7. Give an example for
External DTD.
If
the DTD is declared in an external file, the <!DOCTYPE> definition must
contain a reference to the DTD file.
<?xml version=”1.0”?>
<!DOCTYPE message
SYSTEM “message.dtd”>
<message>
Let the good times
roll!
</message>
8.
What is DTD element and write the syntax .
All elements in a
valid XML document are defined with an element declaration in the DTD. An
element declaration defines the name and all allowed contents of an element.
Each element in the DTD should be defined with the following syntax:
<!ELEMENT
elementname rule >
9. Define content rule
for DTD element.
The content rules for
elements deal with the actual data that defined elements may contain. These
rules include the ANY rule, the EMPTY rule, and the #PCDATA rule.
10. Write about the
ANY rule.
This rule states that the element may contain other
elements and/or normal character data. ). An element using the ANY rule would appear
as follows:
<!ELEMENT
elementname ANY>
11.
What is the syntax for the #PCDATA Rule? What it means?
The #PCDATA rule
indicates that parsed character data will be contained in the element. Parsed
character data is data that may contain normal markup and will be interpreted
and parsed by any XML parser accessing the document. The following element
demonstrates the #PCDATA rule:
<!ELEMENT
elementname (#PCDATA)>
12. What is meant by
structure rule in DTD element? What are its types?
Structure rules deal
with how that data may be organized. There are two types of structure rules.
·
element
only rule.
·
mixed
rule.
13. Write about the
“element only” rule and “mixed”rule.
The
“element only” rule specifies that only elements may appear as children of the
current element. The child element sequences should be separated by commas and
listed in the order they should appear.
Syntax:
<!ELEMENT elementname (element1,
element2, element3)>
The
“mixed” rule is used to help define elements that may have both character data
(#PCDATA) and child elements in the data they contain. A list of options or a
sequential list will be enclosed by parentheses. Options will be separated by
the pipe symbol (|), whereas sequential lists will be separated by commas.
Syntax:
<!ELEMENT elementname (#PCDATA |
childelement1 | childelement2)*>
The asterisk symbol
(*) is added here to indicate that each of the items within the parentheses may
appear zero or more times.
14. What is the role
of DTD entities?
Entities
in DTDs are storage units. An entity’s content could be well-formed XML, normal
text, binary data, a database record, and so on. The main purpose of an entity
is to hold content, and there is virtually no limit on the type of content an
entity can hold.
15. What is the
difference between parameter entity and internal entity?
Parameter entity is very similar to the internal
entity. The main difference between an internal entity and a parameter entity
is that a parameter entity may only be referenced inside the DTD. Parameter
entities are in effect entities specifically for DTDs.
16. What is meant by anonymous Type Declarations?
The <element>
contains a <simpleType> element without a name attribute specified. This
is known as an “anonymous” type definition.
17.
What are the three elements uses to add annotations?
The XML
Schema Definition Language defines three new elements to add annotations to an
XML schema. They are:
•
<annotation>
•
<appInfo>
•
<documentation>
18.
Define X-Path.
The XML Path Language
(XPath) is a standard for creating expressions that can be used to find
specific pieces of information within an XML document.
19.
What is meant by forward axis predicate and reveres axis predicate?
A forward axis
predicate contains the current context node and nodes that follow the context
node. A reverse axis predicate contains the current context node and nodes that
precede the context node.
20.
What are the node tests available in XML Pointer?
XPointer provides two
important node tests. They are:
• point()
• range()