Welcome to my blog

Showing posts with label IT6801 Unit I notes. Show all posts
Showing posts with label IT6801 Unit I notes. Show all posts

Sunday, 10 July 2016

IT6801 SOA Notes



 

 IT6801 - Service Oriented Architecture - SOA- UNIT I

Regulation2013, 2013 regulation SOA notes DTD, IT6801, Nmespaces, Notes, RSOA, SOA, UNIT I, VII Sem CSE notes, XML.

SOA notes, Regulation 2013, Document Type Declaration, Well formed and valid documents, Namespaces, X-Files, SOA, UNIT I, IT6801 SOA, IT6801 

Here IT6801 Service Oriented Architecture e-books are posted and students can download the notes and e-books and make use of it. Anna university 7th semester IT6801 Service Oriented Architecture lecture notes, IT6801 SOA notes and reference books are given below.

 

INTRODUCTION TO XML

XML document structure – Well formed and valid documents – Namespaces – DTD – XML Schema – X-Files.

1.1 XML DOCUMENT STRUCTURE
An XML document consists of a number of discrete components or sections. Although not all the sections of an XML document may be necessary, their use and inclusion helps to make for a well-structured XML document that can easily be transported between systems and devices.
The major portions of an XML document include the following:
• The XML declaration
• The Document Type Declaration
• The element data
• The attribute data
• The character data or XML content
1.1.1 XML Declaration
The first part of an XML document is the declaration. It is a definite way of stating exactly what the document contains. The XML declaration is a processing instruction of the form <?xml .....?>. In addition, the XML declaration indicates the presence of external markup declarations and character encoding. Because a number of document formats use markup similar to XML, the declaration is useful in establishing the document as being compliant with a specific version of XML without any doubt or ambiguity. In general, every XML document should use an XML declaration. The XML declaration consists of a number of components.
The standalone document declaration defines whether an external DTD will be processed as part of the XML document. When standalone is set to “yes”, only internal DTDs will be allowed. When it is set to “no”, an external DTD is required and an internal DTD becomes an optional feature.

Valid XML Declarations
The first declaration defines a well-formed XML document, whereas the second defines a well-formed and valid XML document. The third declaration shows a more complete definition that states a typical use-case for XML.

1.1.2 Document Type Declaration
The Document Type Declaration (DOCTYPE) gives a name to the XML content and provides a means to guarantee the document’s validity, either by including or specifying a link to a Document Type Definition (DTD). Valid XML documents must declare the document type to which they comply, whereas well-formed XML documents can include the DOCTYPE to simplify the task of the various tools that will be manipulating the XML document.
A Document Type Declaration names the document type and identifies the internal content by specifying the root element. A DOCTYPE can identify the constraints on the validity of the document by making a reference to an external DTD subset and/or include the DTD internally within the document by means of an internal DTD subset.

General Forms of the Document Type Declarations


 
<!DOCTYPE NAME SYSTEM “file”>
<!DOCTYPE NAME [ ]>
<!DOCTYPE NAME SYSTEM “file” [ ]>

The DOCTYPE is referring to a document that only allows use of an externally defined DTD subset. The second declaration only allows an internally defined subset within the document. The final listing provides a place for inclusion of an internally defined DTD subset between the square brackets while also making use of an external subset. In the preceding listing, the keyword NAME should be replaced with the actual root element contained in the document, and the “file” keyword should be replaced with a path to a valid DTD. In the case of our shirt example, the DOCTYPE is
<!DOCTYPE shirt SYSTEM “shirt.dtd”>
because the first tag in the document will be the <shirt> element and our DTD is saved to a file named shirt.dtd, which saved in the same path as the XML document.
The only real difference between internally and externally defined DTD subsets is that the DTD content itself is contained within the square brackets, in the case of internal subsets, whereas external subsets save this content to a file for reference, usually with a .dtd extension. The actual components of the Document Type Declaration are listed in Table 5.1.
Table 5.1Components of the Document Type Declaration
Component
Description
The start of the XML tag (in this case, the beginning of the Document Type Declaration).
!DOCTYPE
The beginning of the Document Type Declaration.
NAME
Specifies the name of the document type being defined.
This must comply with XML naming rules.
SYSTEM
Specifies that the following system identifier will be read
and processed.
“file”
Specifies the name of the file to be processed by the system.
[
Starts an internal DTD subset
]
Ends the internal DTD subset
The end of the XML tag (in this case, the end of the
Document Type Declaration)

1.1.3 Markup and Content
In addition to the XML declaration and the Document Type Declaration, XML documents are composed of markup and content. In general, six kinds of markup can occur in an XML document:
§  Elements
§  entity references
§  comments
§  processing instructions
§  marked sections and
§  Document Type Declarations

1.1.4 Elements
Within an XML document, elements are the most common form of markup. XML elements are either a matched pair of XML tags or single XML tags that are “self-closing.” Matching XML tags consist of markup tags that contain the same content, except that the ending tag is prefixed with a forward slash. For example, our shirt element begins with <shirt> and ends with </shirt>. When elements do not come in pairs, the element name is suffixed by the forward slash. For example, if we were merely making a statement that a shirt existed, we may use <on_sale/>. In this case, there would be no other matching element of the same name used in a different manner. These “unmatched” elements are known as empty elements. The trailing “/>” in the modified syntax indicates to a program processing the XML document that the element is empty and no matching end tag should be sought.

Elements can be arbitrarily nested within other elements ad infinitum. In essence, XML is a hierarchical tree. This means that XML elements exist within other elements and can branch off with various children nodes. Although these elements may be restricted by DTDs or schema, the nature of XML is to allow for the growth of these elements in a manner that’s as “wide” or “deep” as possible. This means that a single XML element can contain any number of child elements, and the depth of the XML tree can consist of any number of nodes.

In particular, no XML element names are reserved because namespaces can be used to avoid inadvertent conflicts. Although punctuation marks (other than the colon) can be used within an XML element name, you should avoid the hyphen (-) and period (.) characters in element names because some software applications might confuse them for arithmetic or object operations. Element names should be descriptive and not confusing. Also, some devices with constrained memory capabilities may not work well with overly long XML tag names. In any case, long names are an annoyance to developers, systems, and users.

1.1.5 Attributes
Within elements, additional information can be communicated to XML processors that modifies the nature of the encapsulated content. Attributes are name/value pairs contained within the start element that can specify text strings that modify the context of the element.
Attribute Examples
<price currency=”USD”>…</price>
<on_sale start_date=”10-15-2001”/>
Attributes can be required, optional, or contain a fixed value. Required or optional attributes can either contain freeform text or contain one of a set list of enumerated values. Fixed attributes, if present, must contain a specific value. Attributes can specify a default value that is applied if the attribute is optional but not present.

1.1.6 Entity References
The role of the XML entity is to introduce special characters or make use of content that is constantly repeated without having to enter it multiple times. Entities provide a means to indicate to XML-processing applications that a special text string is to follow that will be replaced with a different literal value.

Each entity has a unique name that is defined as part of an entity declaration in a DTD or XML Schema. Entities are used by simply referring to them by name. Entity references are delimited by an ampersand at the beginning and a semicolon at the ending. The content contained between the delimiters is the entity that will be replaced. For example, the &lt; entity inserts the less-than sign (<) into a document. Elements can be encoded so they aren’t processed or replaced by their entity equivalents in order to be used for display or encoding within other element values. For example, the string <element> can be encoded in an XML document as &lt; element&gt;, and it therefore will not be processed.

Sample Entity References
<description>The following says that 8 is greater than 5</description>
<equation>4 &gt; 5</equation>
<prescription>The Rx prescription symbol is &#8478;
which is the same as &#x211E;</prescription>

Entities can also be used to refer to often repeated or varying text as well as to include the content of external files. There are internal and external entities, and they both can be general or parameter entities. Internal entities are defined and used within the context of a document, whereas external entities are defined in a source that is accessible via a URI. Internal entities are largely simple string replacements, whereas external entities can consist of entire XML documents or non-XML text, such as binary files. Parameter entities are entities that are declared and used within the context of a DTD or schema. They allow users to create replacement text that can be used multiple times to modularize the creation of valid documents. Parameter entities can be either internal or external, but they cannot refer to non- XML data because you can’t have a parameter entity with a notation.

1.1.7 Comments
One of the key benefits of XML is that humans can read it. Comments are quite simple to include in a document. The character sequence <!-- begins a comment and --> ends the comment. Between these two delimiters, any text at all can be written, including valid XML markup. The only restriction is that the comment delimiters cannot be used; neither can the literal string --. Comments can be placed anywhere in a document and are not considered to be part of the textual content of an XML document. As a result, XML processors are not required to pass comments along to an application.

A Sample Comment
<!-- The below element talks about an Elephant I once owned... -->
<animal>Elephant</animal>

1.1.8 Processing Instructions
Processing instructions (PIs) perform a similar function as comments in that they are not a textual part of an XML document but provide information to applications as to how the content should be processed. Unlike comments, XML processors are required to pass along PIs. Processing instructions have the following form:
<?instruction options?>
The instruction name, called the PI target, is a special identifier that the processing application is intended to understand. PI names can be formally declared as notations (a structure for sending such information). The only restriction is that PI names may not start with xml, which is reserved for the core XML standards.
Example of a Processing Instruction
<?send-message “process complete”?>
1.1.9 Marked CDATA Sections
Some documents will contain a large number of characters and text that an XML processor should ignore and pass to an application. These are known as character data (or CDATA) sections. Within an XML document, a CDATA section instructs the parser to ignore all markup characters except the end of the CDATA markup instruction. This allows for a section of XML code to be “escaped” so that it doesn’t inadvertently disrupt XML processing.
CDATA sections follow this general form:
<![CDATA[content]]>

1.1.10 Document Type Definitions
Document Type Definitions (DTDs) provide a means for defining what XML markup can occur in an XML document. Basically, the DTD provides a mechanism to guarantee that a given XML document complies with a well-defined set of rules for document structure and content. DTDs and the more recent XML Schema are the means for defining the validity constraints on XML documents.

1.1.11 XML Content
The value of XML is greatly enhanced by the presence of content within the elements. The content between XML elements is where most of the value lies in an XML document. When a DTD or XML Schema is used, users can’t change these portions of the document. Therefore, the informational content that the metadata describes is precisely where the variable data resides.
In fact, XML content can consist of any data at all, including binary data, as long as it doesn’t violate rules that would confuse the content with valid XML metadata instructions. XML content can contain any characters, including any valid Unicode and international characters. The content can be as long as necessary and contain hundreds of megabytes of textual information, if required. Of course, the size of the content is an implementation decision.

IT6801 SOA notes, Regulation 2013


1.2 WELL FORMED AND VALID DOCUMENTS
In particular, two specific descriptions can be applied to XML documents to describe the content contained within them. XML documents can be well formed, and they can also be valid.

1.2.1 Well-Formed Documents
An XML document is well formed if it follows all the preceding syntax rules of XML. On the other hand, if it includes inappropriate markup or characters that cannot be processed by XML parsers, the document cannot be considered well formed. It goes without saying that an XML document can’t be partially well formed. And, by definition, if a document is not well formed, it is not XML. This means that there is no such thing as an XML document that is not well formed, and XML processors are not required to process these documents.

1.2.2 Valid Documents
A well-formed XML document is considered valid only if it contains a proper Document Type Declaration and if the document obeys the constraints of that declaration. In most cases, the constraints of the declaration will be expressed as a DTD or an XML Schema. Well-formed XML documents are designed for use without any constraints, whereas valid XML documents explicitly require these constraint mechanisms.
Although the creation of well-formed XML is a simple process, the use of valid XML documents can greatly improve the quality of document processes. Valid XML documents allow users to take advantage of content management, business-to-business transactions, enterprise integration, and other processes that require the exchange of constrained XML documents. After all, any document can be well formed, but only specific documents are valid when applied against a constraining DTD or schema.

1.3 NAMESPACES
Namespaces use a colon-delimited prefix to associate external semantics with elements that can be identified via a Universal Resource Identifier (URI). The use of the namespace-identified element then acts as if the element was defined in a local manner.
Namespace Example
<?xml version=”1.0”?>
<shirt:shirt xmlns:shirt=”http://xmlshirts.org/schema”
xmlns:apparel=”http://xmlapparel.org/schema”>
<shirt:model>Zippy Tee</shirt:model>
<apparel:mfgID>KFL233562</apparel:mfgID>
<shirt:description>This is a <b>funky</b> Tee shirt similar to the Floppy Tee shirt
</shirt:description>
</shirt:shirt>

Because XML is an open standard in which XML authors are free to create whatever elements and attributes they wish, it’s inevitable that multiple XML developers will choose the same element and attribute names for their standards. For instance, let’s examine the following sample XML document:
<Customer>
<Name>John Smith</Name>
</Customer>

This sample document contains the root element <Customer>, which contains a child element called <Name>. We can clearly determine that the <Name> element contains the name of the customer referred to by the <Customer> element.
This time, however, the XML document contains details regarding a product, as shown here:
<Product>
<Name>Hot Dog Buns</Name>
</Product>

This document contains a <Product> element as the root element and a <Name> element, which contains the name of the product. The following XML document could be constructed to indicate that a customer has placed an order for a particular product:
<Customer>
<Name>John Smith</Name>
<Order>
<Product>
<Name>Hot Dog Buns</Name>
</Product>
</Order>
</Customer>

The first <Name> element, which appears as a child of the <Customer> element, contains the customer’s name. The second <Name> element, on the other hand, contains the product’s name. By using namespaces, XML parsers can easily tell the difference between the two <Name> elements. Therefore, modifying the preceding XML document to specify the appropriate namespaces turns it into this:

<Customer>
<cust:Name xmlns:cust=”customer-namespace-URI”>John Smith</cust:Name>
<Order>
<Product>
<prod:Name xmlns:prod=”product-namespace-URI”>Hot Dog Buns</prod:Name>
</Product>
</Order>
</Customer>

Now, the XML parsers can easily tell the difference between any validation rules between the customer’s <Name> element and the product’s <Name> element.

1.3.1 Declaring Namespaces
Within an XML document, namespaces can be declared using one of two methods:
§  default declaration
§  explicit declaration

A default namespace declaration specifies a namespace to use for all child elements of the current element that do not have a namespace prefix associated with them. For instance, in the following XML document, a default declaration for the <Customer> element is defined by using the xmlns attribute on the parent element without specifying or attaching a prefix to the namespace:
<Customer xmlns=”http://www.eps-software.com/po”>
<Name>Travis Vandersypen</Name>
<Order>
<Product>
<Name>Hot Dog Buns</Name>
</Product>
</Order>
</Customer>

Sometimes, however, it may be necessary and more readable to explicitly declare an element’s namespace. This is accomplished much the same way in which a default namespace is declared, except a prefix is associated with the xmlns attribute.
Example:
<po:Customer xmlns:po=”http://www.eps-software.com/po”>
<po:Name>Travis Vandersypen</po:Name>
<po:Order>
<po:Product>
<po:Name>Hot Dog Buns</po:Name>
</po:Product>
</po:Order>
</po:Customer>

1.3.2 Identifying the Scope of Namespaces
By default, all child elements within a parent element, unless indicated otherwise by referencing  another namespace, appear within the parent’s namespace. This allows all child elements to “inherit” their parent element’s namespace. However, this “inherited” namespace can be overwritten by specifying a new namespace on a particular child element.
Example:
<Customer xmlns=”http://www.eps-software.com/customer”>
<Name>Travis Vandersypen</Name>
<Order xmlns=”http://www.eps-software.com/order”>
<Product>
<Name>Hot Dog Buns</Name>
</Product>
</Order>
</Customer>
All elements contained within the <Customer> element that do not explicitly qualify a namespace “inherit”, the namespace declared by the <Customer> element. However, the <Order> element also declares a default namespace. Starting at the <Order> element, all unqualified elements within the <Order> element will inherit the namespace declared by the <Order> element.


IT6801 SOA notes, Regulation 2013


1.4 DTD
DTD stands for Document Type Definition. A Document Type Definition allows the XML author to define a set of rules for an XML document to make it valid. An XML document is considered “well formed” if that document is syntactically correct according to the syntax rules of XML 1.0.
The DTD will define the elements required by an XML document, the elements that are optional, the number of times an element should (could) occur, and the order in which elements should be nested. DTD markup also defines the type of data that will occur in an XML element and the attributes that may be associated with those elements.

The hierarchical structure of elements defined in the DTD must be maintained. The values of all attributes will be checked to ensure that they fall within defined guidelines.

In short, every last detail of the XML document from top to bottom will be defined and validated by the DTD. A DTD can ensure that the structure of the XML data does not change from organization to organization (thus rendering the data corrupt and useless).

A DTD can be internal, residing within the body of a single XML document. It can also be external, referenced by the XML document. A single XML document could even have both a portion (or subset) of its DTD that is internal and a portion that is external. a single external DTD can be referenced by many XML documents. Because an external DTD may be referenced by many documents, it is a good repository for global types of definitions (definitions that apply to all documents).
Simple DTD Examples
Example 1 : An Internal DTD
<?xml version=”1.0”?>
<!DOCTYPE message [
<!ELEMENT message (#PCDATA)>
]>
<message>
Let the good times roll!
</message>
The internal DTD is contained within the Document Type Declaration, which begins with <!DOCTYPE and ends with ]>. The Document Type Declaration will appear between the XML declaration and the start of the document itself (the document or root element) and identify that section of the XML document as containing a Document Type Definition. Following the Document Type Declaration (DOCTYPE), the root element of the XML document is defined (in this case, message). The DTD tells us that this document will have a single element, message, that will contain parsed character data (#PCDATA).

Example2: An External DTD
<?xml version=”1.0”?>
<!DOCTYPE message SYSTEM “message.dtd”>
<message>
Let the good times roll!
</message>

Here, the DTD is contained in a separate file, message.dtd. The contents of message.dtd are assumed to be the same as the contents of the DTD in internal DTD example. The keyword SYSTEM in the Document Type Declaration lets us know that the DTD is going to be found in a separate file. A URL could have been used to define the location of the DTD. For example, rather than message.dtd, the Document Type Declaration could have specified something like ../DTD/message.dtd. Validating XML with the Document Type Definition (DTD)

Both of these examples show us a well-formed XML document. Additionally, because both XML documents contain a single element, message, which contains only parsed character data, both adhere to the DTD. Therefore, they are both also valid XML documents.3
Example 3: Document Not Valid According to Defined DTD
<?xml version=”1.0”?>
<!DOCTYPE message SYSTEM “message.dtd”>
<message>
<text>
Let the good times roll!
</text>
</message>

Even though this is a well-formed XML document, it is not valid. When this document is validated against message.dtd, a flag will be raised because message.dtd does not define an element named text.

1.4.1 Structure of a Document Type Definition
The structure of a DTD consists of a
§  Document Type Declaration
§  Elements
§  Attributes
§  entities and
§  several other minor keywords

1.4.2 The Document Type Declaration
In order to reference a DTD from an XML document, a Document Type Declaration must be included in the XML document. There may be one Document Type Declaration per XML document. The syntax is as follows:
<!DOCTYPE rootelement SYSTEM | PUBLIC DTDlocation [ internalDTDelements ] >
§  The exclamation mark (!) is used to signify the beginning of the declaration.
§  DOCTYPE is the keyword used to denote this as a Document Type Definition.
§  rootelement is the name of the root element or document element of the XML document.
§  SYSTEM and PUBLIC are keywords used to designate that the DTD is contained in an external document. Although the use of these keywords is optional, to reference an external DTD you would have to use one or the other. The SYSTEM keyword is used in tandem with a URL to locate the DTD. The PUBLIC keyword specifies some public location that will usually be some application-specific resource reference.
§  internalDTDelements are internal DTD declarations. These declarations will always be placed within opening ([) and closing (]) brackets.
  • It is possible for a Document Type Declaration to contain both an external DTD subset and an internal DTD subset. It is possible for a Document Type Declaration to contain both an external DTD subset and an internal DTD subset. In other words, if both the external and internal DTDs define a rule for the same element, the rule of the internal element will be the one used. Consider the Document Type Declaration fragment shown in Example.4.
Example: 4  Internal and External DTDs
<!DOCTYPE rootelement SYSTEM “http://www.myserver.com/mydtd.dtd”
[
<!ELEMENT element1 (element2,element3)>
<!ELEMENT element2 (#PCDATA)>
<!ELEMENT element3 (#PCDATA)>
]>

In the above example, the Document Type Declaration references an external DTD. There is also an internal subset of the DTD contained in the Document Type Declaration. Any rules in the external DTD that apply to elements defined in the internal DTD will be overridden by the rules of the internal DTD.

DTD Elements
All elements in a valid XML document are defined with an element declaration in the DTD. An element declaration defines the name and all allowed contents of an element. Element names must start with a letter or an underscore and may contain any combination of letters, numbers, underscores, dashes, and periods. Colons should not be used in element names because they are normally used to reference namespaces. Each element in the DTD should be defined with the following syntax:
<!ELEMENT elementname rule >
• ELEMENT is the tag name that specifies that this is an element definition.
• element name is the name of the element.
• rule is the definition to which the element’s data content must conform.

In a DTD, the elements are processed from the top down. A validating XML parser will expect the order of the appearance of elements in the XML document to match the order of elements defined in the DTD. Elements in a DTD should appear in the order you want them to appear in an XML document. If the elements in an XML document do not match the order of the DTD, the XML document will not be considered valid by a validating parser.

Example 5: contactlist.dtd
<!ELEMENT contactlist (fullname, address, phone, email) >
<!ELEMENT fullname (#PCDATA)>
<!ELEMENT address (addressline1, addressline2)>
<!ELEMENT addressline1 (#PCDATA)>
<!ELEMENT addressline2 (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT email (#PCDATA)>

The first element in the DTD, contactlist, is the document element. The rule for this element is that it contains (is the parent element of) the fullname, address, phone, and email elements. The rule for the fullname element, the phone element, and the email element is that each contains parsed character data (#PCDATA). This means that the elements will contain marked-up character data that the XML parser will interpret. The address element has two child elements: addressline1 and addressline2. These two children elements contain #PCDATA. This DTD defines an XML structure that is nested two levels deep. The root element, contactlist, has four child elements. The address element is, in turn, parent to two more elements. In order for an XML document that references this DTD to be valid, it must be laid out in the same order, and it must have the same depth of nesting.

DTD Element Rules
All data contained in an element must follow a set rule. As stated previously, the rule is the definition to which the element’s data content must conform. There are two basic types of rules that elements must fall into. The first type of rule deals with content. The second type of rule deals with structure. First, we will look at element rules that deal with content.

Content Rules
The content rules for elements deal with the actual data that defined elements may contain. These rules include the ANY rule, the EMPTY rule, and the #PCDATA rule.


The ANY Rule
This rule states that the element may contain other elements and/or normal character data (just about anything as long as it is well formed). An element using the ANY rule would appear as follows:
<!ELEMENT elementname ANY>
A DTD that defines all its elements using the ANY rule will always be valid as long as the XML is well formed. This really precludes any effective validation.

XML Fragments Using the ANY Rule
<elementname>
This is valid content
</elementname>

<elementname>
<anotherelement>
This is more valid content
</anotherelement>
This is still valid content
</elementname>

<elementname>
<emptyelement />
<yetanotherelement>
This is still valid content!
</yetanotherelement>
Here is more valid content
</elementname>

The EMPTY Rule
This rule is the exact opposite of the ANY rule. An element that is defined with this rule will contain no data. However, an element with the EMPTY rule could still contain attributes (more on attributes in a bit). The following element is an example of the EMPTY rule:
<!ELEMENT elementname EMPTY>
 The #PCDATA Rule
The #PCDATA rule indicates that parsed character data will be contained in the element. Parsed character data is data that may contain normal markup and will be interpreted and parsed by any XML parser accessing the document. The following element demonstrates the #PCDATA rule:
<!ELEMENT elementname (#PCDATA)>
Structure Rules
Whereas the content rules deal with the actual content of the data contained in defined elements, structure rules deal with how that data may be organized. There are two types of structure rules.
§  element only  rule.
§  mixed rule.

The “Element Only” Rule
The “element only” rule specifies that only elements may appear as children of the current element. The child element sequences should be separated by commas and listed in the order they should appear. The following element definition demonstrates the “element only” rule:
<!ELEMENT elementname (element1, element2, element3)>

The “Mixed” Rule
The “mixed” rule is used to help define elements that may have both character data (#PCDATA) and child elements in the data they contain. A list of options or a sequential list will be enclosed by parentheses. Options will be separated by the pipe symbol (|), whereas sequential lists will be separated by commas. The following element is an example of the “mixed” rule:
<!ELEMENT elementname (#PCDATA | childelement1 | childelement2)*>
The pipe symbol is used here to indicate that there is a choice between #PCDATA and each of the child elements. However, the asterisk symbol (*) is added here to indicate that each of the items within the parentheses may appear zero or more times.

Element Symbols
Symbol
Definition
Asterisk (*)
The data will appear zero or more times (0, 1, 2, …). Here’s an example:
<!ELEMENT children (name*)>
In this example, the element children could have zero or more occurrences of the child element name. This type of rule would be useful on a form asking a person about his or
her children. It is possible that the person could have no children or many children.
Comma (,)
Provides separation of elements in a sequence. Here’s an example:
<!ELEMENT address (street, city, state, zip)>
In this example, the element address will have four child elements: street, city, state, and zip. Each of the child elements must appear in the defined order in the XML document.
Parentheses [( )]
The parentheses are used to contain the rule for an element.
Parentheses may also be used to group a sequence, subsequence, or a set of alternatives in a rule. Here’s an example:
<!ELEMENT address (street, city, (state |
province), zip)>
In this example, the parentheses enclose a sequence. Additionally, a subsequence is nested within the sequence by a second set of parentheses. The subsequence indicates that there will be either a state or a province element in that spot in the main sequence.
Pipe (|)
Separates choices in a set of options. Here’s an example:
<!ELEMENT dessert (cake | pie)>
The element dessert will have one child element: either cake or pie.
Plus sign (+)
Signifies that the data must appear one or more times (1, 2, 3, …). Here’s an example:
<!ELEMENT appliances (refrigerator+)>
The appliances element will have one or more refrigerator child elements. This assumes that every household has at least one refrigerator.
Question mark (?)
Data will appear either zero times or one time in the element. Here’s an example:
<!ELEMENT employment (company?)>
The element employment will have either zero occurrences or one occurrence of the child element company.
No symbol
When no symbol is used (other than parentheses), this signifies that the data must appear once in the XML file.
Here’s an example:
<!ELEMENT contact (name)>
The element contact will have one child element: name.

1.4.3 DTD Attributes
XML attributes are name/value pairs that are used as metadata to describe XML elements. XML attributes are very similar to HTML attributes. In HTML, src is an attribute of the img tag, as shown in the following example:
<img src=”images/imagename.gif” width=”10” height=”20”>
Attribute Use in XML
<image src=”images/” width=”10” height=”20”>
imagename.gif
</image>
src, width, and height are presented as attributes of the XML element image. This is very similar to the way that these attributes are used in HTML. The only difference is that the src attribute merely contains the relative path of the image’s directory and not the actual name of the image file.
Attribute Types
Type
Definition
CDATA
Characterdata only. The attribute will contain no markup.
Example: <ATTLIST box height CDATA ”0”>
In this example, an attribute, height, has been defined for the element box. This attribute will contain character data and have a default value of “0”.
ENTITY
The name of an unparsed general entity that is declared in the DTD but refers to some external data (such as an image file). Example: <!ATTLIST img src ENTITY #REQUIRED>
The src attribute is an ENTITY type that refers to some external image file.
ENTITIES
This is the same as the ENTITY type but represents multiple values listed in sequential order, separated by whitespace. Example: <!ATTLIST imgs srcs ENTITIES #REQUIRED>
ID
An attribute that uniquely identifies the element. The value for this type of attribute must be unique within the XML document. Each element may only have a single ID attribute, and the value of the ID attribute must be a valid XML name, meaning that it may not start with a numeric digit (which precludes the use of a simple numbering system for IDs). Example: <!ATTLIST cog serial ID #REQUIRED>
IDREF
This is the value of an ID attribute of another element in the document. It’s used to establish a relationship with other tags when there is not necessarily a parent/child relationship. Example: <!ATTLIST person cousin IDREF #IMPLIED>
IDREFS
This is the same as IDREF; however, it represents multiple values listed in sequential order, separated by whitespace. Example: <!ATTLIST person cousins IDREFS #IMPLIED>
NMTOKEN
Restricts the value of the attribute to a valid XML name. Example: <!ATTLIST address country NMTOKEN “usa”>
NMTOKENS
This is the same as NMTOKENS; however, it represents multiple values listed in sequential order, separated by whitespace.  Example: <!ATTLIST region states NMTOKENS “KS OK” >
NOTATION
This type refers to the name of a notation declared in the DTD (more on notations later). It is used to identify the format of non-XML data. An example would be using the NOTATION type to refer to an external application that will interact with the document.  Example: <!ATTLIST music play NOTATION “mplayer2.exe “>
Enumerated
This type is not an actual keyword the way the other types are. It is actually a listing of possible values for the attribute separated by pipe symbols (|).Example: <!ATTLIST college grad (1|0) “1”>

1.4.4 DTD Entities
Entities in DTDs are storage units. They can also be considered placeholders. Entities are special markups that contain content for insertion into the XML document. An entity’s content could be well-formed XML, normal text, binary data, a database record, and so on. The main purpose of an entity is to hold content, and there is virtually no limit on the type of content an entity can hold.
The general syntax of an entity is as follows:
<!ENTITY entityname [SYSTEM | PUBLIC] entitycontent>
§  ENTITY is the tag name that specifies that this definition will be for an entity.
§  entityname is the name by which the entity will be referred in the XML document.
§  entitycontent is the actual contents of the entity—the data for which the entity is serving as a placeholder.
§  SYSTEM and PUBLIC are optional keywords. Either one can be added to the definition of an entity to indicate that the entity refers to external content.

Entities may either point to internal data or external data. Internal entities represent data that is contained completely within the DTD. External entities point to content in another location via a URL. The type of data to which an external entity can refer is virtually unlimited. An entity is referenced in an XML document by inserting the name of the entity prefixed by & and suffixed by ;. When referenced in this manner, the content of the entity will be placed into the XML document when the document is parsed and validated.
Example:
<?xml version=”1.0”?>
<!DOCTYPE library [
<!ENTITY cpy “Copyright 2000”>
<!ELEMENT library (book+)>
<!ELEMENT book (title,author,copyright)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
]>
<library>
<book>
<title>How to Win Friends</title>
<author>Joe Charisma</author>
<copyright>&cpy;</copyright>
</book>
<book>
<title>Make Money Fast</title>
<author>Jimmy QuickBuck</author>
<copyright>&cpy;</copyright>
</book>
</library>

Predefined Entities
There are five predefined entities, as shown in Table below. These entities do not have to be declared in the DTD. When an XML parser encounters these entities (unless they are contained in a CDATA section), they will automatically be replaced with the content they represent.

Entity
Content
&amp;
&
&lt;
&gt;
&quot;
&apos;

Example:
<icecream>
<flavor>Cherry Garcia</flavor>
<vendor>Ben &amp; Jerry’s</vendor>
</icecream>
In this example, the ampersand in “Ben & Jerry’s” is replaced with the predefined entity for an ampersand (&amp;).

External Entities
External entities are used to reference external content. XML is incredibly flexible. External entities can contain references to almost any type of data—even other XML documents. One well-formed XML document can contain another well-formed XML document through the use of an external entity reference.
Example:
<?xml version=”1.0”?>
<!DOCTYPE employees [
<!ENTITY bob SYSTEM “http://srvr/emps/bob.xml”>
<!ENTITY nancy SYSTEM “http://srvr/emps/nancy.xml”>
<!ELEMENT employees (clerk)>
<!ELEMENT clerk (#PCDATA)>
]>
<employees>
<clerk>&bob;</clerk>
<clerk>&nancy;</clerk>
</employees>

Non-Text External Entities and Notations
Some external entities will contain non-text data, such as an image file.
<!ENTITY myimage SYSTEM “myimage.gif” NDATA gif>

The NDATA keyword is used to alert the parser that the entity content should be sent unparsed to the output document.
The final part of the declaration, gif, is a reference to a notation. A notation is a special declaration that identifies the format of non-text external data so that the XML application will know how handle the data. Any time an external reference to non-text data is used, a notation identifying the data must be included and referenced. Notations are declared in the body of the DTD and have the following syntax:
<!NOTATION notationname [SYSTEM | PUBLIC ] dataformat>

Here, notation name is the name by which the notation will be referred in the XML document. SYSTEM is a keyword that is added to the definition of the notation to indicate that the format of external data is being defined. You could also use the keyword PUBLIC here instead of SYSTEM. However, using PUBLIC requires you to provide a URL to the data format definition. dataformat is a reference to a MIME type, ISO standard, or some other location that can provide a definition of the data being referenced.
Example:
<!NOTATION gif SYSTEM “image/gif” >
<!ENTITY employeephoto SYSTEM “images/employees/MichaelQ.gif” NDATA gif >
<!ELEMENT employee (name, sex, title, years) >
<!ATTLIST employee pic ENTITY #IMPLIED >
<employee pic=”employeephoto”>
</employee>

Parameter Entities
It is very similar to the internal entity. The main difference between an internal entity and a parameter entity is that a parameter entity may only be referenced inside the DTD. Parameter entities are in effect entities specifically for DTDs.
Parameter entities can be useful when you have to use a lot of repetitive or lengthy text in a DTD. Use the following syntax for parameter entities:
<!ENTITY % entityname entitycontent>

The syntax for a parameter entity is almost identical to the syntax for a normal, internal entity. However, notice that in the syntax, after the declaration, there is a space, a percent sign, and another space before entityname. This alerts the XML parser that this is a parameter entity and will be used only in the DTD. These types of entities, when referenced, should begin with % and end with ;.
Example:
<!ENTITY % pc “(#PCDATA)”>
<!ELEMENT name %pc;>
<!ELEMENT age %pc;>
<!ELEMENT weight %pc;>

1.4.5 More DTD Directives
These are keywords namely, INCLUDE and IGNORE, and they do just what their names suggest—they indicate pieces of markup that should either be included in the validation process or ignored.
The IGNORE Keyword
For using normal command, we can use IGNORE directive.
Example:
<![ IGNORE
This is the part of the DTD ignored
]]>


The INCLUDE Keyword
The INCLUDE directive marks declarations to be included in the document. It is very similar to the syntax for the IGNORE directive.
Example:
<![ INCLUDE
This is the part of the DTD included
]]>

The INCLUDE directive follows the same basic rules as the IGNORE directive. It may enclose entire declarations but not pieces of declarations. The INCLUDE directive can be useful when you’re in the process of developing a new DTD or adding to an existing DTD.

Comments Within a DTD
Comments can also be added to DTDs. Comments within a DTD are just like comments in HTML and take the following syntax:
<!-- Everything between the opening tag and closing tag is a comment -->


IT6801 SOA notes, Regulation 2013


1.5 XML SCHEMA
The XML Schema Definition Language solves a number of problems posed with Document Type Definitions.
1.5.1 Creating XML Schemas
One of the first things that comes to mind for most people when authoring an XML schema is the level of complexity that accompanies it. Table below shows a complete list of every element the XML Schema Definition Language supports.
Element Name
Description
All
Indicates that the contained elements may
appear in any order within a parent element.
Any
Indicates that any element within the specified namespace may appear within the parent element’s definition. If a type is not specifically declared, this is the default.
anyAttribute
Indicates that any attribute within the specified namespace may appear within the parent element’s definition.
Annotation
Indicates an annotation to the schema.
Appinfo
Indicates information that can be used by an
application.
Attribute
Declares an occurrence of an attribute.
attributeGroup
Defines a group of attributes that can be
included within a parent element.
Choice
Indicates that only one contained element or
attribute may appear within a parent element.
complexContent
Defines restrictions and/or extensions to a
complexType.
complexType
Defines a complex element’s construction
Documentation
Indicates information to be read by an individual.
Element
Declares an occurrence of an element.
Extension
Extends the contents of an element
Field
Indicates a constraint for an element using XPath.
Group
Logically groups a set of elements to be included together within another element definition.
Import
Identifies a namespace whose schema elements and attributes can be referenced within the current schema.
Include
Indicates that the specified schema should be included in the target namespace.
Key
Indicates that an attribute or element value is a key within the specified scope.
Keyref
Indicates that an attribute or element value should correspond with those of the specified key or unique element.
List
Defines a simpleType element as a list of values of a specified data type.
Notation
Contains a notation definition.
Redefine
Indicates that simple and complex types, as well as groups and attribute groups from an external schema, can be redefined.
Restriction
Defines a constraint for the specified element.
Schema
Contains the schema definition.
Selector
Specifies an XPath expression that selects a set of elements for an identity constraint.
Sequence
Indicates that the elements within the specified group must appear in the exact  order they appear within the schema.
simpleContent
Defines restrictions and/or extensions of a simpleType element.
simpleType
Defines a simple element type.
Union
Defines a simpleType element as a collection
of values from specified simple data types.
Unique
Indicates that an attribute or element value must be unique within the specified scope.

Authoring an XML schema consists of declaring elements and attributes as well as the “properties” of those elements and attributes.

1.5.2 Declaring Attributes
Attributes in an XML document are contained by elements. To indicate that a complex element has an attribute, use the <attribute> element of the XML Schema Definition Language.
Example : Purchase order schema
<xsd:complexType name=”ProductType”>
<xsd:attribute name=”Name” type=”xsd:string”/>
<xsd:attribute name=”Id” type=”xsd:positiveInteger”/>
<xsd:attribute name=”Price”>
<xsd:simpleType>
<xsd:restriction base=”xsd:decimal”>
<xsd:fractionDigits value=”2”/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name=”Quantity” type=”xsd:positiveInteger”/>
</xsd:complexType>
When declaring an attribute, its type should be specified.
Table 1.5.a Some simple XML data types
Data Type
Description
anyURI
Represents a Uniform Resource Identifier
(URI).
base64Binary
Represents Base-64-encoded binary data.
Boolean
Represents Boolean values (True and
False).
Byte
Represents an integer ranging from -128
to 127. This type is derived from short.
Date
Represents a date
dateTime
Represents a specific time on a specific
date.
Decimal
Represents a variable-precision number.
Double
Represents a double-precision, 64-bit,
floating-point number.
Duration
Represents a duration of time.
ENTITIES
Represents a set of values of the ENTITY
type.
ENTITY
Represents the ENTITY attribute type in
XML 1.0. This type is derived from
NCName.
Float
Represents a single-precision, 32-bit,
floating-point number.
gDay
Represents a recurring Gregorian day of
the month.
gMonth
Represents a Gregorian month.
gMonthDay
Represents a recurring Gregorian date
gYear
Represents a Gregorian year.
gYearMonth
Represents a specific Gregorian month in
a specific Gregorian year.
hexBinary
Represents hex-encoded binary data

Some primitive data types are:
• anyURI
• base64Binary
• Boolean
• date
• dateTime
• decimal
• double
• duration

1.5.3 Declaring Elements
Elements within an XML schema can be declared using the <element> element from the XML Schema Definition Language.
Example:
<xsd:element name=’PurchaseOrder’ type=’PurchaseOrderType’/>
<xsd:complexType name=”PurchaseOrderType”>
<xsd:all>
<xsd:element name=”ShippingInformation” type=”InfoType”
minOccurs=”1” maxOccurs=”1”/>
<xsd:element name=”BillingInformation” type=”InfoType”
minOccurs=”1” maxOccurs=”1”/>
<xsd:element name=”Order” type=”OrderType”
minOccurs=”1” maxOccurs=”1”/>
</xsd:all>
<xsd:attribute name=”Tax”>
<xsd:simpleType>
<xsd:restriction base=”xsd:decimal”>
<xsd:fractionDigits value=”2”/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name=”Total”>
<xsd:simpleType>
<xsd:restriction base=”xsd:decimal”>
<xsd:fractionDigits value=”2”/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>

An element’s type can be defined with either a <complexType> element, a <simpleType> element, a <complexContent> element, or a <simpleContent> element. The validation requirements for the document will influence the choice for an element’s type.

The basic construction of an element declaration using the <element> element within the XML Schema Definition Language is as follows:
<element name=”” [type=””] [abstract=””] [block=””]
  [default=””] [final=””] [fixed=””] [minOccurs=””]
[maxOccurs=””] [nillable=””] [ref=””]
[substitutionGroup=””]/>

The block attribute prevents any element with the specified derivation type from being used in place of the element. The block attribute may contain any of the following values:
§  #all
§  extension
§  restriction
§  substitution

If the value #all is specified within the block attribute, no elements derived from this element declaration may appear in place of this element. A value of extension prevents any element whose definition has been derived by extension from appearing in place on this element. If a value of restriction is assigned, an element derived by restriction from this element declaration is prevented from appearing in place of this element. Finally, a value of substitution indicates that an element derived through substitution cannot be used in place of this element.

The default attribute may only be specified for an element based on a simpleType or whose content is text only. The minOccurs and maxOccurs attributes specify the minimum and maximum number of times this element may appear within a valid XML document.

The nillable attribute indicates whether an explicit null value can be assigned to the element. If this particular attribute is omitted, it is assumed to be false. If this attribute has a value of true, the nil attribute for the element will be true. The fixed attribute specifies that the element has a constant, predetermined value. This attribute only applies to those elements whose type definitions are based on simpleType or whose content is text only.

1.5.4 Declaring Complex Elements
These elements are declared as <complexType> element.
Example:
<xsd:complexType name=”PurchaseOrderType”>
<xsd:all>
<xsd:element name=”ShippingInformation” type=”InfoType”
minOccurs=”1” maxOccurs=”1”/>
<xsd:element name=”BillingInformation” type=”InfoType”
minOccurs=”1” maxOccurs=”1”/>
<xsd:element name=”Order” type=”OrderType”
minOccurs=”1” maxOccurs=”1”/>
</xsd:all>
<xsd:attribute name=”Tax”>
<xsd:simpleType>
<xsd:restriction base=”xsd:decimal”>
<xsd:fractionDigits value=”2”/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name=”Total”>
<xsd:simpleType>
<xsd:restriction base=”xsd:decimal”>
<xsd:fractionDigits value=”2”/>
</xsd:restriction> </xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
The basic syntax for the <complexType> element is as follows:
<xsd:complexType name=’’ [abstract=’’] [base=’’] [block=’’]
[final=’’] [mixed=’’]/>
The abstract attribute indicates whether an element may define its content directly from this type definition or it must define its content from a type derived from this type definition. If this attribute is true, an element must define its content from a derived type definition. If this attribute is omitted or its value is false, an element may define its content directly based on this type definition.
The base attribute specifies the data type for the element. The block attribute indicates what types of derivation are prevented for this element definition.  This attribute can contain any of the following values:
§  #all
§  extension
§  restriction
A value of #all prevents all complex types derived from this type definition from being used in place of this type definition. A value of extension prevents complex type definitions derived through extension from being used in place of this type definition.
Assigning a value of restriction prevents a complex type definition derived through restriction from being used in place of this type definition. If this attribute is omitted, any type definition derived from this type definition may be used in place of this type definition.
The mixed attribute indicates whether character data is permitted to appear between the child elements of this type definition. If this attribute is false or is omitted, no character may appear. If the type definition contains a simpleContent type element, this value must be false. If the complexContent element appears as a child element, the mixed attribute on the complexContent element can override the value specified in the current type definition.
A <complexType> element in the XML Schema Definition Language may contain only one of the following elements:
§  all
§  choice
§  complexContent
§  group
§  sequence
§  simpleContent

1.5.5 Declaring Simple Types
These element type definitions support an element based on the simple XML data types listed in Table 6.5.a or any simpleType declaration within the current schema.
Example:
<xsd:simpleType name=”PaymentMethodType”>
<xsd:restriction base=”xsd:string”>
<xsd:enumeration value=”Check”/>
<xsd:enumeration value=”Cash”/>
<xsd:enumeration value=”Credit Card”/>
<xsd:enumeration value=”Debit Card”/>
<xsd:enumeration value=”Other”/>
</xsd:restriction>
</xsd:simpleType>

The basic syntax for defining a simpleType element definition is as follows:
<xsd:simpleType name=’’>
<xsd:restriction base=’’/>
</xsd:simpleType>

The base attribute type may contain any simple XML data type listed in Table 6.5.a or any simpleType declared within the schema. Specifying the value of this attribute determines the type of data it may contain. A simpleType may only contain a value; not other elements or attributes.

Two other methods are available to an XML schema author to “refine” a simple type definition: <list> and <union>. The <list> element allows an element or attribute based on the type definition to contain a list of values of a specified simple data type. The <union> element allows you to combine two or more simple type definitions to create a collection of values.

1.5.6 Anonymous Type Declarations
Anonymous Type declarations are used in XML to create a separate type definition for an element or attribute.
<xsd:complexType name=”InfoType”>
<xsd:sequence>
<xsd:element name=”Name” minOccurs=”1” maxOccurs=”1”>
<xsd:simpleType>
<xsd:restriction base=”xsd:string”/>
</xsd:simpleType>
</xsd:element>
<xsd:element name=”Address” type=”AddressType” minOccurs=”1”
maxOccurs=”1”/
<xsd:choice minOccurs=”1” maxOccurs=”1”>
<xsd:group ref=”BillingInfoGroup”/>
<xsd:group ref=”ShippingInfoGroup”/>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>

This section defines the type definition for InfoType. If you look closely, you’ll see the declaration of a <Name> element that does not have a type attribute specified. Instead, the <element> element, itself, contains a <simpleType> element without a name attribute specified. This is known as an “anonymous” type definition.

1.5.7 Annotating Schemas
The XML Schema Definition Language defines three new elements to add annotations to an XML schema:
• <annotation>
• <appInfo>
• <documentation>

The <annotation> element contains the <appInfo> and <documentation> elements. In other words, you cannot use the <appInfo> and <documentation> elements by themselves— they must be contained within an <annotation> element.
<xsd:annotation>
<xsd:documentation>
Purchase order schema for an online grocery store.
</xsd:documentation>
</xsd:annotation>

In the preceding example, the <annotation> and <documentation> elements help to identify the purpose of this particular XML schema. In the preceding example, the <annotation> and <documentation> elements help to identify the purpose of this particular XML schema.

For the <documentation> element, the information it contains is meant to be read by users, whereas the information contained within an <appInfo> element is meant to be read and utilized by applications.

1.5.8 Model Groups
A model group, at least in terms of a schema definition, is a logically grouped set of elements. A model group within the XML Schema Definition Language consists of a “compositor” and a list of “particles” (or element declarations). A model group can be
constructed using one of the following XML Schema Definition elements:
§  <all>
§  <choice>
§  <sequence>

Here’s the basic syntax for the <group> element:
<group name=”” [maxOccurs=””] [minOccurs=””] [ref=””]>
.
.
.
</group>
By default, the maxOccurs and minOccurs attributes are set to 1. The ref attribute is used after you have defined the <group> element and you wish to reference it, as the following example shows:
<xsd:group name=”exampleGroup”>
<xsd:all>
<xsd:element name=”Element1” type=”xsd:string”/>
<xsd:element name=”Element2” type=”xsd:string”/>
<xsd:element name=”Element3” type=”xsd:string”/>
</xsd:all>
</xsd:group>
<xsd:element name=”ParentElement”>
<xsd:complexType>
<xsd:group ref=”exampleGroup”/>
</xsd:complexType>
</xsd:element>

All Groups
The <all> element indicates that the elements declared within it may appear in any order within the parent element.

Sequences
The <sequence> element in the XML Schema Definition Language requires the elements contained within it to appear in the same order in the parent element.

Attribute Groups
Just as you can logically group a set of elements together using the <group> element within the XML Schema Definition Language, you can create a logical group of attributes to do the same thing. Here’s the basic syntax for the
<attributeGroup> element:
<attributeGroup [name=””] [ref=””]>
<attribute …/>
<attribute …/>
.
.
.
</attributeGroup>

1.5.9 Targeting Namespaces
Namespaces allow us to distinguish element declarations and type definitions of one schema from another. We can assign an intended namespace for an XML schema by using the targetNamespace attribute on the <schema> element. By assigning a target namespace for the schema, we indicate that an XML document whose elements are declared as belonging to the schema’s namespace should be validated against the XML schema.


IT6801 SOA notes, Regulation 2013


1.6 X- Files
1.6.1 X – Path
The XML Path Language (XPath) is a standard for creating expressions that can be used to find specific pieces of information within an XML document. XPath expressions are used by both XSLT (for which XPath provides the core functionality) and XPointer to locate a set of nodes.

XPath expressions have the ability to locate nodes based on the nodes’ type, name, or value or by the relationship of the nodes to other nodes within the XML document. In addition to being able to find nodes based on these criteria, an XPath expression can also return any of the following:
§  A node set
§  A Boolean value
§  A string value
§  A numeric value

1.6.1.1 Operators and Special Characters
XPath expressions are composed using a set of operators and special characters, each with its own meaning. Table 5.6.a lists the various operators and special characters used within the XML Path Language.
Table 5.6.a  Operators and Special Characters for the XML Path Language
Operators and Special Characters
Description
/
Selects the children from the node set on the left side of this character
//
Specifies that the matching node set should be located at any level within the XML  document
.
Specifies the current context should be used
*
A wildcard character that selects all elements or attributes regardless of name
@
Selects an attribute
:
Namespace separator
()
Indicates a grouping within an XPath expression
[expression]
Indicates a filter expression
[n]
Indicates that the node with the specified index should be selected
+
Addition operator
-
Subtraction operator
Div
Division operator
*
Multiplication operator
Mod
Returns the remainder of a division operation

The priority for evaluating XPath expressions is as follows:
1. Grouping
2. Filters
3. Path operations

1.6.1.2 XPath Syntax
The XML Path Language provides a declarative notation, termed a pattern, used to select the desired set of nodes from XML documents. Each pattern describes a set of matching nodes to select from a hierarchical XML document. Each pattern describes a “navigation” path to the desired set of nodes similar to the Uniform Resource Identifier (URI) syntax.

A “location path” is needed to locate the result nodes. These location paths select the resulting node set relative to the current context. A location path is, itself, made up of one or more location steps. Each step is further comprised of three pieces:
• An axis
• Anode test
• A predicate

Syntax for an XPath expression
axis::node test[predicate]

Axes
The axis portion of the location step identifies the hierarchical relationship for the
desired nodes from the current context.

TABLE 1.6.b XPath Axes for a Location Step
Axis
Description
Ancestor
Specifies that the query should locate the ancestors of the current context  node, which includes the parent node, the parent’s parent node, and ultimately the root node.
ancestor-or-self
Indicates that in addition to the  ancestors of the current context node, the context node should also be included in the resulting node set.
Attribute
Specifies that the attributes of the current context node are desired.
Child
Specifies that the immediate children of the current context node are desired.
Descendant
Specifies that in addition to the immediate children of the current context node, the children’s children are also desired.
descendant-or-self
Indicates that in addition to the descendants of the current context node, the current context node is also desired.
Following
Specifies that nodes in the same  document as the current context node that appear after the current context node should be selected.
following-sibling
Specifies that all the following siblings of the current context node should be selected.
Namespace
Specifies that all the nodes within the same namespace as the current context node should be selected.
Parent
Selects the parent of the current context node.
Preceding
Selects the nodes within the document that appear before the current context node.
preceding-sibling
Selects the siblings of the current  context node that appear before the current context node.
Self
Selects the current context node.

Node Tests
The node test portion of a location step indicates the type of node desired for the results. Every axis has a principal node type: If an axis is an element, the principal node type is element; otherwise, it is the type of node the axis can contain. For instance, if the axis is attribute, the principal node type is attribute.

In addition to specifying an actual node name, other node tests are available to select the desired nodes. Here’s a list of these node tests:
• comment()
• node()
• processing-instruction()
• text()

Predicates
The predicate portion of a location step filters a node set on the specified axis to create a new node set. A predicate may consist of a filter condition that is applied to an axis that either directs the condition in a forward or reverse direction. A forward axis predicate contains the current context node and nodes that follow the context node. A reverse axis predicate contains the current context node and nodes that precede the context node.

A predicate within a location step may contain an expression that, when evaluated, results in a Boolean (or logical) value that can be either True or False. XPath predicates contain a Boolean comparison as shown below:

TABLE 1.6.c  Boolean Operators and Their Respective Descriptions
Boolean Operator
Description
> 
Greater than
>=
Greater than or equal to
< 
Less than
<=
Less than or equal to
=
Equal to
!=
Not equal to

XPath Functions
XPath functions are used to evaluate XPath expressions and can be divided into one of four main groups:
• Boolean
• Node set
• Number
• String

1.6.2 XPointer
The XML Pointer Language (XPointer), currently in the candidate recommendation stage of the W3C approval process, builds on the XPath specification. XPointer provides two more important node tests:
• point()
• range()

These two additional node tests correspond to the new functionality added by XPointer. For this new functionality to work correctly, the XPointer specification added the concept of a location within an XML document. Within XPointer, a location can be an XPath node, a point, or a range.

For an XPath expression, the result from a location step is known as a node set; for an XPointer expression, the result is known as a location set. Because an XPointer expression can yield a result consisting of points or ranges, the idea of the node set had to be extended to include these types. Therefore, to prevent confusion, the results of an XPointer expression are referred to location sets.
TABLE 1.6.d Some XPointer Functions That Return Location Sets
Function
Description
id()
Selects all nodes with the specified ID
root()
Selects the root element as the only location in a location set
here()
Selects the current element location in a location set
origin()
Selects the current element location for a node using an out-of-line link

The id() function works exactly the same as the id() function for an XPath expression. The root() function works just like the / character—it indicates the root element of an XML document.
The next two functions, here() and origin(), are interesting functions in their own right. The here() function, as indicated, refers to the current element. Because an XPointer expression can be located in a text node or in an attribute value, this function could be used to refer to the current element rather than simply the current node. The origin() function works much the same as the here() function, except that it refers to the originating element. The key idea here is that the originating element does not need to be located within the same document as the resulting location set.

1.6.2.1 Points
Many times a link from one XML document into another must locate a specific point within the target document. A node point could be considered to be the gap between the child nodes of a container node. Two different types of points can be represented using XPointer points:
• Node points
• Character points

When the origin node is a text node, the index position indicates the number of characters. These location points are referred to as character points. By specifying 0 for the index position in a character point, the point is considered to be immediately before the first character in the text string. For a character point, the point, conceptually, represents the space between the characters of a text string.

1.6.2.2 Ranges
An XPointer range defines just that—a range consisting of a start point and an endpoint. A range will contain the XML between the start point and endpoint but does not necessarily have to consist of neat subtrees of an XML document. A range can extend over multiple branches of an XML document. The only criterion is that the start point and endpoint must be valid.

A range can be specified by using the keyword namely, start-point() and end-point() functions. For instance, the following expression specifies a range beginning at the first character in the <Name> element for Dillon Larsen and ending after the ninth character in the <Name> element for Dillon Larsen:
/People/Person[1]/Name/text()start-point()[position()=0] to
/People/Person[1]/Name/text()start-point()[position()=9]
In this example, two node points are used as the starting and ending points for the range.
TABLE 1.6.e XPointer Range Functions
Function
Description
end-point()
Selects a location set consisting of the endpoints of the desired location steps
range-inside()
Selects the range(s) covering each location in the location-set argument
range-to()
Selects a range that completely covers the locations within the location-set argument
start-point()
Selects a location set consisting of the start points of the desired location steps

The general syntax for string-range() is as follows:
string-range(location-set, string, [index, [length]])

1.6.3 XLink
The anchor element, <a>, within HTML indicates a link to another resource on an HTML page. This could be a location within the same document or a document located elsewhere. In HTML terms, the anchor element creates a hyperlink to another location. The hyperlink can either appear as straight text, a clickable image, or a combination of both.

The XML Linking Language creates a link to another resource through the use of attributes specified on elements, not through the actual elements themselves. The XML Linking Language specification supports the attributes listed in Table 1.6.f.

Table 1.6.f. XLink Attributes
Attribute
Description
xlink:type
This attribute must be specified and
indicates what type of XLink is   represented or defined.
xlink:href
This attribute contains the information necessary to locate the desired resource.
xlink:role
This attribute describes the function of the
link between the current resource and another.
xlink:arcrole
This attributes describes the function of the link between the current resource and
another.
xlink:title
This attribute describes the meaning of the link between the resources.
xlink:show
This attribute indicates how the resource linked to should be displayed.
xlink:actuate
This attribute specifies when to load the linked resource.
xlink:label
This attribute is used to identify a name  or a target resource.
xlink:from
This attribute identifies the starting resource.
xlink:to
This attribute identifies the ending resource.

The xlink:type attribute must contain one of the following values:
• simple
• extended
• locator
• arc
• resource
• title
• none

A value of simple creates a simple link between resources. Indicating a value of extended creates an extended link. A value of locator creates a link that points to another resource. A value of arc creates an arc with multiple resources and various traversal paths. A resource value creates a link to indicate a specific resource. A value of title creates a title link. By specifying a value of none for the xlink:type attribute, the parent element has no XLink meaning, and no other XLink-related content or attributes have any relationship to the element. For all intents and purposes, a value of none removes the ability to link to another resource from an element.

The xlink:role attribute specifies the function of the link. This attribute may only be used for the following XLink types:
• extended
• simple
• locator
• resource

The xlink:arcrole attribute may only be used with two types of XLinks:
• arc
• simple

The xlink:title attribute is completely optional and is provided for us to make some sense of and document.  If the xlink:title attribute is specified, it should contain a string describing the resource.

The xlink:show attribute is an optionally specified attribute for a link for the simple and arc XLink types and will accept the following values:
• new
• replace
• embed
• other
• none

The xlink:actuate attribute is used to indicate when the linked resource should be loaded. This attribute will accept the following values:
• onLoad
• onRequest
• other
• none

The xlink:label attribute is used to name resource and locator XLink types. This value will end up being used as values within the xlink:from and xlink:to attributes to indicate the starting and ending resources for an arc XLink type.

1.6.3.1 Simple Links
A simple link combines the functionality provided by the different pieces available through an extended link together into a shorthand notation. A simple link consists of an xlink:type attribute with a value of simple and, optionally, an xlink:href attribute with a specified value. A simple link may have any content, and even no content; it is up to the application to provide some means to generate a traversal request for the link. If no target resource is specified with the xlink:href attribute, the link is simply considered “dead” and will not be traversable.

Simple links play multiple roles in linking documents. They link exactly two resources together: one local and one remote. Therefore, if something more complex
must be handled, an extended link is necessary.

1.6.3.2 Extended Links
Within the XML Linking Language, extended links give you the ability to specify relationships between an unlimited number of resources, both local and remote. Local resources are part of the actual extended link, whereas remote resources identify external resources to the link. An out-of-line link is created when there are no local resources at all for a link.

Two marks
1. What are the types of XML document?
·         well-formed XML document
·         well-formed and valid XML document
·         use-case document.

2. Write about well formed xml document and valid xml document.
An XML document with correct syntax is well formed document whereas valid XML document must be well formed. In addition it must conform to a document type definition.

3. What are the general forms of Document Type Declaration?
<!DOCTYPE NAME SYSTEM “file”>
<!DOCTYPE NAME [ ]>
<!DOCTYPE NAME SYSTEM “file” [ ]>

4. What is the use of XML Namespace?
XML Namespaces provide a method to avoid element name conflicts. Namespaces use a colon-delimited prefix to associate external semantics with elements that can be identified via a Universal Resource Identifier (URI).

5. What are the methods used for declaring namespaces?
Within an XML document, namespaces can be declared using one of two methods:
·         default declaration
·         explicit declaration

6. Write about internal DTD. Give example.
If the DTD is declared inside the XML file, it must be wrapped inside the <!DOCTYPE> definition.
<?xml version=”1.0”?>
<!DOCTYPE message [
<!ELEMENT message (#PCDATA)>
]>
<message>
Let the good times roll!
</message>

7. Give an example for External DTD.
If the DTD is declared in an external file, the <!DOCTYPE> definition must contain a reference to the DTD file.
<?xml version=”1.0”?>
<!DOCTYPE message SYSTEM “message.dtd”>
<message>
Let the good times roll!
</message>
8. What is DTD element and write the syntax .
All elements in a valid XML document are defined with an element declaration in the DTD. An element declaration defines the name and all allowed contents of an element. Each element in the DTD should be defined with the following syntax:
<!ELEMENT elementname rule >

9. Define content rule for DTD element.
The content rules for elements deal with the actual data that defined elements may contain. These rules include the ANY rule, the EMPTY rule, and the #PCDATA rule.

10. Write about the ANY rule.
This rule states that the element may contain other elements and/or normal character data. ). An element using the ANY rule would appear as follows:
<!ELEMENT elementname ANY>

11. What is the syntax for the #PCDATA Rule? What it means?
The #PCDATA rule indicates that parsed character data will be contained in the element. Parsed character data is data that may contain normal markup and will be interpreted and parsed by any XML parser accessing the document. The following element demonstrates the #PCDATA rule:
<!ELEMENT elementname (#PCDATA)>

12. What is meant by structure rule in DTD element? What are its types?
Structure rules deal with how that data may be organized. There are two types of structure rules.
·         element only  rule.
·         mixed rule.

13. Write about the “element only” rule and “mixed”rule.
The “element only” rule specifies that only elements may appear as children of the current element. The child element sequences should be separated by commas and listed in the order they should appear.
Syntax: <!ELEMENT elementname (element1, element2, element3)>
The “mixed” rule is used to help define elements that may have both character data (#PCDATA) and child elements in the data they contain. A list of options or a sequential list will be enclosed by parentheses. Options will be separated by the pipe symbol (|), whereas sequential lists will be separated by commas.
Syntax: <!ELEMENT elementname (#PCDATA | childelement1 | childelement2)*>
The asterisk symbol (*) is added here to indicate that each of the items within the parentheses may appear zero or more times.

14. What is the role of DTD entities?
Entities in DTDs are storage units. An entity’s content could be well-formed XML, normal text, binary data, a database record, and so on. The main purpose of an entity is to hold content, and there is virtually no limit on the type of content an entity can hold.

15. What is the difference between parameter entity and internal entity?
Parameter entity is very similar to the internal entity. The main difference between an internal entity and a parameter entity is that a parameter entity may only be referenced inside the DTD. Parameter entities are in effect entities specifically for DTDs.

16. What is meant by anonymous Type Declarations?
The <element> contains a <simpleType> element without a name attribute specified. This is known as an “anonymous” type definition.

17. What are the three elements uses to add annotations?
The XML Schema Definition Language defines three new elements to add annotations to an XML schema. They are:
• <annotation>
• <appInfo>
• <documentation>
18. Define X-Path.
The XML Path Language (XPath) is a standard for creating expressions that can be used to find specific pieces of information within an XML document.

19. What is meant by forward axis predicate and reveres axis predicate?
A forward axis predicate contains the current context node and nodes that follow the context node. A reverse axis predicate contains the current context node and nodes that precede the context node.

20. What are the node tests available in XML Pointer?
XPointer provides two important node tests. They are:
• point()
• range()

IT6801 SOA notes, Regulation 2013

Regulation2013, 2013 regulation SOA notes DTD, IT6801, Nmespaces, Notes, RSOA, SOA, UNIT I, VII Sem CSE notes, XML.

SOA notes, Regulation 2013, Document Type Declaration, Well formed and valid documents, Namespaces, X-Files, SOA, UNIT I, IT6801 SOA, IT6801