XML
Extensible Markup Language
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 1
XML
• Metalanguage
– A Language, which describes languages
– Languages describe formats for data
exchange
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 2
Example
Hans Meyer
Lohmannstrasse 23
06366 Köthen
Dr. Else Müller
Bernburger Strasse 56
06366 Köthen
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 3
Example
<Patient>
<Name>
<Strasse>
<Ort>
</Patient>
<Arzt>
<Name>
<Strasse>
<Ort>
</Arzt>
Worzyk
FH Anhalt
Hans Meyer
Lohmannstrasse 23
06366 Köthen
</Name>
</Strasse>
</Ort>
Dr. Else Müller
Bernburger Strasse 56
06366 Köthen
</Name>
</Strasse>
</Ort>
Telemedizin WS 09/10
XML - 4
Structure of XML documents
• Prolog
– Deklaration of type of dokument
– DTD (Document Type Definition)
• Elements
http://www.w3schools.com/xml/default.asp
Worzyk
FH Anhalt
http://de.selfhtml.org/
Telemedizin WS 09/10
XML - 5
Document Type Definition
DTD
• It describes the grammar of a XML document
• It describes permitted elements and
attributes
– their data type and range of values
– their nesting
• An XML – Dokument, that conforms to a DTD is called
valid
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 6
Example DTD
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Personen [
<!ELEMENT Personen (Patient)>
<!ELEMENT Patient (#PCDATA)>
]>
<Personen>
<Patient>
Hans Meyer
Lohmannstrasse 23
06366 Köthen
</Patient>
</Personen>
Worzyk
FH Anhalt
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten1.xml
Telemedizin WS 09/10
XML - 7
Structure of XML documents
• DTD describes the characteristics of the elements
• Elements are initiated by a start tag <Elementname> and are
terminated by a closing tag </Elementname>.
• XML tags are case sensitive
• Elements can contain Elements.
• #PCDATA Parsed character data: The elements consist of character
strings whose characters are part of the defined character set.
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 8
Names of Elements
• Names can contain letters, numbers, and other characters
• Names must not start with a number or punctuation
character
• Names must not start with the letters xml (or XML or Xml
..)
• Names cannot contain spaces
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 9
Sequence of Elements
Subordinate elements are separated in the declaration by commas and included in
parentheses.
Example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Personen [
<!ELEMENT Personen (Patient,Arzt)>
<!ELEMENT Patient (Name,Adresse)>
<!ELEMENT Arzt (Name, Adresse)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Adresse (#PCDATA)>
]>
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten2.xml
Worzyk
FH Anhalt
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten3.xml
Telemedizin WS 09/10
XML - 10
selection list
• Selection of exactly one element: The available elements are seperated
by |
• Example:
<!DOCTYPE Personen [
<!ELEMENT Personen (Patient|Arzt)>
<!ELEMENT Patient (Name,Adresse,Diagnose)>
<!ELEMENT Arzt (Name, Adresse,Fachgebiet)>
Worzyk
FH Anhalt
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten4.xml
Telemedizin WS 09/10
XML - 11
Multiple occurrence
* The element can appear no time or arbitrarily often
+ The element can appear at least one time or arbitrarily often
? The element can appear no time or at most one time
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 12
Attributes
<!ATTLIST element-name attribute-name attribute-type default-value>
Types of attriutes::
CDATA, (en1|en2|..), ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, ENTITIES,
NOTATION, xml:
Defaultvalue:
value
#REQUIRED, #IMPLIED, #FIXED value
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten5.xml
Worzyk
FH Anhalt
http://www.w3schools.com/xml/xml_attributes.asp
Datenbanksysteme 2 SS 2004
Seite 13 - 13
Comments
Comments are embedded by
<!– and -->
<!-- This is a comment -->
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 14
Well-formed XML - File
• The file starts with the XML-declaration, which establish the
reference to XML
• It exists at least one data element
• It exists exactly one root element, which contain all other
data elements
• All required attributes are defined
• All elements have the right content
• The elements must be nested properly
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 15
Valide XML - File
• The file is well-formed
• A DTD is assigned to the file
• The content of the file is according to the assigned DTD
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 16
Parser
A parser validates if an XML Document is valide:
<html>
<body>
<script type="text/javascript">
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.validateOnParse="true"
xmlDoc.load("Patienten5.xml")
document.write("<br />Error Code: ")
document.write(xmlDoc.parseError.errorCode)
document.write("<br />Error Reason: ")
document.write(xmlDoc.parseError.reason)
document.write("<br />Error Line: ")
document.write(xmlDoc.parseError.line)
</script>
</body>
</html>
Worzyk
FH Anhalt
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Parser.htm
Telemedizin WS 09/10
XML - 17
DTD - Disadvantages
• Few datatypes
• specification not in XML – Syntax
– Specification can not be validated with a parser
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 18
XML - Schema
•
•
•
•
•
•
•
•
•
An XML Schema:
defines elements that can appear in a document
defines attributes that can appear in a document
defines which elements are child elements
defines the order of child elements
defines the number of child elements
defines whether an element is empty or can include text
defines data types for elements and attributes
defines default and fixed values for elements and attributes
Worzyk
FH Anhalt
http://www.w3schools.com/schema/schema_intro.asp
Telemedizin WS 09/10
XML - 19
XML Schema
Advantages over DTD
•
•
•
•
XML Schemas are extensible to future additions
XML Schemas are richer and more useful than DTDs
XML Schemas are written in XML
XML Schemas support data types
– xs;date, xs;dateTime, xs:string
• XML Schemas support namespaces
– xmlns:xs="http://www.w3.org/2001/XMLSchema“
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 20
Dublin Core Standard
Dublin Core Metadata Initiative
Conference in 1995 in Dublin / Ohio defined a set of describing
attributs to categorize documents in the internet
15 core elements are recommended in „Dublin Core Metadata
Element Set, Version 1.1 (ISO 15836)“
http://dublincore.org/documents/dces/
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 21
How to create
an XML structure
•
•
•
•
Create a tree-structure of the data
Convert that structure to a DTD
Add data elements
Test
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 22
Example
Quarterly billing
•
•
•
•
•
•
•
One file consists of exactly one physician and at least one patient
A phyiscian is either a General Practitioner or a dentist
A general practitioner has an address and a profession
A dentist has an address
A patient has an address and no ore more diagnisis
An address consists of Name, City, Street
A name has a salutation Mr. or Ms.
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 23
Example
Quarterly billing
billing
Physician
General Practitioner
Profession ?
Address
Worzyk
FH Anhalt
Patient
| Dentist
Address
Adresse
Name
Mr
Diagnosis
City
Ms
+
Street
Telemedizin WS 09/10
XML - 24
*
Example - DTD
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Billing [
<!ELEMENT Billing (Physician, Patient+)>
<!ELEMENT Physician (General_Practitioner | Dentist)>
<!ELEMENT General_Practitioner (Address, Profession?)>
<!ELEMENT Dentist (Address)>
<!ELEMENT Patient (Address, Diagnosis*)>
<!ELEMENT Address (Name, City, Street)>
<!ELEMENT Profession (#PCDATA)>
<!ELEMENT Diagnosis (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT City (#PCDATA)>
<!ELEMENT Street (#PCDATA)>
<!ATTLIST Name Salutation (Mr|Ms) "Ms">
]>
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 25
Example - Data
< Billing >
< Physician >
< General_Practitioner >
<Address>
<Name>Dr. Erpel</Name>
<City>Entenhausen</City>
<Street>Am Krankenhaus 1</Street>
</Address>
< Profession >Geriatrics</ Profession >
</ General_Practitioner >
</ Physician >
< Patient >
<Address>
<Name Anrede="Herr">Daniel</Name>
<City>Entenhausen</City>
<Street>Bahnhofstrasse 3a</Street>
</Address>
<Diagnose>Bettflucht</Diagnose>
</Patient>
<Patient>
<Address>
<Name>Daisy</Name>
<City>Entenhausen</City>
<Street>Am Stadtpark</Street>
</Address>
<Diagnosis>Sonnenbrand</Diagnosis>
<Diagnosis>Migräne</Diagnosis>
</Patient>
</ Billing >
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 26
Queries to
XML - Files
• XPath
• XQuery
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 27
XPath
The language XPath serves to address parts of a XML
document.
It was designed for the use both in XSLT and in XPointer.
XPath models a XML document as a tree, which consists of
knots.
http://www.informatik.hu-berlin.de/~obecker/obqo/w3c-trans/xpath-de-20010702/
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 28
Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
Worzyk </bookstore>
FH Anhalt
Telemedizin WS 09/10
XML - 29
Queries with XPath
Select all titles:
/bookstore/book/title
Select the title of the first book
/bookstore/book[1]/title
Select all the prices
/bookstore/book/price/text()
Select price nodes with price>35
http://www.w3schools.com/xpath/xpath_examples.asp
/bookstore/book[price>35]/title
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 30
XQuery
• Querylanguage for XML data
• Uses Xpath expression
• Analogy to SQL
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 31
Xquery Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
<book year="1992">
<title>Advanced Programming in the Unix environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price>39.95</price>
</book>
<book year="1999">
<title>The Technology and Content for Digital TV</title>
<editor>
<last>Gerbarg</last><first>Darcy</first>
<affiliation>CITI</affiliation>
</editor>
<publisher>Kluwer Academic Publishers</publisher>
<price>129.95</price>
</book>
Worzyk
</bib>
FH Anhalt
Telemedizin WS 09/10
XML - 32
Xquery Example
Query:
doc("books.xml")/bib/book[price<50]
results:
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price>39.95</price>
</book>
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 33
FLWOR
For, Let, Where, Order by, Return
for $x in doc("books.xml")/bib/book
where $x/price>50
order by $x/title
return $x/title
Results:
<title>Advanced Programming in the Unix environment</title>
<title>TCP/IP Illustrated</title>
<title>The Technology and Content for Digital TV</title>
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 34
XML – Documents
in Databases
XML – Documents can be
• Focussed on data
• Focussed on text
• Semi-structured
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 35
Alternatives to store
XML Documents
• Storage as a whole
• Storage within the XML-Structure
• Transformation to structures of the database
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 36
Storage of XML documents as a whole
Original will be stored in a file system or as CLOB in a
database
full-text index
Strukturindex
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 37
Example
<hotel
url=“http://www.hotel-huebner.de“
id=“h0001“
erstellt-am=“03/02/2003“
Autor=“Hans Müller“>
<hotelname>Hotel Hübner</hotelname>
<kategorie>4</kategorie>
<adresse>
<plz>18199</plz>
<ort>Warnemünde</ort>
<strasse>Seestraße</strasse>
</adresse>
<telefon>0381 / 5434-0</telefon>
<fax> 0381 / 5434-444</fax>
<anreisebeschreibung>Aus Richtung
Rostock kommend ...
</anreisebeschreibung>
</hotel>
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 38
full-text index
Begri ff
Verwe is
hotel
***
Warnemünde *
Rostock
*
ort
**
Worzyk
FH Anhalt
<hotel
url=“http://www.hotel-huebner.de“
id=“h0001“
erstellt-am=“03/02/2003“
Autor=“Hans Müller“>
<hotelname>Hotel Hübner</hotelname>
<kategorie>4</kategorie>
<adresse>
<plz>18199</plz>
<ort>Warnemünde</ort>
<strasse>Seestraße</strasse>
</adresse>
<telefon>0381 / 5434-0</telefon>
<fax> 0381 / 5434-444</fax>
<anreisebeschreibung>Aus Richtung
Rostock kommend ...
</anreisebeschreibung>
</hotel>
Telemedizin WS 09/10
XML - 39
full-text - and
Structurindex
Begriff
Verweis
Element
Warnemünde *
*
Seestrasse
*
*
Rostock
*
*
Element
Worzyk
FH Anhalt
hotel
Ver
w eis
*
Ord
Vor
nung gänger
1
adresse
*
2
*
ort
*
3
*
strasse
*
3
*
anreise
*
bschreibung
2
*
<hotel
url=“http://www.hotel-huebner.de“
id=“h0001“
erstellt-am=“03/02/2003“
Autor=“Hans Müller“>
<hotelname>Hotel Hübner</hotelname>
<kategorie>4</kategorie>
<adresse>
<plz>18199</plz>
<ort>Warnemünde</ort>
<strasse>Seestraße</strasse>
</adresse>
<telefon>0381 / 5434-0</telefon>
<fax> 0381 / 5434-444</fax>
<anreisebeschreibung>Aus Richtung
Rostock kommend ...
</anreisebeschreibung>
</hotel>
Telemedizin WS 09/10
XML - 40
Queries
Volltextindex
hotel AND warnemünde
(hotel OR pension) AND (rostock OR warnemünde)
Volletxt- und Strukturindex
hotel.adresse.ort CONTAINS (“warnemünde“) AND
hotel.freizeitmoeglichkeit CONTAINS
(“swimming pool“)
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 41
Characteristics
full-text index
Description of Schema
Not required
Reconstruction of
document
Queries
The document remains in
the original form
- Information Retrieval
- SQL
The evaluation of the
structure is possible
Document-centered
applications
further characteristics
Use
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 42
generic storage
Storage within the XML-Structure
All Informationen of the XML-Dokument will be stored
– simple generic Storage
– Document Object Model
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 43
DocID Element
name
h0001 hotel
h0001 hotelname
h0001 kategorie
h0001 adresse
h0001 plz
h0001 ort
...
Worzyk
FH Anhalt
ID
101
102
103
104
105
106
Beispiel
Vor
gäng er
101
101
101
104
104
DocID Attribut
name
h0001 url
ID
h0001 id
...
102 101
Ord
nung
1
1
2
3
1
2
Wert
Hotel Hübner
4
18119
Warnemünde
Element Wert
101 101
http://www.hotelhuebner.de
h0001
Telemedizin WS 09/10
XML - 44
Document Object Model
The structure of the tree will be transformed to a class hierarchy
Storage in objectrelational or objektoriented databases
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 45
Queries
• XPath
• QXuery
• XQL
– Abfragesprache der Software AG
• SQL
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 46
Characteristics
Generic Storage
Description of Schema
Not required
Reconstruction of document possible, but expensive
Queries
further characteristics
Use
Worzyk
FH Anhalt
- XQuery, XQL
- QL considers the storage
structures
Queries anb Updates possible w ith
DOM
for documents
- Focussed on data
- Focussed on text
- Semi-structured
Telemedizin WS 09/10
XML - 47
Transformation to Structures of databases
DTD or Schema must be available
Automatic or userdriven procedures
Transformtion to
relational
objectrelational
objectoriented
Databases
Worzyk
FH Anhalt
Telemedizin WS 09/10
XML - 48
Transformation
XM L - Information
Element Root - Element
XM L - Element
Sequence of Elementen
Alternative of Elementen
Element w ith Qualifizierer ?
Element w ith Qualifizierer +
or *
komplex strukturiertes
Element
Attribut XM L - Attribut
#IM PLIED
#REQUIRED
Defaultw ert
Worzyk
FH Anhalt
Datenbankiformation
Relation
Attribut of a Relation
Attribute of a Relation
Attribute of a Relation
Attribut, nullvalue possible
SET oder LIST
ROW
Attributof a Relation
Nullvalue not allow ed
Nullvalue not allow ed
Defaultvalue
Telemedizin WS 09/10
XML - 49
Example
Hotelname url
Hotel
Hübner
id
erstellt-am
autor
http:// h0001 03/02/2003 Hans
M üller
kate fax
anreisebeschreibung
gorie
4
0381 Aus Richtung
Rostock
id
plz
ort
strasse
nummer
h0001
18119
Warnemünde
Seestrass
e
12
id
telefon
h0001 0381 / 5434 - 0
Worzyk
FH Anhalt
Ordnung
1
Telemedizin WS 09/10
XML - 50
Queries
• SQL with
–
–
–
–
Worzyk
FH Anhalt
Joins
Aggregatfunktionen
Queryoptimizing
Update
Telemedizin WS 09/10
XML - 51
Characteristics
Structures of databases
Description of Schema
required
Reconstruction of
document
Queries
only partly possible
further characteristics
Keeps the order of elements w ith
additional attributs
For data-centered applications
Use
Worzyk
FH Anhalt
- SQL und XM L
Telemedizin WS 09/10
XML - 52

XML