XML provides a means to communicate data across networks and among heterogeneous applications. XML is a common information technology acronym in 2010 and is supported in a large variety of applications and software development tooling. XML’s wide adoption into many technologies means it is likely being used in places not originally imagined by its designers. The resulting potential for misuse, erroneous configuration or lack of awareness of basic security issues is compounded by the speed and ease with which XML can be incorporated into new software systems. This paper presents a survey of the security and privacy issues related to XML technology use and deployment in an information technology system.

The XML Working Group was established in 1996 by the W3C. It was originally named the SGML Editorial Review Board. (Eastlake, 2002). Today XML has ten working groups, focused on areas including the core specifications, namespaces, scripting, queries and schema, and service modeling. XML is an ancestor of SGML, and allows the creation of entirely new, domain-specific vocabularies of elements, organized in hierarchical tree structures called documents. XML elements can represent anything related to data or behavior. An XML document can represent a customer’s contact information. It can represent a strategy to format information on a printer or screen. It can represent musical notes in a symphony. XML is being used today for a variety of purposes, including business-to-business and business-to-consumer interactions. It is used for the migration of data from legacy repositories to modern database management systems. XML is used in the syndication of news and literary content, as in the application of ATOM and RSS feeds by web sites.

The flexibility and potential of XML use in information technology received increasing attention when web services technology was introduced. Web services communicate using XML. They can be queried by client programs to learn their methods, parameters and return data. They are self-describing, which means the approach of security-through-obscurity cannot apply if a web service is discovered running on a publicly accessible server. An attacker can ask the service for its method signatures and it will respond with specifications of how to invoke it. This does not mean the attacker will have the necessary information, such as an authentication credential or special element of data to gain access to the web service. Treese (2002) summarizes the primary security concerns involved with deploying any communications system that must transmit and receive sensitive data.

  1. Confidentiality, to ensure that only the sender and receiver can read the message
  2. Authentication, to identify the sender and receiver of a message
  3. Integrity, to ensure that the message has not been tampered with
  4. Non-repudiation, to ensure that the sender cannot later deny having sent a message
  5. Authorization, to ensure that only “the right people” are able to read a message
  6. Key management, to ensure proper creation, storage, use, and destruction of sensitive cryptographic keys
Web services are a recent technology, but fall prey to similar attacks used in past and current Internet technologies. Web services are vulnerable to many of the same attacks as browser-based applications according to Goodin (2006). Parsing and validation of data provided inside a transmitted XML document must be performed regardless of the source of the transmission. DTDs or XML schemas that are not strict enough in their matching constraints can leave an open path for parsing attacks. Goodin (2006) details the primary attacks on web services as:
  1. Injection attacks that use XML to hide malicious content, such as using character encoding to hide the content of strings
  2. Buffer overflow vulnerabilities in SOAP and XML parsers running on the system providing the web service
  3. XML entity attacks, where input references an invalid external file such as a CSS or schema, causing the parser or other part of the application to crash in unexpected ways
Lawton (2007) details a similar problem with AJAX technology. AJAX stands for Asynchronous JavaScript and XML. It not so much a specific technology, but a technique to reduce the number of whole page loads performed by a browser. An AJAX-enabled application can update portions of a browser page with data from a server. The data transmitted between browser and server in an AJAX communication is formatted in XML. The server-side of an AJAX application can be vulnerable to the same attacks described for web services above - overflows, injection of encoded data, and invalid documents. Goodin (2006) recommends IT staff scan the publicly facing systems of an enterprise periodically for undocumented web services, and scanning known web services and applications with analysis tools such as Rational’s AppScan (2009). Lawton (2007) also recommends the use of vulnerability scanners for source code and deployed systems.

A common mistake made even today in the deployment of web services or web applications is a lack of use of HTTPS or TLS protocol in securing the transmission of data between the client and server. All data transmitted across the Internet passes through an unknown number of routers and hosts before arriving at the destination. The format of an XML document makes it easy for eavesdroppers to identify and potentially capture a copy of this data as it passes through networking equipment. The easiest solution to this problem is to host the web service or web application over the HTTPS protocol. HTTPS is HTTP over SSL, which encrypts the data during transmission. HTTPS will not protect data before leaving the source or after arriving at the destination.

Long, et al. (2003) discusses some of the challenges of bringing XML-encoded transactions to the financial services industry. Privacy is a primary concern for electronic financial transactions. Long states that simply using SSL to encrypt transmissions from system to system is not enough to satisfy the security needs of the financial sector. There also exists a need to encrypt portions of an XML document differently, so that sensitive content has different visibility depending on the system or person accessing it. The XML Encryption Syntax and Processing standard allows any portion or an entire XML document to be encrypted with a key, and then placed within an XML document for transmission or storage. The encrypted document remains a well-formed XML document. Eastlake (2002) describes the Encryption Syntax Processing and Signature Syntax Processing recommendations for XML. Using the ESP recommendation, portions of the document can be encrypted with different keys, thus allowing different people or applications to read the portions of the document for which they have keys. This approach provides a form of multi-level security within a single XML document.

With web services comes the problem of knowing which ones to trust and use. Even more difficult is the problem of giving that determination to a computer. Carminati, Ferrari and Hung (2005) describe a problem of automating the evaluation of privacy policies of web services in today’s world of data storage, cloud, banking and financial institutions and multi-player gaming businesses that exist entirely on the Internet. They reason that systems discovered in web services directories may not operate with compatible privacy policies required by the consumer’s organization or local laws. They propose three solutions for handling this problem. The first is basic access control from a third party that evaluates and quantifies the privacy policy for a service provider. The next is cryptography in the services directory so that the consumer decodes only compatible services. The final solution is a hash solution, which looks for flags supplied by the web services provider describing their support of specific aspects of privacy policy.

As with the problem of transmitting sensitive XML data across the Internet unencrypted, there is also a problem of authenticating the source of an XML document. How does a person or system verify the document’s originator? The Signature Syntax Processing recommendation briefly mentioned above provides a method to enclose any number of elements in a digital signature. This method uses public key cryptography to sign a portion of the document’s data. The originator of the document provides a public key to the recipient through a secure channel (on a flash drive) in advance of transmitting the data. The originator uses their secret key to sign the document data, which produces a new smaller block of data called a digital signature. The signature is embedded in XML around the protected elements. The signature and the XML data are used by the recipient to determine if the data was changed in transmission. The signature is also used to verify the identity of the signer. Both authentication steps require the recipient to have the sender’s public key.

The problem of securing documents through path-based access control was addressed early in XML’s lifetime. Damiani et al (2001) describe an access control mechanism specifically designed for XML documents. Their Access Control Processor for XML uses XPath to describe the target location within a schema for access along with the rights associated to groups or specific users of the system. Additionally, Böttcher and Hartel (2009) describe the design of an auditing system to determine if confidential information was accessed directly or indirectly. They use a patient records system as an example scenario for their design. Their system is unique in that it can analyze “[…] the problem of whether the seen data is or is not sufficient to derive the disclosed secret information.” The authors do not discuss whether their design is transportable to non-XML data sources, such as relational databases.

In 2010, we have technologies to use with XML in several combinations to secure document content during transmission and in long-term storage. The use of SSL, Encryption Syntax Processing and Signature Syntax Processing recommendations provide a rich foundation to create secure XML applications. The maturity of web servers, the availability of code analyzers and the increasing sophistication of IT security tools decreases the risk of infrastructure falling to an XML-centric attack. With the technical problems of securing XML addressed through various W3C recommendations, code libraries and tools, a new problem of education, precedence in use and organizational standards for their application becomes the new security issue in XML-related technologies. This is a recurring problem in many disruptive technologies called awareness. Goodin (2006) says, “[…] the security of web services depends on an increased awareness of the developers who create them, and that will require a major shift in thinking.” XML has introduced and solved many of its own security problems - through application of its technology. It becomes important now for the industry to document and share the experiences and practices of deploying secure XML-based Internet applications using the technologies recommended by the W3C and elsewhere.


Böttcher, S., Hartel, R. (2009). Information disclosure by answers to XPath queries. Journal of Computer Security, 17 (2009), 69-99.

Carminati, B., Ferrari, E., Hung, P. C. K. (2005). Exploring Privacy Issues in Web Services Discovery Agencies. IEEE Security and Privacy, 2005, 14-21.

Damiani, E., Samarati, P., De Capitani di Vimercati, S., Paraboschi, S. (2001). Controlling access to XML documents. IEEE Internet Computing, November-December 2001, 18-28.

Eastlake, D. E. III., Niles, K. (2002). Secure XML: The New Syntax for Signatures and Encryption. Addison-Wesley Professional. July 19, 2002. ISBN-13: 978-0-201-75605-0.

Geer, D. (2003). Taking steps to secure web services. Computer, October 2003, 14-16.

Goodin, D. (2006). Shielding web services from attack. Infoworld.com, 11.27.06, 27-32.

Lawton, G. (2007). Web 2.0 creates security challenges. Computer, October 2007, 13-16.
Long, J, Yuan, M. J., Whinston, A. B. (2003). Securing a new era of financial services. IT Pro,
July-August 2003, 15-21. 1520-920203.

Naedele, M. (2003). Standards for XML and web services security. Computer, April 2003, 96-98.

Rational AppScan. (2009). IBM Rational Web application security. Retrieved 14 February 2009 from http://www-01.ibm.com/software/rational/offerings/websecurity/webappsecurity.html.

Treese, W. (2002). XML, web services, and XML. NW, Putting it together, September 2002, 9-12.