Blog

Secure Coding Technique: Processing XML data, part 1

December 10, 2017

Pieter De Cremer

Code snippet in a terminal showing XML of employees with name, role, and security code status.

Extensible Markup Language (XML) is a markup language used for encoding documents in a format that is both easy to handle for machines and human-readable. However, this commonly used format includes multiple security flaws. In this first XML related blog post, I will explain the basics of handling XML documents securely by using a schema.

OWASP divides the different vulnerabilities related to XML and XML schemas in two categories.

Malformed XML documents

Malformed XML documents are documents that do not follow the W3C XML specifications. Some examples that result in a malformed document are the removing of an ending tag, changing the order of different elements or the use of forbidden characters. All of these errors should result in a fatal error and the document should not undergo any additional processing.

In order to avoid vulnerabilities caused by malformed documents, you should use a well-tested XML parser that follows W3C specifications and does not take significantly longer to process malformed documents.

Invalid XML documents

Invalid XML documents are well formed but contain unexpected values. Here an attacker may take advantage of applications that do not properly define an XML schema to identify whether documents are valid. Below you can find a simple example of a document that, if not validated correctly, might have unintended consequences.

A web store which stores its transactions in XML data:

<purchase></purchase>
<id>123</id>
<price>200</price>

And the user only has control over the <id> value. It is then possible, without the right counter measures, for an attacker to input something like this:</id>

<purchase></purchase>
<id>123</id>
<price>0</price>
<id></id>
<price>200</price>

If the parser that processes this document only reads the first instance of the <id> and <price> tags this will lead to unwanted results. </price></id>

XML database with highlighted code showing user roles and passwords including an attacker administrator user.

It is also possible that the schema is not restrictive enough or that other input validation is insufficient, so that negative numbers, special decimals (like NaN or Infinity) or exceedingly big values can be entered where they are not expected, leading to similar unintended behavior.

Avoiding vulnerabilities related to invalid XML documents should be done by defining a precise and restrictive XML Schema to avoid problems of improper data validation.

Next blog post we will go into some more advanced attacks on XML documents such as Jumbo Payloads and the feared OWASP Top Ten number four, XXE.

In the meantime you can hone or challenge your skills on XML input validation on our portal.

Specifications for XML and XML schemas include multiple security flaws. At the same time, these specifications provide the tools required to protect XML applications. Even though we use XML schemas to define the security of XML documents, they can be used to perform a variety of attacks: file retrieval, server side request forgery, port scanning, or brute forcing.

https://www.owasp.org/index.php/XML_Security_Cheat_Sheet

Share on social

Govern AI-driven development before it ships

Measure AI-assisted risk, enforce secure coding policy at commit, and accelerate secure delivery across your SDLC.

book a demo

Resource library

Explore more blogs

Access expert content on secure coding, AI governance, and software risk management.

browse all

Blog

Named in the Gartner® Hype Cycle™ for Application Security 2026

Secure Code Warrior is named in the Gartner® Hype Cycle™ for Application Security, 2026 for Agentic Coding Security and Secure Coding Training. Here's why.

Learn More

Blog

Are you a CISO or Engineering Leader worried about the Security and Cost of LLM code generation?

Review the SCW AI Trust Index, our proprietary LLM benchmarking data, before going all-in on an AI model.

Learn More

Blog

Enabler 6: Regular Reporting to Leadership

Executive buy-in doesn't sustain itself. Enabler 6 shows how regular reporting keeps leadership engaged, informed, and invested in program success.

Learn More

Secure AI-driven development before it ships

See developer risk, enforce policy, and prevent vulnerabilities across your software development lifecycle.

Book a demo

No items found.

Learning

Secure Coding Technique: Processing XML data, part 1

Malformed XML documents

Invalid XML documents

<purchase></purchase> <id>123</id> <price>200</price>

<purchase></purchase> <id>123</id> <price>0</price> <id></id> <price>200</price>

Govern AI-driven development before it ships

Explore more blogs

Named in the Gartner® Hype Cycle™ for Application Security 2026

Are you a CISO or Engineering Leader worried about the Security and Cost of LLM code generation?

Enabler 6: Regular Reporting to Leadership

Secure AI-driven development before it ships

<purchase></purchase>
<id>123</id>
<price>200</price>

<purchase></purchase>
<id>123</id>
<price>0</price>
<id></id>
<price>200</price>