‹ Blog

Schematron Validation – What Is It And Why Is It Important?

04.05.2022

Schematron Rule-based Validation

Unlike a W3C XML schema, Schematron does not specify the document structure. Its role is to define data integrity requirements based on user-defined rules. Typical integrity validations include different kinds of sum checks, date comparisons, and conditional and format requirements. In other words, Schematron does not replace schema validation but complements it.

Why is Schematron Needed?

Company, industry, and country-level data requirements have traditionally been published as exhaustively lengthy message implementation guidelines (MIG). They are of course important, but each implementer may easily come up with their own interpretation of them. While everyone thinks they have done it right, the solutions still don’t interoperate. Sounds familiar?

Using Schematron you can minimize the gap between the schema validation and documented requirements by transforming the conditional, integrity, and content requirements into a machine-processable format. Improved validation coverage will minimize varying interpretations which in turn help to maximize the value of standardization work for all organizations involved.

Often, the integrity and content requirements are validated only when the XML document is read into the receiving system. When errors are detected in the document, it ends up in error handling, in which case the information is either filled in manually or an error message is returned to the sender. Needless to say, this kind of processing wastes time and resources. In addition, the error feedback generated by the receiving system is often vague, such as “error code 6”. In such cases, it remains unclear which part of the material should be corrected in order to avoid the same error in the future.

The Benefits of Schematron Use

With the help of Schematron, most of the integrity and content requirements can be published in a format that can be used to test XML documents already at the sending end. In this case, the majority of invalid files never end up in the recipient's system in the first place.

So why has Schematron not been used in the past for XML document validation? Usually, it is a case of unawareness. But in addition, there are many kinds of explanations, such as “Schematron cannot be used to validate all data requirements because some of the validations are related to master data”. So, what if they are? If in most cases more than 80% of the requirements can be displayed using Schematron, would it not be great if such a large proportion of the validations could be carried out already at the sending end? Most content issues would then not consume the recipient's resources at all. The sender would also receive immediate feedback on the errors and could take immediate action to correct them.

The Limitations of Schematron

A Schematron validation generates a technical feedback, just like a schema check. As feedback, you will receive a test report in XML format, showing the performed validations and the errors that were found. Each error message contains a plain language description of the requirement, an XPath for the inaccurate element, and a technical implementation of the validation rule, i.e., a long litany of XPath code. The challenge with this is that in the case of some checks, for instance, it is often not clear whether it is about a rounding error of one cent, or instead a major computational problem. This shortcoming is evident, for example, in Schematron validations implementing the EU eInvoicing Directive published by CEN434. A Schematron error message also does not contain the line number of the invalid element, so locating it can be a shot in the dark, especially for large datasets.

Schematron validation is well suited for automated validation to ensure that the document meets certain requirements. However, as a tool for locating and correcting errors, Schematron as such is overly technical and too complicated.

TRUUGO + Schematron - Perfect Together

In Truugo, you can create a test profile to cover schema validation, Schematron validation, and any context-specific restrictions – at once. The same validations can be carried out both automatically and manually in the development and production environments. Truugo allows the tester to locate and correct errors faster and easier without special tools and expertise. Truugo draws up a clear test report from which each invalid part of the XML document can easily be accessed. Even the most complex validations can be broken down into smaller parts, so that in the case of sum checks, for example, it is possible to immediately see the magnitude of the problem.

Tags:
content validation, quality, schematron, truugo, validation