Home » Data Protection by Design in Mendix
Actueel + +

Data Protection by Design in Mendix

Stephan Wolbers

Why discipline is not a control mechanism.

Introduction

How many of you have had a copy of *Production* on your laptop over the past year? It’s not a trick question. I’m just curious. Chances are the answer isn’t zero.

Production data is incredibly useful. It contains real customers, real edge cases, and real inconsistencies, exactly the things that are always missing from a synthetic dataset. And so we copy it. Usually with the best of intentions and a healthy dose of self-deception.

That in itself is not surprising. It is efficient and often the quickest way to understand a problem. Even in physics, there are few experiments that do not involve measurement data.

But then a series of questions follows:

  • What kind of data is actually sensitive?
  • Which attributes contain personal data, either directly or indirectly, or contain sensitive business information?
  • How do you anonymize or protect that data?
  • How do you handle references and derived values?
  • How do you know you’re not missing anything?
  • And if you have to do this again tomorrow, will you get exactly the same result?

In practice, these questions are rarely asked explicitly. The focus is on reproducing the bug or completing the user story, not on formalizing a systematic approach to data security.

In that case, the protection of sensitive data becomes not a feature of the model, but a manual correction applied afterward.

The Terms and Conditions

If the protection of sensitive data is to be a mature design aspect, it must meet a number of conditions:

  1. It must be transparent. Decisions regarding sensitive attributes must be an explicit and visible part of the design. They should not be implicitly hidden in scripts or in people’s minds.
  2. It must be predictable. The security logic must be verifiable and reproducible. Determinism may be a deliberate design choice, but it is not a requirement. Nor is randomness.
  3. It must be consistently applicable across environments. What is implemented in one environment must be identically reproducible in another.

If there is no mechanism to facilitate or enforce these conditions, they remain dependent on discipline. And discipline is inversely proportional to project pressure.

The model

If discipline isn’t enough, structure must do the job. And in a Mendix application, that structure starts with the domain model, the formal description of reality. Entities define what exists. Attributes define which properties are relevant.

The protection of sensitive data must not be left out of the model. It must be explicitly included in it.

This means that a decision is made for each entity and for each attribute: Should the value be retained, deleted, or replaced? Does this apply in all cases, or only under certain conditions?

These decisions are not technical details, but policy decisions. They determine how an organization handles sensitive information.

This is the fundamental advantage of a model-driven approach. By mirroring the domain model at runtime using ModelReflection, you create a structured overview of all entities and attributes. Based on this, you can choose the appropriate protection for each attribute, not scattered across scripts, but centralized and transparent.

You can also export this configuration to JSON and apply it in other environments. This way, security becomes not just a technical implementation, but an explicit policy that is transferable and open to discussion.

The result is a shift from an IT solution to a business tool. Stakeholders can see which decisions have been made: which data is retained, which is deleted, and which is generated.

Avoiding these choices simplifies the conversation, but nothing is as persistent as reality.

The implementation

The choices outlined in the model are only valuable if they are applied systematically. Protection should not be a one-off action; it must function as a mechanism.

This mechanism is built around five design principles: scope and scalability, explicit rules, determinism, controlled generation, and uniqueness.

Let’s dive into the technical details for a moment. For some readers, this is a good time to pour another cup of coffee and join us in the next chapter.

1. Scope and scalability

The process begins with an optional XPath scope for each entity. Based on this, the total volume is first determined using an aggregated count, followed by batch processing. Offset and batch size control prevent memory spikes and make large-scale refreshes of test or acceptance data manageable.

2. Explicit rules

The configuration in the model is translated into an execution model using RuleTypes. For each attribute, you explicitly choose a strategy, such as:

  • KEEP
  • CLEAR
  • FIXED
  • HASH
  • GENERALIZE
  • GENERATE

Each RuleType corresponds to a specific, manageable transformation. There is no implicit “best guess.” A missing or invalid configuration results in an explicit error rather than a silent deviation.

3. Determinism and reproducibility

Determinism is a deliberate design choice. If a rule is deterministic, a seed is calculated based on stable inputs, such as the attribute name and original value, combined with a salt.

The seed is cryptographically derived using SHA-256 and reduced to a long. Length prefixing is used to avoid ambiguity in composite inputs.

The result is idempotence: the same input, the same configuration, and the same salt produce the same output.

4. Controlled generation

Generated values use a controlled routing layer on top of Datafaker. Both native expressions and call-style invocations are supported, but always within defined boundaries.

Provider and method names are validated, high-risk methods are blocked, and only safe return and parameter types are accepted. This prevents a configuration field from becoming a generic Java invocation engine.

5. Uniqueness

Uniqueness is not enforced blindly. For String-like attributes, a controlled strategy is used to avoid collisions within a batch, without relying on global state or external storage.

In addition, existing data is redistributed on a bucket-by-bucket basis through shuffling, which can optionally be deterministic. This preserves the statistical distribution while making individual values untraceable.

The context

At MxBlue, we’ve translated the approach described above into a Mendix module: DataProtection. The module is not a one-size-fits-all solution for all forms of data security. That would be suspicious.

It is intended for situations where data is used outside of production and where explicit, model-driven decisions are required.

This module is particularly suitable when:

  • Production data is synced to test or acceptance environments
  • Realistic but non-traceable data is needed for a demo or validation
  • Subsets of data must be shared with partners in a controlled manner
  • Pseudonymization is necessary for analysis without retaining the full production profile
  • Organizations want to provide transparency to the business and stakeholders regarding the security decisions they have made

In these situations, the module provides structure where discipline would otherwise have to do the job.

There are also situations in which this module is not the right solution. The module is not intended for use when:

  • The goal is to protect data in production from unauthorized access
  • Encryption or access control needs to be replaced
  • The focus is on the legal interpretation of regulations
  • Cryptographic anonymization is required
  • Permission management must be set up

In such cases, the solution lies elsewhere in the architecture. The module focuses on the controlled transformation of data outside a production environment.

Conclusion

The problem described here is a concrete one. Production data is being copied. Sensitive information needs to be modified. Without an explicit mechanism in place, this is done on an ad hoc basis.

The DataProtection module offers a solution to this problem. It makes protection settings visible within the model, enables them to be transferred between environments, and systematically applies them to data. This replaces a vulnerable manual step with a reproducible mechanism.

Anyone who uses production data outside of production needs a structured way to modify that data. This module provides that.

Altijd als eerste op de hoogte?
Volg ons op LinkedIn!

Lincedin icon

Mis geen update of event!
Abonneer je op onze nieuwsbrief en hoor als eerste over onze nieuwste updates, klantverhalen en events.

Skip form

Gerelateerde artikelen