Systems & Validation

At Jam Warehouse we had an interesting problem: Data Validation. This in itself is not all that interesting, but if you think about it, what does this really mean? According to Wikipedia:

data validation is the process of ensuring that a program operates on clean, correct and useful data…
And in the case of BrandDirector, not only do we prevent the user from saving invalid data in the database, but also ensure that we don’t do any processing with invalid data.

Ah, now we know what to do! Just put lots of checks in the system. Checks like:

  • ‘if number > 100 or < 0, then show an error'
  • 'if text is not an email address, then show an error'

But, what if the business model consists of hundreds of objects? This is going to get very cluttered and very difficult to maintain. So obviously we need some sort of structure in our code that allows us to create our models without clogging up the code files with numerous bits of check logic.

But before we start to create that awesome code, we need to know what we are working with: BrandDirector is a large client-server system with many domain objects. This is a web based system that effectively allows many users to input data and then it gets saved onto the server. (This is of course a gross understatement of what BrandDirector actually does, but that is not the problem.) I am needing to ensure that the data from the user, before processing and/or saving, is both correct and clean.

Data validation, from a user's perspective, ensures that any data/information from the system is both reliable and useful for business. But, I am not a user, I am going to do the implementing. What I need to do is provide myself a way of adding validation to all those objects without messing up my immaculate code (at least that's what I think). Obviously, we want to tell the user when the data is invalid and instruct him on how to correct it. So, what we are working to provide is: clean code, clean data and error messages.

What we want

Now, we are a C# shop and we like to use all those great features of the C# language, such as attributes and auto-properties. For those that aren't C# fans like me, here is a sample of what I want to write:

// Model
public class Ingredient
{
    [MaximumLength(50)] // <- attribute validation rule
    public string Name { get; set; } // <- auto property
}

This is what I would have written for each model property, if it hadn't been for the frameworks as we will see soon:

// Model
public class Ingredient
{
    private string name;
    public string Name
    {
        get { return name; }
        set
        {
            name = value;
            if (name.Length > 50)
                AddError("Name", "Name must be less than 50 chars");
        }
    }
}

As you can see, It is way more code and just ugly. The first version is both neat and does almost exactly the same thing. It allows the user to set the value and then the UI will display the error message if need be. I say 'almost' because the first does no checking (at least not yet). How are we going to get that checking into the first class? Well, we can use some great frameworks out there that does not need us to change our code at all, but does the checking.

Solving the problem

The two frameworks that are needed are FluentValidation and PostSharp. FluentValidation is a framework that allows us to create rules for a particular type of object and then provides a means to validate an object on those rules. This means is called a 'Validator':

// IngredientValidator validates an object of type 'Ingredient'
public class IngredientValidator : AbstractValidator<Ingredient>
{
    public IngredientValidator()
    {
        // The neatly allows us to create a rule for 'Name'.
        RuleFor(x => x.Name).MaximumLength(50);
    }
}

Using the validator allows us to write a very neat and easy-to-read section of code:

public void SendIngredientToServer(Ingredient myIngredient)
{
    var validator = new IngredientValidator();
    var result = validator.Validate(myIngredient);

    if (result.IsValid)
        SendIngredient(myIngredient);
    else
        ShowErrorMessages(result.Errors);
}

But I don't want to have to do even this small check every time I press the save button. I want the check to run every time I change the properties as well as when I press save. And what I really want is that save button to be disabled when the data is invalid. This is where PostSharp is a really useful. It allows us to modify the compiled assembly and then insert all the checks for us on each property. (We create 'Aspects' that allows us to write the boilerplate code that is applied to each property) This will cause the validator to be run every time the properties' value changes. Now all we have to do is this:

public void SendIngredientToServer(Ingredient myIngredient)
{
    SendIngredient(myIngredient);
}

I can do this with confidence, knowing that my UI will never allow invalid data to ever reach the saving part. All the save buttons will be disabled and error messages will be alongside all the invalid data controls. And, if somehow we manage to get invalid data into the actual save action, the server will also do the validation before doing the actual save to the database. But more of the server later.

On the client

PostSharp will modify all my auto-properties and add the necessary checks into the setters. If any errors are found, the UI is informed. The UI will then respond and show the error messages and disable the save button. But, even in all of this, I still have to write the validators and this requires work for maintaining two separate pieces of logic. All my related things must be in one file. What we currently have is the Ingredient class and the IngredientValidator. PostSharp does the work of adding the checks and the UI does the messages, but still I need to manually create the validators.

Now, this is what I was really working on: The part that generates the validators. One attribute is far shorter than writing a rule. Using this reasoning, I then apply all the combinations of attributes to the appropriate properties. A T4 Template is now used to read the attributes off each model, or in this case the DTO, and then generate the equivalent Validator.

So I have 3 things now:

  1. The Model/DTO that I write with my properties and their attributes
  2. The Validators that is generated from reading the Model attributes
  3. The PostSharped assembly with the injected validation checks

This is all very exciting, as I only have written the one part, the Model. And then all the bits and pieces are put together to create the equivalent of the big and ugly piece of code; one property and one attribute produces almost everything (at least on the client) I need.

The server

Now, as with all client-server systems, data goes across the wire. It first gets downloaded for the user to edit and then the changes are uploaded to the server. It is useless to put the error messages on the server as the user will never see them, and it is unwise to put the validation on the client as can be seen by the fact that we have pirate software. Never trust the user. We can reach the conclusion that we need validation on the client, for those error messages, and also on the server, just to make sure that the data is in fact valid.

This now brings in a problem of duplication. The models on the client are DTOs that are a small subset of the domain model. They have the same need of validation as they are used by the UI. As the DTOs on the client are not the same as the models on the server, we can't reuse the code. we are going to have to re-write it in the way that the client part needs it to be. One of the ways that I chose to solve this problem is by copying the rules. We can do the traditional way of copy-and-paste, but that is practically asking for disaster. Developers will, at some point in time, forget to update either the Model or the DTO. Or something else even worse will happen, such as only adding validation to the client and not the server. This is where the T4 Template is very helpful. It can read the validation off one model and merge them with the ones on on the model that we are actually creating the validator for.

For example, we have:

  • one Model, say Ingredient, and
  • two DTOs, IngredientNameDto and IngredientSupplierDto.
  • The Ingredient Model has, among others, two properties: Name and SupplierName.
  • And the Dtos have a property Name and SupplierName respectively.

We want to add the validation to only one model, Ingredient, and then have the validators generated for all three objects. The way I achieved this was to add a single attribute to the DTOs that specified which type of Model to get the validation rules from, in this case Ingredient. Using this way of providing validation almost does everything for us. And just to show what we do in code (a super simplified model):

// Domain Model on the server
public class Ingredient
{
    [MaximumLength(100)]
    public string Name { get; set; }

    [MaximumLength(50)]
    public string SupplierName { get; set; }

    // other properties here ...
} 

// shared across the client and server
[CopyValidation("BrandDirector.Models.Ingredient")]
public class IngredientNameDto
{
    public string Name { get; set; }

    // other properties here ...
}

[CopyValidation("BrandDirector.Models.Ingredient")]
public class IngredientSupplierDto
{
    public string SupplierName { get; set; }

    // other properties here ...
}

What I haven't said yet, is that the Validators are in a different assembly to the Models. This is because the T4 Template reads the compiled assembly in order to generate the Validators. So the order of actions is really: write model, compile model, generate validators. As you can probably see, the model is compiled before the validators are actually created, so we can't reference the validators directly from the model. What we have is a Registry of all the validators available to the client or server. Therefore, we register the validator assembly when the app starts up and then find the validator when we need it. Here is an example of what the PostSharp does for us:

// Model
public class Ingredient
{
    private string name;
    public string Name
    {
        get { return name; }
        set
        {
            name = value;
            var validators = ValidatorRegistry.FindValidatorsFor<Ingredient>();
            var results = validators.SelectMany(v => v.Validate(this));
            // Do what needs to be done with the result
        }
    }
}

Depending on whether it is on the server or the client, the appropriate action is taken. If it is on the server, we throw an exception if the results are invalid. This is to totally prevent any invalid data from actually reaching the model itself. The exception is then sent back to the client and then handled there; all processing on the server now stops. On the client, we just add an error message to the list of errors that is displayed onscreen. Because the app itself and the server knows where the validators are, we can register them when the server or app starts:

public void OnAppStartup()
{
    ValidatorRegistry.RegisterValidators(typeof(Ingredient).Assembly);
}

So, by utilizing existing frameworks, we can reduce the amount of code that we as developers write. This enables the developers to spend more time writing the really cool bits of code and not repetitively doing the same thing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s