Leveraging C#’s Type System for Robust Code Design
Type driven development was first coined as a term in the title of a book regarding Idris’ powerful and unique type system, however, it has begun being used more extensively in recent years to refer to the process of maximizing the usage of the compiler and type system and creating more accurate and expressive types to better encapsulate invariants and logic. The type system in C# is not as powerful as something like Rust, which allows for additional features such as type aliasing
or union types
, nor something like Idris, where types can be passed into functions, and new types returned, however, it is still very powerful and is often not leaned on enough by developers.
This blog post aims to share some tips on better-utilizing C#’s type system and bringing it to the forefront of your development experience so that you can use it to eliminate certain classes of errors and problems at compile-time. First, let’s start with a primer and try to reframe how we think of types in general!
What Are Types?
A type constrains a variable to a certain set of possible values: For example, in C#, an int
is a variable that can be, at any time, a singular integral value within the range of -2,147,483,648
to 2,147,483,647
. Compare this to a boolean
, which can only be one of two values, true
or false
. A big part of type driven development is ensuring that, when defining variables, you use a type that most accurately constrains the set of possible states that the variable can exist in.
Product Types
When thinking of a type as a certain set of all possible states that the variable can exist in, then you can start to think of classes and interfaces differently. For example, consider the following record:
public record Person(string Name, int Age);
We can consider a person a product of a string
and an int
, meaning it is a product type
, which is analogous to the Cartesian product in set theory, in that the total number of possible states is the product of the number of possible states of its parts: The number of possible states that person
can exist within is A x B
, where A
is the number of all possible states of string
, and B
is the number of all possible states of int
.
Sum Types
Less prevalent within C# are sum types
, which are types that can hold a value that can be one of several different types. In most languages, this would be a union type
, however, the closest thing we have to replicate this within C# is an interface:
public interface IShape {};
public record Circle(int Radius) : IShape;
public record Rectangle(int Width, int Height) : IShape;
We can think of sum types
as analogous to the union of sets in set theory, in that the total number of possible states is the sum of the number of possible states of each of its parts: The number of possible states that IShape
can exist within is A + B
, where A
is the number of possible states Circle
and B
is the number of possible states of Rectangle
. This is why, for instance, we can and should switch
on an instance of an interface, to ensure we handle all possible instances
that it might be at any given time:
IShape shape = new Circle(32);
string shapeType = shape switch {
Circle c => $"Circle with radius {c.Radius}",
Rectangle r => $"Rectangle with dimensions {r.Width}x{r.Height}",
_ => "Unknown shape"
};
Invalid States Should Be Unrepresentable
Okay, now that was a lot to unpack, so let’s take our new-found perception of types and find a way to apply it to day-to-day programming! If we begin to think of types as the possible range of states that a variable can exist in, then we can start to better model our domain objects to use types that better represent the range of possible states that it should exist in. For example, let’s take the following record:
public record User(string Id, string Username, string Email);
The User
type is a product type
of three strings, however, this is not an accurate representation of the User
type. Let’s focus on the Email
field: A string
is not the right type to represent the Email
field, as an email is not representable by any possible string. Rather, it is accurately represented by a certain subset of strings where at least one @
symbol must be present. To represent that in a record, we would have to do the following:
public readonly record struct Email
{
public readonly string Value;
public Email(string email)
{
if (!email.Contains('@'))
{
throw new Exception();
}
Value = email
}
}
Now, functions that deal with the Email
type can be sure that the email will always be formatted correctly, and furthermore, a User
cannot be constructed without a valid Email
.
public record User(string Id, string Username, Email Email);
This allows a great deal of safety at compile time and gives us certainty that operations on the Email
type regarding the @
symbol within it can be performed safely as a concrete class instance cannot exist without it.
public string GenerateUsernameFromEmail(Email email)
{
return email.Value.Split('@')[0];
}
This concept combats the idea of primitive obsession
, which is the anti-pattern of reaching for primitives instead of smaller, focused types. The latter is prevalent heavily within the C# standard library, with types such as Ipv4
, Uri
, Guid
and so forth.
The Result Type
The value type Email
in the previous section is a good start, however, it can be improved. Currently, the constructor throws an exception if a user attempts to create an Email
without an @
within it, however, exceptions should be reserved for truly exceptional
cases, the ones that we have no way to handle, and a user attempting to sign up with a bad email is not that exceptional.
Furthermore, the presence of an exception within a function makes the function signature tell a lie, as it gives no warning that the function could explode in the user’s face. We can lean on the type system and sum types
to express this intent, however, forcing any consumers of this function to handle either possible state, and we can do this with a Result
type, which is a sum type
representing both the error
and success
states of the variable:
public readonly record struct Result<TValue>
{
public TValue? Value { get; }
public string? Error { get; }
[MemberNotNullWhen(true, nameof(Value))]
[MemberNotNullWhen(false, nameof(Error))]
public bool IsSuccess { get; }
private Result(string error)
{
this.Value = default (TValue);
this.Error = error;
this.IsSuccess = false;
}
private Result(TValue value)
{
this.Value = value;
this.Error = (string) null;
this.IsSuccess = true;
}
public static Result<TValue> Success(TValue value) =>
new Result<TValue>(value);
public static Result<TValue> Failure(string error) =>
new Result<TValue>(error);
public readonly record struct Email
{
public readonly string Value;
private Email(string email)
{
Value = email;
}
public static Result<Email> Create(string email)
{
if (!email.Contains('@'))
{
return Result<Email>.Failure("Bad email!");
}
var validEmail = new Email(email);
return Result<Email>.Success(validEmail);
}
}
With these changes, we now force the instantiation of the Email
type to occur via a static constructor method that returns a Result
type, meaning any call site that attempts to create an Email
has to expressly handle both the possible error
and success
states.
A bonus tip, we can use the MemberNotNullWhen
attribute to assist the compiler with nullability analysis, meaning that checking the state of IsSuccess
will, at compile time, assert that either Value
or Error
are not null and can be consumed safely.
Parse, Don’t Validate
We’ve neglected to talk about how functions, at a high level, interact with the type system thus far. Functions are first-class citizens within the C# language, and as such, can be stored in variables and consumed like any other type. At a high level, we understand that functions take some input and may potentially produce some output. This intent is expressed via a function’s signature. For example, a validation function might look like the following:
public bool Validate(object obj);
Here, it’s evident that the function is taking an object and deriving a true or false output based on it. This is the premise of validation: We take some specific instance of a type and check to see if it satisfies a set of criteria. We can do this for many reasons, but commonly we do this as we need to ensure that data is valid before we perform operations on it. Does this ring any bells? Suppose a user has supplied an email address to us in a registration flow, we could validate it as soon as it comes in, but then we run into some issues:
- Every time we want to work with emails, say, within another function down the track, we need to perform the same validation
- We have no compile-time guarantee that the validation has occurred
- The validation mechanism must be known to all users and can be forgotten in spots within the code base
We can avoid this by parsing instead of validating. Parsing is the process of taking some input and deriving a more structured output from it. A common example is how a programming languages parser
works: it takes a string of code and breaks it down into meaningful symbols and constructs for the compiler to consume. At a high level, this change is a simple adjustment of the function signature above:
public Result<Email> Parse(string email);
By parsing instead of validating, and doing so as close to the original point of retrieving the data as possible, we can make the decision to easily return any errors to the user in a failure case if need be (this almost makes the Host layer a pseudo-anti-corruption layer for the domain layer). In the successful case, we now have a rich data type that functions can rely on instead of a string
, removing the need for any further revalidation on any email fields as they will now be compile-time valid Email
instances.
Tying It Together
Let’s tie it all together with a slightly fictitious practical example of a function that attempts to create a user within a database. Assume that the user signs up on the website, and somewhere down the line the following function is called:
public static Task<Result<User>> CreateUser(CreateUserRequest request)
{
var email = Email.Create(request.Email);
if (!email.IsSuccess)
{
return Result<User>.Failure(email.Error);
}
var id = Guid.NewGuid();
var user = new User(id, request.Username, email.Value);
await _repository.SaveAsync(user);
return Result<User>.Success(user);
}
The above code exemplifies a lot of the concepts we’ve spoken about within the article, and although it is a trite example, its goal is to showcase some of the benefits of leaning on the type system without being too verbose:
- Parsing Over Validating: The
Email.Create
method parses the string to a more correct type, meaning that any function receiving anEmail
instance can be certain it’s dealing with valid data, eliminating the need for repetitive validation checks throughout your codebase. - Explicit Error Handling: By returning a
Result<User>
, we’re making failure a first-class citizen in our API. This forces consumers of our method to explicitly handle both success and failure cases at compile time, leading to more robust error handling. Remember, exceptions are for exceptional cases – expected failure states should be part of your type system! - A Focus on Separation of Concerns: The
Email
type encapsulates all the logic for what constitutes a valid email, keeping ourCreateUser
method clean and focused on its primary responsibility. - Errors as Values: Instead of throwing exceptions for invalid emails, we’re returning errors as values within our
Result
type. This makes the potential for failure explicit in our method signature, and allows us to easily propagate and aggregate errors up the call stack, allowing consumers to handle it as necessary, such as exposing it to the users as aBadRequest
or sending an alert out to Slack or Teams.
Remember, these aren’t hard and fast rules: They’re a set of learnings to be pragmatically applied where they can be most beneficial!
Conclusion
By leveraging C#’s type system more effectively, we can create more robust, self-documenting, and error-resistant code. Throughout this post, we’ve explored several key techniques that allow us to push more of our error handling and business logic into the type system itself, catching potential issues at compile time rather than runtime. By doing so, we not only make our code safer but also more expressive and easier to reason about.
The type system is a powerful tool at our disposal, so let’s better understand it so that we can use it to its full potential.
Further Reading
Below are some resources that explain different concepts relating to type driven development in further detail: