A DataString pattern

DataString: a C# pattern

Strings are everywhere in software development.

If you're writing code chances are you're taking user input strings, consuming strings from JSON data, creating strings to serve data, and any number of other tasks.

In many of these cases the strings are not user-text, but payload used in processing. Often leading to:

  1. Validating string input format

  2. Cleaning up, and normalizing strings

  3. Parsing strings for logical content / parts

Strings are the un-typed data of the developer's world.

In .Net it's common to find swathes of code to handle strings; trim it, change case, extract data etc. Often the code is captured as a set of utility functions, occasionally there is a wrapper class, such as PathString

This is my approach for wrapping a data string.

The goal is to keep it simple and aim for a straightforward drop-in to ease factoring existing code.

DataString: a reasonable pattern (idiom) to type a string

Wrap using a record class

A C# record class tends to be best-fit for a DataString. An argument can be made for a record struct, but I prefer a reference type.

The immutability of a record matches the style of string.

MUST Use a base interface (serializer support)

Using a IDataString interface to tag a DataString makes it clearer in the code that it's implementing the pattern, it also provides a convenient handle for adding serialization support.

Writing a JSON converter that makes a DataString serialize as a string is straight-forward.

For write: the ToString() method should return a string value

For read: the TryParse() method can be used.

To keep things clean; use explicit interface implementations.

Should have string-only constructor

Having a string-only constructor simplifies factoring other code to leverage it. This is particularly true with C#9 simplified new expressions.

If the string is invalid, the constructor MUST throw.

MUST have a no-throw parse ability

You're going to want a no-throw option to try create a DataString.

In .NET a static TryParse method is commonly used in the core classes.

But, its 'out' signature feels a bit clumsy in modern C#.

I prefer a nullable return signature:

static TDataType? TryParse(string? data);

MUST override ToString()

The DataString should return its string value with ToString().

Should have an implicit convert to string.

Once created, a DataString should still essentially be a string, so implicit cast to string makes sense.

The implicit cast means using a DataString where a string is expected is ok.

//before
foo(string myData){...}
var data = "some value";
myObject.foo(data);

//after
foo(string myData){...}
var data = new DataString("some value");
myObject.foo(data);

Should have comments

I've found a comment on the DataString class to describe constraints & clean-up super handy. With inheritdoc on the constructors.

(alt) OneOf DataString

You can also use a DataString to describe a string-enum.

For this make the string-only constructor private and provide static read-only instances for the allowed candidates.

Then TryParse reduces to checking the input string fits an existing object.

Implementation hints.

Not strictly part of the pattern.

I also create a TryRead utility to capture the string processing so it can be reused in both constructor(s) and TryParse.

    private static (string? error, other?) TryRead(string? data)

where other? is all the parameters to create the type

For the cast to string; use nullable types and NotNullIfNotNull

Putting it all together

I have an implementation I use in kwd.CoreUtil, but below is a simple implementation to convert words to code:

An interface (C#):

  ///<summary>
  ///<list type="bullet">
  ///<item>Shoud be a record type</item>
  ///<item>MUST provide a TryParse</item>
  ///<item>MUST overload ToString()</item>
  ///<item>Should provide string-only constructor.</item>
  /// <item>Should provide an implicit cast to string.</item>
  ///</list>
  ///</summary>
  public interface IDataString<TSelf>
      where TSelf: IDataString<TSelf>
  {
    static abstract bool TryParse(string? data, out TSelf? value);
    string Value ==> ToString();
  }

A DataString:

/// <summary>
/// <list type="bullet">
/// <item>cannot be empty</item>
/// <item>length &lt; 20</item>
/// <item>trimmed</item>
/// </list>
/// </summary>
public record UserName : IDataString
{
    private readonly string _data;

    private static (string? error, string? cleanData) TryRead(string? data)
    {
        data = data?.Trim();

        if (string.IsnullOrEmpty(data))
            return ("cannot be empty", null);

        if (data.Length > 20)
            return ("length < 20", null);

        return (null, data.ToLower());
    }

    private UserName(string data, bool isChecked)
    {
        if (!isChecked)
        {
            var (error, cleanData) = TryRead(data);

            if (cleanData is null)
                throw new ArgumentException(error, nameof(data));

            data = cleanData;
        }

        _data = data;
    }

    public static UserName? TryParse(string data)
    {
        var (_, cleanData) = TryRead(data);
        return cleanData is null ? null : new(cleanData, true);
    }

    [return: NotNullIfNotNull(nameof(item))]
    public static implicit operator string?(UserName? item) 
        => item?._data;

    /// <inheritdoc cref="UserName"/>
    public UserName(string data):this(data, false){}

    public override string ToString() => _data;
}

Did I miss anything?

Well, that's my approach.

So far it seems to work well, but I only have my own small projects to try it out on.

I'm particularly interested in whether this is useful in other code bases.

Does it need other enhancements?

Should it be simpler?

Does it add enough value?

What are your thoughts?