Value Types and Generics

“Don’t use value types with generics unless you HAVE to!”

I found that warning scrawled across a white board my second or third week at Dots. “Have” was quadruple-underlined for good measure.

In C#, a value type is any basic type like an int or a float, or any type definition declared using the struct keyword. A generic class is a class like List<T> that can hold fields of one or more parameterized types. Supposedly, mixing the two increases the size of your application’s executable by a significant degree.

Is this true?

Turns out, yes. At least when building to iOS or Android from Unity, parameterizing generic code with value types bloats your binary. This is a problem for us at Dots because we’re constantly looking for ways to keep our releases under the 100mb limit that Apple imposes on downloads without a Wi-Fi connection.

I want to try to quantify how bad the problem is, but first I think it’s worth discussing why the problem arises. C#’s syntax obscures this somewhat (you create both structs and regular classes with the new keyword), but value types and reference types are very different animals. A variable declared as int count or DateTime birthday holds the actual value in question, in this case perhaps “5” or whatever block of ones and zeroes constitutes a DateTime object. A variable declared as string message holds nothing more than a reference, a pointer to the value it represents. This is why you can use reference types but not value types polymoprhically. If you declare a local reference type variable, that variable will always require the same number of bytes on the stack to implement: usually four on 32-bit architectures and eight on 64-bit architectures. Even if the variable ends up pointing to an object of a different type, the pointer itself is the same size. If you declare a local value type variable, that variable’s size on the stack will depend on its type, so there can be no runtime substitution.

Now consider how you’d implement generic classes as an author of the Common Language Runtime. It’d be easy enough to generate a concrete implementation of List<T> for every T that it gets used with. But now you’ve got several copies of what is otherwise identical code. What the CLR actually does is more sophisticated—it does not output C#, obviously, but you can think of it as creating something like the following, at least where the type parameters are reference types:

public class List
{
    private int tail = 0;
    private Object[] backingArray = new Object[10];

    ...

    public void Add(Object element)
    {
        backingArray[tail] = element;
        tail++;
    }

    ...
}

The class is no longer strongly typed, but that’s okay. We already did our type safety checks when our C# was compiled into byte code. The class now works with any Object though, so we can reuse the implementation across our entire codebase, whether the T we’d like to insert into our list is a string, a MemoryStream, or a ReallyDamnBigObject.

Unfortunately, the “exists right there” nature of value types precludes this kind of optimization. To take one problem that illustrates the larger issue, how much space would be allocated for backingArray if we were to share this code among value types? Ten bytes? That would work when T is byte. What if we needed to store an int array? We’d need at least 40 bytes, probably more. Reference types are always the same size, so with them there is no problem. But when a generic type parameter T is a value type, the implementation of the generic class must be tailored to the type.

So all those repeated implementations of generic classes take up space. Let’s find out how much. Say we have two files, Reference.cs and Value.cs:

Reference.cs

using System;
using System.Collections.Generic;

class AwesomeGame
{
    public void Run()
    {
        List<string> stringList = new List<string>();
        List<Exception> exceptionList = new List<Exception>();
    }
}

Value.cs

using System;
using System.Collections.Generic;

class AwesomeGame
{
    public void Run()
    {
        List<int> intList = new List<int>();
        List<double> doubleList = new List<double>();
    }
}

We can compile them into .dll files using Mono:

mcs -target:library Reference.cs
mcs -target:library Value.cs

At this point, the two .dll files are the same size (around 4kb). This should be no surprise, as our C# has only been converted into IL byte code. Indeed, if you take a peek at the IL, you can see that the concrete generic class implementations haven’t happened yet. The generic lists are still being referred to as generic lists:

monodis Value.dll
...
IL_0000:  newobj instance void class [mscorlib]System.Collections.Generic.List`1<int32>::'.ctor'()
IL_0005:  stloc.0
IL_0006:  newobj instance void class [mscorlib]System.Collections.Generic.List`1<float64>::'.ctor'()
IL_000b:  stloc.1
...

The difference doesn’t appear until we properly compile our byte code. We can also do this with Mono:

mono --aot Reference.dll Value.dll

On my MacBook, I end up with two .dylib files. Reference.dll.dylib is 24kb large, but Value.dll.dylib is 248kb large. Ten times as big.

You can add three more lists of different types to each file and see how the sizes change. I saw Reference.dll.dylib grow by about 2kb while Value.dll.dylib grew to 533kb. So the difference is significant—we’re already up to half a megabyte’s worth of bloat with just five different value types and a single generic class.

Of course, when building to certain platforms today, Unity spits out C++ instead of a binary library. The Mono backend has been deprecated; IL2CPP is the future. So is binary bloat associated with value types and generics still a concern? According to Unity, the answer is yes. IL2CPP also shares generic code where the type parameters are reference types, but doesn’t where they aren’t, suggesting that the problem is a fundamental one unlikely to disappear soon.

In the grand scheme of things, the enormous utility and convenience of generics, even when used with value types, probably outweighs the size cost. But if you, like us, need to be careful about executable size, you may find yourself scrawling hysterical warnings across the nearest white board as well.