Value Types and Generics
“Don’t use value types with generics unless you HAVE to!”
I found that warning scrawled across a white board my second or third week at Dots. “Have” was quadruple-underlined for good measure.
In C#, a value type is any basic type like an int
or a float
, or any type
definition declared using the struct
keyword. A generic class is a class
like List<T>
that can hold fields of one or more parameterized types.
Supposedly, mixing the two increases the size of your application’s
executable by a significant degree.
Is this true?
Turns out, yes. At least when building to iOS or Android from Unity, parameterizing generic code with value types bloats your binary. This is a problem for us at Dots because we’re constantly looking for ways to keep our releases under the 100mb limit that Apple imposes on downloads without a Wi-Fi connection.
I want to try to quantify how bad the problem is, but first I think it’s worth
discussing why the problem arises. C#’s syntax obscures
this somewhat (you create both structs and regular classes with the new
keyword), but value types and reference types are very different animals.
A variable declared as int count
or DateTime birthday
holds the actual
value in question, in this case perhaps “5” or whatever block of ones and
zeroes constitutes a DateTime
object. A variable declared as
string message
holds nothing more than a reference, a pointer to the value
it represents. This is why you can use reference types but not value types
polymoprhically. If you declare a local reference type variable, that variable
will always require the same number of bytes on the stack to implement: usually
four on 32-bit architectures and eight on 64-bit architectures. Even if the
variable ends up pointing to an object of a different type, the pointer itself
is the same size. If you declare a local value type variable, that variable’s
size on the stack will depend on its type, so there can be no runtime
substitution.
Now consider how you’d implement generic classes as an author of the Common
Language Runtime. It’d be easy enough to generate a concrete implementation of
List<T>
for every T
that it gets used with. But now you’ve got several
copies of what is otherwise identical code. What the CLR actually does is more
sophisticated—it does not output C#, obviously, but you can think of it as
creating something like the following, at least where the type parameters
are reference types:
The class is no longer strongly typed, but that’s okay. We already did our
type safety checks when our C# was compiled into byte code. The class
now works with any Object
though, so we can reuse the implementation across
our entire codebase, whether the T
we’d like to insert into our list is a
string
, a MemoryStream
, or a ReallyDamnBigObject
.
Unfortunately, the “exists right there” nature of value types precludes this kind
of optimization. To take one problem that illustrates the larger issue, how
much space would be allocated for backingArray
if we were to
share this code among value types? Ten bytes? That would work when T
is
byte
. What if we needed to store an int
array? We’d need at least 40 bytes,
probably more. Reference types are always the same size, so with them there is
no problem. But when a generic type parameter T
is a value type, the
implementation of the generic class must be tailored to the type.
So all those repeated implementations of generic classes take up space. Let’s
find out how much. Say we have two files, Reference.cs
and Value.cs
:
Reference.cs
Value.cs
We can compile them into .dll
files using Mono:
At this point, the two .dll
files are the same size (around 4kb). This
should be no surprise, as our C# has only been converted into IL byte code.
Indeed, if you take a peek at the IL, you can see that the concrete
generic class implementations haven’t happened yet. The generic lists are
still being referred to as generic lists:
The difference doesn’t appear until we properly compile our byte code. We can also do this with Mono:
On my MacBook, I end up with two .dylib
files. Reference.dll.dylib
is 24kb
large, but Value.dll.dylib
is 248kb large. Ten times as big.
You can add three more lists of different types to each file and see how the
sizes change. I saw Reference.dll.dylib
grow by about 2kb while
Value.dll.dylib
grew to 533kb. So the difference is significant—we’re already
up to half a megabyte’s worth of bloat with just five different value types and
a single generic class.
Of course, when building to certain platforms today, Unity spits out C++ instead of a binary library. The Mono backend has been deprecated; IL2CPP is the future. So is binary bloat associated with value types and generics still a concern? According to Unity, the answer is yes. IL2CPP also shares generic code where the type parameters are reference types, but doesn’t where they aren’t, suggesting that the problem is a fundamental one unlikely to disappear soon.
In the grand scheme of things, the enormous utility and convenience of generics, even when used with value types, probably outweighs the size cost. But if you, like us, need to be careful about executable size, you may find yourself scrawling hysterical warnings across the nearest white board as well.