C# Features in the .NET Framework 4
Since its initial release in 2002, the C# programming language has
been improved to enable programmers to write clearer, more maintainable
code. The enhancements have come from the addition of features such as
generic types, nullable value types, lambda expressions, iterator
methods, partial classes and a long list of other useful language
constructs. And, often, the changes were accompanied by giving the
Microsoft .NET Framework libraries corresponding support.
This
trend toward increased usability continues in C# 4.0. The additions make
common tasks involving generic types, legacy interop and working with
dynamic object models much simpler. This article aims to give a
high-level survey of these new features. I’ll begin with generic
variance and then look at the legacy and dynamic interop features.
Covariance and Contravariance
Covariance and contravariance are best introduced with an example, and
the best is in the framework. In System.Collections.Generic,
IEnumerable<T> and IEnumerator <T> represent, respectively,
an object that’s a sequence of T’s and the enumerator (or iterator) that
does the work of iterating the sequence. These interfaces have done a
lot of heavy lifting for a long time, because they support the
implementation of the foreach loop construct. In C# 3.0, they became
even more prominent because of their central role in LINQ and LINQ to
Objects—they’re the .NET interfaces to represent sequences.
So if
you have a class hierarchy with, say, an Employee type and a Manager
type that derives from it (managers are employees, after all), then what
would you expect the following code to do?
IEnumerable<Manager> ms = GetManagers(); IEnumerable<Employee> es = ms;
It seems as though one ought to be able to treat a sequence
of Managers as though it were a sequence of Employees. But in C# 3.0,
the assignment will fail; the compiler will tell you there’s no
conversion. After all, it has no idea what the semantics of
IEnumerable<T> are. This could be any interface, so for any
arbitrary interface IFoo<T>, why would an IFoo<Manager> be
more or less substitutable for an IFoo<Employee>?
In C#
4.0, though, the assignment works because IEnumerable<T>, along
with a few other interfaces, has changed, an alteration enabled by new
support in C# for covariance of type parameters.
IEnumerable<T> is eligible to be more special than the arbitrary
IFoo<T> because, though it’s not obvious at first glance, members
that use the type parameter T (GetEnumerator in IEnumerable<T> and
the Current property in IEnumerator<T>) actually use T only in
the position of a return value. So you only get a Manager out of the
sequence, and you never put one in.
In contrast, think of
List<T>. Making a List<Manager> substitutable for a
List<Employee> would be a disaster, because of the following:
List<Manager> ms = GetManagers(); List<Employee> es = ms; // Suppose this were possible es.Add(new EmployeeWhoIsNotAManager()); // Uh oh
The new language feature in C# 4.0, then, is the ability to define types, such as the new IEnumerable<T>, that admit conversions among themselves when the type parameters in question bear some relationship to one another. This is what the .NET Framework developers who wrote IEnumerable<T> used, and this is what their code looks like (simplified, of course):
public interface IEnumerable<out T> { /* ... */ }
Why is this called covariance? Well, it’s easiest to see when you start to draw arrows. To be concrete, let’s use the Manager and Employee types. Because there’s an inheritance relationship between these classes, there’s an implicit reference conversion from Manager to Employee:
Manager → Employee
And now, because of the annotation of T in IEnumerable<out T>, there’s also an implicit reference conversion from IEnumerable<Manager> to IEnumerable<Employee>. That’s what the annotation provides for:
IEnumerable<Manager> → IEnumerable<Employee>
This is called covariance, because the arrows in each of the two examples point in the same direction. We started with two types, Manager and Employee. We made new types out of them, IEnumerable<Manager> and IEnumerable<Employee>. The new types convert the same way as the old ones.
Contravariance is when this happens backward. You might anticipate that this could happen when the type parameter, T, is used only as input, and you’d be right. For example, the System namespace contains an interface called IComparable<T>, which has a single method called CompareTo:
public interface IComparable<in T> { bool CompareTo(T other); }
IComparable<Employee> ec = GetEmployeeComparer(); IComparable<Manager> mc = ec;
Manager → Employee
IComparable<Manager> ← IComparable<Employee>
So the language feature here is pretty simple to summarize: You can add the keyword in or out whenever you define a type parameter, and doing so gives you free extra conversions. There are some limitations, though.
First, this works with generic interfaces and delegates only. You can’t declare a generic type parameter on a class or struct in this manner. An easy way to rationalize this is that delegates are very much like interfaces that have just one method, and in any case, classes would often be ineligible for this treatment because of fields. You can think of any field on the generic class as being both an input and an output, depending on whether you write to it or read from it. If those fields involve type parameters, the parameters can be neither covariant nor contravariant.
Second, whenever you have an interface or delegate with a covariant or contravariant type parameter, you’re granted new conversions on that type only when the type arguments, in the usage of the interface (not its definition), are reference types. For instance, because int is a value type, the IEnumerator<int> doesn’t convert to IEnumerator <object>, even though it looks like it should:
IEnumerator <int> IEnumerator <object>
The reason for this behavior is that the conversion must preserve the type representation. If the int-to-object conversion were allowed, calling the Current property on the result would be impossible, because the value type int has a different representation on the stack than an object reference does. All reference types have the same representation on the stack, however, so only type arguments that are reference types yield these extra conversions.
Very likely, most C# developers will happily use this new language feature—they’ll get more conversions of framework types and fewer compiler errors when using some types from the .NET Framework (IEnumerable<T>, IComparable<T>, Func<T>, Action<T>, among others). And, in fact, anyone designing a library with generic interfaces and delegates is free to use the new in and out type parameters when appropriate to make life easier for their users.
By the way, this feature does require support from the runtime—but the support has always been there. It lay dormant for several releases, however, because no language made use of it. Also, previous versions of C# allowed some limited conversions that were contravariant. Specifically, they let you make delegates out of methods that had compatible return types. In addition, array types have always been covariant. These existing features are distinct from the new ones in C# 4.0, which actually let you define your own types that are covariant and contravariant in some of their type parameters.
Dynamic Dispatch
On to the interop features in C# 4.0, starting with what is perhaps the biggest change.C# now supports dynamic late-binding. The language has always been strongly typed, and it continues to be so in version 4.0. Microsoft believes this makes C# easy to use, fast and suitable for all the work .NET programmers are putting it to. But there are times when you need to communicate with systems not based on .NET.
Traditionally, there were at least two approaches to this. The first was simply to import the foreign model directly into .NET as a proxy. COM Interop provides one example. Since the original release of the .NET Framework, it has used this strategy with a tool called TLBIMP, which creates new .NET proxy types you can use directly from C#.
LINQ-to-SQL, shipped with C# 3.0, contains a tool called SQLMETAL, which imports an existing database into C# proxy classes for use with queries. You’ll also find a tool that imports Windows Management Instrumentation (WMI) classes to C#. Many technologies allow you to write C# (often with attributes) and then perform interop using your handwritten code as basis for external actions, such as LINQ-to-SQL, Windows Communication Foundation (WCF) and serialization.
The second approach abandons the C# type system entirely—you embed strings and data in your code. This is what you do whenever you write code that, say, invokes a method on a JScript object or when you embed a SQL query in your ADO.NET application. You’re even doing this when you defer binding to run time using reflection, even though the interop in that case is with .NET itself.
The dynamic keyword in C# is a response to dealing with the hassles of these other approaches. Let’s start with a simple example—reflection. Normally, using it requires a lot of boilerplate infrastructure code, such as:
object o = GetObject(); Type t = o.GetType(); object result = t.InvokeMember("MyMethod", BindingFlags.InvokeMethod, null, o, new object[] { }); int i = Convert.ToInt32(result);
dynamic o = GetObject(); int i = o.MyMethod();
The value of this shortened, simplified C# syntax is perhaps more clear if you look at the ScriptObject class that supports operations on a JScript object. The class has an InvokeMember method that has more and different parameters, except in Silverlight, which actually has an Invoke method (notice the difference in the name) with fewer parameters. Neither of these are the same as what you’d need to invoke a method on an IronPython or IronRuby object or on any number of non-C# objects you might come into contact with.
In addition to objects that come from dynamic languages, you’ll find a variety of data models that are inherently dynamic and have different APIs supporting them, such as HTML DOMs, the System.Xml DOM and the XLinq model for XML. COM objects are often dynamic and can benefit from the delay to run time of some compiler analysis.
Essentially, C# 4.0 offers a simplified, consistent view of dynamic operations. To take advantage of it, all you need to do is specify that a given value is dynamic, ensuring that analysis of all operations on the value will be delayed until run time.
In C# 4.0, dynamic is a built-in type, and a special pseudo-keyword signifies it. Note, however, that dynamic is different from var. Variables declared with var actually do have a strong type, but the programmer has left it up to the compiler to figure it out. When the programmer uses dynamic, the compiler doesn’t know what type is being used—the programmer leaves figuring it out up to the runtime.
Dynamic and the DLR
The infrastructure that supports these dynamic operations at run time is called the Dynamic Language Runtime (DLR). This new .NET Framework 4 library runs on the CLR, like any other managed library. It’s responsible for brokering each dynamic operation between the language that initiated it and the object it occurs on. If a dynamic operation isn’t handled by the object it occurs on, a runtime component of the C# compiler handles the bind. A simplified and incomplete architecture diagram looks something like FigureThe interesting thing about a dynamic operation, such as a dynamic method call, is that the receiver object has an opportunity to inject itself into the binding at run time and can, as a result, completely determine the semantics of any given dynamic operation. For instance, take a look at the following code:
dynamic d = new MyDynamicObject(); d.Bar("Baz", 3, d);
class MyDynamicObject : DynamicObject { public override bool TryInvokeMember( InvokeMemberBinder binder, object[] args, out object result) { Console.WriteLine("Method: {0}", binder.Name); foreach (var arg in args) { Console.WriteLine("Argument: {0}", arg); } result = args[0]; return true; } }
Method: Bar Argument: Baz Argument: 3 Argument: MyDynamicObject
Now, suppose that instead of a MyDynamicObject, you used a Python object:
dynamic d = GetPythonObject(); d.bar("Baz", 3, d);
def bar(*args): print "Method:", bar.__name__ for x in args: print "Argument:", x
This code, if you had to maintain it, would be every bit as ugly as the reflection code shown earlier or the ScriptObject code or strings that contain XML queries. That’s the point of the dynamic feature in C#—you don’t have to write code like that!
When using the dynamic keyword, your code can look pretty much the way you want: like a simple method invocation, a call to an indexer, an operator, such as +, a cast or even compounds, like += or ++. You can even use dynamic values in statements—for example, if(d) and foreach(var x in d). Short-circuiting is also supported, with code such as d && ShortCircuited or d ?? ShortCircuited.
The value of having the DLR provide a common infrastructure for these sorts of operations is that you’re no longer having to deal with a different API for each dynamic model you’d like to code against—there’s just a single API. And you don’t even have to use it. The C# compiler can use it for you, and that should give you more time to actually write the code you want—the less infrastructure code you have to maintain means more productivity for you.
The C# language provides no shortcuts for defining dynamic objects. Dynamic in C# is all about consuming and using dynamic objects. Consider the following:
dynamic list = GetDynamicList(); dynamic index1 = GetIndex1(); dynamic index2 = GetIndex2(); string s = list[++index1, index2 + 10].Foo();
The compile-time type of each dynamic operation is itself dynamic, and so the “dynamicness” kind of flows from computation to computation. Even if you hadn’t included dynamic expressions multiple times, there still would be a number of dynamic operations. There are still five in this one line:
string s = nonDynamicList[++index1, index2 + 10].Foo();
Notice in the previous examples that C# allows implicit conversions from any dynamic expression to any type. The conversion to string at the end is implicit and did not require an explicit cast operation. Similarly, any type can be converted to dynamic implicitly.
In this respect, dynamic is a lot like object, and the similarities don’t stop there. When the compiler emits your assembly and needs to emit a dynamic variable, it does so by using the type object and then marking it specially. In some sense, dynamic is kind of an alias for object, but it adds the extra behavior of dynamically resolving operations when you use it.
You can see this if you try to convert between generic types that differ only in dynamic and object; such conversions will always work, because at runtime, an instance of List<dynamic> actually is an instance of List<object>:
List<dynamic> ld = new List<object>();
class C { public override bool Equals(dynamic obj) { /* ... */ } }
public dynamic GetDynamicThing() { /* ... */ }
There are a lot more details about the way dynamic is treated
and dispatched, but you don’t need to know them to use the feature. The
essential idea is that you can write code that looks like C#, and if
any part of the code you write is dynamic, the compiler will leave it
alone until run time.
I want to cover one final topic concerning
dynamic: failure. Because the compiler can’t check whether the dynamic
thing you’re using really has the method called Foo, it can’t give you
an error. Of course, that doesn’t mean that your call to Foo will work
at run time. It may work, but there are a lot of objects that don’t have
a method called Foo. When your expression fails to bind at run time,
the binder makes its best attempt to give you an exception that’s more
or less exactly what the compiler would’ve told you if you hadn’t used
dynamic to begin with.
Consider the following code:try { dynamic d = "this is a string"; d.Foo(); } catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException e) { Console.WriteLine(e.Message); }
'string' does not contain a definition for 'Foo'
Named Arguments and Optional Parameters
In another addition to C#, methods now support optional parameters with default values so that when you call such a method you can omit those parameters. You can see this in action in this Car class:class Car { public void Accelerate( double speed, int? gear = null, bool inReverse = false) { /* ... */ } }
Car myCar = new Car(); myCar.Accelerate(55);
myCar.Accelerate(55, null, false);
C# 4.0 will also let you call methods by specifying some arguments by name. In this way, you can pass an argument to an optional parameter without having to also pass arguments for all the parameters that come before it.
Say you want to call Accelerate to go in reverse, but you don’t want to specify the gear parameter. Well, you can do this:
myCar.Accelerate(55, inReverse: true);
myCar.Accelerate(55, null, true);
Console.WriteLine(format: "{0:f}", arg0: 6.02214179e23); Console.WriteLine(arg0: 6.02214179e23, format: "{0:f}");
On the surface, optional arguments and named parameters don’t look like interop features. You can use them without ever even thinking about interop. However, the motivation for these features comes from the Office APIs. Consider, for example, Word programming and something as simple as the SaveAs method on the Document interface. This method has 16 parameters, all of which are optional. With previous versions of C#, if you want to call this method you have to write code that looks like this:
Document d = new Document(); object filename = "Foo.docx"; object missing = Type.Missing; d.SaveAs(ref filename, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing);
Document d = new Document(); d.SaveAs(FileName: "Foo.docx");
Now, when writing a .NET library and considering adding methods that have optional parameters, you’re faced with a choice. You can either add optional parameters or you can do what C# programmers have done for years: introduce overloads. In the Car.Accelerate example, the latter decision might lead you to produce a type that looks like this:
class Car { public void Accelerate(uint speed) { Accelerate(speed, null, false); } public void Accelerate(uint speed, int? gear) { Accelerate(speed, gear, false); } public void Accelerate(uint speed, int? gear, bool inReverse) { /* ... */ } }
Indexed Properties
Some smaller language features in C# 4.0 are supported only when writing code against a COM interop API. The Word interop in the previous illustration is one example.C# code has always had the notion of an indexer that you can add to a class to effectively overload the [] operator on instances of that class. This sense of indexer is also called a default indexer, since it isn’t given a name and calling it requires no name. Some COM APIs also have indexers that aren’t default, which is to say that you can’t effectively call them simply by using []—you must specify a name. You can, alternatively, think of an indexed property as a property that takes some extra arguments.
C# 4.0 supports indexed properties on COM interop types. You can’t define types in C# that have indexed properties, but you can use them provided you’re doing so on a COM type. For an example of what C# code that does this looks like, consider the Range property on an Excel worksheet:
using Microsoft.Office.Interop.Excel; class Program { static void Main(string[] args) { Application excel = new Application(); excel.Visible = true; Worksheet ws = excel.Workbooks.Add().Worksheets["Sheet1"]; // Range is an indexed property ws.Range["A1", "C3"].Value = 123; System.Console.ReadLine(); excel.Quit(); } }
ws.get_Range("A1", "C3").Value2 = 123;
Omitting the Ref Keyword at COM Call Sites
Some COM APIs were written with many parameters passed by reference, even when the implementation doesn’t write back to them. In the Office suite, Word stands out as an example—its COM APIs all do this.When you’re confronted with such a library and you need to pass arguments by reference, you can no longer pass any expression that’s not a local variable or field, and that’s a big headache. In the Word SaveAs example, you can see this in action—you had to declare a local called filename and a local called missing just to call the SaveAs method, since those parameters needed to be passed by reference.
Document d = new Document(); object filename = "Foo.docx"; object missing = Type.Missing; d.SaveAs(ref filename, ref missing, // ...
d.SaveAs(FileName: "Foo.docx");
This should make code that uses APIs like this much cleaner.
Embedding COM Interop Types
This is more of a C# compiler feature than a C# language feature, but
now you can use a COM interop assembly without that assembly having to
be present at run time. The goal is to reduce the burden of deploying
COM interop assemblies with your application.
When COM interop
was introduced in the original version of the .NET Framework, the notion
of a Primary Interop Assembly (PIA) was created. This was an attempt to
solve the problem of sharing COM objects among components. If you had
different interop assemblies that defined an Excel Worksheet, we
wouldn’t be able to share these Worksheets between components, because
they would be different .NET types. The PIA fixed this by existing only
once—all clients used it, and the .NET types always matched.
Though a fine idea on paper, in practice deploying a PIA turns out to be
a headache, because there’s only one, and multiple applications could
try to install or uninstall it. Matters are complicated because PIAs are
often large, Office doesn’t deploy them with default Office
installations, and users can circumvent this single assembly system
easily just by using TLBIMP to create their own interop assembly.
So now, in an effort to fix this situation, two things have happened:
- The runtime has been given the smarts to treat two structurally identical COM interop types that share the same identifying characteristics (name, GUID and so on) as though they were actually the same .NET type.
- The C# compiler takes advantage of this by simply reproducing the interop types in your own assembly when you compile, removing the need for the interop assembly to exist at run time.
I have to omit some
details in the interest of space, but even without knowledge of the
details, this is another feature—like dynamic—you should be able to use
without a problem. You tell the compiler to embed interop types for you
in Visual Studio by setting the Embed Interop Types property on your
reference to true.
Because the C# team expects this to be the
preferred method of referencing COM assemblies, Visual Studio will set
this property to True by default for any new interop reference added to a
C# project. If you’re using the command-line compiler (csc.exe) to
build your code, then to embed interop types you must reference the
interop assembly in question using the /L switch rather than /R.
Each of the features I’ve covered in this article could itself generate
much more discussion, and the topics all deserve articles of their own.
I’ve omitted or glossed over many details, but I hope this serves as a
good starting point for exploring C# 4.0 and you find time to
investigate and make use of these features. And if you do, I hope you
enjoy the benefits in productivity and program readability they were
designed to give you.
Comments
Post a Comment