Expression Trees, dynamic, and Multithreading in C# 3.0/4.0

I just had a brain fart the the other day: the dynamic keyword in C# 4.0 is more than just a way to plug interact dynamic languages. It can be used as a mechanism to evaluate any expression tree at runtime. One such example is that any arbitrary data source can be generically fashioned into dynamic objects that developers can interact with at run time. Take for example the following XML document:

<People>
<Person>
<Name>Koushik</Name>
<Age>27</Age>
</Person>
<Person>
<Name>Bobo</Name>
<Age>12</Age>
</Person>
</People>

Given a proper dynamic binding implementation for XML, the following pseudo code could be used to interact with this document:

XmlDocument doc = new XmlDocument();
doc.Load("file.xml");
dynamic dynamicDoc = DynamicXmlDocument.From(doc);

foreach (dynamic person in dynamicDoc.Person)
{
Console.WriteLine("Name: {0}", person.Name);
// modify the person's age in the document
person.Age++;
}
doc.Save("file.xml");

This approach can be used to access or modify any potential data source, such as a databases, web services, RPC, etc. What would normally be done with code generation tools (like xsd.exe, LINQ to SQL, wsdl.exe, etc) at design time can now be done at runtime. [0]

This train of thought continued until it derailed into another area I find interesting: multithreading (and parallel computing). Consider the following simple program (disregard the [Future] attribute for now, more on that later):

[Future]
static int LongOperation(int result)
{
Thread.Sleep(1000);
return result;
}
static void Main(string[] args)
{
int normalResult = LongOperation(LongOperation(1) + LongOperation(2)) + LongOperation(LongOperation(3) + LongOperation(4));
}

LongOperation is an operation that takes 1 second and returns the value it was passed. It is a pseudo function that simulates an operation that takes a significant amount of time to complete and return a result. Examine the expression that you see above: How long do you think it would take to complete? If you guessed 6 seconds, you guessed correctly.

Those operations are begging to be run asynchronously by way of threads. A friend of mine, Jared Parsons, released his implementation of “Futures” in his BclExtras library, which makes building concurrent operations fairly easy. A Future is an object that wraps a long running operation in a thread, and blocks only when the Value of that operation is necessary. That way, a developer can define several values they will need in the Future, and start computing the results concurrently. For example, to do the above asynchronously using Futures, it would be expressed as follows:

var op1 = Future.Create<int>(() => LongOperation(1));
var op2 = Future.Create<int>(() => LongOperation(2));
var op3 = Future.Create<int>(() => LongOperation(3));
var op4 = Future.Create<int>(() => LongOperation(4));
var outer1 = Future.Create<int>(() => LongOperation(op1.Value + op2.Value));
var outer2 = Future.Create<int>(() => LongOperation(op4.Value + op3.Value));
int futureResult = outer1.Value + outer2.Value;

This implementation, which uses multithreading, will take only 2 seconds to complete (1 second to complete op1-op4 simultaneously, 1 second to compute outer1 and outer2 simultaneously). But it is not very pretty; it is difficult to look at this code and understand what is going on.

And although this is easy enough to write, I had the thought that migrating code towards a concurrent model can be done really easily with “dynamic” types in C# by implementing a dynamic runtime to handle futures. Consider following pseudo code:

public class Foo
{
[Future]
public int LongOperation(int value)
{
Thread.Sleep(1000);
return value;
}
}

dynamic future = new FutureObject(new Foo());
int result = future.LongOperation(future.LongOperation(1) + future.LongOperation(2)) + future.LongOperation(LongOperation(3) + future.LongOperation(4));

This would work because the implementation of the dynamic runtime would create expression trees which handle method calls with the [Future] attribute specially: Instead of calling the method directly, it would create all Future<T> objects up front when the expression is initially evaluated or created. Since I do not have the Visual Studio 2010 CTP, I decided to forego implementing this for the moment (although the implementation of a Future<T> dynamic runtime would provide this nice syntactic support for DLR languages like IronPython and IronRuby).

However, C# 3.0 does have a very powerful Expression support that can be used to achieve similar results. To paraphrase from IanG’s blog post about Expressions:

Consider the following code:

Expression<Func<int, bool>> exprLambda = x => (x & 1) == 0;

This takes that Func delegate, and uses it as the type parameter for a generic type called Expression<T>. It then proceeds to initialize it in exactly the same way, so you'd think it was doing much the same thing. But it turns out that the compiler knows about this Expression<T> type, and behaves differently. Rather than compiling the lambda into IL that evaluates the expression, it generates IL that constructs a tree of objects representing the expression.

This was the point at which I went: "what the?.."

To be more explicit about this, here's roughly what that second line compiles into:

ParameterExpression xParam = Expression.Parameter(typeof(int), "x");
Expression<Func<int, bool>> exprLambda = Expression.Lambda<Func<int, bool>>(
Expression.EQ(
Expression.BitAnd(xParam, Expression.Constant(1)),
Expression.Constant(0)),
xParam);

Basically, given a normal C# expression, we can see the expression tree at runtime! If we walk the expression tree, and modify all the MethodCallExpressions that have the [Future] attribute, we can achieve similar results to the previous solution:

// Notice that I am not changing the contained expression below at all! It is the same as the normal code.
// The futures are created automatically.
int futureResult = FutureExpression.Process<int>(() =>
LongOperation(LongOperation(1) + LongOperation(2)) + LongOperation(LongOperation(3) + LongOperation(4))
);

Remember, the contents of the expression with LongOperation are not actually being executed; only an expression tree for that expression is being created. It is not analyzed until FutureExpression.Process is called with that expression.

This is pretty awesome! Syntactic support for automatic multithreading with little to no code changes:

  • Tag any long running methods with [Future]
  • Wrap your original expression in a FutureExpression.Process

But, there are still an issue with Expressions, in that they are somewhat limited. Expressions that are parseable at compile time must be single statements. For example, the following would not compile:

// does not compile: A lambda expression with a statement body cannot be converted to an expression tree
Expression<Func<int>> expr = () => { if (someValue == 2) return 0; return 3; };

However, it can be rewritten as the following to make it compile:

Expression<Func<int>> expr2 = () => someValue == 2 ? 0 : 3;

 

It is possible to alleviate some of this problem, albeit somewhat sloppily:

  • FutureExpression.Process<T> can return a Future<T> rather than T.
  • Future<T> can implement all the operators to return further Future<T> which executes an expression tree of that operator on T (this would fail if T does not support that operator) .
  • The result of an operator on two Futures is a Future, so the actual result is never analyzed until the Value is explicitly retrieved.

Anyhow, the code samples containing the FutureExpression analyzer is available for download. I may resume working on the dynamic runtime for Futures, as that seems rather interesting. But don’t hold your breath for another post. :)

 

[0] Obviously this would be at the cost of runtime performance. Another issue with dynamic objects is that there is no “strong” typing. This can be addressed by duck typing.

3 comments:

ether said...

If you want dynamic binding from Xml to an object model and back, check out http://wiki.developer.mindtouch.com/User:Arnec/XObject

XObject basically tackles the problem in .NET 2.0 (using LinFu's DynamicProxy). Don't have the code in a public repository right now, but it's open source, so i can give you a copy if you want until i actually get it into our sourceforge tree. My email is that wiki user name at mindtouch.com.

Koush said...

Yeah, I've blogged about DynamicProxy before. It is possible to use it to get similar behavior to my dynamic solution.

Travis said...

Very cool info. Thanks for the detailed writeup.