
Whenever I am considering using a new technology in my work, the very first thing I do is look for books about it. This gives me what I feel is a great advantage; in a fairly short time, I have an excellent idea of the major arc of a technology, and I also learn many important facets that won't be at all obvious to the newcomer using a more haphazard, "learn-as-you-go" approach. In the time I have spent on StackOverflow.com (link) and lesser but similar sites, I have been acutely reminded of some prime examples. So, I thought I'd post about some!
Microsoft Language Integrated Query (LINQ)
My introduction to LINQ came via Joseph C. Rattz' book Pro LINQ: Language Integrated Query in C# 2008 (APress - link). Among the topics that I learned of in this book which I have found not to be in common knowledge are;
- Many LINQ operators are deferred.
- LINQ to SQL deferred operators work even more differently.
- The importance of IEnumerable and IQueryable in relation to the above and other issues.
- Using the Log member of DataContext.
- LINQ to XML is actually one of the best parts of LINQ!
So, without further ado...
Deferred Query Execution
Take this program as an example:
static void Main() {
string[] values = {"one", null, "three", "four", "five" };
var results = values.Where(item => item.StartsWith("f"));
Console.WriteLine(results.GetType().ToString());
Console.Write("Press ENTER to display results:");
Console.ReadLine();
foreach (var item in results) {
Console.WriteLine(item);
}
}
Two things may surprise you about the output of this program; The second being the output of results.GetType().ToString(). It is this: System.Linq.Enumerable+WhereArrayIterator`1[System.String] Deferred operators in LINQ do not output simple collections of their results. Instead, they return objects which contain within them what they need to run the query when required to do so, by a call to the iterator or to a non-deferred operator.
The first thing that surprises you, though, may be that the program outputs the type name of results at all; but indeed it does! You also will reach the call to Console.ReadLine(). Only when the collection is iterated is the query actually executed, and you experience the NullReferenceException you might have been expecting due to the null in the second position of the array.
What Does Deferred Execution Do For (or to) Me?
One could argue that there was no need for deferred execution in LINQ to Objects queries. (One would need to be somewhat insane to argue the same for LINQ to SQL/Entity Framework queries.) But I won't argue the case; I'll just lay out some of what it means to you.
Chaining Query Conditions
You can easily chain query conditions together based on run-time conditions, without creating lots of unnecessary result sets in the interim. A set of chained queries will not be executed until you enumerate the results or call a non-deferred operator, and even more - the LINQ runtime will form the whole chain of operators into a single query for you. An example:
static void Main() {
string[] values = { "one", "two", "three", "four", "five" };
string[] queries = {"o", "3", "on" };
var results = values.Where(item=>true);
foreach (var item in queries) {
results = results.Where(it => it.StartsWith(item));
}
foreach (var item in results) {
Console.WriteLine(item);
}
}
This query starts with something of a dummy value to start off the results variable for you. Due to deferred execution, it shouldn't result in any notable additional performance hit. (Although this code is merely being written like this to prove a point; this isn't something you should need to do normally;) After that, each of the values in the queries array is chained on. Study this code closely; What do you think the result will be?
If you said there would be no results returned, you are astute (none of the items in the values array can possibly start with both "o" and "3"!) but you are also incorrect. The value one is printed out.
LINQ Queries in Loops
Due to deferred execution, the above query does not run quite how you would expect it to. You would think it's doing these tests:
it.StartsWith("o") && it.StartsWith("3") && it.StartsWith("on")
But it is not. Instead, it's doing this:
it.StartsWith("on") && it.StartsWith("on") && it.StartsWith("on")
It's actually using the last item in the loop variable over for each iteration. Fixing this is easy; store the loop value in a temporary variable:
static void Main() {
string[] values = { "one", "two", "three", "four", "five" };
string[] queries = {"o", "3", "on" };
var results = values.Where(item=>true);
foreach (var item in queries) {
string thisItem = item;
results = results.Where(it => it.StartsWith(thisItem));
}
foreach (var item in results) {
Console.WriteLine(item);
}
}
Now you will get the results you would expect: none! Why does this happen?
The extremely short version is this: What allows us to refer to item in the first place is that C# Lambdas enable something called Closures (link); the ability to refer to variables that are actually outside the scope of the code in question; Think about what the body of the Where() methods are defining in actuality; an anonymous delegate, itself a shortcut to 'real' delegates. Try to create a function and its delegate to do the exact same thing as it => it.StartsWith(item) and you'll quickly realize it's complex to do so.
So, the Closure helps us immensely here. But without that interim variable, and coupled with deferred execution, it also causes trouble; the same instance of the item variable ends up being the one that all of the iterations run against. Adding the variable inside the loop (thisItem) and calling the lambda using that fixes this issue.
'And' Was Easy. 'Or'? Not So Much
Chaining operators together like this to perform an "and" query on a variable number of operands was easy. But this does not work the same way when you want to do "or" queries. The expression can still be built dynamically, but it must be done in a different way. I'll cover that later!
Next: LINQ to SQL Queries
My next part will cover LINQ to SQL queries and how they are different and alike regular LINQ to Objects queries. They, too, have deferred operators, and most of what's above applies to them, but there is also more to it you will definitely want to know!