I spent way too long this morning trying to debug some processing code in a Flutter app when I discovered that I was getting bitten by an artifact of lazy evalution of a collection stream operator. That caused me to do a deep dive of how they work in Dart and other languages. For posterity and my own memory I am documenting it all here.
Definition
Most modern languages have some concept of stream type operations on their collections, and streams themselves. The most basic of operations is the ability to transform a value, filter values, and reduce values into one aggregated value. These are so-called map/reduce style operations which can be applied as a series of operations. In psuedocode it would look like:
final sum = collection
.map(...)
.filter(...)
.map(...)
.map(...)
.filter(...)
.reduce(...);
The ability to chain operations together often allows for more concise/readable and in many cases higher performing code than traditional looping operations. So for example something like the above in a looping construct would look like:
var sum = 0.0;
for (final value in collection) {
final newValue1 = transform1(value);
if (shouldFilter1(newValue1) {
continue;
}
final newValue2 = transform2(newValue1);
final newvalue3 = transform3(newValue3);
if (shouldFilter2(newValue3)) {
continue;
}
sum += finalValue3;
}
The chaining together is what allows for a lot cleaner code compared to for loop methods. It also can help reduce errors when writing the code and allow for easier rearranging of code when the various stages get more complicated. The syntax and how you do it changes for various languages. .NET has LINQ, Java has the streams extensions on collections, and in Dart these are operators on the iterators which all collections support. The nuances of these operations is important though. How/when do they execute, what is the artifact of that behavior, etc.
The remainder of this discussion is going to focus on these behaviors around Dart Lists.
Lazy Evaluation
One of the reasons why these methods can get better performance is because of the lazy evaluation of each of the stages. Take the below Dart code:
print('Iterator demo');
final intValues = [1,2,3];
final stringValues = intValues.map((e) {
final stringValue = '$e';
print('Converted $e to string');
return stringValue;
});
print('Number of values: ${stringValues.length}');
The code has an initial array of three numbers that are mapped to string values. We are doing this a bit more verbose to have it print interim statements. What output are you expecting for this code? If you answered:
Iterator demo
Number of values: 3
…then you would be correct. What happened to all of our “Converted” status statements? They don’t appear because that code hasn’t been executed yet. The nature of lazy evaluation is that it will be evaluated when it needs it not all up front. The “do it all now” method is given the term “greedy evaluation”. If we instead printed each value by adding a line stringValues.forEach((e) => print(e));
to the end then we’d see each of those statements being evaluated:
Iterator demo
Number of values: 3
Converted 1 to string
1
Converted 2 to string
2
Converted 3 to string
3
As you can see it didn’t do all of the evaluations up front it only did it when it needed it.
Benefits of lazy initialization
So what’s the point of all that? If we are going to use all the values anyway why not just do it all at once? There are a few reasons why this is beneficial even in that case. First, it decouples the configuration of the operations and their execution. This means that we don’t have to wait for all of these values of these operations to complete before we move on to the next operation. That becomes especially relevant when we are chaining together operations. In our above pseudo-code example, lets say there were one million numbers. You wouldn’t want it to transform all of the values to the next million values, do the filtering on all those million values, and so on. You want it to “stream” a value from the top operation to the bottom. In steps that require a lot of resources, especially memory, this can help smooth out resource usage. But there is also the issue that you don’t always use all of the values.
It is often the case that while you may have all of the initial elements you may not want/need to use all of those elements. Doing operations on all of them then is a waste of resources. Dart iterators support the ability to skip values in a collection and only take certain values. In Dart those operators are skip
, skipWhile
, take
, firstWhere
, and takeWhile
. If you know you are only going to be working with a limited subset of values of a larger set then you can save a lot of processing because of the lazy evaluation. Take a collection defined by:
final randomValues = List.generate(1000, (index) => index);
final randomStrings = randomValues.map((e) {
final stringValue = '$e';
print('Converted $e to string');
return stringValue;
});
Let’s also say that we want to skip the first five values and print the next three:
randomStrings
.skip(5)
.take(3)
.forEach((e) => print(e));
When we look at the output we will see that only three of the 1000 potential mapping operations got executed:
Converted 5 to string
5
Converted 6 to string
6
Converted 7 to string
7
Let’s look at something less prescriptive, Let’s say that we want to find the first value that meets a criteria, in this case when it is equal to “3”:
final randomValues = List.generate(1000, (index) => Random().nextInt(10));
final randomStrings = randomValues.map((e) {
final stringValue = '$e';
print('Converted $e to string');
return stringValue;
});
print('First value of 3');
print(randomStrings.firstWhere((value) => value == "3"));
We will see that we would never evaluate it 1000 times and often finish in just a few iterations:
First value of 3
Converted 8 to string
Converted 3 to string
3
Perils of lazy evaluation
So lazy evaluation is the best thing ever and we can go happily on our way using it wherever we want. Of course the answer is yes and no. We have to remember the artifact of lazy evaluation is that it will be evaluated every time. Let’s look at our first example with three integers we convert to strings and print each out but instead print it twice:
print('Iterator demo');
final intValues = [1,2,3];
final stringValues = intValues.map((e) {
final stringValue = '$e';
print('Converted $e to string');
return stringValue;
});
print('Number of values: ${stringValues.length}');
stringValues.forEach((e) => print(e));
stringValues.forEach((e) => print(e));
While you may be expecting something like this:
Iterator demo
Number of values: 3
Converted 1 to string
1
Converted 2 to string
2
Converted 3 to string
3
1
2
3
It is actually:
Iterator demo
Number of values: 3
Converted 1 to string
1
Converted 2 to string
2
Converted 3 to string
3
Converted 1 to string
1
Converted 2 to string
2
Converted 3 to string
3
What happened is that Dart has to evaluate the computation every time it is called. There is no concept of “caching” the value. That means that if you plan on reading these values repeatedly during an execution they will be re-evaluated each time. While this may be an artifact you desire it often is not. In Kotlin/Java you have to explicitly create a new stream to march through it again otherwise it throws an exception so that you are aware of the issues. In .NET, Dart, and other languages however it is more than happy to just recompute the values. Converting the iterator to a list forces the evaluation at list creation time and thus the evaluation to happen only once because you then would be iterating over a list not working with the iterator with the transformations on it:
final intValues = [1,2,3];
final stringValues = intValues.map((e) {
final stringValue = '$e';
print('Converted $e to string');
return stringValue;
});
final stringValuesList = stringValues.toList();
stringValuesList.forEach((e) => print(e));
stringValuesList.forEach((e) => print(e));
Produces this output:
Converted 1 to string
Converted 2 to string
Converted 3 to string
1
2
3
1
2
3
The missing “toList()” is where I got burned. I had a map operation that was generating a new object based on some other parsed value objects. I was then permutating that object over a series of other iterations. A drastically simplified example of the problem is:
class Value {
int value;
Value(this.value);
int increment() => ++value;
}
...
final intValues = [1, 1, 1];
final wrappedValue = intValues.map((e) => Value(e));
print('First pass');
for (final wv in wrappedValue) {
print(wv.increment());
}
print('Second pass');
for (final wv in wrappedValue) {
print(wv.increment());
}
When all was said and done I wasn’t seeing all the updates. Once I added the now obviously missing “toList” at the end it all worked as expected. This is one of those easy to miss bugs which because it was in more complicated code sent me on a wild goose chase. For the simplified version essentialy I expect the output to be:
First pass
2
2
2
Second pass
3
3
3
Instead both the first and second passes output a string of twos. Why? Because each time the for loop executes on the iterator the mapping is re-evaluated. I therefore have a new Value wrapper object each time. By changing the second line to end with .toList()
it forces the evaluation once and the code works as expected.
Conclusion
So in conclusion, use streaming operators for transforming values, leverage the lazy evaluation but be aware of potential side effects both good and bad.