Summary: Guest blogger, Windows PowerShell MVP, Joel Bennett talks about using tuples in Windows PowerShell.
Microsoft Scripting Guy, Ed Wilson, is here. Today, we have a guest blog post by Windows PowerShell MVP, Joel Bennett. Joel has been a MVP for a while, and he created the infrastructure that is used by the Scripting Games before they were turned over to PowerShell.org. In addition, he was instrumental in creating PoshCode, which was the first Windows PowerShell repository on the Internet. So, without further ado, here is Joel…
When Ed asked me to write a post explaining tuples and comparing them to other similar types, my first thought was, “Wow, I rarely use tuples in Windows PowerShell. I wonder why that is?”
Of course, if you don’t have a programming or math background, you might be excused if your first thought was, “What do toupees have to do with Windows PowerShell?”
So what are tuples?
Put simply, a tuple is a data structure (an ordered list of elements). The actual comes from set theory, where an ordered n-tuple is a sequence of n elements. The word itself is a generic form of words you’re probably quite familiar with: single, double, triple, quadruple, quintuple, sextuple, … n-tuple.
Like arrays and dictionaries, tuples are a standard data structure in almost all programming languages, and the differences between them are common to all languages. But in the context of Windows PowerShell, when we talk about tuples, we’re talking about System.Tuple.
System.tuple is not simply an ordered list, but an ordered list of a specific number of elements with specific types.
So why not just use…?
You might think that tuples sound like lists or arrays, but the List and Array classes don’t have a fixed number of elements. Instead, they have a length property so you can check how many items are in them. With lists or arrays, there is no way to specify that a function takes an array of exactly three elements (although we could check the length and throw an error). Of course, there’s also no way to specify different types for the elements in a list or array—they all have to be the same type (or be cast to the same type, usually an object).
You might think tuples sound like hash tables or dictionaries, but the Hashtable and Dictionary classes have keys that you have to specify so you can name each element. However, like List and Array, they are homogenous: all the keys are the same type, and all the values are the same type.
Of course, as with arrays, in Windows PowerShell, we frequently use Object as the type, which lets us put anything in there, but without knowing what’s in there. The point here is that you can only specify one type. They also don’t limit the number of elements. With a Hashtable or Dictionary, you can always call the Add method to add more key-value pairs.
Another difference is in how you access the elements. List and Array have a zero-based index accessor, so you can access the first element like $Array[0], and the next like $Array[1]. You can even count from the end by starting with -1, so the last element is $Array[-1].
On the other hand, Hashtable and Dictionary have a key accessor, where the key is of a specific type (as I mentioned earlier). For example, it could be a string or an object such as $Hashtable["Key1"]. With a tuple, the only way to access the elements is through the hard-coded member names, for example: Item1, Item2, Item3. These member names are one-based, not zero-based.
However, the biggest difference (and the reason you’d use a tuple) is about type safety. With a tuple you must specify exactly how many elements will be in it, and the type of each element in the specific order they’ll be in. On the upside, that means that you can count on a tuple having the right type of data in it!
In fact, a tuple is a lot like a struct. Each element has a specific type, and you can use the specific tuple type as a parameter constraint in a function or cmdlet to be sure that the provided values will be the right type.
Let me demonstrate what this means by way of an example…
If I told you I had written a function to compare MVPs, and you need to pass it an array of MVPs for comparison, how would you represent them? To be more specific: I want the name, most recent award date, and the number of awards the MVP has received.
If you wanted to represent it as an array, you might pick any of these:
$Array = "Teresa Wilson", 1, 2014
$Array = "Teresa”, “Wilson", "1/4/2014", 1
$Array = "Teresa Wilson", [DateTime]"April 1, 2014", 1
And any of those might work, but there’s no way they all would. It would be too much work to write a function that could deal with all those different types and orders, so I need to be more specific. I need to know that the data structure you pass in will have specific types in specific places. Any of the other types that we’ve looked at would have a simple problem: I could tell you the types and names (or order) of the data in the structure I need and use any of them, but I would have to rely on you to follow my instructions.
With tuples, I can be more specific: I want the name, award date, and number of awards as a tuple[String, DateTime, Int]. Now you have only one choice for how to pass the data. I’ve specified the order and the object types, and taken all the complexity out of parsing it.
There are a couple of ways to build that tuple—one specifies the type and the other passes the objects and infers the type from them:
$tuple1 = New-Object "tuple[String, DateTime, Int]" "Theresa Wilson", "April, 1 2014", 1
$tuple2 = [tuple]::Create("Theresa Wilson", [DateTime]"April, 1 2014", 1)
Both of these create the same tuple object (they would be -eq if we compared them).
Hopefully, you can clearly see that the tuple basically the same as the last array. Both structures have the same objects, in the same type, in the same order. The difference is all about type safety and the commitment you have to provide the data types in the right order. With the tuple, you know that I want exactly three pieces of information, and what order and format they need to be in.
So why not a struct?
I mentioned earlier that a tuple is a lot like a struct. In an example like this, it might be more elegant to create a custom struct type for our data structure. That works very nicely, and it lets me specify not only the type and order of the elements, but also the Name:
Add-Type @"
using System;
public struct MVP {
public string Name;
public DateTime LastAward;
public int AwardCount;
}
"@
However, it requires a little knowledge of C# syntax, and it also requires that the Add-Type command be run before any code can create one of these objects. Additionally, authors reading the Help for a function that takes an MVP array are going to need to investigate the MVP object type to determine what its members are and possibly find a constructor. For example, consider this function:
function Sort-MVP {
[CmdletBinding()]
param(
[Parameter(Mandatory=$True, ParameterSetName="tuple")]
[tuple[string,datetime,int][]]$tuple,
[Parameter(Mandatory=$True, ParameterSetName="Struct")]
[MVP[]]$MVP
)
<# logic goes here #>
}
Now look at the syntax statements in the Help that Windows PowerShell gives us for those parameter sets:
SYNTAX
Sort-MVP -MVP <MVP[]> [<CommonParameters>]
Sort-MVP -tuple <tuple[string,datetime,int][]> [<CommonParameters>]
If you have a function that takes an array of tuple instead of a custom type, the syntax Help for the tuple is actually somewhat clearer (indicating the types of all the members). Because it’s a standard type, scripters trying to read your code may not even have to investigate the type to create it.
Additionally, creating the tuple objects is quite a bit more succinct in code. Compare the syntax of the two:
$MVP = New-Object MVP -Property @{
Name = "Joel Bennett"
LastAward = "July 1, 2014"
AwardCount = 6
}
$tuple = New-Object "tuple[String,DateTime,Int]" "Joel Bennett", "July 1, 2014", 6
There are obviously a few tradeoffs in usability (for instance, with the tuple, we have to access .Item2 instead of .LastAward), but not having to compile the struct is actually quite significant. If you are designing a type and a set of functions or cmdlets around it, and you need to change it, the struct is a lot more hassle. You have to recompile with Add-Type, which requires restarting Windows PowerShell to clear out the old type name. With the tuple, you can change the specifics by changing the type restriction on your function.
For instance, if I decided I need to sort by the last name (and thus, store the first name and last name separately), I could change my parameter definition to this:
[tuple[string,string,datetime,int][]]$tuple
Additionally, with a tuple, another Windows PowerShell programmer can write functions that use the same data structure without worrying about stepping on your type definition. If one of you needs to change it, you don’t have to worry about having different definitions of the same type. They may not be completely compatible anymore, but they’re not completely broken, either.
And finally, the output
There are two other major benefits to using tuples in Windows Powershell.
First, Windows PowerShell doesn’t unroll them. Arrays and lists get unrolled, so when you return multiple data structures represented as arrays from a function, they end up blending together. To avoid that, you have to use Write-Output -NoEnumerate or return arrays of arrays. Of course, when you have to pass those output objects through multiple levels of functions, each function has to not let the arrays unroll. You have to also not accidentally use –NoEnumerate on an array of arrays. It’s really a mess.
Second, Windows PowerShell treats tuples as a custom type, so they are displayed as objects, not collections. Hashtable and Dictionary classes don’t get unrolled in Windows PowerShell, but they do get displayed as a table of Key and Value, rather than having the keys used as columns. With a tuple, you get the same output treatment as a struct or any other custom object. The only downside is that your columns have the unimaginative names: Item1, Item2, Item3…
It’s easy to see that any time you need to work with data records, tuples are a great choice. Any time you need to work with a collection of strongly typed elements (such a database record), tuples are a great choice. Any time you need to pass multiple values in or out of a function as a single object, tuples are a great choice. That’s not to say that tuples are always the best choice even in these situations—structs have almost all of the same benefits, but different downsides.
~Joel
Thank you, Joel, for a great conclusion to our tuple series.
So, that is all there is to working with tuples. Join me tomorrow when I will talk about some more cool Windows PowerShell stuff.
I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson, Microsoft Scripting Guy