Summary: Microsoft Scripting Guy, Ed Wilson, talks about using Windows PowerShell to analyze custom objects.
Microsoft Scripting Guy, Ed Wilson, is here. Last week, I talked about using a Windows PowerShell script to collect the number of words and documents for each of several years. To this, I wrote a Windows PowerShell script that trolled a number of folders, opened the Word documents, and gathered the word count. (See Use PowerShell to Count Words and Display Progress Bar.)
The real power of the script comes not in simply emiting the objects to the Windows PowerShell console, but in collecting the objects, and then using Windows PowerShell to process the objects. In fact, with a script that takes a while to run, this is the only practical solution. All I need to do is add $objects = at the point in my script that creates the objects. The revised code is shown here:
Note Remember, the only thing I did was add $Objects = to my script in the section where I created the custom objects.
$path = "E:\Data\ScriptingGuys"
$year = $NumberOfDocs = $NumberOfWords = $null
$i = 1
$totalDocs = (Get-ChildItem E:\Data\ScriptingGuys -filter "*doc*" -Recurse -file |
Where {$_.BaseName -match '^(HSG|WES|QHF)'}).count
$word = New-Object -ComObject word.application
$word.visible = $false
$objects = Get-ChildItem $path -filter "????" -Directory |
ForEach-Object {
$year = $_.name
Get-ChildItem $_.FullName -filter "*doc*" -Recurse -file |
Where-Object {$_.BaseName -match '^(HSG|WES|QHF)'} |
ForEach-Object {
$i++
Write-Progress -Activity "Processing $($_.BaseName)" `
-PercentComplete (($i / $totalDocs)*100) -Status "Working on $year"
$document = $word.documents.open($_.fullname)
$NumberOfWords += $document.words.count
$NumberOfDocs ++
$document.close() | out-null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($document) |
Out-Null
Remove-Variable Document }
[PSCustomObject]@{
"NumberOfDocuments" = $NumberOfDocs
"NumberOfWords" = $NumberOfWords
"Year" = $year}
$NumberOfDocs = $NumberOfWords = $year = $null }
$word.quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
Remove-Variable Word
[gc]::collect()
[gc]::WaitForPendingFinalizers()
After I have run the script and created my collection of objects, I will be able to work with the objects until I close Windows PowerShell, change the value of $objects, or remove the variable.
The first thing I do is look at the variable to see what it contains. This is shown here:
PS C:\> $objects
NumberOfDocuments NumberOfWords Year
----------------- ------------- ----
6 9083 2008
135 281606 2009
387 672847 2010
379 600970 2011
392 598339 2012
363 502704 2013
388 456485 2014
180 123584 2015
The next thing I want to do is look at some stats related to the number of words created over the years:
PS C:\> $objects | measure -Property numberofwords -Sum
Count : 8
Average :
Sum : 3245618
Maximum :
Minimum :
Property : NumberOfWords
It was over 3 million words! I want to know: What was the average number of words per year, the maximum number in one year, and the minimum number in one year? This code is shown here:
PS C:\> $objects | Measure-Object -Property numberofwords -Sum -Average -Maximum -Minimum
Count : 8
Average : 405702.25
Sum : 3245618
Maximum : 672847
Minimum : 9083
Property : NumberOfWords
But the first year, I only wrote six articles, and so that is skewing the results. I decide that I want to eliminate the first year. I can do that like this with the Select-Object cmdlet:
PS C:\> $objects | select -Last 7
NumberOfDocuments NumberOfWords Year
----------------- ------------- ----
135 281606 2009
387 672847 2010
379 600970 2011
392 598339 2012
363 502704 2013
388 456485 2014
180 123584 2015
Now that I have eliminated the first year, I add the Measure-Object cmdlet:
PS C:\> $objects | select -Last 7 | Measure-Object -Property numberofwords -Sum -Average -Maximum -Minimum
Count : 7
Average : 462362.142857143
Sum : 3236535
Maximum : 672847
Minimum : 123584
Property : NumberOfWords
But what if I want to see the average size of each document? Well, I did not specifically collect that, did I? No problem, I have the information. All I need to do is to create a new object with the information I need. Once again, I use the Select-Object cmdlet as shown here:
PS C:\> $objects | select Year, @{L = "AverageSize"; E = {$_.NumberOfWords / $_.NumberOfDocuments}} | ft -AutoSize
Year AverageSize
---- -----------
2008 1513.83333333333
2009 2085.97037037037
2010 1738.62273901809
2011 1585.672823219
2012 1526.375
2013 1384.85950413223
2014 1176.50773195876
2015 686.577777777778
What if I want a more in-depth look at the average size? Well, I bring the Measure-Object cmdlet back in to play. This is shown here:
PS C:\> $objects | select Year, @{L = "AverageSize"; E = {$_.NumberOfWords / $_.NumberOfDocuments}} | measure -Property AverageSize -Average -Maximum -Minimum
Count : 8
Average : 1462.3024099762
Sum :
Maximum : 2085.97037037037
Minimum : 686.577777777778
Property : AverageSize
So, by creating a custom object and saving that object in a variable, it makes for great offline analysis.
That is all for now. Join me tomorrow for more way cool Windows PowerShell stuff.
I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson, Microsoft Scripting Guy