Summary: Microsoft Scripting Guy, Ed Wilson, talks about using Windows PowerShell to find specific built-in properties from Word documents.
Microsoft Scripting Guy, Ed Wilson, is here. Well the script for today took a bit of work … actually it took quite a bit of work. The script does the following:
- Searches a specific folder for Word documents
- Creates an array of specific Word document properties from the Word built-in document properties enumeration. The built-in Word properties are listed on MSDN.
- Retrieves the specific built-in Word properties and their associated value
- Creates a custom Windows PowerShell object with each of the specified properties, in addition to the full path to the Word document
Today’s script is similar to the Find All Word Documents that Contain a Specific Phrase script from yesterday, so reviewing that posting would be a good thing to do. This script also accomplishes a few of the things I wanted to do in yesterday’s script that I did not get a chance to do—namely, I return a custom object that contains the built-in properties I choose. This is a great benefit because it permits further analysis and processing of the data—and it would even permit export to a CSV file if I wish.
Working with Word Document properties
It is very difficult to work with Word document properties, and I have written several blogs about this. You should refer to those blogs for additional information. The first thing I do is create a couple of command-line parameters. This permits changing the path to search, as well as modifying the include filter that is used by the Get-ChildItem cmdlet. Next, I create the Word.Application object and set it to be invisible. Next, I need to create BindingFlags and WdSaveOptions. The reason for creating WdSaveOptions is to keep Word from modifying the last save option on the Word files. Finally, I obtain a collection of fileinfo objects and store the returned objects in the $docs variable. This portion of the script is shown here.
Param(
$path = "C:\fso",
[array]$include = @("HSG*.docx","WES*.docx"))
$AryProperties = "Title","Author","Keywords", "Number of words", "Number of pages"
$application = New-Object -ComObject word.application
$application.Visible = $false
$binding = "System.Reflection.BindingFlags" -as [type]
[ref]$SaveOption = "microsoft.office.interop.word.WdSaveOptions" -as [type]
$docs = Get-childitem -path $Path -Recurse -Include $include
Now I need to walk through the collection of documents. I use the foreach statement. Inside the foreach loop, I open each document,and return the BuiltInDocumentProperties collection. I also create a hash table that I will use to create the custom object later in the script. This portion of the code is shown here.
Foreach($doc in $docs)
{
$document = $application.documents.open($doc.fullname)
$BuiltinProperties = $document.BuiltInDocumentProperties
$objHash = @{"Path"=$doc.FullName}
It is time to work through the array of built in properties that I selected earlier. To do this, once again I use a foreach statement. I use Try when attempting to access each built-in property because an error generates if the property contains no value. I already know the name of the property that I desire to obtain; therefore, I use it directly when obtaining the value of the property. Both the name and the value of the built-in document properties are assigned to the hash table as a keyvalue pair. If an error occurs, I print a message via Write-Host that the value was not found. I use Write-Host for this so I can specify the color (blue). The code is shown here.
foreach($p in $AryProperties)
{Try
{
$pn = [System.__ComObject].invokemember("item",$binding::GetProperty,$null,$BuiltinProperties,$p)
$value = [System.__ComObject].invokemember("value",$binding::GetProperty,$null,$pn,$null)
$objHash.Add($p,$value) }
Catch [system.exception]
{ write-host -foreground blue "Value not found for $p" }
I then create a new custom PSObject and use the hash table for the properties of that object. I display that object, and close the Word document without saving any changes. Finally, I release the document object and the BuiltInProperties object, and I continue to loop through the collection of documents. This code is shown here.
$docProperties = New-Object psobject -Property $objHash
$docProperties
$document.close([ref]$saveOption::wdDoNotSaveChanges)
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($BuiltinProperties) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($document) | Out-Null
Remove-Variable -Name document, BuiltinProperties
}
When I have completed processing the collection of documents, I release the Word.Application COM object and call garbage collection. This code is shown here.
$application.quit()
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($application) | Out-Null
Remove-Variable -Name application
[gc]::collect()
[gc]::WaitForPendingFinalizers()
Using the returned objects
One reason for returning an object is that it allows for grouping, sorting, and for further processing. I could have written everything in a function, but it works just as well as a script. For example, when I run the script, it returns the following objects.
PS C:\> C:\data\ScriptingGuys\2012\HSG_7_30_12\Get-SpecificDocumentProperties.ps1
Path : C:\fso\HSG-7-23-12.docx
Number of words : 1398
Number of pages : 4
Author : edwils
Keywords :
Title :
Path : C:\fso\HSG-7-24-12.docx
Number of words : 1035
Number of pages : 4
Author : edwils
Keywords : guest blogger, powershell
Title :
Because the objects return from the script, I can search the output and find only documents that contain the word “guest blogger” as shown here.
PS C:\> C:\data\ScriptingGuys\2012\HSG_7_30_12\Get-SpecificDocumentProperties.ps1 | where keywords -match "guest blogger"
Path : C:\fso\HSG-7-24-12.docx
Number of words : 1035
Number of pages : 4
Author : edwils
Keywords : guest blogger, powershell
Title :
It is even possible to modify the way the output appears and to split only the file name from the remainder of the path. This is shown here.
PS C:\> C:\data\ScriptingGuys\2012\HSG_7_30_12\Get-SpecificDocumentProperties.ps1 | sort "number of words" -Descending | select @{LABEL="file";EXPRESSION={split-path $_.path -Leaf}}, "number of words", author, keywords | ft -AutoSize
file Number of words Author Keywords
---- --------------- ------ --------
HSG-7-23-12.docx 1398 edwils
HSG-7-27-12.docx 1208 edwils
HSG-8-2-11.docx 1206 edwils
hsg-9-28-11.docx 1131 edwils
HSG-7-24-12.docx 1035 edwils guest blogger, powershell
HSG-8-1-11.docx 963 edwils
HSG-7-25-12.docx 882 edwils
HSG-7-26-12.docx 848 edwils
PS C:\>
The complete Get-SpecificDocumentProperties.ps1 script is on the Scripting Guys Script Repository.
Join me tomorrow when I will talk about programmatically assigning values to the Word documents.
I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson, Microsoft Scripting Guy