Summary: Guest blogger, James O'Neill, uses Windows PowerShell to help users with input for searching the Windows Index.
Microsoft Scripting Guy, Ed Wilson, is here. Today James O’Neill is back with Part Two.
Note: This is Part Two of a three part series about using Windows PowerShell to search the Windows Index. Yesterday, James talked about building a query string to search the Windows Index.
Take it away, James…
In Part One, I developed a working Windows PowerShell function to query the Windows Index. It outputs data rows, which isn't the ideal behaviour, and I'll address that in Part Three. Today, I'll address another drawback: search terms passed as parameters to the function must be "SQL-Ready." I think that makes for a bad user experience, so I am going to look at the half-dozen bits of logic that I added to allow my function to process input that is a little more human. Regular expressions are the way to recognize text that must be changed, and I'll pay particular attention to those because I know a lot of people find them daunting. Let’s address the items from yesterday’s list that I want the function to do for me…
Replace* with %
SQL statements use % for a wildcard, but selecting files at the command prompt traditionally uses *. It's a simple matter to replace. For the need to "escape" the * character, replacing * with % would be as simple as a –Replace statement gets. This command is shown here.
$Filter = $Filter -replace "\*","%"
For some reason, I am never sure if the camera maker is Canon or Cannon, so I would rather search for Can*…or rather Can%, and that replace operation will turn "CameraManufacturer=Can*" into "CameraManufacturer=Can%". It is worth noting that –Replace is just as happy to process an array of strings in $filter as it is to process one.
Searching for a term across all fields uses "CONTAINS (*,'Stingray')", and if the –Replace operation changes * to % inside CONTAINS(), the result is no longer a valid SQL statement. So the regular expression needs to be a little more sophisticated, using a "negative look behind."
$Filter = $Filter -replace " "(?<!\(\s*)\*","%"
To filter out cases like CONTAINS(*… , the new regular expression qualifies "Match on *", with a look behind "(?<!\(\s*)", which says, "If it isn’t immediately preceded by an opening bracket and any spaces." In regular expression syntax:
- (?= x) says, "Look ahead for x"
- (?<= x) says, "Look behind for x"
- (?!= x) is “look ahead for anything EXCEPT x”
- (?<!x) is “look behind for anything EXCEPT x”
These will see a lot of use in this function. Here, (?<! ) is being used. The open bracket needs to be escaped, so it is written as \( , and \s* means 0 or more spaces.
Convert orphan search terms into Contains conditions
A term that needs to be wrapped as a "CONTAINS" search can be identified by the absence of quotation marks, = , < , or > signs, or the LIKE, CONTAINS, or FREETEXT search predicates. When these are present, the search term is left alone; otherwise, it goes to CONTAINS like this.
$filter = ($filter | ForEach-Object {
if ($_ -match "'|=|<|>|like|contains|freetext") {$_}
else {"Contains(*,'$_')"}
})
Add quotation marks if the user omits them
The next thing I check for is omitted quotation marks. I said I wanted to be able to use Can*, and we’ve seen it changed to Can%, but the search term needs to be transformed into "CameraManufacturer='Can%' ". Here is a –Replace operation to do that:
$Filter = $Filter -replace "\s*(=|<|>|like)\s*([^'\d][^\s']*)$",' $1 ''$2'' '
This is a more complex regular expression which takes a few moments to understand.
Regular expression |
Meaning |
Application |
\s*(=|<|>|like)\s* |
Any spaces (or none) |
|
\s*(=|<|>|like)\s* |
= or < or > or "Like" |
CameraManufacturer=Can% |
\s*(=|<|>|like)\s* |
Anything that is NOT a ' character or a digit |
CameraManufacturer=Can% |
\s*(=|<|>|like)\s* |
Any number of non-quotation mark, non-space characters (or none) |
CameraManufacturer=Can% |
\s*(=|<|>|like)\s* |
End of line |
|
\s*(=|<|>|like)\s* |
Capture the enclosed sections as matches |
$Matches[0]= "=Can%" |
' $1 ''$2'' '0 |
Replace Matches[0] ("=Can%") with an expression that uses the two submatches "=" and "can%". |
= 'Can%' |
Note The expression that is being inserted uses $1 and $2 to mean matches [1] and [2]. If this is wrapped in double quotation marks, Windows PowerShell will try to evaluate these terms before they get to the regex handler, so the replacement string must be wrapped in single quotation marks. But the desired replacement text contains single quotation marks, so they need to be doubled up.
Replace '=' with 'like' for wildcards
So far, =Can* has become ='Can%', which is good, but SQL needs "LIKE" instead of "=" to evaluate a wildcard. So the next operation converts "CameraManufacturer = 'Can%' " into "CameraManufacturer LIKE 'Can%' ".
$Filter = $Filter -replace "\s*=\s*(?='.+%'\s*$)" ," LIKE "
Regular expression |
Meaning |
Application |
\s*=\s*(?='.+%'\s*$) |
= sign surrounded by any spaces |
CameraManufacturer = 'Can%' |
\s*=\s*(?='.+%'\s*$) |
A quotation mark character |
CameraManufacturer = 'Can%' |
\s*=\s*(?='.+%'\s*$) |
Any characters (at least one) |
CameraManufacturer = 'Can%' |
\s*=\s*(?='.+%'\s*$) |
% character followed by ' |
CameraManufacturer = 'Can%' |
\s*=\s*(?='.+%'\s*$) |
Any spaces (or none) followed by end of line |
|
\s*=\s*(?='.+%'\s*$) |
Look ahead for the enclosed expression, but don't include it in the match |
$Matches[0] = "=" |
Provide aliases
The previous steps reconstruct "WHERE" terms to build syntactically correct SQL, but what if I get confused and enter CameraMaker instead of CameraManufacturer or Keyword instead of Keywords? I need Aliases, and they should work anywhere in the SQL statement—not just in the "WHERE" clause, but also in "ORDER BY".
I defined a hash table (aka a "dictionary" or an "associative array") near the top of the script to act as a single place to store the aliases with their associated full canonical names, like this:
$PropertyAliases = @{Width="System.Image.HorizontalSize";
Height="System.Image.VerticalSize";
Name="System.FileName";
Extension="System.FileExtension";
Keyword="System.Keywords";
CameraMaker="System.Photo.CameraManufacturer }
Later in the script, after the SQL statement is built, a loop runs through the aliases replacing each with its canonical name:
$PropertyAliases.Keys | ForEach-Object {
$SQL= $SQL -replace "(?<=\s)$($_)(?=\s*(=|>|<|,|Like))",$PropertyAliases[$_]
}
A hash table has .Keys and .Values properties, which return what is on the left and right of the equals sign respectively. $hashTable.keyName or $hashtable[keyName] will return the value, so $_ will start by taking the value "width", and its replacement will be $PropertyAliases["width"], which is "System.Image.HorizontalSize". On the next pass through the loop, "height" is replaced, and so on. To ensure that it matches on a field name and not text being searched for, the regular expression stipulates that the name must be preceded by a space and followed by "="or "like", and so on.
Regular expression |
Meaning |
Application |
(?<=\s)Width(?=\s*(=|>|<|,|Like)) |
The literal text "Width" |
Width > 1024 |
(?<=\s)Width(?=\s*(=|>|<|,|Like)) |
A space |
|
(?<=\s)Width(?=\s*(=|>|<|,|Like)) |
Look behind for the enclosed expression, but don't include it in the match. |
$Matches[0] = "Width" |
(?<=\s)Width(?=\s*(=|>|<|,|Like)) |
Any spaces (or none) |
|
(?<=\s)Width(?=\s*(=|>|<|,|Like)) |
The literal text "Like", or any of the following characters: comma, equals, greater than, or less than |
Width > 1024 |
(?<=\s)Width(?=\s*(=|>|<|,|Like)) |
Look ahead for the enclosed expression, but don't include it in the match. |
$Matches[0] = "Width" |
Add the correct prefix if it is omitted
This builds on the ideas we've seen already. I want the list of fields and prefixes to be easy to maintain, so just after I define my aliases, I define a list of field types:
$FieldTypes = "System","Photo","Image","Music","Media","RecordedTv","Search"
For each type, I define two variables, a prefix and a fieldslist. The names must be FieldtypePREFIX and FieldTypeFIELDS. The reason for this will become clear shortly, but here is what they look like:
$SystemPrefix = "System."
$SystemFields = "ItemName|ItemUrl"
$PhotoPrefix = "System.Photo."
$PhotoFields = "cameramodel|cameramanufacturer|orientation"
In practice, the field lists are much longer. System contains 25 field names, not just the two shown here. The lists are written with "|" between the names so they become a regular expression meaning "ItemName or ItemUrl Or …". The following code runs after aliases have been processed:
foreach ($type in $FieldTypes) {
$fields = (get-variable "$($type)Fields").value
$prefix = (get-variable "$($type)Prefix").value
$sql = $sql -replace "(?<=\s)(?=($Fields)\s*(=|>|<|,|Like))" , $Prefix
}
I can save repeating code by using Get-Variable in a loop to get $systemFields, $photoFields, and so on. If I want to add one more field or a whole type, I only need to change the variable declarations at the start of the script. The regular expression in the -Replace works like this:
Regular expression |
Meaning |
Application |
(?<=\s)(?=(cameramanufacturer| |
Look behind for a space, but don't include it in the match. |
|
(?<=\s)(?=(cameramanufacturer| |
The literal text "orientation" or "cameramanufacturer" |
CameraManufacturer LIKE 'Can%' |
(?<=\s)(?=(cameramanufacturer| |
Any spaces (or none) |
|
(?<=\s)(?=(cameramanufacturer| |
The literal text "Like", or any of the following characters: comma, equals, greater than, or less than |
CameraManufacturer LIKE 'Can%' |
(?<=\s)(?=(cameramanufacturer| |
Look ahead for the enclosed expression, but don't include it in the match. |
$match[0] is the point between the leading space and "CameraManufacturer LIKE", but it doesn't include either. |
Use ‑Replace with a regular expression
We get the effect of an "insert" operator by using ‑Replace with a regular expression that finds a place in the text, but doesn't select any of it.
This part of the function allows "CameraManufacturer LIKE 'Can%'" to become "System.Photo CameraManufacturer LIKE 'Can%' " in a WHERE clause. I also wanted "CameraManufacturer" in an ORDER BY clause to become "System.Photo CameraManufacturer".
Very sharp-eyed readers may have noticed that I look for a comma after the fieldname in addition to <, >, =, and LIKE. I modified the code that appeared in Part One so that when an ORDER BY clause is inserted, it is followed by a trailing comma like this:
if ($orderby) { $sql += " ORDER BY " + ($OrderBy -join " , " ) + ","}
The new version will work with this regular expression, but the extra comma will cause a SQL error, so it must be removed later. When I introduced the SQL, I said the SELECT statement looks like this:
SELECT System.ItemName, System.ItemUrl, System.FileExtension, System.FileName, System.FileAttributes, System.FileOwner, System.ItemType, System.ItemTypeText , System.KindText, System.Kind, System.MIMEType, System.Size
Building this clause from the field lists simplifies code maintenance, and as a bonus, anything declared in the field lists will be retrieved by the query and accepted as input by its short name. The SELECT clause is prepared like this:
if ($First) {$SQL = "SELECT TOP $First "}
else {$SQL = "SELECT "}
foreach ($type in $FieldTypes) {
$SQL += ((get-variable "$($type)Fields").value -replace "\|",", " ) + ", "
}
This replaces the "|" with a comma and puts a comma after each set of fields. This means that there is a comma between the last field and FROM. This allows the regular expression to recognize field names, but it will break the SQL, so it is removed after the prefixes have been inserted (just like ORDER BY).
This might seem inefficient, but when I checked the time it took to run the function and get the results (but not output them), it was typically about 0.05 seconds (50 ms) on my laptop. It takes more time to output the results.
Combining all the bits in this part with the bits in Part One turns my 36-line function into about a 60-line one as follows:
Function Get-IndexedItem{
Param ( [Alias("Where","Include")][String[]]$Filter ,
[Alias("Sort")][String[]]$OrderBy,
[Alias("Top")][String[]]$First,
[String]$Path,
[Switch]$Recurse )
$PropertyAliases = @{Width ="System.Image.HorizontalSize";
Height = "System.Image.VerticalSize"}
$FieldTypes = "System","Photo"
$PhotoPrefix = "System.Photo."
$PhotoFields = "cameramodel|cameramanufacturer|orientation"
$SystemPrefix = "System."
$SystemFields = "ItemName|ItemUrl|FileExtension|FileName"
if ($First) {$SQL = "SELECT TOP $First "}
else {$SQL = "SELECT "}
foreach ($type in $FieldTypes) {
$SQL += ((get-variable "$($type)Fields").value -replace "\|",", ")+", "
}
if ($Path -match "\\\\([^\\]+)\\.") {
$SQL += " FROM $($matches[1]).SYSTEMINDEX WHERE "
}
else {$SQL += " FROM SYSTEMINDEX WHERE "}
if ($Filter) {
$Filter = $Filter -replace "\*","%"
$Filter = $Filter -replace"\s*(=|<|>|like)\s*([^'\d][^\s']*)$",
' $1 ''$2'' '
$Filter = $Filter -replace "\s*=\s*(?='.+%'\s*$)" ," LIKE "
$Filter = ($Filter | ForEach-Object {
if ($_ -match "'|=|<|>|like|contains|freetext") {$_}
else {"Contains(*,'$_')"}
})
$SQL += $Filter -join " AND "
}
if ($Path) {
if ($Path -notmatch "\w{4}:") {$Path = "file:" + $Path}
$Path = $Path -replace "\\","/"
if ($SQL -notmatch "WHERE\s$") {$SQL += " AND " }
if ($Recurse) {$SQL += " SCOPE = '$Path' "}
else {$SQL += " DIRECTORY = '$Path' "}
}
if ($SQL -match "WHERE\s*$") {
Write-warning "You need to specify either a path , or a filter." ; return
}
if ($OrderBy) { $SQL += " ORDER BY " + ($OrderBy -join " , " ) + ","}
$PropertyAliases.Keys | ForEach-Object {
$SQL= $SQL -replace "(?<=\s)$($_)(?=\s*(=|>|<|,|Like))",
$PropertyAliases.$_
}
foreach ($type in $FieldTypes) {
$fields = (get-variable "$($type)Fields").value
$prefix = (get-variable "$($type)Prefix").value
$SQL = $SQL -replace "(?<=\s)(?=($Fields)\s*(=|>|<|,|Like))" , $Prefix
}
$SQL = $SQL -replace "\s*,\s*FROM\s+" , " FROM "
$SQL = $SQL -replace "\s*,\s*$" , ""
$Provider="Provider=Search.CollatorDSO;"+
"Extended Properties=’Application=Windows’;"
$Adapter = new-object system.data.oledb.oleDBDataadapter -argument $SQL,
$Provider
$DS = new-object system.data.dataset
if ($Adapter.Fill($DS)) { $DS.Tables[0] }
}
~James
Awesome job, James! I want to thank you for taking the time to share with us today. Guest Blogger Week will continue tomorrow when James returns with Part Three.
I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson, Microsoft Scripting Guy