Summary: Microsoft Scripting Guy, Ed Wilson, talks about using Windows PowerShell to clean up an impossible document.
Microsoft Scripting Guy, Ed Wilson, is here. Sometimes I just cringe. It is a reflex reaction born from many long (and at times, torturous) years in the IT field. I know, I should be able to get over it, and sometimes I actually think I am getting better. And then things like the following crop up in my email...
I recently received a panic email from an IT admin. He was tasked with creating over 1,200 new user accounts. Hey, no problem. Give me a properly formatted CSV file or Excel spreadsheet, and it is a piece of cake.
Dude, that was the problem. Some *&^%$#@!!! in HR typed all of the new names in a Word doc. Here is a screenshot of the Word document:
My initial reaction ran something along the lines of, “You have GOT to be kidding me!”
My secondary reaction went along the lines of, “OK. Reply to the email, attach an Excel spreadsheet, and tell them to copy the names into the proper columns. When they do the job right, I will be glad to help create the new user accounts.”
But then the email said that the person's boss said, “Just fix it.” So that ended that idea.
I closed my laptop and mumbled something along the lines of, “That’s ridiculous,” and I decided to go make a pot of tea. After having a great cup of Darjeeling tea, I was sufficiently calmed down and ready to try something.
First of all, I did not want to deal with automating Word, so I decided to copy the three columns of user names into a text file. I figured Windows PowerShell could easily clean up the text file, and I would be using it as an intermediate stage on the way to a CSV file.
I was pleasantly surprised in that when I copied and pasted all of the user names into the text file, everything came out as a single column. Here is a screenshot of the text file:
The second issue I was worried about was that some of the names had Unicode characters in them. Luckily, Notepad prompted me about that, and I was able to save the file as Unicode. It is a simple control next to the Save button that permits me to choose the encoding type:
The first thing I need to do is to remove all of the single letter labels. The A, B, C kind of things that luckily appear on single lines. To do this, I read the contents of the text file, store it in a variable, and then check the length of each line. After I check the length, I title case the letters, trim any leading or trailing spaces from the words, and split on a comma. I then create a custom object. Here is my code so far:
$rawtext = Get-Content C:\DataIn\Names_IN.txt
Foreach ($n in $rawtext)
{
If($n.Length -gt 2)
{
$name = (Get-Culture).TextInfo.ToTitleCase($n).split(',').trim()
[PSCustomObject]@{
Lname = $name[0]
Fname = $name[1]} }
}
When I run the script, I can see from the output that I have come a long way towards my goal. Here is the script and output:
But I now see that I have another problem. Some of the user names have a middle initial, and some do not. I need to fix that. It should be another simple if/else sort of thing. The tricky part (so to speak) is that I need to add my additional logic at the correct point in my code.
This point is where my custom object emits. Right now, it emits to the default output location, which is the Windows PowerShell ISE output pane. I want to send it down a pipeline, check to see if there is a space in the Fname property, and if there is, split that into two—an Fname property and an Mname property. If there is not a middle, I will leave the property blank as the default value.
Here is my code now:
$rawtext = Get-Content C:\DataIn\Names_IN.txt
Foreach ($n in $rawtext)
{
If($n.Length -gt 2)
{
$name = (Get-Culture).TextInfo.ToTitleCase($n).split(',').trim()
[PSCustomObject]@{
Lname = $name[0]
Fname = $name[1]} |
ForEach-Object {
If($_.Fname -match ' ')
{
$fn = $_.Fname -split ' '
[PSCustomObject]@{
Lname = $name[0]
Fname = $fn[0]
Mname = $fn[1]}
}
ELSE {
[PSCustomObject]@{
Lname = $name[0]
Fname = $name[1]
Mname = ''}}
}
}
}
Here is the script and the output as it stand at this point:
Now all I need to do is to save my output as a CSV file so I will be able to import it into Active Directory. Luckily, there is an Export-CSV cmdlet that should be able to handle this. Because I emit my custom objects in two places, I need to pick up the objects in two different places and use that to append to my CSV file. Remember, I have Unicode characters in my text file, so I need to specify the encoding as Unicode. Here is my completed script:
$rawtext = Get-Content C:\DataIn\Names_IN.txt
Foreach ($n in $rawtext)
{
If($n.Length -gt 2)
{
$name = (Get-Culture).TextInfo.ToTitleCase($n).split(',').trim()
[PSCustomObject]@{
Lname = $name[0]
Fname = $name[1]} |
ForEach-Object {
If($_.Fname -match ' ')
{
$fn = $_.Fname -split ' '
[PSCustomObject]@{
Lname = $name[0]
Fname = $fn[0]
Mname = $fn[1]} |
Export-Csv -Path C:\DataOut\Names_Out.CSV -Encoding Unicode `
-NoTypeInformation -Append
}
ELSE {
[PSCustomObject]@{
Lname = $name[0]
Fname = $name[1]
Mname = ''} |
Export-Csv -Path C:\DataOut\Names_Out.CSV -Encoding Unicode `
-NoTypeInformation -append }
}
}
}
My CSV file is shown here:
OK...I will admit it. I spent more time whining about cleaning up this data than I spent writing the script. Windows PowerShell makes performing these sorts of data-grooming tasks easy. I knew this, but dude, I did not think it would be this easy. The script appears long, but it is repetitive. There may be an easier way to do this, but this did not take very long at all.
Join me tomorrow when I will create a simple script to use my newly cleaned CSV file to create over 1200 users in Active Directory. I KNOW this task will be easy.
I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson, Microsoft Scripting Guy