Summary: Guest blogger, Matt Tisdale, talks about using Windows PowerShell to remove data from a .csv file.
Microsoft Scripting Guy, Ed Wilson, is here. Welcome back guest blogger, Matt Tisdale…
Last night a geoscientist told me that he has almost 900 .csv files, and he needs to remove two columns of data from them. He has used Windows PowerShell for some other data management tasks and he wanted to know if Windows PowerShell could help with this task. Of course it can!
Here is the command I provided to him:
Get-ChildItem c:\temp *.csv | foreach { $filename=$_.fullname; Import-Csv $filename | select * -ExcludeProperty column1,column2 |Export-csv $filename.Replace(".csv",".new.csv") -NoTypeInformation }
Here is a breakdown of each component of the command…
Get-ChildItem c:\temp *.csv
This finds all .csv files directories under c:\temp. If we wanted to find all files anywhere in the subfolder structure below c:\temp, the command would be Get-ChildItem c:\temp *.csv -Recurse.
foreach { .... }
We are using a foreach loop to run a series of commands for each file that is enumerated by Get-ChildItem. In most cases with Windows PowerShell, we can simply use the pipeline ( | ) character. But in this case, we are doing something special, which requires a foreach loop. I’ll discuss more details about this later.
$filename=$_.fullname; Import-Csv $filename | select * -ExcludeProperty column1,column2 |Export-csv $filename.Replace(".csv",".new.csv") -NoTypeInformation
Inside the foreach loop, we are performing three specific tasks:
- Importing data from the csv file
- Selecting specific pieces of the data
- Exporting the selected data to a new .csv file
Now let's look at why we are using a foreach loop here instead of just the pipeline. The pipeline is used to send output from one command to the next command. Here, we take the output from Get-ChildItem and send it to Import-csv. Then we take the output from Import-csv and send it to Select. Finally, we take the output from Select and send it to Export-csv.
All of this is pretty simple, but we need output from Get-ChildItem to make it all the way down the line to Export-csv. We need this because we want to maintain the original file name we are working with. If we only used the pipeline, the only input Export-csv would receive is what Select sent it.
Because Select does not know the file name, we are in trouble. The answer I chose here is to write the file name into a variable and maintain that variable inside the foreach loop. The next few sections explain more about this.
$filename=$_.fullname; Import-csv $filename
The very first thing we do inside the foreach loop is save the file name into a variable named $filename. By doing this, we can use this variable anywhere further down the pipeline, as long as it is inside the foreach loop. As you can see when looking at the entire command, we reference this variable with Import-csv and Export-csv at different levels in the pipeline process.
Notice the semicolon at the end of this? This is used to run more than one command before passing output through the pipeline to the next command. In this section we are first setting our variable and then running Import-csv.
select * -ExcludeProperty column1,column2
Here we are using Select-Object (select) to select only specific pieces of data (columns in this case) from the data sent to us from Import-csv. In the test data I created while designing this command, I had a number of .csv files with five columns each. I named the columns column1, column2, column3, and so on. Under each column I added various text strings of data.
With Select-Object, we can include specific columns or exclude specific columns. I have no idea how many columns the production data may actually have, but I know the need is to only exclude two of them (by name), so I decided to go with the exclusion method because it will be less effort.
There are a few different ways you can include and exclude data with Select-Object, but I will focus on the specific parameters that I elected to use for today. Feel free to use Get-Help to research the other parameters that are available with Select-Object (for example, -First and -Last).
The asterisk ( * ) initially selects all of the data that is passed from the previous command. The -ExcludeProperty parameter is used to specifically exclude two columns by name. The result of this command against my test data is that only data in columns 3, 4, and 5 were sent to the next command in the pipeline.
Export-Csv $filename.Replace(".csv",".new.csv") -NoTypeInformation
This command takes the data sent to us from Select-Object and saves it out to a new .csv file. The data sent through the pipeline from Select-Object does not have any details about the original file name, but because we saved this into a variable earlier, we can now access it and determine the original file name where this data came from. Two thumbs up for Windows PowerShell variables!
We want to save our new file by using the original file name with ".new" added, so we simply use the Replace method to find .csv in the file name and replace it with .new.csv. If my original file name is c:\temp\data1.csv, the new file name will be c:\temp\data1.new.csv. The -NoTypeInformation parameter is used to keep Export-csv from writing additional information that we do not need at the top of our .csv file.
There you have it. Ultimately this is a pretty simple business need, but it would take someone hours of time to manually remove two columns of data from almost 900 files. It only took me about 10 minutes to design and test this command in Windows PowerShell. Now this individual can spend more of his time working to find petroleum and less time performing tedious data management tasks.
I really enjoy seeing Windows PowerShell provide direct benefit to our business and company bottom line! Please keep your questions flowing in. I look forward to helping IT staff and business employees save more time and money in the future.
Happy scripting!
~Matt
Way cool. Thanks Matt. This is a very helpful technique. I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson, Microsoft Scripting Guy