In this article, I am going to share a easy and quick way to delete some duplicate rows (e.g. regarding column B) from a spreadsheet but the requirement is to keep the row with the maximum value (e.g. regarding column G).
If you convert the spreadsheet into *.csv (comma separated values), which are just plain text basically, then you can use many other programming languages easily that read text files and parse them into columns of values.
If the given files are in the format of *.XLS or *.XLSX, then it may not be so easy to read/understand these formats. Now my recommendation would be to use the VBScript and make use of the Excel.Application COM object.
To set up the object and read the spread sheet.
1 2 3 4 | Dim objXL Set objXL = WScript.CreateObject("Excel.Application") objXL.Visible = TRUE objXL.Workbooks.Open "data.xlsx" |
Dim objXL Set objXL = WScript.CreateObject("Excel.Application") objXL.Visible = TRUE objXL.Workbooks.Open "data.xlsx"
To check for duplicate rows (regarding column B) while considering the maximum value for column G. Mark rows to delete with ‘X’ (column A).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | row = 2 ' skip first line which is column names Do While True curtime = Trim(objXL.Range("B" & CStr(row))) If (Len(curtime) = 0) Then Exit Do End If curg = CDbl(objXL.Range("G" & CStr(row))) j = row + 1 Do While True nexttime = Trim(objXL.Range("B" & CStr(j))) If (Len(nexttime) = 0) Then Exit Do End If If (nexttime = curtime) Then nextg = CDbl(objXL.Range("G" & CStr(j))) If (nextg > curg) Then ' mark for deletion later curg = nextg objXL.Cells(row, 1) = "x" Else objXL.Cells(j, 1) = "x" End If Else Exit Do End If j = j + 1 Loop row = j Loop |
row = 2 ' skip first line which is column names Do While True curtime = Trim(objXL.Range("B" & CStr(row))) If (Len(curtime) = 0) Then Exit Do End If curg = CDbl(objXL.Range("G" & CStr(row))) j = row + 1 Do While True nexttime = Trim(objXL.Range("B" & CStr(j))) If (Len(nexttime) = 0) Then Exit Do End If If (nexttime = curtime) Then nextg = CDbl(objXL.Range("G" & CStr(j))) If (nextg > curg) Then ' mark for deletion later curg = nextg objXL.Cells(row, 1) = "x" Else objXL.Cells(j, 1) = "x" End If Else Exit Do End If j = j + 1 Loop row = j Loop
To delete these rows (the ones marked with ‘X’), it should be done in backward order.
1 2 3 4 5 | For i = row To 2 Step -1 If objXL.Cells(i, 1) = "x" Then objXL.Rows(i).Delete End If Next |
For i = row To 2 Step -1 If objXL.Cells(i, 1) = "x" Then objXL.Rows(i).Delete End If Next
Please note that if you run the Excel.Application in background (objXL.Visible = False), then you have to put objXL.Quit otherwise when the script quits, the Excel Process will still live in memory (background process) and you can’t control it, which is a memory leak unless you terminate it in the process monitor.
The full script source code with better user experiences (support command line parameter) is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | ' helloacm.com ' check command line parameter If WScript.Arguments.Count = 0 Then WScript.Echo "File not given." WScript.Quit End If CSV = Trim(WScript.Arguments.Item(0)) WScript.Echo "Processing " & CSV ' check if file is valid Set fso = CreateObject("Scripting.FileSystemObject") If (Not fso.FileExists(CSV)) Then WScript.Echo "File not found: " & CSV WScript.Quit End If Dim objXL Set objXL = WScript.CreateObject("Excel.Application") ' in background objXL.Visible = false objXL.Workbooks.Open CSV row = 2 Do While True curtime = Trim(objXL.Range("B" & CStr(row))) If (Len(curtime) = 0) Then Exit Do End If curg = CDbl(objXL.Range("G" & CStr(row))) j = row + 1 Do While True nexttime = Trim(objXL.Range("B" & CStr(j))) If (Len(nexttime) = 0) Then Exit Do End If If (nexttime = curtime) Then ' considered duplicate nextg = CDbl(objXL.Range("G" & CStr(j))) If (nextg > curg) Then ' keep the maximum curg = nextg objXL.Cells(row, 1) = "x" Else objXL.Cells(j, 1) = "x" End If Else Exit Do End If j = j + 1 Loop row = j ' jump to next row Loop For i = row To 2 Step -1 If objXL.Cells(i, 1) = "x" Then objXL.Rows(i).Delete End If Next objXL.DisplayAlerts = False ' Default Option objXL.ActiveWorkbook.SaveAs CSV & "_new.csv" WScript.Echo "New file saved to: " & CSV & "_new.csv" ' avoid memory leaks objXL.Quit Set fso = Nothing Set objXL = Nothing |
' helloacm.com ' check command line parameter If WScript.Arguments.Count = 0 Then WScript.Echo "File not given." WScript.Quit End If CSV = Trim(WScript.Arguments.Item(0)) WScript.Echo "Processing " & CSV ' check if file is valid Set fso = CreateObject("Scripting.FileSystemObject") If (Not fso.FileExists(CSV)) Then WScript.Echo "File not found: " & CSV WScript.Quit End If Dim objXL Set objXL = WScript.CreateObject("Excel.Application") ' in background objXL.Visible = false objXL.Workbooks.Open CSV row = 2 Do While True curtime = Trim(objXL.Range("B" & CStr(row))) If (Len(curtime) = 0) Then Exit Do End If curg = CDbl(objXL.Range("G" & CStr(row))) j = row + 1 Do While True nexttime = Trim(objXL.Range("B" & CStr(j))) If (Len(nexttime) = 0) Then Exit Do End If If (nexttime = curtime) Then ' considered duplicate nextg = CDbl(objXL.Range("G" & CStr(j))) If (nextg > curg) Then ' keep the maximum curg = nextg objXL.Cells(row, 1) = "x" Else objXL.Cells(j, 1) = "x" End If Else Exit Do End If j = j + 1 Loop row = j ' jump to next row Loop For i = row To 2 Step -1 If objXL.Cells(i, 1) = "x" Then objXL.Rows(i).Delete End If Next objXL.DisplayAlerts = False ' Default Option objXL.ActiveWorkbook.SaveAs CSV & "_new.csv" WScript.Echo "New file saved to: " & CSV & "_new.csv" ' avoid memory leaks objXL.Quit Set fso = Nothing Set objXL = Nothing
The example output:
D:\>cscript csv.vbs D:\data.csv Microsoft (R) Windows Script Host Version 5.8 Copyright (C) Microsoft Corporation. All rights reserved. Processing D:\data.csv New file saved to: D:\data.csv_new.csv D:\>
–EOF (The Ultimate Computing & Technology Blog) —
loading...
Last Post: C/C++ Function to Compute the Combination Number
Next Post: Interview Question: The Number of Bits Required to Convert One Integer to Another (C/C++ function)