How to Use VBScript to Delete Duplicate Excel Rows with Another Less Column Value


In this article, I am going to share a easy and quick way to delete some duplicate rows (e.g. regarding column B) from a spreadsheet but the requirement is to keep the row with the maximum value (e.g. regarding column G).

If you convert the spreadsheet into *.csv (comma separated values), which are just plain text basically, then you can use many other programming languages easily that read text files and parse them into columns of values.

If the given files are in the format of *.XLS or *.XLSX, then it may not be so easy to read/understand these formats. Now my recommendation would be to use the VBScript and make use of the Excel.Application COM object.

To set up the object and read the spread sheet.

1
2
3
4
Dim objXL
Set objXL = WScript.CreateObject("Excel.Application")
objXL.Visible = TRUE
objXL.Workbooks.Open "data.xlsx"
Dim objXL
Set objXL = WScript.CreateObject("Excel.Application")
objXL.Visible = TRUE
objXL.Workbooks.Open "data.xlsx"

To check for duplicate rows (regarding column B) while considering the maximum value for column G. Mark rows to delete with ‘X’ (column A).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
row = 2 ' skip first line which is column names
Do While True
    curtime = Trim(objXL.Range("B" & CStr(row)))
    If (Len(curtime) = 0) Then
        Exit Do
    End If
    curg = CDbl(objXL.Range("G" & CStr(row)))
    j = row + 1
    Do While True
        nexttime = Trim(objXL.Range("B" & CStr(j)))
        If (Len(nexttime) = 0) Then
            Exit Do
        End If
        If (nexttime = curtime) Then
            nextg = CDbl(objXL.Range("G" & CStr(j)))
            If (nextg > curg) Then ' mark for deletion later
                curg = nextg
                objXL.Cells(row, 1) = "x"
            Else
                objXL.Cells(j, 1) = "x"
            End If
        Else
            Exit Do
        End If
        j = j + 1
    Loop
    row = j
Loop 
row = 2 ' skip first line which is column names
Do While True
	curtime = Trim(objXL.Range("B" & CStr(row)))
	If (Len(curtime) = 0) Then
		Exit Do
	End If
	curg = CDbl(objXL.Range("G" & CStr(row)))
	j = row + 1
	Do While True
		nexttime = Trim(objXL.Range("B" & CStr(j)))
		If (Len(nexttime) = 0) Then
			Exit Do
		End If
		If (nexttime = curtime) Then
			nextg = CDbl(objXL.Range("G" & CStr(j)))
			If (nextg > curg) Then ' mark for deletion later
				curg = nextg
				objXL.Cells(row, 1) = "x"
			Else
				objXL.Cells(j, 1) = "x"
			End If
		Else
			Exit Do
		End If
		j = j + 1
	Loop
	row = j
Loop 

To delete these rows (the ones marked with ‘X’), it should be done in backward order.

1
2
3
4
5
For i = row To 2 Step -1
    If objXL.Cells(i, 1) = "x" Then
        objXL.Rows(i).Delete
    End If
Next
For i = row To 2 Step -1
	If objXL.Cells(i, 1) = "x" Then
		objXL.Rows(i).Delete
	End If
Next

Please note that if you run the Excel.Application in background (objXL.Visible = False), then you have to put objXL.Quit otherwise when the script quits, the Excel Process will still live in memory (background process) and you can’t control it, which is a memory leak unless you terminate it in the process monitor.

The full script source code with better user experiences (support command line parameter) is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
' helloacm.com
 
' check command line parameter
If WScript.Arguments.Count = 0 Then
    WScript.Echo "File not given."
    WScript.Quit
End If
 
CSV = Trim(WScript.Arguments.Item(0))
WScript.Echo "Processing " & CSV
' check if file is valid
Set fso = CreateObject("Scripting.FileSystemObject")
If (Not fso.FileExists(CSV)) Then
    WScript.Echo "File not found: " & CSV
    WScript.Quit
End If
 
Dim objXL
Set objXL = WScript.CreateObject("Excel.Application")
 
' in background
objXL.Visible = false
objXL.Workbooks.Open CSV
row = 2
Do While True
    curtime = Trim(objXL.Range("B" & CStr(row)))
    If (Len(curtime) = 0) Then
        Exit Do
    End If
    curg = CDbl(objXL.Range("G" & CStr(row)))
    j = row + 1
    Do While True
        nexttime = Trim(objXL.Range("B" & CStr(j)))
        If (Len(nexttime) = 0) Then
            Exit Do
        End If
        If (nexttime = curtime) Then ' considered duplicate
            nextg = CDbl(objXL.Range("G" & CStr(j)))
            If (nextg > curg) Then ' keep the maximum
                curg = nextg
                objXL.Cells(row, 1) = "x"
            Else
                objXL.Cells(j, 1) = "x"
            End If
        Else
            Exit Do
        End If
        j = j + 1
    Loop
    row = j ' jump to next row
Loop 
 
For i = row To 2 Step -1
    If objXL.Cells(i, 1) = "x" Then
        objXL.Rows(i).Delete
    End If
Next
 
objXL.DisplayAlerts = False ' Default Option
objXL.ActiveWorkbook.SaveAs CSV & "_new.csv"
WScript.Echo "New file saved to: " & CSV & "_new.csv"
 
' avoid memory leaks
objXL.Quit
Set fso = Nothing
Set objXL = Nothing
' helloacm.com

' check command line parameter
If WScript.Arguments.Count = 0 Then
	WScript.Echo "File not given."
	WScript.Quit
End If

CSV = Trim(WScript.Arguments.Item(0))
WScript.Echo "Processing " & CSV
' check if file is valid
Set fso = CreateObject("Scripting.FileSystemObject")
If (Not fso.FileExists(CSV)) Then
	WScript.Echo "File not found: " & CSV
	WScript.Quit
End If

Dim objXL
Set objXL = WScript.CreateObject("Excel.Application")

' in background
objXL.Visible = false
objXL.Workbooks.Open CSV
row = 2
Do While True
    curtime = Trim(objXL.Range("B" & CStr(row)))
    If (Len(curtime) = 0) Then
        Exit Do
    End If
    curg = CDbl(objXL.Range("G" & CStr(row)))
    j = row + 1
    Do While True
        nexttime = Trim(objXL.Range("B" & CStr(j)))
        If (Len(nexttime) = 0) Then
            Exit Do
        End If
        If (nexttime = curtime) Then ' considered duplicate
            nextg = CDbl(objXL.Range("G" & CStr(j)))
            If (nextg > curg) Then ' keep the maximum
                curg = nextg
                objXL.Cells(row, 1) = "x"
            Else
                objXL.Cells(j, 1) = "x"
            End If
        Else
            Exit Do
        End If
        j = j + 1
    Loop
    row = j ' jump to next row
Loop 

For i = row To 2 Step -1
    If objXL.Cells(i, 1) = "x" Then
        objXL.Rows(i).Delete
    End If
Next

objXL.DisplayAlerts = False ' Default Option
objXL.ActiveWorkbook.SaveAs CSV & "_new.csv"
WScript.Echo "New file saved to: " & CSV & "_new.csv"

' avoid memory leaks
objXL.Quit
Set fso = Nothing
Set objXL = Nothing

The example output:

D:\>cscript  csv.vbs D:\data.csv
Microsoft (R) Windows Script Host Version 5.8
Copyright (C) Microsoft Corporation. All rights reserved.

Processing D:\data.csv
New file saved to: D:\data.csv_new.csv

D:\>

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
721 words
Last Post: C/C++ Function to Compute the Combination Number
Next Post: Interview Question: The Number of Bits Required to Convert One Integer to Another (C/C++ function)

The Permanent URL is: How to Use VBScript to Delete Duplicate Excel Rows with Another Less Column Value

Leave a Reply