For my Data Analysts, in this video I will demonstrate how to perform a column comparison between two address fields so you don’t have to manually review every row. We’ll use a VBA function from Stack Overflow to provide the comparison results.
I should point out that Excel is NOT the preferred method for address matching, but sometimes it is your only option due to lack of time or better tools. Ideally, you should use address correction software that “fixes spelling errors, corrects abbreviations, and standardizes capitalization so each address in your list complies with the USPS official format” – (per the USPS). Once your addresses are standardized, THEN you should perform a comparison, but this rarely happens.
What typically happens is that some poor analyst like you is conscripted into performing address matching manually using some combination of SQL Server and manual Excel processes. That’s why a Google search led you to this page!
Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
- The string “HAT” as compared to “hat” would have a Levenshtein Distance of 3
- Since the function is case sensitive all three characters are different
- The string “HAT” as compared to “BAT” would have a Levenshtein Distance of 1
- To turn the first string into the second string it would take 1 substitution of characters (H changed to B or vice-versa)
- The lower the number, the more the strings are similar
- The higher the number, the more the strings are dissimilar.
Activate the Developer Tab in Excel
The Developer tab is the place to go when you want to do or use the following:
- Write macros
- Run macros that you previously recorded
- Create VBA Modules and User Defined Functions <– This is our sweet spot
- On the File tab, go to Options > Customize Ribbon
- Under Customize the Ribbon and under Main Tabs, select the Developer check box
Create a Module in Excel
- On the Developer tab select Visual Basic
- In the VBA interface select Insert > Module
Insert Levenshtein Distance Function VBA Code
- Go to this link at Stack Overflow to view the code as originally referenced
- Or, simply copy the code below as developed by user “smirkingman” which is the first answer.
- Big shoutout to “smirkingman” for this great resource!
Option Explicit Public Function Levenshtein(s1 As String, s2 As String) Dim i As Integer Dim j As Integer Dim l1 As Integer Dim l2 As Integer Dim d() As Integer Dim min1 As Integer Dim min2 As Integer l1 = Len(s1) l2 = Len(s2) ReDim d(l1, l2) For i = 0 To l1 d(i, 0) = i Next For j = 0 To l2 d(0, j) = j Next For i = 1 To l1 For j = 1 To l2 If Mid(s1, i, 1) = Mid(s2, j, 1) Then d(i, j) = d(i - 1, j - 1) Else min1 = d(i - 1, j) + 1 min2 = d(i, j - 1) + 1 If min2 < min1 Then min1 = min2 End If min2 = d(i - 1, j - 1) + 1 If min2 < min1 Then min1 = min2 End If d(i, j) = min1 End If Next Next Levenshtein = d(l1, l2) End Function
- Paste the code into your newly created Excel module
- Debug > Compile VBAProject
You should not experience any errors after compiling the code.
Watch the Video to Use the Function
Using this function in a judicious manner can help you cut down on the mental energy required to manually review the address columns on each row. It is much better to mentally focus on 25% of the rows than 100%. The fewer rows you have to manually review in Excel, the less the chance of you making an error.
Please like and subscribe on the Anthony B. Smoak YouTube channel.
All views and opinions are solely my own and do not necessarily reflect those of my employer
Do Great Things with Your Data!
★☆★ Support this Channel: ★☆★
★☆★ FOLLOW ME BELOW: ★☆★
Twitter ► https://twitter.com/AnthonySmoak
Facebook ► https://www.facebook.com/AnthonyBSmoak/
Tableau Public ►Search for “Anthony B. Smoak”