Address Matching in Excel Using Levenshtein Distance

For my Data Analysts, in this video I will demonstrate how to perform a column comparison between two address fields so you don’t have to manually review every row. We’ll use a VBA function from Stack Overflow to provide the comparison results.

I should point out that Excel is NOT the preferred method for address matching, but sometimes it is your only option due to lack of time or better tools. Ideally, you should use address correction software that “fixes spelling errors, corrects abbreviations, and standardizes capitalization so each address in your list complies with the USPS official format” – (per the USPS). Once your addresses are standardized, THEN you should perform a comparison, but this rarely happens.

What typically happens is that some poor analyst like you is conscripted into performing address matching manually using some combination of SQL Server and manual Excel processes. That’s why a Google search led you to this page!

If you ever have to perform address matching in Excel, this could be you!

Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

For example:

  • The string “HAT” as compared to “hat” would have a Levenshtein Distance of 3
    • Since the function is case sensitive all three characters are different
  • The string “HAT” as compared to “BAT” would have a Levenshtein Distance of 1
    • To turn the first string into the second string it would take 1 substitution of characters (H changed to B or vice-versa)
  • The lower the number, the more the strings are similar
  • The higher the number, the more the strings are dissimilar.

Activate the Developer Tab in Excel

The Developer tab is the place to go when you want to do or use the following:

  • Write macros
  • Run macros that you previously recorded
  • Create VBA Modules and User Defined Functions <– This is our sweet spot
  1. On the File tab, go to Options > Customize Ribbon
  2. Under Customize the Ribbon and under Main Tabs, select the Developer check box

Create a Module in Excel

  1. On the Developer tab select Visual Basic
  2. In the VBA interface select Insert > Module

Insert Levenshtein Distance Function VBA Code

  1. Go to this link at Stack Overflow to view the code as originally referenced
  2. Or, simply copy the code below as developed by user “smirkingman” which is the first answer.
    • Big shoutout to “smirkingman” for this great resource!
Option Explicit
Public Function Levenshtein(s1 As String, s2 As String)

Dim i As Integer
Dim j As Integer
Dim l1 As Integer
Dim l2 As Integer
Dim d() As Integer
Dim min1 As Integer
Dim min2 As Integer

l1 = Len(s1)
l2 = Len(s2)
ReDim d(l1, l2)
For i = 0 To l1
    d(i, 0) = i
Next
For j = 0 To l2
    d(0, j) = j
Next
For i = 1 To l1
    For j = 1 To l2
        If Mid(s1, i, 1) = Mid(s2, j, 1) Then
            d(i, j) = d(i - 1, j - 1)
        Else
            min1 = d(i - 1, j) + 1
            min2 = d(i, j - 1) + 1
            If min2 < min1 Then
                min1 = min2
            End If
            min2 = d(i - 1, j - 1) + 1
            If min2 < min1 Then
                min1 = min2
            End If
            d(i, j) = min1
        End If
    Next
Next
Levenshtein = d(l1, l2)
End Function
  1. Paste the code into your newly created Excel module
  2. Debug > Compile VBAProject

You should not experience any errors after compiling the code.

Watch the Video to Use the Function

Using this function in a judicious manner can help you cut down on the mental energy required to manually review the address columns on each row. It is much better to mentally focus on 25% of the rows than 100%. The fewer rows you have to manually review in Excel, the less the chance of you making an error.


Please like and subscribe on the Anthony B. Smoak YouTube channel.

All views and opinions are solely my own and do not necessarily reflect those of my employer

Do Great Things with Your Data!



★☆★ Support this Channel: ★☆★

Merch ► shop.spreadshirt.com/AnthonySmoak

★☆★ FOLLOW ME BELOW: ★☆★

This image has an empty alt attribute; its file name is anthony-smoak-twitter.jpg

Twitter ► https://twitter.com/AnthonySmoak

Facebook ► https://www.facebook.com/AnthonyBSmoak/

Tableau Public ►Search for “Anthony B. Smoak”

Photo by Oladimeji Ajegbile from Pexels

Advertisement

How to Fix an Import Specification Error in Microsoft Access

There are certain aspects of Microsoft Access that can be downright frustrating and puzzling to debug. I want to share a tip with you that will hopefully save you hours of frustration. There is nothing more foundational than importing data into Microsoft Access so most likely you’ll appreciate the fix for this run-time error if you are attempting to use VBA.

If you encounter the following Microsoft Access Error:

“Run-Time error ‘3625’: The text file specification ‘My Saved Access Import Spec’ does not exist. You cannot import, export, or link using that specification.”

Most likely you have confused a saved set of “import steps” with a saved “Import/Export specification” while trying to use the Docmd.TransferText command; or at least I did.

Consider the following sample VBA code that uses the Docmd.TransferText command to import a delimited file (from a path stored in string variable strInputFileName) into a table named “tbl_Access_Import_Data” using an import specification.

Private Sub cmd_Import_Table_Click()

Dim strInputFileName As String
'Set Path to Local CSV File. This file will be imported into an Access Table.
strInputFileName = "C:\Users\Desktop\Access Data\Access_Import_Data"

' Use a Macro to Import a delimited file
' "My Saved Access Import Spec" = Import Spec
' "tbl_Access_Import_Data" = Destination Access Table
' strInputFileName = hardcoded path to source csv file

DoCmd.TransferText acImportDelim, "My Saved Access Import Spec", "tbl_Access_Import_Data", strInputFileName

End Sub

4. Error 3625 Edited 2

Let me show you where I went off track. I saved “import steps” and then tried to reference the saved “import steps” with the Docmd.TransferText method. You cannot reference “import steps” with this method, only “Import/Export specifications”.

1. Import Text Wizard Edited

I used the Import Text Wizard to define and delimit the columns in a specified .csv file and indicated the table I desired to have that data imported into. Afterwards, I pressed the finish button.

2. Import Text Wizard Blurred

Once I hit “Finish”, on the very next screen I saved the “import steps” that I previously defined. Notice the verbiage next to step 1 (i.e. “Save import steps”).

3. Saved Fake Spec Blurred

As you can see above, I created a saved “import step” erroneously named “My Saved Access Import Spec”. This name was the value that I erroneously passed to the Docmd.TransferText method in code.

4. Error 3625 Edited 2

These actions result in ‘Run-time error 3625’ that we will fix.

5. Import data Secification Edited

In order to save a legitimate Import/Export specification that can be successfully referenced with the Docmd.TransferText method, make sure to hit the “Advanced” button before you hit “Finish” when you come to the last window of the Import Text Wizard.

Make sure to hit “Save As” (Step 2 above) on the right hand side of the window.

6. Capture Edited

At this point, name and then save your true Import/Export specification name and hit “OK”.

Now when you come to the same window again you can hit the “Specs…” button to observe the names of all of the saved Import/Export specifications.

7. Specs Button Edited

In the pic above I only have 1 Import/Export specification named “My Real Saved Access Import Spec”.

7.5 Import Complete Edited 2

Observe, once the true Import/Export specification is referenced in VBA code, the code executes as intended.

Additional Tips

I am not aware of how to edit Import/Export specifications. The best advice that I have is to recreate and then overwrite the existing specification or save the new revised specification with a different name.

If you place the following SQL code in a blank Select Query, you can view all the true specification names along with field names and respective field widths.

SELECT
MSysIMEXSpecs.SpecName,
MSysIMEXColumns.FieldName,
MSysIMEXColumns.Start,
MSysIMEXColumns.Width,
MSysIMEXColumns.SkipColumn
FROM MSysIMEXColumns
INNER JOIN MSysIMEXSpecs
ON MSysIMEXColumns.SpecID = MSysIMEXSpecs.SpecID

ORDER BY MSysIMEXSpecs.SpecName,
MSysIMEXColumns.Start,
MSysIMEXColumns.Width;

8. SQL Results Edited

The results of that query from my example database are shown above. All due credit goes to stackoverflow for this SQL tip.

https://stackoverflow.com/questions/34295360/the-text-file-specification-does-not-exist-when-importing-into-access

3. Saved Fake Spec Blurred

Furthermore, if you are intent on referencing saved import steps in VBA code (not to be confused with the aforementioned Import/Export specification), then use the Docmd.RunSavedImportExport method.

To execute the “import step” shown in the picture above using VBA, I would use the following command:

DoCmd.RunSavedImportExport "My Saved Access Import Spec"

I hope this helps solve your “how to fix Run-Time error 3625 in Microsoft Access” question. Good luck!