Fuzzy logic - Damerau-Levenshtein

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !

Fuzzy logic - Damerau-Levenshtein

Postby HSeguin on Tue Jan 18, 2022 10:29 pm

I want to match names (including 2 fields: first name and last name). I found on this site the very useful APL functions (dist and fuzzy) based on the Levenshtein distance methodology that address this issue: https://dfns.dyalog.com/n_dist.htm.

I have various sources with peoples names that I am trying to match. Even if most of this info has been input by the people themselves, you are faced with a multitude of typing errors. If the Levenshtein distance is low (ex: 1) I would assume that there is a match.

Only problem is when there is a character inversion. With no other difference, the Levenshtein approach would calculate a value of 2. Ex: 'Hubert' dist 'Hubetr'.

I think that this is a bit high when in fact this is only 1 error.

I read that a variant is the Damerau-Levenshtein methodology that does exactly that.

Can anybody tell me if a modified version of the dist function exists somewhere?

Thanks!

Hubert Séguin
HSeguin
 
Posts: 8
Joined: Wed Feb 03, 2010 2:43 pm

Return to APL Chat

Who is online

Users browsing this forum: No registered users and 0 guests