String Similarity (Levenshtein)
From Erlang Community
(Difference between revisions)
| Revision as of 16:15, 27 October 2006 (edit) 213.171.204.166 (Talk) (→Solution) ← Previous diff |
Current revision (08:50, 4 December 2006) (edit) (undo) Andreas (Talk | contribs) m (Reverted edits by Kaiserpanda (Talk); changed back to last version by 213.171.204.166) |
||
| (One intermediate revision not shown.) | |||
Current revision
[edit] Problem
You need to compare two strings and get an index of how similar they are.
[edit] Solution
This module implements the Levenshtein edit distance algorithm (described more here). In short, it calculates the number of edit steps that are needed to transform the source string to the target string. The lesser the more similiar.
%%%============================================================================= %%% @author Adam Lindberg |
You can use the levenshtein function to compare two strings.
2> string_metrics:levenshtein("Aloha!", "Alhoa!").
2
3> string_metrics:levenshtein("adam", "Adam").
1
4> string_metrics:levenshtein("adam", "Assam").
3
5> string_metrics:levenshtein("teh", "the").
2
6> string_metrics:levenshtein("the", "the").
0
7> string_metrics:levenshtein("the", "").
3
|
Note that the function is not case insensitive (and the algorithm isn't either), though you can always use the httpd_util library to_lower or to_upper functions to put the two strings on an equal footing:
8> string_metrics:levenshtein(httpd_util:to_lower("Adam"), "adam").
0
|

Digg It
Del.icio.us
Reddit
Facebook
Stumble Upon
Technorati

