Database Reference
In-Depth Information
5.
Now, we can test some of these for equality :
user=> (records-match [:given-name :surname]
(data :mulder) (data :molder))
true
user=> (records-match [:given-name :surname]
(data :mulder) (data :mulder2))
true
user=> (records-match [:given-name :surname]
(data :scully) (data :scully2))
true
user=> (records-match [:given-name :surname]
(data :mulder) (data :scully))
false
How it works…
The fuzzy string matching function uses several parameters. Let's take a look at each
individually:
(def fuzzy-dist edit-distance)
fuzzy-dist is a function that returns a similarity metric for the two strings. Lower numbers
indicate that the two strings are more similar. In this case, we're using clj-diff.core/
edit-distance . The edit-distance parameter is the number of editing operations
(usually inserting and deleting a single character, with a change being a combination of
these) required to transform one string into another. For example, here are the edit
distances of a few simple strings:
user=> (edit-distance "abc" "bc")
1
user=> (edit-distance "abc" "abcd")
1
user=> (edit-distance "abc" "bac")
2
user=> (edit-distance "abc" "bbc")
2
Based off of these values, the maximum allowable distance is determined by the following
two parameters:
(def fuzzy-max-diff 2)
First, for equality, the distance has to be at most fuzzy-max-diff . Setting it to 2 allows
replacements, which are generally two changes (delete and insert):
(def fuzzy-percent-diff 0.1)
 
Search WWH ::




Custom Search