Database Reference
In-Depth Information
How to do it…
We'll irst deine a function to test for fuzzy equality. Then, we'll write another function that
uses fuzzy equality to test whether two records match.
1.
Here are the main parameters for fuzzy string matching. We'll see how to use these
later in the recipe:
(def fuzzy-max-diff 2)
(def fuzzy-percent-diff 0.1)
(def fuzzy-dist edit-distance)
2. Now, we can deine a function that uses these parameters to determine whether
two strings are equal to each other:
(defn fuzzy= [a b]
(let [dist (fuzzy-dist a b)]
(or (<= dist fuzzy-max-diff)
(<= (/ dist (min (count a) (count b)))
fuzzy-percent-diff))))
3.
Building on this, we can write a function that determines whether two records are
the same. It also takes one or more key functions, which returns the values that the
items should be compared on:
(defn records-match
[key-fn a b]
(let [kfns (if (sequential? key-fn) key-fn [key-fn])
rfn (fn [prev next-fn]
(and prev (fuzzy= (next-fn a)
(next-fn b))))]
(reduce rfn true kfns)))
4.
These should allow you to test whether two records are approximately equal.
Let's create some data to test this out:
(def data
{:mulder {:given-name "Fox" :surname "Mulder"}
:molder {:given-name "Fox" :surname "Molder"}
:mulder2 {:given-name "fox" :surname "mulder"}
:scully {:given-name "Dana" :surname "Scully"}
:scully2 {:given-name "Dan" :surname "Scully"}})
 
Search WWH ::




Custom Search