| |
Name Matching
A lot of different applications require name matching functionality.
Some examples include customer account management, medical record searching,
duplicate database record identification and in general any form of information
retrieval.
There are a variety of approaches to name matching such as exact matching,
wildcarding, fuzzy matching, alphanumeric matching, soundex or other
forms of phonetic matching, string distance metrics, probabilistic record
linkage, knowledge-based expertise and learning. Each of these approaches
has advantages and disadvantages, some work for specific case, others
are able to meet more specialized applications and business requirements.
The best name matching technique, off course, is employing live data
operators to sift through the records and make an intelligent decision
if the records are duplicate using human cognitive abilities. Employing
a big number of data operators and having them work through millions
of records is very costly and time consuming. The problem gets even worse
if this has to be done weekly or monthly.
Intelligent Search Technology has developed searching and matching
software that produces an intelligent numeric score that determines
the likelihood of a match between two records. NameSearch® -
our searching and matching software engine, utilizes a combination
of different matching techniques such as fuzzy matching, phonetic
matching, knowledge-based expertise and advanced heuristic algorithms
to arrive at the matching scores. The production of the scores
mimics the way in which people go about producing a numeric score
that determines the likelihood of a match between two input strings
such as personal names, company names, addresses, account numbers,
etc.
NameSearch® Matching Algorithms
ALFACOMP
This is used to compare fields containing multi word strings. The ALFACOMP routine
is based solely on a heuristic algorithm and is not dependent on rulebase
expertise.
COMP, COMP1, COMP2
These are NameSearch®'s comparison routines used for scoring names and addresses.
These routines utilize NameSearch®’s rulebase expertise and phonetic
tokenization to determine scores. Comp was the original comparison routine released
with version
1 of the NameSearch® product. COMP1 was introduced in Version 2.0 of the
NameSearch® product in order to provide a more representative score. In Version
2.5 of the NameSearch product COMP2 was added. This routine uses ALFACOMP, an
advanced heuristic
algorithm, to arrive at its results.
DATESCR
The Date Score is used for comparison of two dates. The DATESCR
comparison routine uses rulebase expertise in order to arrive
at its’ results. For
example, July 28, 1965 compared to 7/28/66 would yield a score of 100 in this
matter the DATESCR routine overcomes problems due to inconsistency in date
format. The routine also accepts several parameters which will dictate the
penalty for mis-matches based on the year. By increasing these settings the
score can be made more tolerant. For example, if you want all dates that correspond
to July 28, 1965 + or - two years, you would set the year range to 2. If you
wish it to be + or - five years this would mean your year range would be set
to 5. In this manner NameSearch® gives you the ability to widen or narrow the
range of dates being returned given the month a day agrees.
NUMCOMP
This is a comparison routine used for evaluating Alpha numeric
strings. For example this routine is well suited for social security
numbers, account numbers, phone numbers.
|