Information Technology Reference
In-Depth Information
Uniform Residential Loan Application I
TYPE GF'MORTGAGE-AHD TERMS
OF LOAN Mortgage flvAB—Ccnvniltiai
Applied fof: E¿HA Agency Case Num-
ber Lender Case Number Amount 5
I No. of Months Amortization Fixed
Rate Typo: I Other (explain): I ARM
(type): / LOAN Subject Property Ad-
dresi (street, city, state, ZIP) Legal De-
scripllon of Subject Property (attach
description If necessary) No. of Units
Year Built Purpose of Loan Construc-
tion Construction-Permanent Otner (ex-
plain): Property will be: Primary ” ”an.
Compl&te this line If construction or
construction-permanent loan. Secondary
Investment Year Lot Acquired Original
Cost S Amount Exjsling Uens $ (a)
Present Value of Lot $ (b) Cost o( Im-
provements $ Total (a+b) S Complete
this line if this Is a ra/fiuncfl loan. Year
Acquired Original Cost Amount Existing
Uens Title will be held in what Name(s)
Purpose of Refinance Describe Improva-
manU
Fig. 8.1. Image and OCR text from a sample loan application. While the forms
themselves are authentic, we redacted the information contained on them to ensure
privacy.
It has been shown [7] that, for certain classifiers and texts, these abstractions do
not reduce accuracy. In addition, abstraction reduces the number of parameters that
need to be estimated during training, which in turn reduces the number of training
samples that need to be provided. This aspect is of particular importance for us.
In order to deploy a separation solution, customers must prepare a certain number
of samples for each document type. Given the classification technology outlined in
section 8.4.2, we achieve acceptable results with as little as twenty to thirty examples
per document type. If we were using a classifier that takes word sequence information
into account, for instance a Bayesian classifier over word n -grams, we would need
hundreds of samples per document type; this would pose a severe entrance barrier
for customers. 5
5 However, for certain types of problems, sequence-aware modeling is superior and
even necessary. In one deployment, we encountered a fixed form with two broad
columns into which data could be entered. Depending on whether only one column
or both were filled out, the documents were categorized as different types. The
classification model had di culties distinguishing between these two document
types. In an experiment, we collected enough sample data to train a word n -gram
classifier and were then able to reliably assess the correct type.
 
Search WWH ::




Custom Search