Data Preparation Basic Models - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

3. Cochinwala, M., Kurien, V., Lalk, G., Shasha, D.: Efficient data reconciliation. Inf. Sci. 137 (1-

4), 1-15 (2001)

4. Cohen, W.W.: Integration of heterogeneous databases without common domains using queries

based on textual similarity. In: Proceedings of the 1998 ACM SIGMOD International Confer-

ence on Management of Data. SIGMOD '98, pp. 201-212. New York (1998)

5. Dey, D., Sarkar, S., De, P.: Entity matching in heterogeneous databases: A distance based deci-

sion model. In: 31st Annual Hawaii International Conference on System Sciences (HICSS'98),

pp. 305-313 (1998)

6. Do, H.H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32 (6),

857-885 (2007)

7. Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A

machine-learning approach. In: Proceedings of the 2001 ACM SIGMOD International Con-

ference on Management of Data, SIGMOD '01, pp. 509-520 (2001)

8. Doan, A., Domingos, P., Halevy, A.: Learning to match the schemas of data sources: a multi-

strategy approach. Mach. Learn. 50 , 279-301 (2003)

9. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE

Trans. Knowl. Data Eng. 19 (1), 1-16 (2007)

10. Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64 , 1183-1210

(1969)

11. Gill, L.E.: OX-LINK: The Oxford medical record linkage system. In: Proceedings of the Inter-

national Record Linkage Workshop and Exposition, pp. 15-33 (1997)

12. Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Pietarinen, L.,

Srivastava, D.: Using q -grams in a DBMS for approximate string processing. IEEE Data Engi-

neering Bull. 24 (4), 28-34 (2001)

13. Guha, S., Koudas, N., Marathe, A., Srivastava, D.: Merging the results of approximate match

operations. In: Nascimento, M.A., Zsu, M.T., Kossmann, D., Miller, R.J., Blakeley, J.A.,

Schiefer, K.B. (eds.) VLDB. Morgan Kaufmann, San Francisco (2004)

14. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data

mining software: an update. SIGKDD Explor. Newsl. 11 (1), 10-18 (2009)

15. Han, J., Kamber, M., Pei, J.: Data mining: concepts and techniques. The Morgan Kaufmann

Series in Data Management Systems, 2nd edn. Morgan Kaufmann, San Francisco (2006)

16. Hulse, J., Khoshgoftaar, T., Huang, H.: The pairwise attribute noise detection algorithm. Knowl.

Inf. Syst. 11 (2), 171-190 (2007)

17. Jaro, M.A.: Unimatch: A record linkage system: User's manual. Technical report (1976)

18. Joachims, T.: Advances in kernel methods. In: Making Large-scale Support Vector Machine

Learning Practical, pp. 169-184. MIT Press, Cambridge (1999)

19. Johnson, R.A., Wichern, D.W.: AppliedMultivariate Statistical Analysis. Prentice-Hall, Engle-

wood Cliffs (2001)

20. Kim, W., Choi, B.J., Hong, E.K., Kim, S.K., Lee, D.: A taxonomy of dirty data. Data Min.

Knowl. Disc. 7 (1), 81-99 (2003)

21. Koudas, N., Marathe, A., Srivastava, D.: Flexible string matching against large databases in

practice. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases,

VLDB '04, vol. 30, pp. 1078-1086. (2004)

22. Kukich, K.: Techniques for automatically correcting words in text. ACM Comput. Surv. 24 (4),

377-439 (1992)

23. Levenshtein, V.: Binary codes capable of correcting deletions. Insertions Reversals Sov. Phys.

Doklady 163 , 845-848 (1965)

24. Lin, T.Y.: Attribute transformations for data mining I: theoretical explorations. Int. J. Intell.

Syst. 17 (2), 213-222 (2002)

25. McCallum, A., Wellner, B.: Conditional models of identity uncertainty with application to

noun coreference. Advances in Neural Information Processing Systems 17, pp. 905-912. MIT

Press, Cambridge (2005)

26. Monge, A.E., Elkan, C.: The fieldmatching problem: algorithms and applications. In: Simoudis,

E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowl-

edgeDiscovery andDataMining (KDD-96), pp. 267-270. KDD, Portland, Oregon, USA (1996)

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home