Information Technology Reference
In-Depth Information
istic for Web page datasets. We then use the threshold as a stopping factor in our clustering process to
automatically discover the number of clusters in Web page datasets.
Having the new stopping factor for the Web domain together with the new bi-directional hierarchi-
cal clustering algorithm, we have developed a clustering system suitable for mining the Web. We are
working to incorporate the new clustering system into our information classification and search engine
(Baberwal & Choi, 2004; Choi, 2001; Choi & Dhawan, 2004; Choi & Guo, 2003; Choi & Peng 2004;
Yao & Choi, 2003, 2005, 2007; Chen & Choi, 2008).
The future of Web mining is moving toward Semantic Web. Future works include developing new
systems for automatically extracting useful information from the Web and creating new systems to use
the vast amount of information contained on the Web.
acknowledg
Ment
This research was funded in part by a grant from the Center for Entrepreneurship and Information
Technology (CEnIT), Louisiana Tech University.
references
Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (1998, June). Automatic subspace clustering
for high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD
Conference on Management of Data , Seattle, WA.
Antoniou, G., & van Harmelen, F. (2004). A Semantic Web Primer . Cambridge, MA: The MIT Press.
Baberwal, S., & Choi, B. (2004, November). Speeding up keyword search for search engines. The
3rd IASTED International Conference on Communications, Internet, and Information Technology.
St.Thomas, VI.
Berkhin, P. (2002). Survey of clustering data mining technique s. (Tech. Rep.). San Jose, CA: Accrue
Software.
Berry, M. W., & Browne, M. (1999). Understanding Search Engines. Philadelphia: SIAM.
Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statis-
tics, 3 (1), 1-27.
Chen, G., & Choi, B. (2008, March). Web page genre classification. The 23rd Annual ACM Symposium
on Applied Computing , CearĂ¡, Brazil.
Choi, B. (2001, October). Making sense of search results by automatic Webpage classifications. WebNet
2001 World Conference on the WWW and Internet , Orlando, FL.
Choi, B., & Dhawan, R. (2004, September). Agent space architecture for search engines. The 2004
IEEE/WIC/ACM International Conference on Intelligent Agent Technology , Beijing, China.
Search WWH ::




Custom Search