Projects done/doing by HYUK CHO

(Organized in reverse chronological order)

Current Projects

  • Co-clustering Algorithms
    • June 2003 ~ Current.
    • Preprocessing for Co-clustering Algorithms
    • Initialization for Co-clustering Algorithms
    • New Co-clustering Algorithms
    • Applications of Co-clustering Algorithms
    • Evaluation of Co-clustering Algorithms
  • Research Projects

  • Minimum Squared Residue Co-clustering of Gene Expression Data
    • June 2003 ~ December 2003.
    • With Dr. Inderjit S. Dhillon, Yuqiang Guan, and Suvrit Sra.
    • Datasets, Reports, and Programs are available upon request.

  • MDL-Based Formulation of Distributional Clustering
    • September 2002 ~ May 2003.
    • With Dr. Inderjit S. Dhillon, Manyam, and Dr. Byron Dom(IBM).
    • Apply MDL(Minimum Description Length) formulation in order to predict optimal number of feature clusters from Distributional Clustering for better classification of text documents.
    • Use NEWS20 data and some other artificial data for experiments.
    • Datasets, Reports, and Programs are available upon request.

  • Semisupervised Learning for Classification of Large Text Data using Feature Clustering
    • May 2002 ~ August 2002.
    • CS395T - Conference Course (Dr. Inderjit Dhillon)
    • Do feature clustering for better classification of text documents for which only small training data's labels are available, and make use of label information to enhance classification accuracy.
    • Apply different distance (or similarity or divergence) measures.
    • Use NEWS20 data for experiments.
    • Datasets, Reports, and Programs are available upon request.

  • Comparisons on Classification Algorithms
    • January 2001 ~ December 2001.
    • CS395T - Conference Course (Dr. Inderjit Dhillon).
    • With Yancong Zhou.
    • Implement in MATLAB well-known classification algorithms: Naive Bayesian(NB), K-Nearest Neighbor(KNN), Centroid-based(CB), and Support Vector Machine(SVM).
    • Propose several variations of algorithms and compare their classification performance in accuracy, precision, and recall classification measure.
    • Use CLASSIC3 and NEWS20 data for experiments.
    • Project Report Webpage
    • Datasets, Reports, and Programs are available upon request.

  • Text Mining: Clustering and Querying
    • January 2001 ~ December 2001.
    • CS395T - Conference Course (Dr. Inderjit Dhillon)
    • Do feature clustering for better query retrieval
    • Apply different normalizations and query expansions to Keyword Matching(KM) and Generalized Vector Space Model (VSM) for query retrieval
    • Use FBIS and LATIMES data (in TREC) for experiments.
    • Datasets, Reports, and Programs are available upon request.

  • Comparisons on Partitioning Algorithms
    • January 2000 ~ May 2000.
    • CS395T - Conference Course (Dr. Inderjit Dhillon)
    • Use METIS and hMETIS for text document clustering (here we do partitioning.).
    • Modifying METIS is essential.
    • Converting VSM into graph model.
    • Compare with other graph partitioning algorithms such as CHACO and FM
    • Datasets, Reports, and Programs are available upon request.

  • Spectral Graph Partitioning
    • August 1999 ~ May 2000.
    • CS395T - Conference Course (Dr. Inderjit Dhillon)
    • Use existing Lanczos-based software for computing eigenvectors of both adjacency and Laplacian matrices
    • Some experience with Lanczos algorithm (in FORTRAN and C) is essential.
    • Apply spectral algorithms to special graphs such as Clique and Roach graphs.
    • Datasets, Reports, and Programs are available upon request.

  • Design and Application of Intelligent System Using Clustering Technique and Evolution Program
    • July 1997 ~ July 1998.
    • With Dr. Daihee Park and Dr. Jooyoung Park.
    • Koran University Research Foundation

  • Efficient Clustering Algorithm
    • April 1997 ~ March 1998.
    • With Dr. Daihee Park and Dr. Jooyoung Park.
    • Korea University Research Foundation.

  • An Optimal Design Procedure for BSB(Brain-State-in-a-Box) Neural Networks
    • August 1996 ~ July 1997.
    • With Dr. Daihee Park and Dr. Jooyoung Park.
    • Korea Research Foundation.
  • Class Projects

  • Feature Clustering on Clustering and Classification of .GOV TRECWeb Data
    • January 2003 ~ May 2003.
    • EE380L - Practicum in Data Mining (Dr. Joydeep Ghosh)
    • With Alex(EE) and Rajal(ME).
    • Evaluate how feature clustering affects on both clustering and classification of huge text datasets.
    • Apply different distance (or similarity) measures of document clustering algorithms and compare feature selection vs. feature clustering.
    • Use .GOV TRECWeb data and some other artificial data for experiments.
    • Datasets, Reports, and Programs are available upon request.

  • SCHEME Interpreter using Java Language
    • January 2002 ~ May 2002.
    • EE386L - Programming Languages (Dr. Greg Lavender)
    • Datasets, Reports, and Programs are available upon request.

  • Classification Algorithms on Gene Expression Data
    • August 2001 ~ December 2001.
    • CH391L - Bioinformatics (Dr. Edward M. Marcotte)
    • Compare the performance of supervised machine learning algorithms for classification based on gene expression data.
    • Evaluate the feasibility and performance of traditional classification algorithms.
    • Project Report Webpage
    • Datasets, Reports, and Programs are available upon request.

  • Evaluation of Algorithms on Document Retrieval
    • August 2000 ~ December 2000.
    • EE380L - Data Mining (Dr. Joydeep Ghosh)
    • Use SVDPACKC for Singular Value Decomposition(SVD) of Vector Space Model(VSM) for query retrieval.
    • Modifying SVDPACKC is essential to get sparse matrix storage format of Compressed Column Storage(CCS).
    • Use CLASSIC3 data(CISI, CRAN, MED) for experiments.
    • Datasets, Reports, and Programs are available upon request.