Notes of Genomic Data Science and Clustering (Bioinformatics V)
Chapter 9: How Did Yeast Become a Wine Maker? (Clustering Algorithms)
In this chapter, we learned the clustering algorithms in bioinformatics study.
Some concepts:
gene expression analysis.
Good clustering Problem.
k-Means Clustering Problem: optimization problem
- Farthest First Traversal Problem
- Squared error distortion
- Lloyd Algorithm for k-means clustering: two steps: centers to clusters and clusters to centers.
- limitations: make a "hard" assignment of each point to only one cluster.
Soft k-means clustering:
- Expectation Maximization Algorithm: starts with a random choice of Parameters. It then alternates between the E-step, in which we compute a responsibility matrix HiddenMatrix for Data given Parameters, and the M-step, in which we re-estimate Parameters using HiddenMatrix.
- Centers to Soft Clusters (E-step) AND Soft Clusters to Centers (M-step)
- SoftKMeans Problem
Hierarchical Clustering
- how to use distance matrix to partition genes into clusters
- UPGMA in disguise
The Python implementations of all the algorithms are available on my github: https://github.com/aprilchunyuzhao/BioinformaticsFromCoursera.
No comments:
Post a Comment