Friday, January 8, 2016

Notes of Bioinformatics V

Notes of Genomic Data Science and Clustering (Bioinformatics V)

Chapter 9: How Did Yeast Become a Wine Maker? (Clustering Algorithms)

In this chapter, we learned the clustering algorithms in bioinformatics study. 

Some concepts:
gene expression analysis.
Good clustering Problem.

k-Means Clustering Problem: optimization problem
- Farthest First Traversal Problem
- Squared error distortion
- Lloyd Algorithm for k-means clustering: two steps: centers to clusters and clusters to centers.
- limitations: make a "hard" assignment of each point to only one cluster.

Soft k-means clustering:
- Expectation Maximization Algorithm: starts with a random choice of Parameters. It then alternates between the E-step, in which we compute a responsibility matrix HiddenMatrix for Data given Parameters, and the M-step, in which we re-estimate Parameters using HiddenMatrix.
- Centers to Soft Clusters (E-step) AND Soft Clusters to Centers (M-step)
- SoftKMeans Problem

Hierarchical Clustering
- how to use distance matrix to partition genes into clusters
- UPGMA in disguise

The Python implementations of all the algorithms are available on my github: https://github.com/aprilchunyuzhao/BioinformaticsFromCoursera.

No comments:

Post a Comment