Progress in Bioinformatics World: Notes of Bioinformatics V

Notes of Genomic Data Science and Clustering (Bioinformatics V)

Chapter 9: How Did Yeast Become a Wine Maker? (Clustering Algorithms)

In this chapter, we learned the clustering algorithms in bioinformatics study.

Some concepts:

gene expression analysis.

Good clustering Problem.

k-Means Clustering Problem: optimization problem

- Farthest First Traversal Problem

- Squared error distortion

- Lloyd Algorithm for k-means clustering: two steps: centers to clusters and clusters to centers.

- limitations: make a "hard" assignment of each point to only one cluster.

Soft k-means clustering:

- Expectation Maximization Algorithm: starts with a random choice of Parameters. It then alternates between the E-step, in which we compute a responsibility matrix HiddenMatrix for Data given Parameters, and the M-step, in which we re-estimate Parameters using HiddenMatrix.

- Centers to Soft Clusters (E-step) AND Soft Clusters to Centers (M-step)

- SoftKMeans Problem

Hierarchical Clustering

- how to use distance matrix to partition genes into clusters

- UPGMA in disguise

The Python implementations of all the algorithms are available on my github: https://github.com/aprilchunyuzhao/BioinformaticsFromCoursera.

Progress in Bioinformatics World

Friday, January 8, 2016

Notes of Bioinformatics V

No comments:

Post a Comment