#机器学习# 文章列表 浅入浅出:PageRank算法 使用 TextRank 算法为文本生成关键字和摘要 基于物品的协同过滤 如何使用MapReduce实现基于物品的协同过滤(1) 如何使用MapReduce实现基于物品的协同过滤(2) 浅入浅出:K近邻算法 使用mahout下的朴素贝叶斯分类器对新闻分类 使用Affinity Propagation进行聚类 K-medoids聚类 矩阵分解在推荐系统中的应用:NMF和经典SVD实战 使用特征递归消除筛选特征 如何分配权重 比较NMF、PCA和VQ 方差和协方差 基于SVD的协同过滤 逻辑斯谛回归代码实现 隐语义模型和NMF(非负矩阵分解) 使用PCA处理MNIST数据集 使用GBDT选取特征 基于贝叶斯的文本分类系统的数据库设计 在hadoop1.2.1上安装mahout 0.9 Hadoop 2.4 实现Kmeans聚类算法 在Iris数据集上对比PCA、LDA、NMF 基于贝叶斯的文本分类实战 单层决策树 Logistic regression(逻辑斯蒂回归) 基于用户的协同过滤 词袋模型与文档-词矩阵 如何实现拼音与汉字的互相转换 梯度下降法 如何判定相似度 MovieLens数据集介绍 基于KNN的文本分类实战 Jasper文本分类系列博客阅读摘录 使用 Mean Shift进行聚类 朴素贝叶斯的三个常用模型:高斯、多项式、伯努利 使用决策树处理iris数据集 浅入浅出:从Kmeans到Kmeans++ 如何持久化scikit-learn中训练好的模型 浅入浅出:DBSCAN聚类算法(1) 浅入浅出:DBSCAN聚类算法(2) 2015阿里移动推荐算法比赛第一赛季总结 爬山算法 使用朴素贝叶斯分类器划分邮件 层次聚类 基于MapReduce的频繁项集挖掘 搜狗实体关系提取比赛

使用Affinity Propagation进行聚类


#机器学习#


2014-09-21

Affinity Propagation,可以翻译为近邻传播

它的思路是通过迭代的方法计算出一个数据点更适合和另外哪一个数据点属于一个簇。

AP的论文:Clustering by Passing Messages Between Data Points

http://www.psi.toronto.edu/index.php?q=affinity%20propagation提供了相关的论文、数据集、源代码。

近邻传播聚类Affinity Propagation是一篇少有的关于AP的中文博文。

http://scikit-learn.org/stable/modules/clustering.html#affinity-propagation中如此介绍AP:

Algorithm description: The messages sent between points belong to one of two categories. The first is the responsibility [latex]r(i, k)[/latex], which is the accumulated evidence that sample k should be the exemplar for sample i. The second is the availability [latex]a(i, k)[/latex] which is the accumulated evidence that sample [latex]i[/latex] should choose sample [latex]k[/latex] to be its exemplar, and considers the values for all other samples that [latex]k[/latex] should be an exemplar. In this way, exemplars are chosen by samples if they are (1) similar enough to many samples and (2) chosen by many samples to be representative of themselves.

More formally, the responsibility of a sample [latex]k[/latex] to be the exemplar of sample [latex]i[/latex] is given by:

[latex]
r(i, k) \gets s(i,k) - max_{k' \neq k}[a(i+k') + s(i+k')]
[/latex]

Where [latex]s(i, k)[/latex] is the similarity between samples [latex]i[/latex] and [latex]k[/latex]. The availability of sample [latex]k[/latex] to be the exemplar of sample [latex]i[/latex] is given by:

[latex]
a(i, k) \gets min[0, r(k,k)+\sum_{i' \neq i, i' \neq k} r(i', k)]
[/latex]

To begin with, all values for [latex]r[/latex] and [latex]a[/latex] are set to zero, and the calculation of each iterates until convergence.

http://www.biomedcentral.com/1471-2105/10/99中如此介绍AP:

Affinity Propagation (AP) identifies cluster centers, or exemplars, from the graph, which in some sense are a representative member of the cluster. Initially, all nodes are considered as exemplars, though each node is manually assigned a "preference" that it should be chosen as an exemplar. If no prior knowledge is available on which nodes should be favored as exemplars, then all nodes can be assigned the same preference value – where the magnitude can be used to control cluster granularity. For each node i and each candidate exemplar k, AP computes the "responsibility" [latex]r(i, k)[/latex], which indicates how well suited [latex]k[/latex] is as an exemplar for i, and the "availability" [latex]a(i, k)[/latex] reflecting the evidence that [latex]i[/latex] should choose [latex]k[/latex] as an exemplar.

[latex]
r(i, k) \gets s(i,k) - max_{k' \neq k}[a(i+k') + s(i+k')]
[/latex]

[latex]
a(i, k) \gets min[0, r(k,k)+\sum_{i' \neq i, i' \neq k} r(i', k)]
[/latex]

Where the matrix [latex]s(i, k)[/latex] denotes the similarity (eg. edge weight) between the two nodes [latex]i[/latex] and [latex]k[/latex], and the diagonal of this matrix contains the preferences for each node. The above two equations are iterated until a good set of exemplars emerges. Each node [latex]i[/latex] can then be assigned to the exemplar [latex]k[/latex] which maximizes the sum [latex]a(i, k) + r(i, k)[/latex], and if [latex]i = k[/latex], then [latex]i[/latex] is an exemplar. A damping factor between 0 and 1 is used to control for numerical oscillations.

不过上面少介绍了[latex]a(k,k)[/latex]:

[latex]
a(k,k) \gets \sum_{i' \neq k} max(0, r(i', k)) [/latex]


( 本文完 )