next up previous
Next: Correspondence via co-embedding prototypes Up: Algorithm Previous: Bipartite graph coclustering

Document-keyword correspondence

Our goal is to cluster documents by identifying exclusive correlations between keywords and documents. We can think of this problem as graph editing. Edges incident on common and random keywords must be removed, otherwise the entire graph would be densely connected. Edges incident on informative keywords should be preserved, otherwise the graph would be sparsely connected. A brute force strategy would be to search through all possible addition and deletion of edges such that the resulting graph partitioning is neither too coarse nor too fine.

To make this task precise we define a correspondence function $CP(p_i,v_j) \in [0,1]$ between a prototype feature $p_i$ and video segment $v_j$ with following properties: $CP(p_i,v_j)$ should be consistent through transitivity, i.e. $CP(p_i,v_j)$ should be high if there is a chain of correspondences linking them indirectly. We can express this consistency constraint using the notion of transitive closure. To allow continuous values in $CP(p_i,v_j)$, we introduce $\alpha$-transitive-closure: $\forall_{i_1,i_2,j_1,j_2}$ and $\forall_{\delta \in
[0,1]} [CP(p_{i_1},v_{j_1}), CP(p_{i_1},v_{j_2}), CP(p_{i_2}, v_{j_1}) >
1-\delta] \Rightarrow [CP(p_{i_2},v_{j_2}) > 1 - \alpha \delta]$. Transitive closure assures that no random or common features are in any correspondence relationship.

In addition, $CP(p_i,v_j)$ should be close to the co-occurrence $C(p_i,v_j)$, i.e. the graph editing should be minimal. We maximize

\begin{displaymath}
E_{C}(CP) = \sum_{p_i \in \mathbf{P},v_j \in \mathbf{V}} C(p_i,v_j) CP(p_i,v_j)
\end{displaymath} (1)

under certain constraints that prevent the trivial solution.


next up previous
Next: Correspondence via co-embedding prototypes Up: Algorithm Previous: Bipartite graph coclustering
Mirko Visontai 2004-05-13