CoClustering

CoClustering [1] predicts ratings by clustering users and items.

Definition

Let \(\mathcal{U} = \{u\}^m_{u=1}\) be the set of users such that \(|\mathcal{U}|=m\) and \(\mathcal{P} = \{i\}^n_{i=1}\) be the set of items such that \(|\mathcal{P}|=n\). Let \(A\) be the \(m\times n\) ratings matrix such that \(A_{ui}\) is the rating of the user \(u\) and let \(W\) be the \(m\times n\) matrix corresponding to the confidence of the ratings in \(A\). In the absence of explicit confidence information, we assume \(W_{ij} = 1\) when the rating is known and \(0\) otherwise.

Let \(\rho:\{1,\cdots,m\} \rightarrow \{1,\cdots,k\}\) and \(\gamma:\{1,\cdots,n\} \rightarrow \{1,\cdots,l\}\) denote the user and item clustering where \(k\) and \(l\) are number of user and item clusters.

\[\hat{A}_{ui}=A^{COC}_{gh}+(A^R_u-A^{RC}_g)+(A^C_i-A^{CC}_h)\]

where \(g=\rho(u)\), \(h=\gamma(i)\) and \(A^R_u\), \(A^C_i\) are the average ratings of user \(u\) and item \(i\), and \(A^{COC}_{gh}\), \(A^R_g\) and \(A^{CC}_h\) are the average ratings of the corresponding cocluster, user-cluster and item-cluster respectively, i.e.,

\[A_{g h}^{C O C}=\frac{\sum_{u^{\prime} | \rho\left(u^{\prime}\right)=g} \sum_{i^{\prime} | \gamma\left(i^{\prime}\right)=h} A_{u^{\prime} i^{\prime}}}{\sum_{u^{\prime} | \rho\left(u^{\prime}\right)=g} \sum_{i^{\prime} | \gamma\left(i^{\prime}\right)=h} W_{u^{\prime} i^{\prime}}}\]
\[A_{g}^{R C}=\frac{\sum_{u^{\prime} | \rho\left(u^{\prime}\right)=g} \sum_{i^{\prime}=1}^{n} A_{u^{\prime} i^{\prime}}}{\sum_{u^{\prime} | \rho\left(u^{\prime}\right)=g} \sum_{i^{\prime}=1}^{n} W_{u^{\prime} i^{\prime}}}\]
\[A_{h}^{C C}=\frac{\sum_{u^{\prime}=1}^{m} \sum_{i^{\prime} | \gamma\left(i^{\prime}\right)=h} A_{u^{\prime} i^{\prime}}}{\sum_{u^{\prime}=1}^{m} \sum_{i^{\prime} | \gamma\left(i^{\prime}\right)=h} W_{u^{\prime} i^{\prime}}}\]
\[A_{u}^{R}=\frac{\sum_{i^{\prime}=1}^{n} A_{u i^{\prime}}}{\sum_{i^{\prime}=1}^{n} W_{u i^{\prime}}} A_{i}^{C}=\frac{\sum_{u^{\prime}=1}^{m} A_{u^{\prime} i}}{\sum_{u^{\prime}=1}^{m} W_{u^{\prime} i}}\]

Using \(\hat A\), we can now pose the prediction of unknown ratings as a co-clustering problem where we seek to find the optimal user and item clustering \((\rho,\gamma)\) such that the approximation error of \(A\) (which is a function of \((\rho,\gamma)\)) with respect to the known ratings of \(A\) is minimized, i.e.,

\[\min _{(\rho, \gamma)} \sum_{u=1}^{m} \sum_{i=1}^{n} W_{u i}\left(A_{u i}-\hat{A}_{u i}\right)^{2}\]

Training

Input: Ratings Matrix ŸA , Non-zeros matrix ŸW , #row clusters \(l\), #col. clusters \(k\).

Output: Locally optimal co-clustering \((\rho,\gamma)\) and averages \(A^{COC}\), \(A^{RC}\), \(A^{CC}\), \(A^R\) and \(A^C\).

Method:

  • Randomly initialize \((\rho,\gamma)\).

  • repeat

    • Compute averages \(A^{COC}\), \(A^{RC}\), \(A^{CC}\), \(A^R\) and \(A^C\).

    • Update row cluster assignments

      \[\rho(u)=\underset{1 \leq g \leq k}{\operatorname{argmin}} \sum_{i=1}^{n} W_{u i} \left( A_{u i} - A_{g \gamma(j)}^{C O C} -A_{i}^{R}+A_{g}^{R C} -A_{j}^{C}+A_{\gamma(j)}^{C C} \right)^{2}, \quad 1 \leq i \leq m\]
    • Update column cluster assignments

      \[\gamma(j)=\underset{1 \leq h \leq l}{\operatorname{argmin}} \sum_{i=1}^{m} W_{i j}\left(A_{i j}-A_{\rho(i) h}^{C O C}-A_{i}^{R}+A_{\rho(i)}^{R C}-A_{j}^{C}+A_{h}^{C C} \right)^{2}, \quad 1 \leq j \leq n\]
  • until convergence

Prediction

Input: ž \(A^{COC}\), \(A^{RC}\), \(A^{CC}\), \(A^R\) , \(A^C\), \(A^{glob}\), co-clustering \((\rho,\gamma)\) user set \(\mathcal U\), item set \(\mathcal P\), user \(u\), item \(i\).

Output: Rating \(r\)

Method:

  • Case \(u\in\mathcal U\) and \(i\in\mathcal P\): // old user-old item
\[g \leftarrow \rho(i), h \leftarrow \gamma(j)\]
\[r \leftarrow A_{i}^{R}+A_{j}^{C}-A_{g}^{R C}-A_{h}^{C C}+A_{g h}^{C o C}\]
  • Case \(u\in\mathcal U\) and \(i\notin\mathcal P\): // old user-new item
\[g \leftarrow \rho(i), r \leftarrow A_{i}^{R}\]
  • Case \(u\notin\mathcal U\) and \(i\in\mathcal P\): // new user-old item
\[h \leftarrow \gamma(j), r \leftarrow A_{j}^{C}\]
  • Case \(u\notin\mathcal U\) and \(i\notin\mathcal P\): // new user-new item
\[r \leftarrow A^{g l o b}\]

References

[1]George, Thomas, and Srujana Merugu. “A scalable collaborative filtering framework based on co-clustering.” Data Mining, Fifth IEEE international conference on. IEEE, 2005.