# KNN¶

Neighbors-based models  predict perference from similar items or similar users. There are two kinds of neighbors-based models: user-based and item-based, which means whether predictions are made by similar users or items. In general, item-based model is better than user-based model since items’ characters are more consistent than users’ preferences.

Note

In this section, KNN models are introduced by user-based forms. Item-based models could be got by transposing positions of users and items.

## KNN for Explicit Feedback¶

For explicit feedback, neighbors-based models predict ratings from similar items or similar users.

### Definition¶

Items rated by two different users $$u$$ and $$v$$ are represented by $$I_u$$ and $$I_v$$. Users rated two different items $$i$$ and $$j$$ are represented by $$U_i$$ and $$U_j$$. The rating that user $$u$$ gives to item $$i$$ is $$r_{ui}$$ and the predicted rating is $$\hat r_{ui}$$.

### Similarity¶

Similarity metrics define nearest neighbors. There are several most used similarity functions:

#### Cosine¶

$\cos(u,v)=\frac{\sum\limits_{k\in|I_u\cap I_v|}r_{uk}\cdot r_{vk}}{\sqrt{\sum\limits_{k\in|I_u\cap I_v|}r_{uk}^2}\cdot\sqrt{\sum\limits_{k\in|I_u\cap I_v|}r_{vk}^2}}$

#### Pearson¶

Pearson similarity is similar to cosine similarity but ratings are subtracted by means first.

$\text{pearson}(u,v)=\frac{\sum\limits_{k\in|I_u\cap I_v|}(r_{uk}-\tilde r_u)\cdot (r_{vk}-\tilde r_v)}{\sqrt{\sum\limits_{k\in|I_u\cap I_v|}(r_{uk}-\tilde r_u)^2}\cdot\sqrt{\sum\limits_{k\in|I_u\cap I_v|}(r_{vk}-\tilde r_v)^2}}$

where $$\tilde r_u$$ is the mean of ratings rated by the user $$u$$:

$\tilde r_u = \sum_{k\in I_u} r_{uk}$

#### Mean Square Distance¶

The Mean Square Distance is

$\text{msd}(u,v)=\frac{1}{|I_u\cap I_v|}\sum_{k\in|I_u\cap I_v|}(r_{uk}-r_{vk})^2$

Then, the Mean Square Distance Similarity is

$\text{MSDSim}(u, v) = \frac{1}{\text{msd}(u, v) + 1}$

### Predict¶

A rating could be predict by k nearest neighbors $$\mathcal N_k(u)$$ (k users with top k similarities to user $$u$$).

$\hat r_{ui}=\frac{\sum_{v\in \mathcal N_k(u)}\text{sim}(u,v)r_{vi}}{\sum_{v\in \mathcal N_k(u)}\text{sim}(u,v)}$

The basic KNN prediction has some problem. There are more advanced methods which achieve higher accuracy.

#### KNN with Mean¶

Some users tend to give higher ratings but some users tend to give lower ratings. It’s resonable to substract ratings with the mean.

$\hat r_{ui}=\tilde r_u+\frac{\sum_{v\in \mathcal N_k(u)}\text{sim}(u,v)(r_{vi}-\tilde r_v)}{\sum_{v\in \mathcal N_k(u)}\text{sim}(u,v)}$

where $$\tilde r_v$$ is the mean of $$v$$-th user’s ratings.

#### KNN with Z-score¶

Different users give ratings with different scales, the solution is to standardlize ratings.

$\hat r_{ui}=\tilde r_u+\sigma(r_u)\frac{\sum_{v\in \mathcal N_k(u)}\text{sim}(u,v)\frac{r_{vj}-\tilde r_v}{\sigma(r_v)}}{\sum_{v\in \mathcal N_k(u)}\text{sim}(u,v)}$

where $$\sigma(r_v)$$ is the standard deviation of $$v$$-th user’s ratings.

#### KNN with Baseline¶

Ratings could be substract by biases from baseline model as well.

$\hat r_{ui}=b_u+\frac{\sum_{v\in \mathcal N_k(u)}\text{sim}(u,v)(r_{vi}- b_v)}{\sum_{v\in \mathcal N_k(u)}\text{sim}(u,v)}$

where $$b_u$$ is the bias comes from the baseline model $$\hat r_{ui}=b+b_u+b_i$$. The KNN model with baseline is the best model since biases are used.

## KNN for Implicit Feedback¶

For implicit feedback , neighbors-based models predict wheater a users will interact with a item from similar items or similar users.

### Definition¶

Items interacted with user $$u$$ are represented by $$I^+_u$$. Users interacted with item $$i$$ are represented by $$U^+_i$$. The confidence of predicting user $$u$$ will interact with item $$i$$ is $$\hat x_{ui}$$.

### Similarity¶

The similarity for implicit feedback is slightly different.

$c_{i, j}^{\operatorname{cosine}} :=\frac{\left|U_{i}^{+} \cap U_{j}^{+}\right|}{\sqrt{\left|U_{i}^{+}\right| \cdot\left|U_{j}^{+}\right|}}$

### Predict¶

The prediction is given by the sum of similarities.

$\hat{x}_{u i}=\sum_{l \in I_{u}^{+} \wedge l \neq i} c_{i l}$