# KNN¶

Neighbors-based models [1] predict perference from similar items or similar users. There are two kinds of neighbors-based models: user-based and item-based, which means whether predictions are made by similar users or items. In general, item-based model is better than user-based model since items’ characters are more consistent than users’ preferences.

Note

In this section, KNN models are introduced by user-based forms. Item-based models could be got by transposing positions of users and items.

## KNN for Explicit Feedback¶

For explicit feedback, neighbors-based models predict ratings from similar items or similar users.

### Hyperparameters¶

Key | Hyperparameter | Type | Description | Default |
---|---|---|---|---|

lr | Lr | float64 | learning rate (for baseline) | 0.005 |

reg | Reg | float64 | regularization strength (for baseline) | 0.02 |

n_epochs | NEpochs | int | number of epochs (for baseline) | 20 |

type | Type | string | type for KNN (`basic` , `centered` , `z_score` , `baseline` ) |
basic |

user_based | UserBased | bool | user based if true. otherwise item based | true |

similarity | Similarity | string | similarity metrics (`pearson` , `cosine` , `msd` ) |
msd |

k | K | int | number of neighbors | 40 |

min_k | MinK | int | least number of neighbors | 1 |

shrinkage | Shrinkage | int | shrinkage strength of similarity | 100 |

### Definition¶

Items rated by two different users \(u\) and \(v\) are represented by \(I_u\) and \(I_v\). Users rated two different items \(i\) and \(j\) are represented by \(U_i\) and \(U_j\). The rating that user \(u\) gives to item \(i\) is \(r_{ui}\) and the predicted rating is \(\hat r_{ui}\).

### Similarity¶

Similarity metrics define nearest neighbors. There are several most used similarity functions:

#### Cosine¶

#### Pearson¶

Pearson similarity is similar to cosine similarity but ratings are subtracted by means first.

where \(\tilde r_u\) is the mean of ratings rated by the user \(u\):

#### Mean Square Distance¶

The *Mean Square Distance* is

Then, the *Mean Square Distance Similarity* is

### Predict¶

A rating could be predict by k nearest neighbors \(\mathcal N_k(u)\) (k users with top k similarities to user \(u\)).

The basic KNN prediction has some problem. There are more advanced methods which achieve higher accuracy.

#### KNN with Mean¶

Some users tend to give higher ratings but some users tend to give lower ratings. It’s resonable to substract ratings with the mean.

where \(\tilde r_v\) is the mean of \(v\)-th user’s ratings.

#### KNN with Z-score¶

Different users give ratings with different scales, the solution is to standardlize ratings.

where \(\sigma(r_v)\) is the standard deviation of \(v\)-th user’s ratings.

#### KNN with Baseline¶

Ratings could be substract by biases from baseline model as well.

where \(b_u\) is the bias comes from the baseline model \(\hat r_{ui}=b+b_u+b_i\). The KNN model with baseline is the best model since biases are used.

## KNN for Implicit Feedback¶

For implicit feedback [2], neighbors-based models predict wheater a users will interact with a item from similar items or similar users.

### Hyperparameters¶

There are no hyperparameters for implicit version KNN.

### Definition¶

Items interacted with user \(u\) are represented by \(I^+_u\). Users interacted with item \(i\) are represented by \(U^+_i\). The confidence of predicting user \(u\) will interact with item \(i\) is \(\hat x_{ui}\).

### Similarity¶

The similarity for implicit feedback is slightly different.

### Predict¶

The prediction is given by the sum of similarities.

## References¶

[1] | Desrosiers, Christian, and George Karypis. “A comprehensive survey of neighborhood-based recommendation methods.” Recommender systems handbook. Springer, Boston, MA, 2011. 107-144. |

[2] | Rendle, Steffen, et al. “BPR: Bayesian personalized ranking from implicit feedback.” Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, 2009. |