Suppose we want to compute K hidden or latent features. Our task is to find out the matrices PU x K and QD x K U – Users, D – Movies such that P x QT approximates R which is the rating matrix. Now, each row of P will represent strength of association between user and the feature while each row of Q represents the same strength w. r.
t. the movie. To get the rating of a movie dj rated by user ui, we can calculate the dot product of 2 vectors corresponding to ui and dj All we need to do now is calculate P and Q matrices. We use gradient descent algorithm for doing this. The objective is to minimize the squared error between the actual rating and the one estimated by P and Q. The squared error is given by the following equation.
FM solves the problem of considering pairwise feature interactions. It allows us to train, based on reliable information latent features from every pairwise combination of features in the model. FM also allows us to do this in an efficient way both in terms of time and space complexity.