Information Filtering via a Scaling-Based Function

Tian Qiu; Zi-Ke Zhang; Guang Chen

doi:10.1371/journal.pone.0063531

Abstract

Finding a universal description of the algorithm optimization is one of the key challenges in personalized recommendation. In this article, for the first time, we introduce a scaling-based algorithm (SCL) independent of recommendation list length based on a hybrid algorithm of heat conduction and mass diffusion, by finding out the scaling function for the tunable parameter and object average degree. The optimal value of the tunable parameter can be abstracted from the scaling function, which is heterogeneous for the individual object. Experimental results obtained from three real datasets, Netflix, MovieLens and RYM, show that the SCL is highly accurate in recommendation. More importantly, compared with a number of excellent algorithms, including the mass diffusion method, the original hybrid method, and even an improved version of the hybrid method, the SCL algorithm remarkably promotes the personalized recommendation in three other aspects: solving the accuracy-diversity dilemma, presenting a high novelty, and solving the key challenge of cold start problem.

Citation: Qiu T, Zhang Z-K, Chen G (2013) Information Filtering via a Scaling-Based Function. PLoS ONE 8(5): e63531. https://doi.org/10.1371/journal.pone.0063531

Editor: Danilo Roccatano, Jacobs University Bremen, Germany

Received: December 3, 2012; Accepted: April 2, 2013; Published: May 17, 2013

Copyright: © 2013 Qiu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The work was supported by the National Natural Science Foundation of China (grant nos. 11175079 and 11105024). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Favored by increasing information, people can enjoy an abundant life. However, people are also brought into a quandary decision of getting what they actually prefer. For example, how to select a satisfactory dress from various dress brands, or get an interesting book to read from the book sea. As a powerful tool, recommendation engine emerges to help people out of the overloaded information [1]. With the need of personalized recommendation, developing efficient recommendation methods has become one of the central scientific programs.

A great many algorithms have been proposed, and have led to a considerable progress, such as the collaborative filtering (CF) algorithms [2], [3] which can be further divided into memory-based [4]–[6] and model-based methods [7]–[10], content-based algorithms [11]–[14], and the relevant extensive studies [15]–[21]. Recently, favored by the fruitful achievements of complexity theory, complex-network-based recommendation algorithms have been proposed [21], [22], which directs a promising way for the personalized recommendation [23]–[35]. Meanwhile, concepts from traditional physical domain have been introduced into the algorithm design, e.g., the introduction of the thought of mass diffusion [24], [28] and heat conduction [23], [28], which greatly promotes recommendation accuracy and diversity, respectively.

Among these numerous physical-concept-based recommendation algorithms, a representative work is a hybrid algorithm of heat conduction and mass diffusion (HHP) [28]. Generally, improving the recommendation accuracy usually inhibits the recommendation diversity. However, the need of personalized recommendation resorts to a powerful engine that is not only accurate but also personalized. Whereas improving the recommendation accuracy, the HHP method simultaneously elevates the recommendation diversity, which therefore greatly contributes to solving the long-standing dilemma between the recommendation accuracy and diversity for the network-based recommender systems. Inspired by this work, extensive methods have been proposed in various disciplines, such as the integrated weighted tags [36] and the target-drug prediction [37]. A promising direction of improvement is to consider the heterogeneity of users or objects [38], which might lead to a more personalized recommendation matching individual tastes.

However, for a number of different algorithms, the algorithm performance is usually controlled by some ‘tunable parameter’. What challenges these algorithms in common is how to find out the optimal value of the tunable parameter. By far, most algorithms take a one-evaluator-based parameter selection, namely, choosing the optimal value of the tunable parameter according to the recommendation performance of one evaluator [28], [35], [39], [40]. For instance, one can take the value of the tunable parameter as its optimal value, with which parameter the system leads to its best recommendation accuracy. Nevertheless, without bias, different recommendation focuses might prefer different evaluator performance. Consequently, a challenging question emerges: which evaluator is the best one to be used as the reference of searching for the optimal value of the tunable parameter? Even though the recommendation accuracy is widely accepted to be the most important evaluator in personalized recommendation, the cold start problem or the recommendation diversity and novelty also raises a central interest [28], [41], . The cold start problem refers to how to recommend the new object or recommend the interesting object to new users due to the lack of activity records. The diversity and novelty also significantly mark the vitality of a system. Explicitly, one can hardly find out the same optimal value of the tunable parameter according to different recommendation focal purposes. Moreover, even when evaluating the recommendation accuracy, different indicators might reach different optimal values of the tunable parameter. For example, the ranking score [24] and the precision [43] are both indicators which are used to evaluate the recommendation accuracy. However, the optimal value of the tunable parameter obtained from the ranking score and the precision are usually not consistent for the same method.

Motivated by the explicit dilemma to choose a proper reference of the algorithm optimization, in the present paper, for the first time, we introduce a scaling-based algorithm (SCL) independent of the recommendation list length, based on the hybrid method of heat conduction and mass diffusion (HHP). By testing our algorithm on three real datasets, Netflix, MovieLens and RYM, we here report two results:

A single curve independent of the recommendation list length is obtained by rescaling the tunable parameter and the object average degree, and we describe it by a scaling function. The optimal value of the tunable parameter can be abstracted from the scaling function, which is heterogeneous for the individual object.
The present algorithm shows a high accuracy in recommendation. More importantly, it greatly improves the personalized recommendation in three other challenging aspects: solving the accuracy-diversity dilemma, presenting a high novelty, and solving the cold start problem.

The remainder of this paper is organized as follows. In the next section, we detail the bipartite network and the investigated recommendation algorithms. Some popular indicators to evaluate the recommendation algorithm performance are introduced in the section of metrics, and followed by the description of the datasets in the data section. Then, we compare the results of the present algorithm with a highly accurate mass diffusion algorithm, the original both highly accurate and diverse hybrid method, and even an improved version of the hybrid method which well resolves the cold start problem in the section of results and discussion. Finally comes to the conclusion.

Materials and Methods

A recommendation system can be described by a bipartite network composed of a user set and an object set. The user set includes users , and the object set includes objects . If an object is collected by a user , then add a link between them. The adjacent matrix which links the users and the objects is . If the object is collected by the user , then , otherwise, .

In the following algorithms, a so-called “resource” is introduced to objects. At first, objects are assigned an initial resource f, with for a particular user . If an object is collected by the user , its initial resource is assigned to be 1, otherwise, to be 0. That is to say, for the user , the initial resource of the object equates the value of the adjacent matrix element , i.e., . After a resource reallocation process via a transformation matrix W, objects obtain a final resource formulated by . For each user, rank his/her uncollected objects in the decreasing order of the final resource, and then recommend the top objects to the user. The formula of the transformation matrix , i.e., how to redistribute the resources, therefore plays a key role in the recommendation process.

PBS and HTS Methods

The mass-diffusion based algorithm, referring to the probability spreading (PBS) process based algorithm, is reported as a highly accurate method [24]. An example is illustrated in figure 1 (a) to show the process of the resource reallocation. Initially, the four objects are assigned a resource. At first, each object distributes the resource to its neighboring users with an equal probability. For example, for the particular user indicated by the solid circle with two neighboring objects, i.e., the first and the fourth object. The first object transits resource to the user, and the fourth object also transits resource to the user. Therefore, the user can get the total resource of 1 from his/her neighboring objects. Then the user again redistributes the total resource of 1 to his/her neighboring objects with the equal probability, i.e., the first and the fourth objects both get resource from the user. By summing up all the resources from their neighboring users, the objects then obtain their final level of resources. The resource transformation matrix of the PBS is formulated as,(1)where is the degree of object , and is the degree of user (Degree is denoted as the number of links owned by the user or the object). We assume an object to be popular if the object has a high degree, otherwise, the object to be cold. In the last step of the PBS, due to objects receiving resources from all their neighboring users, it greatly upgrades the resources of objects with high degrees. Henceforth, the PBS assigns more priority to the popular objects, leading to a good recommendation accuracy, yet a relatively low diversity.

Download:

Figure 1. An illustration of the resource reallocation process.

(a) for the PBS method, and (b) for the HTS method.

https://doi.org/10.1371/journal.pone.0063531.g001

By incorporating heat-conduction analogous process, the heat conduction (HTS) method is proposed, with an illustration of how resources are reallocated shown in figure 1 (b). Firstly, the user gets the average resource from all his/her neighboring objects. For example, for the particular user indicated by the solid circle, he/she receives 1 resource from the first object and 1 resource from the fourth object. Taking an average over the two objects, the user therefore gets the total resource of 1. Then the object again gets the average resource from all its neighboring users. The transformation matrix then reads,(2)where is the degree of object . In the last step of the HTS, due to the resources of objects divided by their degree, the rank of objects with high degrees is greatly depreciated. Therefore, the HTS assigns more priority to the cold objects, leading to a good performance in recommendation diversity, but at the cost of the recommendation accuracy.

Hybrid Method and an Improved Version

To achieve a high accuracy and diversity of recommendation, a hybrid method (HHP) is proposed [28], by elegantly combining the heat conduction and the mass-diffusion method as,(3)where . When tuning the parameter to a suitable value, the HHP method shows an apparent advantage in both the recommendation accuracy and the diversity.

Based on the HHP method, an improved object-oriented hybrid method (OHHP) is proposed [38], focusing on resolving the cold-start problem. In the OHHP, an object-degree-dependent tunable parameter is introduced, with its resource transformation matrix to be,(4)where , is the maximal degree of all the object degrees, and is a tunable parameter. The OHHP actually optimizes the probability spreading factor in the transformation matrix of equation (3) according to the individual object degree level, therefore it greatly enhances the recommendation accuracy of cold objects, whereas keeping a high recommendation accuracy of the overall objects.

Scaling-based Method

The common question in most algorithms is how to find out the optimal value of the tunable parameter. For example, the optimal value obtained by utilizing the ranking score as the reference is usually different from that obtained by utilizing the diversity as the reference. Moreover, diversity performance varies with the recommendation list length. We show the tunable parameter on the object average degree for different recommendation list length in the HHP algorithm in figure 2, where . For three real datasets, the Netflix, MovieLens, and RYM (Details of the datasets will be introduced in the Data section), on exhibits different behavior for different recommendation list length. It indicates that, for different recommendation list length, one can obtain different value of the tunable parameter for the same object average degree. If the scaling behavior independent of the recommendation list length can be found, the tunable parameter on the object average degree for different recommendation list length can be then described in a universal way.

Download:

Figure 2. The tunable parameter

on the object average degree

.

The black, red, green, blue and dark yellow lines are for the recommendation list lengths of , 20, 30, 40 and 50, respectively.

https://doi.org/10.1371/journal.pone.0063531.g002

In order to obtain an -independent scaling function, we analytically investigate the recommendation result for the HHP algorithm. On average, the probability that a user collects an object is directly proportional to ’s degree, , that is to say, , where is the number of objects. Hypothesize that the probability of is independent of other links. For the particular user , the resource of the object can be calculated according to the transformation matrix, which reads,(5)where is the probability distribution function of the object degrees. As suggested in Ref. [38], obeys a power-law distribution from the empirical study, i.e., . Then, one can calculate as,(6)where and are respectively the maximum and the minimum of the object degrees.

Inspired by the above theoretical analysis, we propose the Scaling-based (SCL) algorithm, making use of the formula in equation (6) to collapse the data into a single curve characterized by the scaling form,(7)where is a universal function, , with and to be the maximum and minimum of the object average degree for the overall range of . We rescale the axes and according to the transformation and , and obtain and to make all the curves roughly collapsed to a single curve. Therefore, and . As shown in figure 3, the major part of the curves is well collapsed. However, due to the fluctuations of empirical data, a small part of the curves is only approximately collapsed. The procedure to obtain the optimal value of the tunable parameter in the SCL is as follows:

Download:

Figure 3. The rescaled tunable parameter

vs. the rescaled object average degree

.

The black, red, green, blue and dark yellow lines are for the recommendation list lengths of , 20, 30, 40 and 50, respectively.

https://doi.org/10.1371/journal.pone.0063531.g003

Make the polynomial fit for the single curve, so that one can obtain a set of fitting coefficients , where is the number of polynomial fitting order. Here we take to obtain the coefficient set .
Having the coefficients , compute the optimal value of the tunable parameter for a particular object according to the formula , where , with being the degree of the examined object , , and () being the maximal (minimal) degree of all the objects.
Having the optimal value of the tunable parameter for a particular object , calculate its resource transformation matrix as

(8)Henceforth, the optimal value of the tunable parameter in the SCL is no longer accessed according to any specific evaluator, but abstracted from the scaling function acquired from the single curve.

Metrics

Recommendation accuracy is with no doubt one of the most important indicators to evaluate the performance of an algorithm. As an adjunct to accuracy, recommendation diversity and novelty are addressed to be important evaluators to quantify the personalized recommendation. In our study, we take the ranking score, precision and recall to quantify the recommendation accuracy, the object average degree to quantify the novelty, the inter-diversity and inner-diversity to quantify the recommendation diversity. Moreover, to specifically investigate the recommendation accuracy of cold objects, we further study an object-dependent ranking score, an object-dependent precision, and an object-dependent recall.

1. Ranking score () [24].

The ranking score for the object to the user is defined as,(9)where is the number of all objects, is the degree of the user , and is the position of the recommended object located in all the uncollected objects of the user . Generally speaking, users collect the objects which they prefer. Namely, for a user , if the deleted link with an object is in a higher rank of ’s all deleted links, the algorithm is more accurate. The average ranking score is then defined as the average of over all the deleted links. The smaller the , the more accurate the algorithm.

To focus on the recommendation accuracy of cold objects, we define an object-degree dependent ranking score as the average ranking score over objects with the same value of degrees [39].

2. Precision () [43].

The recommendation precision is defined as(10)where is the number of the user u_i’s deleted links contained in the top recommended object list. The larger the , the higher accuracy the algorithm.

Similarly, to better understand the recommendation accuracy of the cold objects, we define an object-degree dependent precision by,(11)where is the number of the user u_i’s deleted links for objects with degree in the top recommended object list.

3. Recall () [43].

The recall is defined as(12)where is the number of user u_i’s deleted links contained in the top recommended object list, is the number of user u_i’s deleted links in the test set.

The object-degree dependent recall is analogously defined as,(13)where is the number of user u_i’s deleted links for objects with degree in the top recommended object list, and is the number of user u_i’s deleted links for objects with degree in the test set.

4. Novelty ().

The average degree of objects in the recommendation list is widely used to identify the novelty of a recommender system, which is defined by,(14)where is the object set of user ’s recommendation list. If is small, it indicates that, on average, the degree of the recommended objects is small, i.e., more cold objects are recommended, which is therefore more novel to users; otherwise, if the recommended objects are on average with high degree, i.e., the popular objects, it is less novel to users.

5. Inter diversity ().

quantifies the difference between different users recommendation list by(15)where is the number of common recommended objects for user and in the top recommendation list. Generally, the greater the , the more personalized the recommendation for different users, and vice versa.

6. Inner diversity ().

calculates the difference within a specific user recommendation list by(16)where is the cosine similarity between objects and in a single user’s top recommended object list. Generally, the greater the , the higher diversification of the recommendation list for a specific user, and vice versa.

Data

We test the algorithm performance on three datasets, the Netfilx, MovieLens and RYM. The Netflix and MovieLens are movie rating systems with a five-level rating and the RYM is a music rating system with a ten-level rating. The Netflix dataset is obtained by randomly selecting from the huge dataset of the Netflix Prize, and the MovieLens is downloaded from the web site of GroupLens Research (http://grouplens.org), and the RYM dataset is downloaded from the music rating web site RateYourMusic.com. Due to the different level of ratings, we perform a coarse-graining mapping to a unary form for all the three datasets. If the rating is no less than three for the Netflix and MovieLens, and six for the RYM, we argue that the object is collected by a user. The Netflix contains 9999 users, 5870 objects and 815917 links, and the MovieLens contains 943 users, 1682 objects and 100000 links, and the RYM contains 10159 users, 5250 objects and 559634 links. The sparsity of the datasets, defined as the number of links proportional to the total number of the user-object links, is , and for the Netflix, the MovieLens and the RYM, respectively.

We divide a dataset into two subsets of the training set and the test set. We randomly delete links as the test set, and remain the rest links as the training set. We utilize the training set to make predictions for users, and the test set to test the algorithm performance.

Results and Discussion

To provide a solid investigation of the performance of the SCL algorithm, we compare the performance of the SCL with three typical and excellent algorithms, the PBS, the HHP, and the OHHP. The PBS is highly accurate, and the HHP well resolves the great challenge of accuracy-diversity dilemma, and the OHHP further outperforms the HHP in resolving the cold start problem. A summary of the performance of the PBS, the HHP, the OHHP and the SCL is presented in table 1, with the results being the average over six runs.

Download:

Table 1. The performance of the PBS, HHP, OHHP and SCL methods.

https://doi.org/10.1371/journal.pone.0063531.t001

To detect how much the SCL outperforms the other three algorithms, we define an improvement percentage by,(17)where the subhead refers to the investigated algorithm, and the is the value of the indicator, i.e., the value of , , , , , , , and . The improvement percentage of the SCL against the PBS, the HHP and the OHHP is summarized in table 2.

Download:

Table 2. The improvement percentage of the SCL against the PBS, HHP and OHHP methods.

https://doi.org/10.1371/journal.pone.0063531.t002

From table 1 and table 2, for all the three datasets, the SCL shows a great advantage in recommendation accuracy of the low-degree objects, as well as novelty and diversity, while simultaneously keeping a high recommendation accuracy.

For the recommendation accuracy, we focus on the overall recommendation accuracy and the recommendation accuracy of the cold objects. Compared with the highly accurate PBS method, the SCL outperforms the PBS for almost all the metrics. Taking the Netflix as an example, the SCL outperforms the PBS as much as and for the recommendation accuracy of the low-degree objects and ; and for the overall recommendation accuracy , and ; for the novelty ; and for the inter-diversity and the inner-diversity . Due to the zero value of the of the PBS, the improvement of the SCL against the PBS leads to an infinite value for the . Similar outstanding performance of the SCL against the PBS is also observed for the MovieLens and the RYM. It indicates the SCL is highly accurate.

The HHP is excellent in both the accuracy and the diversity at the optimal value of the tunable parameter. Compared with the HHP at the optimal value of the tunable parameter evaluated by the ranking score, the SCL presents a very little lower overall recommendation accuracy, but a much greater advantage in the recommendation accuracy of the cold objects. Moreover, the SCL outperforms the HHP in the novelty , as well as both the inter-diversity and the inner-diversity for all the three datasets. Taking the Netflix as an example, the HHP is more advantageous than the SCL in the overall ranking score. However, the ranking score for the cold objects of the SCL is more advantageous than the HHP, and the improvement of the SCL against the HHP is as high as and for the precision and recall for the cold objects. It also suggests that the SCL is outstanding in the cold start problem, while keeping a high recommendation accuracy. To be significant, the improvement of the SCL against the HHP in the novelty , the inter-diversity and the inner-diversity reaches , and , respectively.

The OHHP method has been reported to be more advantageous in the cold start problem than the HHP. Compared with the OHHP at the optimal value of the tunable parameter defined by the ranking score, the SCL method further improves the recommendation accuracy of the cold objects. Also, the SCL outperforms the OHHP in the novelty, the inter-diversity and the inner-diversity for all the three datasets.

The cold start problem is a long-standing challenge in traditional recommendation system, since it is difficult for users to be aware of the cold objects due to the lack of sufficient accessorial information [42]. Basically, the cold start problem can be divided into two categories [44]: i) cold user start [45] and ii.) cold object start [46]. The former focuses on recommending objects for new users, while the latter tends to design algorithms to push new objects, which is exactly what we are trying to solve in this paper. Most of researches in this area try to generate recommendation by using additional information, such as trust relationship [47], social network structure [48], tags [21], [30], [31], [41], [49], [50], etc [51]. However, it increases the system complexity. In addition, for most systems, the cold objects occupy a big proportion. In the Netflix, Movielens and RYM, the cold objects whose degrees are no more than 10 are as much as , , and . Developing effective information filtering techniques is essentially required to solve the cold start problem. Without any additional information, the SCL greatly improves the recommendation accuracy of the cold objects.

To further understand the cold start efficiency of the four algorithms, we investigate the object-degree-dependent ranking score vs. the object degree . As shown in figure 4, it is observed that, the of the low-degree objects of the SCL is much smaller than that of the PBS and the HHP for all the three datasets, and even a little smaller than that of the OHHP for the MovieLens and the RYM, while keeping a close value for the popular objects with high degrees. It suggests that the SCL significantly elevates the recommendation accuracy for cold objects.

Download:

Figure 4. The object-degree dependent ranking score

vs. the object degree.

The black, red, green and blue lines are for the HHP, PBS, OHHP and SCL methods, respectively.

https://doi.org/10.1371/journal.pone.0063531.g004

We then study the degree distribution of the objects in the top recommendation list in figure 5. It is observed that the of the cold objects of the SCL is much greater than the PBS, the HHP and the OHHP, which indicates that the SCL indeed contributes greatly to the recommendation efficiency of the cold objects.

Download:

Figure 5. The degree distribution

of the objects in the top

recommendation list.

The black, red, green and blue lines are for the HHP, PBS, OHHP and SCL methods, respectively.

https://doi.org/10.1371/journal.pone.0063531.g005

Besides the cold start problem, diversity and novelty are also significant to mark the vitality of personalized recommendation. Recommendation accuracy and diversity has been addressed to a dilemma pair, as well as accuracy-novelty. Typical examples are the PBS and HTS algorithms, where the PBS is more accurate but less diverse and novel, whereas the HTS is more diverse and novel but less accurate.

Intuitively, the improvement of recommendation accuracy of the cold objects would meanwhile upgrade the recommendation novelty and diversity. However, by comparing the OHHP with the original HHP, we find that the novelty, the inter-diversity and the inner-diversity of the HHP outperform those of the OHHP for all the three datasets, though the OHHP greatly improves the recommendation accuracy of the cold objects. To better understand the observed phenomena, we show the optimal value of the tunable parameter on the object average degree of the OHHP and the SCL in figure 6, where the curve of the SCL is obtained from the empirical study. It is observed that the curve obtained from the SCL is more heterogeneous than that obtained from the OHHP, which can partially explain why the OHHP method unilaterally improves the recommendation accuracy of the cold objects, but not simultaneously enhances the recommendation novelty and diversity. Compared with the OHHP, the SCL not only further improves the recommendation accuracy of the cold objects, but also elevates the recommendation novelty and diversity.

Download:

Figure 6. The tunable parameter

on the object degree

.

The black and red lines are for the SCL and OHHP methods, respectively.

https://doi.org/10.1371/journal.pone.0063531.g006

To manifest how the novelty evolves with the recommendation list length, we then study the novelty on the recommendation list length . As shown in figure 7, for all the three datasets, the of the SCL is much smaller than that of the PBS, the HHP and the OHHP for all the investigated range of the recommendation list length. Also, the novelty of the SCL keeps quite stable with the recommendation list length for all the three datasets. It supports that the novelty of the SCL is quite advantageous.

Download:

Figure 7. The novelty

on the recommendation list length

.

The black, red, green and blue lines are for the HHP, PBS, OHHP and SCL methods, respectively.

https://doi.org/10.1371/journal.pone.0063531.g007

Further investigation of the inter-diversity on the recommendation list length suggests that, for all the four methods, the inter-diversity decreases with the recommendation list length , as shown in figure 8. It is reasonable since the difference between different users’ recommendation list would decrease with the augment of the recommendation list length . Compared with the PBS, the HHP and the OHHP, the SCL exhibits a much higher value. Moreover, the inter-diversity of the SCL shows a slower decay for the overall range of the recommendation list length for the Netflix and the MovieLens. For the RYM, the inter-diversity of the SCL is also higher than that of the PBS and the OHHP, and similar to the HHP with the recommendation list length evolving. It also indicates that the recommendation diversity of the SCL is advantageous.

Download:

Figure 8. The inter-diversity

on the recommendation list length

.

The black, red, green and blue lines are for the HHP, PBS, OHHP and SCL methods, respectively.

https://doi.org/10.1371/journal.pone.0063531.g008

Similar advantage of the SCL is also found for the inner-diversity , as shown in figure 9. It is observed that the increases with for all the four algorithms for the Netflix, the MovieLens, and the RYM, and the of the SCL is higher than the other three methods.

Download:

Figure 9. The inner-diversity

on the recommendation list length

.

The black, red, green and blue lines are for the HHP, PBS, OHHP and SCL methods, respectively.

https://doi.org/10.1371/journal.pone.0063531.g009

Taken together, while not searching for the optimal value of the tunable parameter according to any particular evaluator, but abstracting it from the scaling function, the SCL remarkably outperforms the PBS, the HHP, and the OHHP in the recommendation accuracy of cold objects, as well as the recommendation novelty and diversity, and simultaneously keeps a high overall recommendation accuracy.

Conclusion

In conclusion, we have proposed a scaling-based (SCL) recommendation algorithm, in which the optimal value of the tunable parameter can be abstracted from the scaling function independent of the recommendation list length via a rescaled procedure. Based on three real datasets, Netflix, MovieLens and RYM, the optimal value of the tunable parameter is observed to be heterogeneous for the individual object in the SCL algorithm. Experimental results show that, the SCL algorithm not only shows a high accuracy, but also significantly promotes the performance in three other important aspects of personalized recommendation: improving the novelty, solving the long-standing cold start problem, as well as the accuracy-diversity dilemma.

The dilemma existing most in common in a number of algorithms is how to find out the proper value of the tunable parameter for different recommendation focuses, e.g., the accuracy, the diversity, or the cold start problem. It is with no doubt that recommendation accuracy is one of the most important evaluators of the algorithm performance. However, even using the recommendation accuracy as the reference to search for the optimal value of the tunable parameter, the optimal value might also be different for using different accuracy evaluators. By finding out a scaling function independent of the recommendation list length based on empirical data, we resolve the explicit dilemma of the optimal value selection of the tunable parameter for the complex contradiction among different recommendation focuses.

Author Contributions

Conceived and designed the experiments: TQ ZKZ GC. Performed the experiments: TQ GC. Analyzed the data: TQ ZKZ. Wrote the paper: TQ ZKZ.

References

1. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans KnowlData Eng 17: 734–749.
- View Article
- Google Scholar
2. Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to weave an information tapestry. Commun ACM 35: 61–70.
- View Article
- Google Scholar
3. Schafer JB, Frankowski D, Herlocker J, Sen S (2007) Collaborative filtering recommender systems. In: The adaptive web, Springer. 291–324.
4. Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proc. 14th Conf. Uncertainity Artif. Intel. Morgan Kaufmann Publishers Inc., 43–52.
5. Nakamura A, Abe N (1998) Collaborative filtering using weighted majority prediction algorithms. In: Proc. 5th Intl. Conf. Mach. Learn. 395–403.
6. Delgado J, Ishii N (1999) Memory-based weighted majority prediction. In: SIGIR Workshop Recomm. Syst. Citeseer.
7. Getoor L, Sahami M (1999) Using probabilistic relational models for collaborative filtering. In: Workshop Web Usage Anal. User Profil. Citeseer.
8. Hofmann T (2003) Collaborative filtering via gaussian probabilistic latent semantic analysis. In: Proc. 26th Ann. Intl. SIGIR Conf. Research Devel. Infor. Retr. ACM, 259–266.
9. Billsus D, Pazzani MJ (2000) User modeling for adaptive news access. User Model User-Adap 10: 147–180.
- View Article
- Google Scholar
10. Marlin B (2003) Modeling user rating profiles for collaborative filtering. Adv Neural inf Process Syst 16.
11. Pazzani MJ, Billsus D (2007) Content-based recommendation systems. In: The adaptive web, Springer. 325–341.
12. Lipczak M, Hu Y, Kollet Y, Milios E (2009) Tag sources for recommendation in collaborative tagging systems. Proc ECML/PKDD Discovery Challenge: 157–172.
13. Cantador I, Vallet D, Jose JM (2009) Measuring vertex centrality in co-occurrence graphs for online social tag recommendation. Proc ECML/PKDD Discovery Challenge: 17–33.
14. Ju S, Hwang KB (2009) A weighting scheme for tag recommendation in social bookmarking systems. In: Proc. ECML/PKDD Discovery Challenge. 109–118.
15. Balabanović M, Shoham Y (1997) Fab: content-based, collaborative recommendation. Comm ACM 40: 66–72.
- View Article
- Google Scholar
16. Goldberg K, Roeder T, Gupta D, Perkins C (2001) Eigentaste: A constant time collaborative filtering algorithm. Infor Retr 4: 133–151.
- View Article
- Google Scholar
17. Hofmann T (2004) Latent semantic models for collaborative filtering. ACM Trans Inf Syst 22: 89–115.
- View Article
- Google Scholar
18. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3: 993–1022.
- View Article
- Google Scholar
19. Laureti P, Moret L, Zhang YC, Yu YK (2006) Information filtering via iterative refinement. EPL 75: 1006.
- View Article
- Google Scholar
20. Ren J, Zhou T, Zhang YC (2008) Information filtering via self-consistent refinement. EPL 82: 58007.
- View Article
- Google Scholar
21. Zhang ZK, Zhou T, Zhang YC (2011) Tag-aware recommender systems: a state-of-the-art survey. J Comput Sci Technol 26: 767–777.
- View Article
- Google Scholar
22. Lü L, Medo M, Yeung CH, Zhang YC, Zhang ZK, et al. (2012) Recommender systems. Phys Rep 519: 1–49.
- View Article
- Google Scholar
23. Zhang YC, Blattner M, Yu YK (2007) Heat conduction process on community networks as a recommendation model. Phys Rev Lett 99: 154301.
- View Article
- Google Scholar
24. Zhou T, Ren J, Medo M, Zhang YC (2007) Bipartite network projection and personal recommendation. Phys Rev E 76: 046115.
- View Article
- Google Scholar
25. Liu C, Zhou WX (2012) Heterogeneity in initial resource configurations improves network-based hybrid recommendation algorithm. Physica A 391: 5704–5711.
- View Article
- Google Scholar
26. Zhou T, Su RQ, Liu RR, Jiang LL, Wang BH, et al. (2009) Accurate and diverse recommendations via eliminating redundant correlations. New J Phys 11: 123008.
- View Article
- Google Scholar
27. Liu J, Deng G (2009) Link prediction in a user–object network based on time-weighted resource allocation. Physica A 388: 3643–3650.
- View Article
- Google Scholar
28. Zhou T, Kuscsik Z, Liu JG, Medo M, Wakeling JR, et al. (2010) Solving the apparent diversityaccuracy dilemma of recommender systems. Proc Natl Acad Sci USA 107: 4511–4515.
- View Article
- Google Scholar
29. Liu RR, Jia CX, Zhou T, Sun D, Wang BH (2009) Personal recommendation via modified collaborative filtering. Physica A 388: 462–468.
- View Article
- Google Scholar
30. Zhang ZK, Zhou T, Zhang YC (2010) Personalized recommendation via integrated diffusion on user–item–tag tripartite graphs. Physica A 389: 179–186.
- View Article
- Google Scholar
31. Shang MS, Zhang ZK, Zhou T, Zhang YC (2010) Collaborative filtering with diffusion-based similarity on tripartite graphs. Physica A 389: 1259–1264.
- View Article
- Google Scholar
32. Liu JG, Zhou T, Wang BH, Zhang YC, Guo Q (2009) Effects of user’s tastes on personalized recommendation. Int J Mod Phys C 20: 1925–1932.
- View Article
- Google Scholar
33. Liu JG, Zhou T, Che HA, Wang BH, Zhang YC (2010) Effects of high-order correlations on personalized recommendations for bipartite networks. Physica A 389: 881–886.
- View Article
- Google Scholar
34. Zeng A, Yeung CH, Shang MS, Zhang YC (2012) The reinforcing influence of recommendations on global diversification. EPL 97: 18005.
- View Article
- Google Scholar
35. Liu JG, Zhou T, Guo Q (2011) Information filtering via biased heat conduction. Phys Rev E 84: 037101.
- View Article
- Google Scholar
36. Liang H, Xu Y, Li Y, Nayak R, Tao X (2010) Connecting users and items with weighted tags for personalized item recommendations. In: Proc. 21st ACM Conf. Hypertext hypermedia. ACM, 51–60.
37. Cheng F, Liu C, Jiang J, Lu W, Li W, et al. (2012) Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol 8: e1002503.
- View Article
- Google Scholar
38. Qiu T, Chen G, Zhang ZK, Zhou T (2011) An item-oriented recommendation algorithm on coldstart problem. EPL 95: 58003.
- View Article
- Google Scholar
39. Zhou T, Jiang LL, Su RQ, Zhang YC (2008) Effect of initial configuration on network-based recommendation. EPL 81: 58004.
- View Article
- Google Scholar
40. Lü L, Liu W (2011) Information filtering via preferential diffusion. Phys Rev E 83: 066119.
- View Article
- Google Scholar
41. Zhang ZK, Liu C, Zhang YC, Zhou T (2010) Solving the cold-start problem in recommender systems with social tags. EPL 92: 28002.
- View Article
- Google Scholar
42. Ahn HJ (2008) A new similarity measure for collaborative filtering to alleviate the new user coldstarting problem. Inf Sci 178: 37–51.
- View Article
- Google Scholar
43. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22: 5–53.
- View Article
- Google Scholar
44. Papagelis M, Plexousakis D (2005) Qualitative analysis of user-based and item-based prediction algorithms for recommendation agents. Engin Appl Artif Intel 18: 781–789.
- View Article
- Google Scholar
45. Lam XN, Vu T, Le TD, Duong AD (2008) Addressing cold-start problem in recommendation systems. In: Proc. 2nd Intl. Conf. Ubiquitous Infor. Manag. Commun. ACM, 208–211.
46. Park YJ, Tuzhilin A (2008) The long tail of recommender systems and how to leverage it. In: Proc. 2008 ACM Conf. Recomm. Syst. ACM, 11–18.
47. Jamali M, Ester M (2009) Trustwalker: a random walk model for combining trust-based and itembased recommendation. In: Proc. 15th ACM SIGKDD Intl Conf. Knowl. Disc. Data Mining. ACM, 397–406.
48. Groh G, Ehmig C (2007) Recommendations in taste related domains: collaborative filtering vs. social filtering. In: Proc. 2007 Intl. Conf. Supporting Group Work. ACM, 127–136.
49. Zhang ZK, Liu C (2012) Hybrid recommendation algorithm based on two roles of social tags. Int J Bifurcat Chaos 22: 1250166.
- View Article
- Google Scholar
50. Kim HN, Ji AT, Ha I, Jo GS (2010) Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation. Electron Commerce Research Appl 9: 73–83.
- View Article
- Google Scholar
51. Chu W, Park ST (2009) Personalized recommendation on dynamic content using predictive bilinear models. In: Proc. 18th Intl. Conf. World Wide Web. ACM, 691–700.

[ref1] 1. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans KnowlData Eng 17: 734–749.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to weave an information tapestry. Commun ACM 35: 61–70.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Schafer JB, Frankowski D, Herlocker J, Sen S (2007) Collaborative filtering recommender systems. In: The adaptive web, Springer. 291–324.

[ref4] 4. Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proc. 14th Conf. Uncertainity Artif. Intel. Morgan Kaufmann Publishers Inc., 43–52.

[ref5] 5. Nakamura A, Abe N (1998) Collaborative filtering using weighted majority prediction algorithms. In: Proc. 5th Intl. Conf. Mach. Learn. 395–403.

[ref6] 6. Delgado J, Ishii N (1999) Memory-based weighted majority prediction. In: SIGIR Workshop Recomm. Syst. Citeseer.

[ref7] 7. Getoor L, Sahami M (1999) Using probabilistic relational models for collaborative filtering. In: Workshop Web Usage Anal. User Profil. Citeseer.

[ref8] 8. Hofmann T (2003) Collaborative filtering via gaussian probabilistic latent semantic analysis. In: Proc. 26th Ann. Intl. SIGIR Conf. Research Devel. Infor. Retr. ACM, 259–266.

[ref9] 9. Billsus D, Pazzani MJ (2000) User modeling for adaptive news access. User Model User-Adap 10: 147–180.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref10] 10. Marlin B (2003) Modeling user rating profiles for collaborative filtering. Adv Neural inf Process Syst 16.

[ref11] 11. Pazzani MJ, Billsus D (2007) Content-based recommendation systems. In: The adaptive web, Springer. 325–341.

[ref12] 12. Lipczak M, Hu Y, Kollet Y, Milios E (2009) Tag sources for recommendation in collaborative tagging systems. Proc ECML/PKDD Discovery Challenge: 157–172.

[ref13] 13. Cantador I, Vallet D, Jose JM (2009) Measuring vertex centrality in co-occurrence graphs for online social tag recommendation. Proc ECML/PKDD Discovery Challenge: 17–33.

[ref14] 14. Ju S, Hwang KB (2009) A weighting scheme for tag recommendation in social bookmarking systems. In: Proc. ECML/PKDD Discovery Challenge. 109–118.

[ref15] 15. Balabanović M, Shoham Y (1997) Fab: content-based, collaborative recommendation. Comm ACM 40: 66–72.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref16] 16. Goldberg K, Roeder T, Gupta D, Perkins C (2001) Eigentaste: A constant time collaborative filtering algorithm. Infor Retr 4: 133–151.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref17] 17. Hofmann T (2004) Latent semantic models for collaborative filtering. ACM Trans Inf Syst 22: 89–115.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref18] 18. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3: 993–1022.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref19] 19. Laureti P, Moret L, Zhang YC, Yu YK (2006) Information filtering via iterative refinement. EPL 75: 1006.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref20] 20. Ren J, Zhou T, Zhang YC (2008) Information filtering via self-consistent refinement. EPL 82: 58007.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref21] 21. Zhang ZK, Zhou T, Zhang YC (2011) Tag-aware recommender systems: a state-of-the-art survey. J Comput Sci Technol 26: 767–777.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref22] 22. Lü L, Medo M, Yeung CH, Zhang YC, Zhang ZK, et al. (2012) Recommender systems. Phys Rep 519: 1–49.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref23] 23. Zhang YC, Blattner M, Yu YK (2007) Heat conduction process on community networks as a recommendation model. Phys Rev Lett 99: 154301.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref24] 24. Zhou T, Ren J, Medo M, Zhang YC (2007) Bipartite network projection and personal recommendation. Phys Rev E 76: 046115.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref25] 25. Liu C, Zhou WX (2012) Heterogeneity in initial resource configurations improves network-based hybrid recommendation algorithm. Physica A 391: 5704–5711.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref26] 26. Zhou T, Su RQ, Liu RR, Jiang LL, Wang BH, et al. (2009) Accurate and diverse recommendations via eliminating redundant correlations. New J Phys 11: 123008.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref27] 27. Liu J, Deng G (2009) Link prediction in a user–object network based on time-weighted resource allocation. Physica A 388: 3643–3650.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref28] 28. Zhou T, Kuscsik Z, Liu JG, Medo M, Wakeling JR, et al. (2010) Solving the apparent diversityaccuracy dilemma of recommender systems. Proc Natl Acad Sci USA 107: 4511–4515.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref29] 29. Liu RR, Jia CX, Zhou T, Sun D, Wang BH (2009) Personal recommendation via modified collaborative filtering. Physica A 388: 462–468.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref30] 30. Zhang ZK, Zhou T, Zhang YC (2010) Personalized recommendation via integrated diffusion on user–item–tag tripartite graphs. Physica A 389: 179–186.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref31] 31. Shang MS, Zhang ZK, Zhou T, Zhang YC (2010) Collaborative filtering with diffusion-based similarity on tripartite graphs. Physica A 389: 1259–1264.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref32] 32. Liu JG, Zhou T, Wang BH, Zhang YC, Guo Q (2009) Effects of user’s tastes on personalized recommendation. Int J Mod Phys C 20: 1925–1932.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref33] 33. Liu JG, Zhou T, Che HA, Wang BH, Zhang YC (2010) Effects of high-order correlations on personalized recommendations for bipartite networks. Physica A 389: 881–886.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref34] 34. Zeng A, Yeung CH, Shang MS, Zhang YC (2012) The reinforcing influence of recommendations on global diversification. EPL 97: 18005.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref35] 35. Liu JG, Zhou T, Guo Q (2011) Information filtering via biased heat conduction. Phys Rev E 84: 037101.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref36] 36. Liang H, Xu Y, Li Y, Nayak R, Tao X (2010) Connecting users and items with weighted tags for personalized item recommendations. In: Proc. 21st ACM Conf. Hypertext hypermedia. ACM, 51–60.

[ref37] 37. Cheng F, Liu C, Jiang J, Lu W, Li W, et al. (2012) Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol 8: e1002503.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref38] 38. Qiu T, Chen G, Zhang ZK, Zhou T (2011) An item-oriented recommendation algorithm on coldstart problem. EPL 95: 58003.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref39] 39. Zhou T, Jiang LL, Su RQ, Zhang YC (2008) Effect of initial configuration on network-based recommendation. EPL 81: 58004.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref40] 40. Lü L, Liu W (2011) Information filtering via preferential diffusion. Phys Rev E 83: 066119.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref41] 41. Zhang ZK, Liu C, Zhang YC, Zhou T (2010) Solving the cold-start problem in recommender systems with social tags. EPL 92: 28002.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref42] 42. Ahn HJ (2008) A new similarity measure for collaborative filtering to alleviate the new user coldstarting problem. Inf Sci 178: 37–51.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref43] 43. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22: 5–53.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref44] 44. Papagelis M, Plexousakis D (2005) Qualitative analysis of user-based and item-based prediction algorithms for recommendation agents. Engin Appl Artif Intel 18: 781–789.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref45] 45. Lam XN, Vu T, Le TD, Duong AD (2008) Addressing cold-start problem in recommendation systems. In: Proc. 2nd Intl. Conf. Ubiquitous Infor. Manag. Commun. ACM, 208–211.

[ref46] 46. Park YJ, Tuzhilin A (2008) The long tail of recommender systems and how to leverage it. In: Proc. 2008 ACM Conf. Recomm. Syst. ACM, 11–18.

[ref47] 47. Jamali M, Ester M (2009) Trustwalker: a random walk model for combining trust-based and itembased recommendation. In: Proc. 15th ACM SIGKDD Intl Conf. Knowl. Disc. Data Mining. ACM, 397–406.

[ref48] 48. Groh G, Ehmig C (2007) Recommendations in taste related domains: collaborative filtering vs. social filtering. In: Proc. 2007 Intl. Conf. Supporting Group Work. ACM, 127–136.

[ref49] 49. Zhang ZK, Liu C (2012) Hybrid recommendation algorithm based on two roles of social tags. Int J Bifurcat Chaos 22: 1250166.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref50] 50. Kim HN, Ji AT, Ha I, Jo GS (2010) Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation. Electron Commerce Research Appl 9: 73–83.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref51] 51. Chu W, Park ST (2009) Personalized recommendation on dynamic content using predictive bilinear models. In: Proc. 18th Intl. Conf. World Wide Web. ACM, 691–700.

Figures

Abstract

Introduction

Materials and Methods

PBS and HTS Methods

Hybrid Method and an Improved Version

Scaling-based Method

Metrics

1. Ranking score () [24].

2. Precision () [43].

3. Recall () [43].

4. Novelty ().

5. Inter diversity ().

6. Inner diversity ().

Data

Results and Discussion

Conclusion

Author Contributions

References