A Continuous Dataflow Pipeline For Low Latency Recommendations
Abstract: The goal of building recommender system is to generate personalized recommendations to users. Recommender system has great value in multiple business verticals like video on demand, news, advertising and retailing. In order to recommend to each individual, large number of personal preference data need to be collected and processed. Processing big data usually takes long time. The long delays from data entered system to results being generated makes recommender systems can only benefit returning users. This project is an attempt to build a recommender system as service with low latency, to make it applicable for more scenarios. In this paper, different recommendation algorithms, distributed computing frameworks are studied and compared to identify the most suitable design. Experiment results reviled the logarithmical relationship between recommendation quality and training data size in collaborative filtering. By applying the finding, a low latency recommendation workflow is achieved by reduce training data size and create parallel computing partitions with minimal cost of prediction quality. In this project the calculation time is successfully limited in 3 seconds (instead of 25 in control value) while maintaining 90% of the prediction quality.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)