Reinforcement Learning for Interactive QoS-Aware Services Composition
Abstract
An important and challenging research problem in web of things is how to select an appropriate composition of concrete services in a dynamic and unpredictable environment. The main goal of this article is to select from all possible compositions the optimal one without knowing a priori the users' quality of service (QoS) preferences. From a theoretical point of view, we give bounds on the problem search space. As the QoS user's preferences are unknown, we propose a vector-valued MDP approach for finding the optimal QoS-aware services composition. The algorithm alternatively solves MDP with dynamic programming and learns the preferences via direct queries to the user. An important feature of the proposed algorithm is that it is able to get the optimal composition and, at the same time, limits the number of interactions with the user. Experiments on a real-world large size dataset with more than 3500 web services show that our algorithm finds the optimal composite services with around 50 interactions with the user.