RL4RS
Last updated
Last updated
RL4RS is a real-world deep reinforcement learning recommender system benchmark for practitioners and researchers.
Github Page: https://github.com/fuxiAIlab/RL4RS
Dataset Download(data only): https://zenodo.org/record/6622390#.YqBBpRNBxQK
Dataset Download(for reproduction): https://drive.google.com/file/d/1YbPtPyYrMvMGOuqD4oHvK0epDtEhEb9v/view?usp=sharing
Paper: https://arxiv.org/pdf/2110.11073.pdf
Kaggle Competition (old version): https://www.kaggle.com/c/bigdata2021-rl-recsys/overview
Resource Page: https://fuxi-up-research.gitbook.io/fuxi-up-challenges/
two real-world datasets: Besides the artificial datasets or semi-simulated datasets, RL4RS collects the raw logged data from one of the most popular games released by NetEase Game, which is naturally a sequential decision-making problem.
data understanding tool: RL4RS provides a data understanding tool for testing the proper use of RL on recommendation system datasets.
advanced dataset setting: RL4RS provides the separated data before and after reinforcement learning deployment for each dataset, which can simulate the difficulties to train a good RL policy from the dataset collected by SL-based algorithm.
model-free RL: RL4RS supports state-of-the-art RL libraries, such as RLlib and Tianshou. We provide the example codes of state-of-the-art model-free algorithms (A2C, PPO, etc.) implemented by RLlib library on both discrete and continue (combining policy gradients with a K-NN search) RL4RS environment.
offline RL: RL4RS implements offline RL algorithms including BC, BCQ and CQL through d3rlpy library. RL4RS is also the first to report the effectiveness of offline RL algorithms (BCQ and CQL) in RL-based RS domain.
RL-based RS baselines: RL4RS implements some algorithms proposed in the RL-based RS domain, including Exact-k and Adversarial User Model.
offline RL evaluation: In addition to the reward indicator and traditional RL evaluation setting (train and test on the same environment), RL4RS try to provide a complete evaluation framework by placing more emphasis on counterfactual policy evaluation.
low coupling structure: RL4RS specifies a fixed data format to reduce code coupling. And the data-related logics are unified into data preprocessing scripts or user-defined state classes.
file-based RL environment: RL4RS implements a file-based gym environment, which enables random sampling and sequential access to datasets exceeding memory size. It is easy to extend it to distributed file systems.
http-based vector Env: RL4RS naturally supports Vector Env, that is, the environment processes batch data at one time. We further encapsulate the env through the HTTP interface, so that it can be deployed on multiple servers to accelerate the generation of samples.
A new dataset for bundle recommendation with variable discounts, flexible recommendation trigger, and modifiable item content is in prepare.
Take raw feature rather than hidden layer embedding as observation input for offline RL
Model-based RL Algorithms
Reward-oriented simulation environment construction
reproduce more algorithms (RL models, safe exploration techniques, etc.) proposed in RL-based RS domain
Support Parametric-Action DQN, in which we input concatenated state-action pairs and output the Q-value for each pair.
RL4RS supports Linux, at least 64 GB Mem !!
Dataset Download: https://drive.google.com/file/d/1YbPtPyYrMvMGOuqD4oHvK0epDtEhEb9v/view?usp=sharing
See script/ and reproductions/.
RLlib examples: https://docs.ray.io/en/latest/rllib-examples.html
d3rlpy examples: https://d3rlpy.readthedocs.io/en/v1.0.0/
See reproductions/.
Any kind of contribution to RL4RS would be highly appreciated! Please contact us by email.
algorithm | category | support mode |
---|---|---|
algorithm | discrete control | continuous control | offline RL? |
---|---|---|---|
experiment in the paper | shell script | optional param. | description |
---|---|---|---|
Channel | Link |
---|---|
supervised learning
item-wise classification/slate-wise classification/item ranking
supervised learning
item-wise classification/slate-wise classification/item ranking
supervised learning
item-wise classification/slate-wise classification/item ranking
supervised learning
item-wise classification/slate-wise classification/item ranking
model-free learning
discrete env & hidden state as observation
model-free RL
model-free learning
model-free RL
discrete env & raw feature/hidden state as observation
model-free RL
conti env & raw feature/hidden state as observation
model-free RL
discrete/conti env & raw feature/hidden state as observation
model-free RL
discrete/conti env & raw feature/hidden state as observation
Behavior Cloning
supervised learning/Offline RL
discrete env & hidden state as observation
Offline RL
discrete env & hidden state as observation
Offline RL
discrete env & hidden state as observation
Sec.3
run_split.sh
-
dataset split/shuffle/align(for datasetB)/to tfrecord
Sec.4
run_mdp_checker.sh
recsys15/movielens/rl4rs
unzip ml-25m.zip and yoochoose-clicks.dat.zip into dataset/
Sec.5.1
run_supervised_item.sh
dnn/widedeep/lstm/dien
Table 5. Item-wise classification
Sec.5.1
run_supervised_slate.sh
dnn_slate/widedeep_slate/lstm_slate/dien_slate/adversarial_slate
Table 5. Item-wise rank
Sec.5.1
run_supervised_slate.sh
dnn_slate_multiclass/widedeep_slate_multiclass/lstm_slate_multiclass/dien_slate_multiclass
Table 5. Slate-wise classification
Sec.5.1 & Sec.6
run_simulator_train.sh
dien
dien-based simulator for different trainsets
Sec.5.1 & Sec.6
run_simulator_eval.sh
dien
Table 6.
Sec.5.1 & Sec.6
run_modelfree_rl.sh
PG/DQN/A2C/PPO/IMPALA/DDPG/*_conti
Table 7.
Sec.5.2 & Sec.6
run_batch_rl.sh
BC/BCQ/CQL
Table 8.
Sec.5.1
run_exact_k.sh
-
Exact-k
-
run_simulator_env_test.sh
-
examining the consistency of features (observations) between RL env and supervised simulator
Materials
Issues
Fuxi Team
Our Team
Behavior Cloning (supervised learning)