Super HN
New
Show
Generalized on-policy distillation with reward extrapolation
(arxiv.org)
3 points by fzliu 1 day ago