Super HN - Super Hacker News

Reinforcement Learning from Human Feedback (arxiv.org) 7 points by onurkanbkrc 46 minutes ago