how does RL in LLM work