Reward Model

A reward model is used in reinforcement learning to guide behavior by assigning scores to outputs based on how desirable or correct they are. These scores help adjust the underlying model weights over time, pushing the system to make better decisions. Reward models are crucial for fine-tuning language and decision-making agents in enterprise workflows.