Learn how to enhance the performance of agentic systems by fine-tuning them with Generalized Reinforcement from Preference Optimization (GRPO).
dspy
and its dependencies are not yet installed, you might need to install them. For this notebook, the key libraries are dspy
and potentially others for data handling or specific model interactions.
dspy.Example
objects. Each example will have a question as input and the expected answer for evaluation. We’ll split our dataset into training, validation, and test sets to properly evaluate our approach.
The training set will be used to optimize our agent, the validation set to tune parameters and monitor progress, and the test set for final evaluation.
update_interval
: How often to update the modelnum_samples_per_input
: How many different outputs to generate for each inputnum_train_steps
: Total number of training stepsbeta
: Controls the trade-off between optimizing for rewards and staying close to the original model