Reinforcement Learning from Human Feedback (RLHF) is mainly used to:
- Encrypt prompts
- Align model outputs with human preferences
- Reduce dataset size
- Compress models
Explanation
RLHF fine-tunes models using human feedback to align outputs with human values.
Related MCQs
- Single images
- Small spreadsheets
- Extremely large, complex datasets
- Short text files
اس سوال کو وضاحت کے ساتھ پڑھیں
- Virus
- Voltage
- Vector
- Variety
اس سوال کو وضاحت کے ساتھ پڑھیں
- Classification
- Regression
- Backpropagation
- Clustering
اس سوال کو وضاحت کے ساتھ پڑھیں
- Add noise
- Reduce image size
- Encrypt data
- Convert outputs into a probability distribution
اس سوال کو وضاحت کے ساتھ پڑھیں
- Imputation
- Pooling
- Attention mechanism
- Padding