Reinforcement Learning by Human Feedback (RLHF) is important component of pre-training AI, particularly in the reduction of hallucinations by GPT’s. It mainly consists of humans clicking Y/N on outputs - a mind numbingly repetitive job which, in the
case of Finland, is occasionally performed by prison labour. An interesting ethical conundrum - at what point does providing legal routes to economy become the exploitation of slave labour?