Researchers are ringing the alarm bells, warning that companies like OpenAI and Google are rapidly running out of human-written training data for their AI models.
There is so much more to this than the raw amount of data, this is not at all the bottle neck is seems to be. There’s a lot of room for progress when it comes to how we clean the data, how we train and the actual structures of the models.
Right? What happened to that whole “there are millions of pages of text being generated by all internet users every minute” thing that people used to say? Look at lemmy alone. Look how much text we are putting into the ether every day. They’re not ever going to run out of text unless people stop typing. Is this not a fake problem?
There is so much more to this than the raw amount of data, this is not at all the bottle neck is seems to be. There’s a lot of room for progress when it comes to how we clean the data, how we train and the actual structures of the models.
Yeah if AI can’t pinpoint something when it has ALL OF HUMAN KNOWLEDGE to draw from, it’s not the fault of the data set
Right? What happened to that whole “there are millions of pages of text being generated by all internet users every minute” thing that people used to say? Look at lemmy alone. Look how much text we are putting into the ether every day. They’re not ever going to run out of text unless people stop typing. Is this not a fake problem?