Content whirlpool

LLMs are trained on public data, the limitation to grow is not the hardware or parameters but it is the quality of content that is available to further train the model. With these new tools, newer content will be generated at a pace that can’t be consumed easily by humans and we may need to use these tools again to summarise and create action items for us to follow. A new risk that has emerged is that we will be stuck in a content whirlpool where these tools create more content based on the content they had already created.

Photo by Q. Hu01b0ng Phu1ea1m on Pexels.com

This is similar to what many algorithmic feeds are doing to us already. You get thrown similar content that you have watched, songs that you have heard and articles that you have read. Those algorithms can be worked around by going private mode at least for that window so that the serendipity factor increases. The LLMs and similar tools in a way work from a same public data set, making them behave like coupled systems. The more self generated data they feed on the public domain the more synced they are. This is very similar to what is explained in Kuramoto model. When the underlying foundation is the same and more content generated is being added to the same foundation from different models, then models may begin to converge on what they can generate.

There are paths possible that this convergence does not happen. One is to start getting into personal and proprietary data which is out of the bounds of the models now. Who gets access to what will start getting to matter a lot, our personal data is a goldmine and will be monetised well. The other is to advance the technologies to start reason well with first principles and heuristics, this will require less magnitude of data and may be years away. So the first option to getting into private data is more of a possibility, entities with fiduciary responsibilities of data will be tempted to go for legal ways of monetising which can be loop holes that may not be in the best interest of a layperson.

Before more reasoning capabilities are built, it is better to live in the content whirlpool than feed the private lives to insane computing power of these models.

1 Comment

  1. Interesting times ahead!

    Liked your conclusion – “It is better to live in the content whirlpool than feed the private lives to insane computing power of these models”

Leave a comment