The idea of leveraging the vast library collections at the Internet Archive to address the “hallucination problem” in chatbots is indeed possible and holds great potential for creating more dependable and trustworthy AI services.
By collaborating with responsible AI companies and research projects, the Internet Archive could offer an anti-hallucination service as an add-on to chatbots. This service would enable chatbots to cite supporting evidence, counter claims, and provide a reliable knowledge base by drawing upon the historical content available in the Internet Archive’s collections.
The Internet Archive’s extensive collection of “historical internet” content, accumulated over 27 years of web archiving, provides a unique resource to combat the increase in AI-generated content. By mining assertions in the literature and contextualizing them within their historical context, the anti-hallucination service could enhance the reliability and accuracy of AI-generated responses.
Similar efforts have already been undertaken by the Internet Archive, such as fixing broken links in Wikipedia articles and linking assertions to specific pages in books. These processes, currently done manually or with the assistance of special-purpose robots, can be automated and integrated with AI models to create a more robust and dependable World Wide Web.
While there may be legal challenges, such as the ongoing litigation with major publishers, the Internet Archive remains committed to its mission of owning collections that can be used by researchers and the public to gain a deeper understanding of the world. To realize this vision, collaboration among various stakeholders including scientists, researchers, humanists, ethicists, engineers, governments, and philanthropists would be essential. The establishment of a Public AI Research laboratory could facilitate the mining of vast collections without rights issues, and expanding the corpus by collecting and digitizing publications from democracies worldwide would further enrich the available knowledge.
Ultimately, building a better internet necessitates a shared purpose, collaborative partnerships, and sufficient resources. By combining the efforts of the Internet Archive, AI companies, and a diverse range of stakeholders, it is possible to create a more reliable and trustworthy online ecosystem, where disinformation and propaganda can be effectively challenged and weakened.