Facebook is perhaps infamous for helping usher in the era of “fake news”; but he also tries to make a place for himself in the sequel: the endless battle to fight him. In the latest development on this front, Facebook parent Meta today announced a new tool called Sphere, AI built around the concept of harnessing the vast repository of information on the open web to provide a knowledge base for AI and other systems are working. According to Meta, the first user of Sphere is Wikipedia, which uses it to automatically analyze entries and identify when citations in its entries are strongly or weakly supported.
The research team has Open source sphere — which is currently based on 134 million public web pages.
Here’s how it works in action:
The idea behind using Sphere for Wikipedia is simple: the online encyclopedia has 6.5 million entries and on average sees few. 17,000 items added monthly. The wiki concept behind this effectively means that adding and editing content is outsourced, and although there is a team of editors to oversee this, it is a daunting task that is growing day by day. , not just because of its size but because of its mandate, given the number of people, educators, and others who rely on it as a repository of records.
At the same time, the Wikimedia Foundation, which oversees Wikipedia, has been thinking about new ways to exploit all this data. Last month it announced an Enterprise tier and its first two commercial customers, Google and Internet Archive, which use Wikipedia-based data for their own business interests and will now have broader, more formal service agreements around that.
On Meta’s part, the company continues to be weighed down by poor public perception, stemming in part from accusations that it allows misinformation and toxic ideas to run wild – or if you’re someone one who ended up in “Facebook Jail,” believing you shared something you think is good, but still ran afoul of overzealous social police. It’s certainly a waste, but in that respect launching something like Sphere feels a bit like a public relations exercise for Meta, as well as a potentially useful tool: if it works, it shows that there are people in the organization trying to work in good faith. .
To be clear, today’s announcements regarding Meta’s work with Wikipedia do not refer to Wikimedia Enterprise, but the general addition of additional tools for Wikipedia to ensure the content it has is verified. and exact will be something potential Enterprise service customers will want to know when you are considering paying for the service.
It’s also unclear whether this agreement makes Wikipedia a paying customer of Meta, or vice versa – say, Meta becoming an enterprise customer in order to have more access to data to work on Sphere. Meta notes that to train the Sphere model, it created “a new dataset (WAFER) of 4 million Wikipedia citations, much more complex than ever used for this type of research.” And only five days agoMeta announced that Wikipedia editors are also using a new AI-based translation tool he built, so there’s clearly a relationship there.
We asked and will update this post as we know more.
For now, a few more details about Sphere and how Wikipedia uses it, and what might follow:
— Meta believes that the “white box” knowledge base that Sphere represents contains significantly more data (and, therefore, more sources to match for verification) than a typical “black box” knowledge source based on results from, for example, proprietary search engines. “Because Sphere can access much more public information than current standard models, it could provide useful information that they cannot,” he noted in a blog post. The 134 million documents that Meta used to gather and form Sphere were divided into 906 million passages of 100 tokens each.
— In open source this tool, Meta’s argument is that it is a stronger base for AI training models and other work than any proprietary base. Still, he admits that the very foundations of knowledge are potentially fragile, especially in these early days. What if a “truth” is simply not disseminated as widely as disinformation? This is where Meta wants to focus its future efforts in Sphere. “Our next step is to train models to assess the quality of retrieved documents, detect potential contradictions, prioritize more reliable sources – and, if there is no convincing evidence, concede that they still may, like us, be puzzled,” he noted.
— In that sense, it raises some interesting questions about what Sphere’s hierarchy of truth will be based on compared to those of other knowledge bases. The idea seems to be that because it is open source, users may have the ability to modify these algorithms in ways that better suit their own needs. (For example, a legal knowledge base may lend more credence to court records and case law databases than a fashion or sports knowledge base.)
— We’ve asked but haven’t yet heard whether Meta uses Sphere or a version of it on its own platforms like Facebook and Instagram, Messenger, which themselves have long grappled with misinformation and the toxicity of bad actors. (We also asked if there were any other online clients for Sphere.)
– The current size of Wikipedia is arguably beyond what any team of single sized humans could check for accuracy, so the idea here is that Sphere is used to automatically analyze hundreds of thousands of citations simultaneously to spot when a quote doesn’t have much support on the wider web: “If a quote seems irrelevant, our model will suggest a more applicable source, even pointing to the specific passage that supports the claim,” he said. he noted. It seems that the editors select the passages that need to be checked at the moment. “Ultimately, our goal is to create a platform to help Wikipedia editors systematically spot citation issues and quickly fix the citation or correct the corresponding article content at scale.”