The Tech Industry Said It Was "Impossible" to Create AI Based Entirely on Ethically-Sourced Data, So These Scientists Proved Them Wrong in Spectacular Fashion

A team of more than two dozen AI researchers from MIT, Cornell University, the University of Toronto, and other institutions have trained a large language model only using data that was openly licensed or in the public domain, the Washington Post reports, providing a blueprint for ethically developing the technology.

But, as the creators readily admit, it was far from easy.

As they describe in a yet-to-be-peer-reviewed paper published this week, it quickly became apparent that it wouldn't be computing power holding them back, but personpower.

That's because the text in the over eight terabyte dataset they put together, which they're calling the Common Pile v0.1, had to be manually cleaned up and reformatted to make it suitable for AI training, WaPo explains. Then there was the amazing a

See Full Page

1440

Interests (0)

Settings

The Tech Industry Said It Was "Impossible" to Create AI Based Entirely on Ethically-Sourced Data, So These Scientists Proved Them Wrong in Spectacular Fashion

As FIU fades, push for Trump presidential library shifts to downtown Miami

Trump and Musk's Turbulent Alliance Shakes GOP

Whoopi claims America 'can't elect a Black man' despite Obama presidency

Coco Gauff Shocks the World with French Open Victory!

Coco Gauff Defeats Aryna Sabalenka In 3 Sets To Win Her First French Open Title

Meet All the Women Elon Musk Has Children With

Russia responds to Trump-Musk feud with jokes, jibes and job offers

Investigators looking at who sent Hegseth's Signal texts, sources say