As artificial intelligence (AI) systems grow more advanced and data-driven, organizations are turning to synthetic and de-identified data to power model development while purportedly reducing legal risk. The assumption is simple: if the data no longer identifies a specific individual, it falls outside the reach of stringent privacy laws and related obligations. But this assumption is increasingly being tested—not just by evolving statutes and regulatory enforcement, but by advances in re-identification techniques and shifting societal expectations about data ethics.
This article examines the complex and evolving legal terrain surrounding the use of synthetic and de-identified data in AI training. It analyzes the viability of “privacy by de-identification” strategies in light of re-identif