
While reading the latest WSJ interview with the OpenAI CTO, I was 🤔curious to know what she meant by 😱“publicly available data” when the interviewer asked about the training data for SORA(their text-to-video generator service). After going through the latest interview of WSJ with OpenAI CTO, I was keen on what she meant by “Publicly available data” when the interviewer asked about the training data for SORA(their text-to-video generator service). The reason I was keen was to know whether they use data(like pictures and videos) from public social media accounts to train their models for Dall-E and SORA.
So I asked SORA’s 👪family member ChatGPT what “publicly available data” really means and whether it includes data from public social media accounts. Yes, it does😶. However, whether OpenAI uses this data or not would remain unanswered unless OpenAI discloses this, but there is a possibility that OpenAI may use data from public social media accounts to train its models.
I couldn’t find much info on this, so let me know if you found anything interesting on this.✍️
April 13, 2025
