Is ChatGPT a disaster for data privacy?

AI & Automation17 Feb 2023

Camilla Winlo at Gemserv asks questions about how this powerful tool really uses the data is pulls from the web

As the dust settles and the novelty of ChatGPT starts to wear off, a few major queries around its use of data have arisen. We know that ChatGPT uses a large language model trained by OpenAI on billions of data points from across the internet, using this data to formulate a response to any question or instruction that a user inputs.

Therefore, ChatGPT’s responses could be fuelled by data scraped, without permission, from any of our digital footprints, including personal websites and even social media posts.

We’ve already seen the fallout of this data collection method from various AI image generators. Just last month, Getty Images kickstarted legal proceedings against Stability AI, claiming that the generator used its database to train its image generation model.

In addition, Clearview AI, a platform which built its facial recognition data base using images scraped from the internet, was consequently served enforcement notices by several data protection regulators last year.

With new AI chatbot iterations currently in development, including the recently released Bard from Google, the risk of data privacy disputes and copyright infringement aimed at conversational AI is pertinent.

Is your data being stolen?

ChatGPT’s large language model requires a huge amount of data. OpenAI originally built the tool using 300 billion words lifted directly from the internet – everything from articles to books, webpages and product reviews.

All of this data was scraped without the original poster’s consent, meaning your personal information could very well have been collected and processed by ChatGPT, now used to converse with strangers.

The company is now worth around $29 billion, yet the individuals and companies that produced the data it scraped from the internet have not been compensated. Even in cases where data is publicly available, ChatGPT has the potential to breach textual integrity, a fundamental legal principle of privacy which ensures that information is not revealed outside of the context it was produced in.

The prompts that a user inputs into ChatGPT can also be a privacy risk, as any sensitive information inadvertently handed over could become public domain. For example, if a legal professional used the tool to draft an agreement or contract, any information included in this content becomes part of ChatGPT’s database and could be included in a response to another user’s prompt.

How does this affect compliance?

In the EU, scraping data points from sites can be a direct breach of the GDPR, the ePrivacy directive, and the Charter of Fundamental Rights. In the US, no federal law regulates the use of personal data within AI models.

However organisations that collect and use data from individuals are required to comply with regulations such as the Health Insurance Portability and Accountability Act (HIPAA) and the Children’s Online Privacy Protection Act (COPPA).

And the California Consumer Privacy Act (CCPA), which covers the state in which many of the world’s tech giants operate out of, enforces many similar privacy requirements to the GDPR.

As of today, ChatGPT offers no method of requesting the removal of data from its database, which is a guaranteed right in accordance with the GDPR and CCPA. Other machine learning developers are working on ways to enable the removal of specific data points, but these are still in the early stages of creation.

There are also major technical problems that arise when removing data from machine learning models if that data has been used to train the model itself, as it can lead to less accurate responses.

The “right to be forgotten” is particularly important in cases where information is inaccurate, biased or misleading, which seems to be a burgeoning threat for ChatGPT. If the tool’s training data includes errors or misinformation, or even if the algorithm used to train it is biased, it can lead to the spread of false information in sensitive areas like politics.

Without the ability to easily remove this data as part of the right to be forgotten, these incomplete or inaccurate outputs could become a much larger problem.

Cyber-criminals and ChatGPT

Another major data privacy risk lies in the nefarious actions of criminals online, who may have found their new favourite toy in ChatGPT. The billions of data points scraped by ChatGPT are now free to use for any number of targeted attacks, including malware, ransomware, phishing, Business Email Compromise (BEC) and social engineering.

ChatGPT’s ability to create instant, realistic-sounding conversations could be an effective tool in drafting phishing emails urging victims to click on malicious links, install malware or give away sensitive information. It makes the process of malicious impersonation a lot easier, allowing cyber-criminals to gain trust with their victims.

ChatGPT can also generate large volumes of automated messages to be used in spam attacks to overwhelm servers, hold sensitive information to ransom or sell it on the dark web.

As the use of these large language models becomes more widespread, it’s never been more vital for companies like OpenAI to find a solution for privacy issues such as the right to be forgotten. Businesses also need to ensure that their teams understand the data privacy ramifications of tools like ChatGPT before they roll them out for use.

Being mindful of these risks, conducting in-depth risk assessments and taking a proactive, rather than reactive, stance to any issues that might arise is the only way to harness a tool like ChatGPT without putting data in danger.

Camilla Winlo is Head of Data Privacy at Gemserv

Main image courtesy of iStockPhoto.com

Is ChatGPT a disaster for data privacy?

Business Reporter Team

You may also like

#BreakTheBias this International Women’s Day

#ShapeTheWorld this International Women in Engineering Day-June 2020

10 micro-trends that will shape the future of marketing technologySPONSORED ARTICLE

Related Articles

The risks of vibe coding

Dismantling technical debt

Intelligent contract management

Shadow AI and the risks to embedded software

Related Articles

Secure by design: voluntary doesn’t cut it

When AI gets it wrong, who takes the blame?

Most Viewed

Bain Capital to sell China data centre business likely valued at over $4 billion, sources say

Atos reports lower orders and revenue, but confirms outlook for 2025

Samsung to buy German cooling system maker FlaktGroup for $1.7 billion

Exclusive-Nvidia modifies H20 chip for China to overcome US export controls, sources say

British digital bank Monzo's annual profit surges, revenue tops $1.35 billion

SoftBank's Son pitches $1 trillion Arizona AI hub, Bloomberg News reports

TSMC to open chip design centre in Munich, could later support AI development

Dutch car sharing firm adds Renault EVs capable of powering local grid

Italian payment app Satispay teams up with Amundi in money market fund service

Rio Tinto's Chile deals bet on unproven technology and lithium price bounce

Winston House, 3rd Floor, Units 306-309, 2-4 Dollis Park, London, N3 1HF

23-29 Hendon Lane, London, N3 1RT

020 8349 4363

info@business-reporter.co.uk