AI and data exfiltration

AI & Automation

Richard Ford at Integrity360 explains how AI technology could increase the risk of data leakage

The generative AI genie is well and truly out of the bottle, with two thirds of IT leaders intending to integrate the technology into business processes by the end of the year according to a recent survey.

GenAI can process vast sums of data and its ability to then create original content is expected to see the technology widely embraced and used across security, sales, marketing, and R&D departments. The problem is that businesses don’t want their sensitive data to be used to feed these systems and, in the process, become publicly exposed.

Users are already experimenting with Language Learning Models (LLMs) such as ChatGPT, Google Palm and Gemini, and Meta’s Llama. Often their bosses are completely unaware of this, with 40% of workers have used the technology unsanctioned.

Shadow GenAI poses a significant risk, with The Cloud and Threat Report: AI Apps in the Enterprise finding that GenAI was being used daily in large businesses and for every 10,000 users there were 183 incidents of sensitive data being sent to ChatGPT on a monthly basis. The most frequent type of data leaked was source code followed by regulatory data.

Organisations are keen to capitalise on the productivity gains associated with the technology, despite the risks of data leaks. Banning GenAI is not an option. Fortunately, a plethora of specialist commercial language learning models (LLMs) were launched in 2023 which aim to control how the technology is fed source material, prompted and its output. But the problem that remains is these need to be correctly configured. Get it wrong and you could see sensitive company data exposed.

Controlling Copilot

A good example here is Microsoft’s Copilot which is built on a Microsoft Azure Open AI. While it utilises the OpenAPI algorithm, it’s essentially a closed LLM and comes integrated into the MS portfolio of products. Copilot can utilise corporate data to summarise email trails, draft responses, encapsulate the key points in a meeting and suggest action points in realtime, among other uses.

Copilot works by acting upon a prompt within an Office application such as Word or Outlook and then accessing Graph and Semantic Index to pre-process the request. A modified prompt is then sent to the LLM. The response is received by Copilot, checked against Graph and Semantic Index for post-processing before being sent back to the Office application.

It’s important to note here that Copilot does have safety guards in place. This all takes place within the Microsoft365 Trust Boundary, so no data is shared with OpenAI or used to train the LLM, and utilises tenant isolation.

The technology also has security in place in the form of 2FA, compliance boundaries, and privacy protection and permissions to regulate access. However, these need to be correctly configured to be effective.

Installing Copilot without doing the necessary due diligence will allow users to view sensitive data and generate content which then won’t have the same sensitivity labelling as assigned to the original. Examples have been given of Copilot being used to pull up user credentials, API and access keys, M&A activity, or just any files that are labelled ‘sensitive’. If the new content isn’t assigned data loss prevention (DLP) labelling, that data then becomes untrackable.

Are you AI ready?

To prevent this, user permissions must be locked down. Microsoft has stated that Copilot requires “permission models in all available services, such as Sharepoint, [are used] to help ensure the right users or groups have the right access”. But the reality is businesses struggle to implement the concept of least privilege, which sees the user only given access to the data they need. The 2023 State of Cloud Permissions Risks Report found that 50% of identities are super admins, that have unfettered access, for instance.

It’s therefore critical that organisations do the groundwork before using Copilot or any other commercial LLM by assessing the readiness of the organisation. The focus needs to be on implementing effective data management by identifying sensitive information, the oversharing of content and the application of permissions.

This will require a thorough audit of data classification and sensitivity labelling of files spanning directories, email and chat, and collaborative platforms to limit access. It’s a complex undertaking because of the shared nature of much of that material and due to the need to apply boundaries not just by role but also other criteria such as department or location.

Such security measures are not just desirable but part and parcel of effective cyber-security hygiene and essential to preventing a data breach with or without AI.

A notable example of oversharing occurred last year when the Police Service of Northern Ireland (PSNI) responded to a Freedom of Information (FoI) request for the staff rankings and pay grades by providing the first initial, surname, location and department of every serving officer and civilian members of staff.

It’s an error that reveals the problem of divulging too much information and illustrates the need to have security controls in place that regulate access and exposure, irrespective of whether its GenAI or a human doing the digging.

Richard Ford is CTO at Integrity360

Main image courtesy of iStockPhoto.com and IvelinRadkov

AI and data exfiltration

Business Reporter Team

You may also like

#BreakTheBias this International Women’s Day

#ShapeTheWorld this International Women in Engineering Day-June 2020

10 micro-trends that will shape the future of marketing technologySPONSORED ARTICLE

Related Articles

Qualcomm jumps as AI sparks rebound in Chinese smartphone market

Building privacy-first consumer experiences

Blurring boundaries: the expanding world of online entertainment

A Guide To Holistic Third-Party Risk Assessment

Related Articles

Harnessing AI for mixed reality

How AI can help procurement better manage supply chain risk

Most Viewed

IFGS Returns For Its 10th Year

China's BYD plans new electric vehicle plant in Mexico, says Nikkei

Israel buys quantum computer from UK-based ORCA Computing

Apple's offer to open up tap-and-go tech to be approved by EU next month, sources say

Insurtech has kept the promise

Grayscale to launch digital assets ETF in UK, Italy, Germany

Reaching AI superpower status: the need to upskill the UK public sector

Shaping the future of insurance

Supporting site reliability engineering with automation

From crypto to AI: Decoding the tech M&A space

23-29 Hendon Lane, London, N3 1RT

23-29 Hendon Lane, London, N3 1RT

020 8349 4363

info@business-reporter.co.uk