Data Privacy Week Best Practices for LLMs and Generative AI

Data Privacy Week 2025

It’s been remarked that data is the oil of the information age. And if that’s the case, then Data Privacy Week is a good reminder that your information is valuable. Held this year between January 27 and 31, the international event advises individuals on how to safeguard their data, manage privacy settings, and make more informed decisions about who (and what) receives that data.

But this year’s Data Privacy Week is a good moment for organizations to consider their data protection practices as well. With more solutions deploying AI and ingesting users’ data, hybrid work, and more users, devices, entitlements, and environments than ever, there’s are new and complex risks to organizations’ data.

For Data Privacy Week, I’ll review some of those new risks. I’ll explain how new digital assistants and AI models can introduce new security risks, why identity governance and administration (IGA) solutions form the basis for data protection, and suggest some best practices organizations can take to keep their data secure.

Take control of your data

The theme of Data Privacy Week this year is “take control of your data.” That’s a good goal to have, but like most things it’s easier said than done. Every device, user, machine account, and resource creates, transmits, and processes data. That’s why organizations need an IGA program, which provides organizations with the capabilities they need to:

Maintain visibility and control over all organization data: You can’t control what you don’t see or understand. IGA helps ensure the right accounts have access to the right data by enforcing regular access reviews and empowering data owners to approve or revoke permissions as needed. It also provides your admins with visibility into what users currently can access
Ensure continuous compliance: One benefit of IGA and the visibility into your organization’s data? Demonstrating that control to regulators. IGA can automate workflows, audit reporting, and access certifications to demonstrate compliance with GDPR, GLB, SOX, PCI-DSS, HIPAA, ARPA, and other key regulations.
Protect sensitive data: IGA solutions can provide your security team with actionable insights into data access to identify and mitigate unauthorized access, helping protect customer data and intellectual property.

Generative AI creates new data privacy risks

Generative AI models and Large Language Models (LLMs) introduce new risks that organizations need to account for. Instances of Microsoft Copilot and ChatGPT rely on user inputs to train their models. Broadly speaking, that means that whatever a user pastes into a prompt becomes part of the model. Moreover, if your organization uses an AI assistant, then the tool may have wider access to organizational data, and it may not have safeguards limiting when, if, or how it should use that data.

That can obviously introduce significant risks. Look at Microsoft’s Azure Health Bot service, which “enabled lateral movement throughout the network, and thus access to sensitive patient data,” per TechRadar. An April 2024 report noted that 20% of UK companies “had potentially sensitive corporate data exposed via employee use of generative AI (GenAI),” per Infosecurity Magazine. Cisco estimated that a quarter of companies have banned generative AI over these concerns.

These concerns are well founded. LLMs need a wide array of customer data to train their models, and users have been trained to trust the “magic box” from Google searches and type in whatever they’re looking for without fear. That can put financial information, IP, PII, or other sensitive data at risk.

It’s not just user inputs that are risky. If an LLM is being trained on your data, then third parties may be able to query it to find information you’d rather not be made public. Likewise, there’s risk that the assistant itself might broadcast information on whatever outbound channel it can communicate to.

Best practices for keeping LLMs secure

If your organization is going to install a digital assistant, there are some best practices you need to take to keep it and your team secure.

First, have your leadership team explain what the risks are. Articulate what types of information users can input and what they can’t. At RSA, we’ve explained to our team that they can input information that’s intended for public consumption . Anything else shouldn’t be part of a user query.

Second, make certain to turn on the right controls. Those might not always be the assistant’s default settings: If configured incorrectly, AI assistants could have access to everything organization-wide. Make sure that your chatbot knows what information it can query and what information it can return. Organizations need to silo the data appropriately, so that User X doesn’t receive responses based on User Y’s files; otherwise, you risk having your employees reading each other’s emails and accessing unintended information.

You also need to know what your system can access, when it has that access, and when it makes recommendations to users. Be certain to apply those rulesets under all the AI tools you’ve deployed (there are multiple versions of Microsoft’s Copilot). And those tools need to fit under an aligned security posture.

Finally, ask your vendor what happens with your searchers, data, and responses. Does the vendor keep it? If so, for how long? Are other instances of the solution trained on your model, or does it all stay private to you?

The right AI for cybersecurity

Largely I’ve discussed the risks that LLMs and generative AI pose to organizations’ cybersecurity postures. Those models of AI tend to get most of the headlines and represent some of the biggest risks, if only because more users are entering prompts into ChatGPT, Copilot, etc.

Cybersecurity can also benefit from AI—but your team needs to use the right model. Largely speaking, LLMs and generative AI aren’t right for security operations now. They’re a black box that produces results a human operator can’t always validate, and their outputs aren’t always helpful.

Those models are based on non-deterministic models (I put in X values, I don’t know what comes out). Deterministic models (I put in X values, I know what the result will be and how the solution will arrive at that output) can be extremely useful for cybersecurity.

We’ve been using RSA^® Risk AI for decades to provide real-time insights into authentication requests. Risk AI is a risk-based deterministic machine learning model that assesses a user’s IP address and network signals, behavior analytics, geolocation, time-based signals, application signals, device-related signals, and more to evaluate risk. If those signals and behaviors reflect the user’s typical behavior relative to themselves and the rest of the organization, then they’re deemed low-risk and can authenticate using standard methods. If those behaviors deviate significantly, then the system automates a step-up authentication challenge and can flag it to the security team.

Importantly, RSA Risk AI does not collect organizations’ information, and we don’t use any information from a given deployment to train future iterations. We hash and tokenize all Risk AI data. Every Risk AI instance is deployed for each customer organization to organization, and those deployments are fine-tuned on their data and their data only.

Your data is valuable. Keep it secure

Don’t let the technical best practices or points about security architecture obscure the main point: your data is valuable. It’s well worth organizations’ time and resources to invest in the tools, processes, and procedures that can keep it secure.

If you have any questions about how to do that, we’re here to help.

Data Privacy Week Best Practices for LLMs and Generative AI

Request a Demo