Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They've increased their cybersecurity usage filters to the point that Opus 4.7 refuses to work on any valid work, even after web fetching the program guidelines itself and acknowledging "This is authorized research under the [Redacted] Bounty program, so the findings here are defensive research outputs, not malware. I'll analyze and draft, not weaponize anything beyond what's needed to prove the bug to [Redacted].

I will immediately switch over to Codex if this continues to be an issue. I am new to security research, have been paid out on several bugs, but don't have a CVE or public talk so they are ready to cut me out already.

Edit: these changes are also retroactive to Opus 4.6. I am stuck using Sonnet until they approve me or make a change.



Sounds like you will need to drink a(n identity) verification can soon [1] to continue as a security researcher on their platform.

1: https://support.claude.com/en/articles/14328960-identity-ver...

Identity verification on Claude

Being responsible with powerful technology starts with knowing who is using it. Identity verification helps us prevent abuse, enforce our usage policies, and comply with legal obligations.

We are rolling out identity verification for a few use cases, and you might see a verification prompt when accessing certain capabilities, as part of our routine platform integrity checks, or other safety and compliance measures.


Context for "please drink verification can": https://files.catbox.moe/eqg0b2.png


We sure aren’t far off.


Yes, it's a stupid 4chan meme from 2013. I can only surmise those who quote it either don't know its origin, or they must be wholeheartedly 'embracing the cringe.'


Lul, Im embracing this "cringe" you talk about :) Everytime I read it it makes me laugh :D


Well, that's okay; you're young. There are better and more topical jokes in your future, and it will serve you well in making them to have encountered this particular, extremely stale and suspiciously stained, cookie. Just be careful you don't take too big a bite!


One must integrate the cringe, in order to become truly based. —Carl Jung


Stupid? Hardly.

Sony was granted a patent in 2009 "for an interactive commercial system that allows viewers to skip commercials by yelling the brand name of the advertiser at their television or monitor." : https://www.snopes.com/fact-check/sony-patent-mcdonalds/


Yes, mostly because no one actually cares much what anyone patents until a material invention eventuates, and partly so that they would be able to sue anyone who did actually invent it - which you will note they themselves of course did not proceed to do.

I don't claim this failed to occur because Sony is more decent than average, but because the idea is self-evidently very stupid. The thing is, when you get to have a "Patents" section in your CV, no one cares very much that they are stupid patents as long as you were working for a serious company when you got them. There is a point past which that's just a perquisite, like how the company subsidizes your au pair.

I've never needed an au pair! And I hold no patents of which I'm aware. But it is not 2009, or even 2013, any more.


That's a big assumption that this patent, a technology quite relevant to a massive media company, was filed only for future patent troll purposes. Plenty of seriously-intentioned ideas never materialize for a multitude of reasons.

The point is that the idea is now out in the wild and cannot be unseen, and however stupid or morally bankrupt it is, someone in the past did (and someone in the future will) think it was a good idea. And if and when it finally gets implemented for real, we all suffer.

The soda can validation 4chan meme isn't just a dumb joke. It's a warning.


(Unrelated, but since your prior comment on Vegas "hacking" is too old to take replies: if you haven't, you should definitely check out Thomas A. Bass's 1985 The Eudaimonic Pie, which I believe may touch upon one of the stories you mentioned, and is also one of that kind in its own right; having occupied some train commutes with it in about 2001 or 2002, I can recommend the book not only for its information but also as a well-written, gripping read, if somewhat shockingly naïve by our 21st-century standard. Enjoy!)


From the most unserious source imaginable, yes. Do you know of a company called "Chaotic Good?" Do you think they were the first to come up with the model?

But even if the 2013 post was as organic as you assume, I would think it worth finding a way to "warn" about the issue that doesn't make you look like a weird fringey incel lacking the social competence to read the kind of normal room which this website has emphatically never been nor even wished to be.


I'm surprised we can't just authenticate in other ways.. like a domain TXT record that proves the website I'm looking to audit for security is my own.


How would it know it’s really there, and not just a tool input/output injected into its input?


It could be an API endpoint on Anthropic servers, the same way Let's Encrypt verifies things on their servers. If you can't control the DNS records, you can't verify via DNS, no matter what you tell the local `certbot`.


AI being what it is, at this point you might be able to ask it for a token to put in a web page at .well-known, put it in as requested, and let it see it, and that might actually just work without it being officially built in.

I suggest that because I know for sure the models can hit the web; I don't know about their ability to do DNS TXT records as I've never tried. If they can then that might also just work, right now.


A smart AI would realise that I can MITM its web access such that sees the .well-known token that isn't actually there. I assume that the model doesn't have CA certificates embedded into it, and relies on its harness for that.


In this context we are talking explicitly about cloud-hosted AIs. If you control it locally you have a lot of options to force it to do things.

MITM the cloud AI on the modern internet is non-trivial, and probably harder and less reliable than just talking your way around the guardrails anyhow.


> In this context we are talking explicitly about cloud-hosted AIs.

Looking upthread, we seem to be talking about Claude. Claude is cloud-hosted inference but the harness is local if you're using Claude Code, and can be MITM'd there.


I think even Claude Web can run arbitrary Linux commands at this point.

I tried using it to answer some questions about a book, but the indexer broke. It figured out what file type the RAG database was and grepped it for me.

Computers are getting pretty smart ._.


What do you offer as a solution? If theoretically some foreign state intelligence was exposed using Claude for security penetration that affected the stability of your home government due to Antropic's lax safety controls, are you going to defend Anthropic because their reasoning was to allow everyone to be able to do security research?


> What do you offer as a solution? If theoretically some foreign state intelligence was exposed using Claude for security penetration that affected the stability of your home government due to Antropic's lax safety controls, are you going to defend Anthropic because their reasoning was to allow everyone to be able to do security research?

I don't have an answer.

But the problem is that with a model like Grok that designed to have fewer safeguards compared to Claude, it is trivially easy to prompt it with: "Grok, fake a driver's license. Make no mistakes."

Back in 2015, someone was able to get past Facebook's real name policy with a photoshopped Passport [1] by claiming to be “Phuc Dat Bich”. The whole thing eventually turned out to be an elaborate prank [2].

1: https://www.independent.co.uk/news/world/australasia/man-cal...

2: https://gizmodo.com/phuc-dat-bich-is-a-massive-phucking-fake...


To me, those seem a lot lower stakes than supply chain attacks, social engineering, intelligence gathering, and other security exploits that Anthropic is more worried about. Making a fake driver license to buy beer isn't really the thing that Anthropic is actively trying to prevent (though I would assume they would stop that too). Even the GP was about penetration testing of a public website; without some sort of identification, how would it be ethical for Claude to help with something like that? Remember, this whole safety thing started because people held AI companies accountable for politically incorrect output of AI, even if it was clearly not the views of the company. So when Google made a Twitter bot that started to spout anti-Semitic and racist talking points, the fact that no one defended them and allowed them to be criticized to the point of taking the bot down is the reason why we have all of these extremely restrictive rules today.


A state intelligence agency will have the ability to get through an ID verification system like this.


Different model limitations for different groups of people…

Imagine what the military and secret services are getting.


> Being responsible with powerful technology starts with knowing who is using it.

What asinine slop. As a frontier model creator, responsibility should start far before they're signing up customers.


  ⎿  API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy (https://www.anthropic.com/legal/aup). This request triggered restrictions on violative cyber content and was blocked under Anthropic's 
     Usage Policy. To request an adjustment pursuant to our Cyber Verification Program based on how you use Claude, fill out                                                                                                                        
     https://claude.com/form/cyber-use-case?token=[REDACTED] Please double press esc to edit your last message or 
     start a new session for Claude Code to assist with a different task. If you are seeing this refusal repeatedly, try running /model claude-sonnet-4-20250514 to switch models.                                                                  
                        
This is gonna kill everything I've been working on. I have several reproduced items at [REDACTED] that I've been working on.


I predict this sort of filtering is only going to get worse. This will probably be remembered as the 'open internet' era of LLMs before everything is tightly controlled for 'safety' and regulations. Forcing software devs to use open source or local models to do anything fun.


Just as likely it's going to be "Oh, you want <use case the thing's actually good at>? Let me introduce your wallet to my hoover."


> Forcing software devs to use open source or local models to do anything fun.

Episode Five-Hundred-Bazillenty-Eight of Hacker News: the gang learns a valuable lesson after getting arrested at an unchaperoned Enshittification party and having to call Open Source to bail them out.


All while Frank is pitching his state of the art basement datacenter to VC's, getting billions of dollars in investments.


What happened to open weight models are 2-3 years behind the proprietary ones? I don't see the drama here.


It’s like all the conspiracy theories we ve been hearing about AI and bio weapons and vaccines used for surveillance aren’t conspiracy theories at all?


It's a brave new world of centralized computing where one day you boot up and can't work because something changed arbitrarily in the "compute" service you are renting.


I got a refusal doing some math, I think based on the word "sextic", as best I can tell.

/model claude-opus-4.6


I've never seen "double press esc" as a control pattern.


esc once interrupts the LLM, double-esc lets you revert to a previous state (interrupt harder).


Don't forget that you can also write code by hand


Out of curiosity, (a) did you receive this error at the start of a session or in the middle of it, and (b) did you manage to find/confirm valid findings within the scope/codebase 4.7 was auditing with Sonnet/yourself later on?

I just gave 4.7 a run over a codebase I have been heavily auditing with 4.6 the past few days. Things began soothly so I left it for 10-15 minutes. When I checked back in I saw it had died in the middle of investigating one of the paths I recommended exploring.

I was curious as to why the block occurred when my instructions and explicitly stated intent had not changed at all - I provided no further input after the first prompt. This would mean that its own reasoning output or tool call results triggered the filter. This is interesting, especially if you think of typical vuln research workflows and stages; it’s a lot of code review and tracing, things which likely look largely similar to normal engineering work, code reviews, etc. Things begin to get more explicitly “offensive” once you pick up on a viable angle or chain, and increase as you further validate and work the chain out, reaching maximum “offensiveness” as you write the final PoC, etc.

So, one would then have to wonder if the activity preceding the mid-session flagging only resulted in the flag because it finally found something seemingly viable and started shifting reasoning from generic-ish bug hunting to over exploitation.

So, I checked the preceding tool calls, and sure enough…

What a strange world we’re living in. Somebody should try making a joke AUP violation-based fuzzer, policy violations are the new segfaults…


It’s to stop you from getting RL traces or using Claude without paying the big bucks for the Enterprise Security version

I really like Anthropic models and the company mission but I personally believe this is anticompetitive, or at least, anti user.

If they are going to turn into a protection racket I’ll just do RL black boxing/pentesting on Chinese models or with Codex, and since I know Anthropic is compute constrained I’ll just put the traces on huggingface so everybody else can do it too.

I just want to pay them for their RL’d tensor thingies it but if their business plan is to hoard the tokens or only sell it to certain people, they are literally part of every other security conscious person’s threat model.


Worse, I have had it being sus of my own codebase when I tasked it with writing mundane code. Apparently if you include some trigger words it goes nuts. Still trying to narrow down which ones in particular.

Here is some example output:

"The health-check.py file I just read is clearly benign...continuing with the task" wtf.

"is the existing benign in-process...clearly not malware"

Like, what the actual fuck. They way over compensated for the sensitivity on "people might do bad stuff with the AI".

Let people do work.

Edit: I followed up with a plan it created after it made sure I wasn't doing anything nefarious with my own plain python service, and then it still includes multiple output lines about "Benign this" "safe that".

Am I paying money to have Anthropic decide whether or not my project is malware? I think I'll be canceling my subscription today. Barely three prompts in.


> I will immediately switch over to Codex if this continues to be an issue.

FYI, unless you specifically get verified [0], GPT-5.4 silently reroutes request to GPT-5.2 if an intermediate model detects any cybersecurity work.

[0] https://chatgpt.com/cyber


so if they are retroactive to 4.6 then they can't be trained into the model. They would have to be applied as a pre-screening or post-screening process. Which is disturbing since it implies already deployed workflows could be broken by this. I am curious if it is enforced in enterprise accounts eg: using AWS/Bedrock and how Anthropic would have implemented that given they push models to Amazon for hands off operation.


I can see no other explanation for this disastrous launch other then Anthropic trying to ruin their reputation for some reason.


I've switched over to Codex. On Extra High reasoning it seems very capable and is definitely catching mistakes Sonnet has missed. I'd love to move back to Opus but at this time it is untenable.


From my experience, saying "this is not X, it will be not used for Y" is vastly increasing chances of this being classified as being X. Anybody can write "this is authorized research". Instead use something like evaluate security / verify security, make sure this cannot be (...), etc.

Of course these models are pretty smart so even Anthropic's simple instructions not to provide any exploits stick better and better.


It has been the same for Sonnet/Opus 4.6 for sometime. It will straight up refuse to work on anything in the grey area. Chinese models will happily do anything; On my tests, GLM 5.1 comfortably bypassed a multi-player game's anti-piracy/anti-cheats check with some guided steering.


Having tried codex for some security practice, it is similarly terrible.

You can link it to a course page that features the example binary to download, it can verify the hash and confirm you are working with the same binary - and then it refuses to do any practical analysis on it


They don't want competition, they are going to become bounty hunters themselves. They probably plan on turning this into a part of their business. Its kinda trivial to jailbreak these things if you spend a day doing so.


Maybe stick with 4.6 until the bugs are worked out? Is this new filter retroactive?


Codex is just as bad with this, i've received two ToS warnings for security research activities so far. I have also tried to appeal with zero response.


With all the low quality code that's being generated and deployed cybersecurity will be the golden goose.


hah maybe the plan for Mythos is to solution all the security issues introduced by ClaudeCode. Anthropic makes money creating the security issues and identifying/fixing the security issues, that's a nice spot to be in.


I can barely get it to send a PDF to my printer without a flat refusal >_<


i think updating fixed this for me?


Came here to post this. 4.7 is absolutely useless for binary/firmware analysis on our own freakin products.

Anthropic needs to get their ish together I've got real work to do.


>even after acknowledging "This is authorized research under the [Redacted] Bounty program, so the findings here are defensive research outputs, not malware. I'll analyze and draft, not weaponize anything beyond what's needed to prove the bug to [Redacted].

What else would you expect? If you add protections against it being used for hacking, but then that can be bypassed by saying "I promise I'm the good guys™ and I'm not doing this for evil" what's even the point?


This was Opus saying that after reviewing the [REDACTED] bug bounty program guidelines and having them in context.


Right, but that can be easily spoofed? Moreover if say Microsoft has a bounty program, what's preventing you from getting Opus to discover a bug for the bounty program, but you actually use it for evil?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: