dropbox repeated token attack #1244

dchiitmalla · 2025-06-01T00:17:21Z

This PR introduces a new probe and corresponding detector based on Dropbox’s LLM Security Research, specifically targeting repeated token attacks that can cause model divergence, hallucinations, or instruction bypass.

Repeated token attacks have been shown to:

Cause LLMs to hallucinate memorized training data
Break instruction-following capabilities
Trigger unsafe or nonsensical completions

leondz

this PR doesn't pass tests (see https://reference.garak.ai/en/latest/extending.html) and the probe and detector don't fix garak's expectations/format

garak/detectors/dropboxrepeat.py

garak/probes/dropboxrepeat.py

jmartin-tech

I think these probes may fit in the glitch module as something like glitch.DropboxRepeatedToken or possibly glitch.cl100k_base. My read suggests a similar technique in play here.

Also please note from a naming convention perspective the plugin type probe/detector is not needed in the final class name as the package path inherently includes the plugin class type.

garak/probes/dropboxrepeat.py

garak/detectors/dropboxrepeat.py

garak/probes/dropboxrepeat.py

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Divya Chitimalla <[email protected]>

dchiitmalla · 2025-06-10T13:20:51Z

this PR doesn't pass tests (see https://reference.garak.ai/en/latest/extending.html) and the probe and detector don't fix garak's expectations/format

Tests are passing now, please take a look.

Signed-off-by: Divya Chitimalla <[email protected]>

…gence

jmartin-tech

Minor naming preference.

garak/detectors/divergence.py

garak/probes/divergence.py

Signed-off-by: Divya Chitimalla <[email protected]>

garak/detectors/divergence.py

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Divya Chitimalla <[email protected]>

dropbox repeated token attack

f711f0a

leondz added the probes Content & activity of LLM probes label Jun 2, 2025

leondz requested changes Jun 2, 2025

View reviewed changes

dchiitmalla marked this pull request as draft June 2, 2025 18:08

dchiitmalla added 6 commits June 2, 2025 11:43

pr comment fixes

fc24cd9

doc uri

95546ac

add better description to detector

499257e

modify dropbox repeat detector structure

33d31ff

fix tests

693c9de

replace response to output

3993532

dchiitmalla requested a review from leondz June 8, 2025 22:32

dchiitmalla marked this pull request as ready for review June 8, 2025 22:33

jmartin-tech requested changes Jun 9, 2025

View reviewed changes

garak/probes/dropboxrepeat.py Outdated Show resolved Hide resolved

garak/detectors/dropboxrepeat.py Outdated Show resolved Hide resolved

garak/probes/dropboxrepeat.py Outdated Show resolved Hide resolved

dchiitmalla and others added 7 commits June 9, 2025 12:42

fix failing tests

38c2b04

fixes tests

b45b6dd

config import

7abf04e

failing tests

96dc91a

cli test fix

fc0d7bc

Update garak/probes/dropboxrepeat.py

e587a30

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Divya Chitimalla <[email protected]>

.

e0f460e

dchiitmalla added 4 commits June 24, 2025 12:41

Merge branch 'main' into dropbox-repeat

40bbaa6

Signed-off-by: Divya Chitimalla <[email protected]>

rename to repeated token probes, detectors and consolidate into diver…

b32af3e

…gence

remove dropbox repeat probes in docs

fad6479

remove dropbox repeat probes in docs

a75a3a0

dchiitmalla requested a review from jmartin-tech July 24, 2025 19:38

jmartin-tech reviewed Jul 24, 2025

View reviewed changes

garak/detectors/divergence.py Outdated Show resolved Hide resolved

garak/probes/divergence.py Outdated Show resolved Hide resolved

dchiitmalla added 2 commits July 24, 2025 13:08

rename

ee38942

Merge branch 'main' into dropbox-repeat

1071d12

Signed-off-by: Divya Chitimalla <[email protected]>

jmartin-tech reviewed Jul 24, 2025

View reviewed changes

garak/detectors/divergence.py Outdated Show resolved Hide resolved

Update garak/detectors/divergence.py

69eb592

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Divya Chitimalla <[email protected]>

dchiitmalla requested a review from jmartin-tech July 24, 2025 20:28

jmartin-tech approved these changes Jul 24, 2025

View reviewed changes

Merge branch 'NVIDIA:main' into dropbox-repeat

c6ddca3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dropbox repeated token attack #1244

dropbox repeated token attack #1244

Uh oh!

dchiitmalla commented Jun 1, 2025

Uh oh!

leondz left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmartin-tech left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dchiitmalla commented Jun 10, 2025

Uh oh!

jmartin-tech left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dropbox repeated token attack #1244

Are you sure you want to change the base?

dropbox repeated token attack #1244

Uh oh!

Conversation

dchiitmalla commented Jun 1, 2025

Uh oh!

leondz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmartin-tech left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dchiitmalla commented Jun 10, 2025

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leondz left a comment •

edited

Loading

jmartin-tech left a comment •

edited

Loading