Add support for token dictionaries #336

disconnect3d · 2022-02-23T09:12:10Z

Hey,

We have had an internship project in Trail of Bits to improve go-fuzz recently which was done by @vfsrfs.

We are aware of the ongoing official work on native fuzzing support but since we still rely on go-fuzz we went ahead to improve its pain points and so that's why we propose this PR. Feel free to drop it if you feel it is too much or you do not want to introduce any changes in go-fuzz.

Below I am pasting the description from the original PR merged to our fork of go-fuzz (trail-of-forks#2).

This PR adds support for dictionaries containing interesting keywords (tokens) that are useful for the mutation of inputs while fuzzing, particularly, when fuzzing syntax-aware programs (#174). This modification allows to provide the -dict flag to go-fuzz, so that the user can provide a dictionary file with useful tokens for the fuzzing campaign. E.g.:

-dict /path/dictionary.dict

The tokens parsed from the dictionary are stored in ROData.strLits, as those are the string literals that are used by the mutator engine when generating new fuzzing inputs.

The dictionary format that is accepted by the -dict flag is the same that is used by AFL/Libfuzzer (see https://github.com/google/AFL/tree/master/dictionaries).

This dictionary format defines that there is one token per line. Every line consists of a name followed by an equal sign and the token in quotes (e.g. name=”token”). It is also possible to define binary sequences by providing the values in hex (e.g. \xNN) within the token. To insert a backslash or a double quote within the token, it has to be escaped using a backslash (e.g. \\ or \”). \n and \t are recognized as well, since they might be useful for text-based protocols. Other problematic characters can be added by providing its hex value.

To make this implementation fully compatible with AFL/Libfuzzer’s dictionaries, token levels are supported. A level can be appended to every token, by appending @<num> to the keyword, e.g.
keyword@1=”token”

These tokens will be loaded only, if the dictionary level is equal to or greater than the specified number. The default dictionary level is 0, but it can be increased by appending @<num> to the dictionary path. E.g.:

-dict /path/dictionary.dict@1

josharian · 2022-02-23T15:22:26Z

@thepudds I'm going to leave this to you to review, if you want.

Note that there is a similar outstanding PR at #315. It might be worth looking at it, and at the comments there.

CityOfLight77 · 2022-05-23T01:29:06Z

@thepudds @dvyukov

dvyukov · 2022-05-23T09:15:21Z

go-fuzz/main.go

+		_, err := os.Stat(*flagDict)
+		if err != nil {
+			// If not it might be because a dictLevel was provided by appending @<num> to the dict path
+			atIndex := strings.LastIndex(*flagDict, "@")


It seems that this logic can be simpler if we try to split by @ first. Otherwise we have too many branches and duplicated error handling. If there is no specific reason to try to open the file with "@" first, please split by @ before first stat.

dvyukov · 2022-05-23T09:15:49Z

go-fuzz/main.go

+				}
+				dictLevel, err = strconv.Atoi((*flagDict)[atIndex+1:])
+				if err != nil {
+					log.Printf("could not convert dict level using dict level 0 instead")


Why not Fatalf? That's incorrect user input.

dvyukov · 2022-05-23T09:17:31Z

go-fuzz/main.go

 		log.Fatalf("both -http and -worker are specified")
 	}

+	if *flagDict != "" {


Please move this logic to a helper function. It's too low level for main.

dvyukov · 2022-05-23T09:18:52Z

go-fuzz/hub.go

 	}
+
+	if dictPath != "" {
+		/*


Please use C-style single-line // comments.

dvyukov · 2022-05-23T09:21:28Z

go-fuzz/hub.go

+			}
+			token := parseDictTokenLine(&tokenLine, tokenLineNo)
+			if token != nil {
+				// add token to ro.strLits


This does not look useful. Remove.

dvyukov · 2022-05-23T09:22:28Z

go-fuzz/hub.go

 	restarts uint64
 }

+func parseDictTokenLine(tokenLine *[]byte, tokenLineNo int) *[]byte {


Please move this function to somewhere at the bottom (after use). It's not the most important function of the file to be first.

dvyukov · 2022-05-23T09:23:07Z

go-fuzz/hub.go

+			if bytes.HasPrefix(bytes.TrimSpace(tokenLine), []byte("#")) || len(tokenLine) == 0 {
+				continue
+			}
+			token := parseDictTokenLine(&tokenLine, tokenLineNo)


Why pass pointer to tokenLine?

dvyukov · 2022-05-23T09:23:59Z

go-fuzz/hub.go

 	restarts uint64
 }

+func parseDictTokenLine(tokenLine *[]byte, tokenLineNo int) *[]byte {


Why return pointer to []byte?

dvyukov · 2022-05-23T09:25:01Z

go-fuzz/hub.go

+					token = append(token, (*tokenLine)[index])
+					break
+
+				case byte('x'):


Do these need to be casted to byte?

vfsrfs added 5 commits January 7, 2022 13:42

add support for afl/libfuzzer token dictioneries

2bc12c7

add fallthrough in switch statement

370fa87

strip whitespaces

4e79d5e

add documentation regarding intLits

70b11b7

add a description for dictionaries in readme

7028835

dvyukov reviewed May 23, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for token dictionaries #336

Add support for token dictionaries #336

Uh oh!

disconnect3d commented Feb 23, 2022

Uh oh!

josharian commented Feb 23, 2022

Uh oh!

CityOfLight77 commented May 23, 2022

Uh oh!

dvyukov May 23, 2022

Uh oh!

dvyukov May 23, 2022

Uh oh!

dvyukov May 23, 2022

Uh oh!

dvyukov May 23, 2022

Uh oh!

dvyukov May 23, 2022

Uh oh!

dvyukov May 23, 2022

Uh oh!

dvyukov May 23, 2022

Uh oh!

dvyukov May 23, 2022

Uh oh!

dvyukov May 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add support for token dictionaries #336

Are you sure you want to change the base?

Add support for token dictionaries #336

Uh oh!

Conversation

disconnect3d commented Feb 23, 2022

Uh oh!

josharian commented Feb 23, 2022

Uh oh!

CityOfLight77 commented May 23, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants