Skip to content

Further read optimization #1

@oliverpool

Description

@oliverpool

Hi, first of all thank you for sharing your code!

I was quite impressed by the reading speed, and I think that I found a way to make this part a bit faster:

1brc/main.go

Lines 131 to 155 in 8513d5e

// spawn a goroutine to read file in chunks and send it to the chunk channel for further processing
go func() {
buf := make([]byte, chunkSize)
leftover := make([]byte, 0, chunkSize)
for {
readTotal, err := file.Read(buf)
if err != nil {
if errors.Is(err, io.EOF) {
break
}
panic(err)
}
buf = buf[:readTotal]
toSend := make([]byte, readTotal)
copy(toSend, buf)
lastNewLineIndex := bytes.LastIndex(buf, []byte{'\n'})
toSend = append(leftover, buf[:lastNewLineIndex+1]...)
leftover = make([]byte, len(buf[lastNewLineIndex+1:]))
copy(leftover, buf[lastNewLineIndex+1:])
chunkStream <- toSend

Currently each loop calls 2 times make([]byte) and copy. This could be reduced to 1, by storing the leftover as a length, instead of a byte slice:

		buf := make([]byte, chunkSize)
		leftover := 0
		for {
			n, err := file.Read(buf[leftover:]) // append to the leftover
			if err != nil {
				if errors.Is(err, io.EOF) {
					break
				}
				panic(err)
			}
			toSend := buf[:leftover+n]

			lastNewLineIndex := bytes.LastIndexByte(toSend, '\n')

			buf = make([]byte, chunkSize) // prepare a new buffer for next read
			leftover = copy(buf, toSend[lastNewLineIndex+1:])

			chunkStream <- toSend[:lastNewLineIndex+1]
		}

On a sample file, this code is about 10% faster than the current version

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions