Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading data using the executeSelect API is slow #2764

Open
o-shevchenko opened this issue Nov 11, 2024 · 1 comment
Open

Reading data using the executeSelect API is slow #2764

o-shevchenko opened this issue Nov 11, 2024 · 1 comment
Labels
api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API.

Comments

@o-shevchenko
Copy link

o-shevchenko commented Nov 11, 2024

We use executeSelect API to run SQL query and read results from BigQuery. We expected a good speed based on

Reading data using executeSelectAPI is extremely slow.
Reading of 100_000 rows takes 23930 ms.
The profiling showed no prominent places where we spent most of the time.

Are there any recent changes that might cause performance degradation for such an API?
Do you have a benchmark to understand what performance we should expect?
Thanks!

Environment details

  1. com.google.cloud:google-cloud-bigquery:2.43.3
  2. Mac OS Sonoma M1
  3. Java version: 17

Code example

Mono.fromCallable { bigQueryOptionsBuilder.build().service }
            .flatMap { context ->
                val connectionSettings = ConnectionSettings.newBuilder()
                    .setRequestTimeout(10L)
                    .setUseReadAPI(true)
                    .setMaxResults(1000)
                    .setNumBufferedRows(1000)
                    .setUseQueryCache(true)
                    .build();
                val connection = context.createConnection(connectionSettings)
                val bqResult = connection.executeSelect(sql)
                val result = Flux.usingWhen(
                    Mono.just(bqResult.resultSet),
                    { resultSet -> resultSet.toFlux(bqResult.schema) },
                    { _ -> Mono.fromRunnable<Unit> { connection.close() } }
                )
                Mono.just(Data(result, bqResult.schema.toSchema()))
            }
            ...
            
fun ResultSet.toFlux(schema:Schema): Flux<DataRecord> {
    return Flux.generate<DataRecord> { sink ->
        if (next()) {
            sink.next(toDataRecord(schema))
        } else {
            sink.complete()
        }
    }
}
@product-auto-label product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API. label Nov 11, 2024
@o-shevchenko o-shevchenko changed the title Read data via executeSelect API is slow Read data with executeSelect API is slow Nov 11, 2024
@o-shevchenko o-shevchenko changed the title Read data with executeSelect API is slow Reading data using the executeSelect API is slow Nov 11, 2024
@o-shevchenko
Copy link
Author

o-shevchenko commented Nov 14, 2024

I've created a simplified test to show performance:

@Test
    fun `test read`() {
        val sql =
            """
            SELECT *
            FROM `pr`
            """.trimIndent().replace("\n", " ")
        val connectionSettings = ConnectionSettings.newBuilder()
            .setRequestTimeout(300)
            .setUseReadAPI(true)
            .setMaxResults(5000)
            .setUseQueryCache(true)
            .build()
        val connection = bigQueryOptionsBuilder.build().service.createConnection(connectionSettings)
        val bqResult = connection.executeSelect(sql)
        val resultSet = bqResult.resultSet

        var n = 1
        var lastTime = Instant.now()
        while (++n < 1_000_000 && resultSet.next()) {
            if (n % 30_000 == 0) {
                val now = Instant.now()
                val duration = Duration.between(lastTime, now)
                println("ROW $n Time: ${duration.toMillis()} ms ${DateTimeFormatter.ISO_INSTANT.format(now)}")
                lastTime = now
            }
        }
   }     
ROW 30000 Time: 5516 ms 2024-11-14T12:35:54.354169Z
ROW 60000 Time: 11230 ms 2024-11-14T12:36:05.585005Z
ROW 90000 Time: 5645 ms 2024-11-14T12:36:11.230378Z
ROW 120000 Time: 5331 ms 2024-11-14T12:36:16.561915Z
ROW 150000 Time: 5458 ms 2024-11-14T12:36:22.019994Z
ROW 180000 Time: 5391 ms 2024-11-14T12:36:27.411807Z

~5sec to read 30000 rows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API.
Projects
None yet
Development

No branches or pull requests

1 participant