Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Spans from flag evaluation not being created in Java #1369

Open
julianocosta89 opened this issue Jul 24, 2024 · 9 comments
Open

[BUG] Spans from flag evaluation not being created in Java #1369

julianocosta89 opened this issue Jul 24, 2024 · 9 comments
Labels
bug Something isn't working Needs Triage This issue needs to be investigated by a maintainer

Comments

@julianocosta89
Copy link

Observed behavior

When updating the dependencies for the AdService in the OTel demo, I've noticed that the flagd spans are not being created in Java.
We do have the flagd spans connected with RecommendationService (Python), CartService (.NET) and Frontend-proxy (Envoy).

But for AdService no.
Even though, we follow the recommended approach documented here: https://flagd.dev/providers/java/#opentelemetry-tracing-rpc-only

https://github.com/open-telemetry/opentelemetry-demo/blob/main/src/adservice/src/main/java/oteldemo/AdService.java#L86-L94

Expected Behavior

Whenever AdService checks the state of the FeatureFlag, I'd expect 3 spans from flagd to be created, as it is happening with other services.

Steps to reproduce

git clone [email protected]:open-telemetry/opentelemetry-demo.git
cd opentelemetry-demo
docker compose up -d

Navigate to Jaeger UI: http://localhost:8080/jaeger/ui/

Check the received traces from AdService and Flagd.

@julianocosta89 julianocosta89 added bug Something isn't working Needs Triage This issue needs to be investigated by a maintainer labels Jul 24, 2024
@Kavindu-Dodan
Copy link
Contributor

@julianocosta89 I had a look into this. The service use withGlobalTelemetry(true) which internally obtains an OpenTelemetry instance from GlobalOpenTelemetry.get(). This follows Java manual instrumentation guidelines [1]. And in the AdService, there is no manual configuration initializations and this is the root cause for this observation.

[1] - https://opentelemetry.io/docs/languages/java/instrumentation/#manual-configuration

@julianocosta89
Copy link
Author

@Kavindu-Dodan we are using the OTel Java agent, isn't that done automatically?
We have another service, written in Kotlin, and for that it is working fine.
https://github.com/open-telemetry/opentelemetry-demo/blob/main/src/frauddetectionservice/src/main/kotlin/frauddetectionservice/main.kt#L32-L36

@julianocosta89
Copy link
Author

I've done some tests again and it seems that it is actually creating spans, but for some reason just once.
Here is the trace containing the flagd spans:
Screenshot 2024-07-25 at 12 21 23

Here are the metrics from one of the flags from AdService:
Screenshot 2024-07-25 at 12 20 43

As we can see, the flag is being queried every time AdService is called, but the spans from flagd are just arriving once.
Maybe we have some error in the implementation of AdService.

@beeme1mr
Copy link
Member

Hey @julianocosta89, this is an optimization in the Java provider that can disabled. Basically, a value can be cached if it's "static" and the provider is able to establish a persistent connection with flagd to bust the cache when the flag configuration changes.

Here are the configuration options that are available.
https://github.com/open-feature/java-sdk-contrib/tree/main/providers/flagd#configuration-options

@julianocosta89
Copy link
Author

Hey @julianocosta89, this is an optimization in the Java provider that can disabled. Basically, a value can be cached if it's "static" and the provider is able to establish a persistent connection with flagd to bust the cache when the flag configuration changes.

Here are the configuration options that are available. https://github.com/open-feature/java-sdk-contrib/tree/main/providers/flagd#configuration-options

Oh, cool TIL.

Any reason the behaviour is different for Java and Kotlin?
Both are using the same dependencies and none have the cache disabled.

@beeme1mr
Copy link
Member

There are a few reasons why a value wouldn't be cashed. For example, if there's a targeting rule in the flag configuration. I'll take a look tomorrow.

@beeme1mr
Copy link
Member

Hey @julianocosta89, can you confirm that the kafkaQueueProblems doesn't have a targeting rule defined in your environment? That would immediately make it ineligible to be cached. I didn't notice any obvious reasons this isn't being cached, but I didn't have a chance to investigate in depth.

@julianocosta89
Copy link
Author

this is what I currently have for kafkaQueueProblems:

"kafkaQueueProblems": {
  "description": "Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike",
  "state": "ENABLED",
  "variants": {
    "on": 100,
    "off": 0
  },
  "defaultVariant": "off"
},

And this is the kotlin snippet:

fun main() {
    val options = FlagdOptions.builder()
    .withGlobalTelemetry(true)
    .build()
    val flagdProvider = FlagdProvider(options)
    OpenFeatureAPI.getInstance().setProvider(flagdProvider)

    val props = Properties()
    props[KEY_DESERIALIZER_CLASS_CONFIG] = StringDeserializer::class.java.name
    props[VALUE_DESERIALIZER_CLASS_CONFIG] = ByteArrayDeserializer::class.java.name
    props[GROUP_ID_CONFIG] = groupID
    val bootstrapServers = System.getenv("KAFKA_SERVICE_ADDR")
    if (bootstrapServers == null) {
        println("KAFKA_SERVICE_ADDR is not supplied")
        exitProcess(1)
    }
    props[BOOTSTRAP_SERVERS_CONFIG] = bootstrapServers
    val consumer = KafkaConsumer<String, ByteArray>(props).apply {
        subscribe(listOf(topic))
    }

    var totalCount = 0L
    consumer.use {
        while (true) {
            totalCount = consumer
                .poll(ofMillis(100))
                .fold(totalCount) { accumulator, record ->
                    val newCount = accumulator + 1
                    if (getFeatureFlagValue("kafkaQueueProblems") > 0) {
                        logger.info("FeatureFlag 'kafkaQueueProblems' is enabled, sleeping 1 second")
                        Thread.sleep(1000)
                    }
                    val orders = OrderResult.parseFrom(record.value())
                    logger.info("Consumed record with orderId: ${orders.orderId}, and updated total count to: $newCount")
                    newCount
                }
        }
    }
}

/**
* Retrieves the status of a feature flag from the Feature Flag service.
*
* @param ff The name of the feature flag to retrieve.
* @return `true` if the feature flag is enabled, `false` otherwise or in case of errors.
*/
fun getFeatureFlagValue(ff: String): Int {
    val client = OpenFeatureAPI.getInstance().client
    // TODO: Plumb the actual session ID from the frontend via baggage?
    val uuid = UUID.randomUUID()

    val clientAttrs = mutableMapOf<String, Value>()
    clientAttrs["session"] = Value(uuid.toString())
    client.evaluationContext = ImmutableContext(clientAttrs)
    val intValue = client.getIntegerValue(ff, 0)
    return intValue
}

@toddbaert
Copy link
Member

toddbaert commented Aug 13, 2024

@julianocosta89 could it be that you are evaluating the same flag in quick succession, and evaluations are interleaving? The flag value can't be cached until the RPC completes, so if 2 evaluations happen quickly, they will both cause RPCs since theirs no previously resolved value in the cache (yet)..

Besides some transient connection issue, I can't see another reason. If the reason field is coming back as STATIC, subsequent evaluations should result in cache hits with your configuration 🤔

EDIT:

Oh, one more reason could be that the provider hasn't fully connected its stream before which no caching happens... this is plausible since you're using setProvider (which doesn't block awaiting a connection) instead of setProviderAndWait which blocks until the provider is fully initialized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Needs Triage This issue needs to be investigated by a maintainer
Projects
None yet
Development

No branches or pull requests

4 participants