Skip to content

Support "collStats" command for Spark compatibility #45

@apkar

Description

@apkar

collStats command is used across many different tools and frameworks like MongoExpress and Spark. Some Spark jobs wouldn't even start without some stats in collStats. Following is the collStats response format.

{
  "ns" : <string>,
  "count" : <number>,
  "size" : <number>,
  "avgObjSize" : <number>,
  "storageSize" : <number>,
  "capped" : <boolean>,
  "max" : <number>,
  "maxSize" :  <number>,
  "wiredTiger" : {   
  },
  "nindexes" : <number>,         // number of indexes
  "totalIndexSize" : <number>,   // total index size in bytes
  "indexSizes" : {                // size of specific indexes in bytes
          "_id_" : <number>,
          "username" : <number>
  },
  // ...
  "ok" : <number>
}

We don't have to implement all the fields part of this issue.

count - This is important for Spark jobs to work reasonably well. We can use Atomic operations to maintain the count. A trivial implementation would just maintain a single counter, which would generate a hotkey. Considering write hotkeys are not as bad, and also Atomic operations don't cause any conflict ranges, this could be a reasonable immediate solution.

Metadata

Metadata

Assignees

Labels

In progressActively working on the issuecompatibilityEither missing or incompatible features causing driver or tool compatibility

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions