Use qs2 serialization #201

wlandau · 2024-10-31T20:05:40Z

wlandau
Oct 31, 2024

As we have discussed, crew cannot set a serialization method through mirai because the mechanism relies on everywhere(). As a compromise, would it be okay to have the option to disable built-in serialization and leave it to the user? That would allow crew to leverage the compression capabilities of qs::qserialize()/qs::qdeserialize() without incurring duplicate overhead from serializing twice.

shikokuchuo · 2024-11-01T11:10:15Z

shikokuchuo
Nov 1, 2024
Maintainer

I'm copying @traversc as we need to have a joined-up discussion here.

The only meaningful integration I can see is at the C level with nanonext, if qs2 can be used as a drop-in replacement for R serialization.

@traversc, for qs2 do you plan / consider to expose functions with the following signatures (or equivalent) by registering them as C callables?

void qs2_serialize(unsigned char *, size_t *, SEXP);
SEXP qs2_unserialize(unsigned char *, size_t);

For serialization, I'm thinking it could be 2-pass, where passing NULL will return the size required. But you might do all the memory management in-package so I won't pre-suppose anything.

It should be fairly straightforward to wrap your C++ functions with an extern "C" for these exports.

0 replies

traversc · 2024-11-01T16:40:02Z

traversc
Nov 1, 2024

A two pass would be pretty inefficient. If it was C++ returning a std::vector would make sense, what would be a C way of doing something like that? Returning a buffer that the caller would then need to manage?

unsigned char * qs2_serialize(size_t * size, SEXP) {
   *size = ...
   unsigned char * result = (unsigned char*)malloc(*size);
   return result;
}

0 replies

shikokuchuo · 2024-11-01T16:50:47Z

shikokuchuo
Nov 1, 2024
Maintainer

Yeah the 2-pass is a kind of C way of thinking of things. I think you should just do the serialization and store it in whatever C++ object or class.

Then have an extern "C" wrapper which returns the raw pointer and the size, which can be consumed by a C program. Don't worry about this step - I can have a look at it when you get there.

Just wanted to check conceptually you plan it to be a drop-in for R serialization?

0 replies

traversc · 2024-11-01T18:24:53Z

traversc
Nov 1, 2024

Yes, it should be a drop in replacement since I'm using R_serialize. The one difference is I'm not passing a callback function which I haven't had a use for.

Question for you, do you see a use for in-memory compression? Or is the data generally small enough that it doesn't matter? We can get on average ~8x compression with a some overhead, but that can be tuned based on zstd compression level.

0 replies

wlandau · 2024-11-01T19:40:13Z

wlandau
Nov 1, 2024
Author

Question for you, do you see a use for in-memory compression?

From my end, in-memory compression would be extremely valuable, and it's the reason why I am interested in this thread and qsbase/qs2#4

0 replies

traversc · 2024-11-04T07:20:08Z

traversc
Nov 4, 2024

Yeah the 2-pass is a kind of C way of thinking of things. I think you should just do the serialization and store it in whatever C++ object or class.

Then have an extern "C" wrapper which returns the raw pointer and the size, which can be consumed by a C program. Don't worry about this step - I can have a look at it when you get there.

Just wanted to check conceptually you plan it to be a drop-in for R serialization?

Can you have a look at the latest commit and example here? qsbase/qs2#4

0 replies

shikokuchuo · 2024-11-04T12:14:20Z

shikokuchuo
Nov 4, 2024
Maintainer

I haven't been able to run it yet, but it looks promising! Sorry I was not totally up to date - I thought qs2 was still an early stage proof of concept, didn't realise it had already reached CRAN. Will let you know once I've tested.

As for compression, yes it would be useful as creating the serialised object (buffer) necessarily creates a copy. So I think this would be useful even on the same machine. But regardless, minimising bytes transferred over the wire makes sense in a networked scenario e.g. HPC cluster.

0 replies

traversc · 2024-11-08T19:09:57Z

traversc
Nov 8, 2024

@shikokuchuo Have you had a chance to check it out? I am hoping to submit a CRAN update soon.

0 replies

shikokuchuo · 2024-11-08T19:47:55Z

shikokuchuo
Nov 8, 2024
Maintainer

@shikokuchuo Have you had a chance to check it out? I am hoping to submit a CRAN update soon.

Hi Travers, sorry I didn't get the chance today. I'll see if I have time over the weekend, otherwise please go ahead if you don't hear from me by Monday.

0 replies

shikokuchuo · 2025-01-05T20:05:53Z

shikokuchuo
Jan 5, 2025
Maintainer

Once the functionality is merged in nanonext, it's now easy to enable it via an argument to daemons() here in mirai. The new dispatcher architecture in mirai v2 makes it possible for daemons to be told to 'upgrade' to qs2 when they first connect.

0 replies

shikokuchuo · 2025-01-28T10:56:33Z

shikokuchuo
Jan 28, 2025
Maintainer

Just to update on this, I need to have a re-think of where this functionality naturally sits (long-term in R).

I've been introduced to the carrier package, and of course I also put together the sakura package. Perhaps there could be a marshalling/serialization package that combines all relevant functionality?

0 replies

traversc · 2025-01-28T20:19:53Z

traversc
Jan 28, 2025

The best place for these functionality would be in base R ;)

But given that is unlikely any time soon, I think it will be difficult (logistically and technically) to merge all possible functionality into a single comprehensive package.

Briefly looking over the carrier package, it is pure R so it's something you could import into your packages easily, right?

For the sakura package, I am much more interested in the functionality there and merging/porting/importing it to qs2. What do you think is the best way?

1 reply

shikokuchuo Feb 16, 2025
Maintainer

Hi @traversc apologies I missed your reply entirely - I just posted this: shikokuchuo/nanonext#71 (comment) and I came across this discussion again purely by chance!

If it's possible to integrate qs2 in the way I suggest, then down the line we could probably talk about porting across the sakura capabilities and then nanonext/mirai could delegate this all to qs2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use qs2 serialization #201

{{title}}

Replies: 12 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Use qs2 serialization #201

wlandau Oct 31, 2024

Replies: 12 comments · 1 reply

shikokuchuo Nov 1, 2024 Maintainer

traversc Nov 1, 2024

shikokuchuo Nov 1, 2024 Maintainer

traversc Nov 1, 2024

wlandau Nov 1, 2024 Author

traversc Nov 4, 2024

shikokuchuo Nov 4, 2024 Maintainer

traversc Nov 8, 2024

shikokuchuo Nov 8, 2024 Maintainer

shikokuchuo Jan 5, 2025 Maintainer

shikokuchuo Jan 28, 2025 Maintainer

traversc Jan 28, 2025

shikokuchuo Feb 16, 2025 Maintainer

wlandau
Oct 31, 2024

Replies: 12 comments 1 reply

shikokuchuo
Nov 1, 2024
Maintainer

traversc
Nov 1, 2024

shikokuchuo
Nov 1, 2024
Maintainer

traversc
Nov 1, 2024

wlandau
Nov 1, 2024
Author

traversc
Nov 4, 2024

shikokuchuo
Nov 4, 2024
Maintainer

traversc
Nov 8, 2024

shikokuchuo
Nov 8, 2024
Maintainer

shikokuchuo
Jan 5, 2025
Maintainer

shikokuchuo
Jan 28, 2025
Maintainer

traversc
Jan 28, 2025

shikokuchuo Feb 16, 2025
Maintainer