-
Notifications
You must be signed in to change notification settings - Fork 17
introduce ndarr support #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
questdb-rs/src/ingress/mod.rs
Outdated
let index = self.output.len(); | ||
let reserve_size = reserve_size.checked_add(index).ok_or(error::fmt!( | ||
ArrayViewError, | ||
"Array total elem size overflow" | ||
))?; | ||
self.output | ||
.try_reserve(reserve_size) | ||
.map_err(|_| error::fmt!(BufferOutOfMemory, "Buffer out of memory"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not how reserve works: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve
You pass in the argument for additional required size, not the total.
Also try_reserve
is completely pointless here since we can end up with large memory pressure already: A regular reserve
should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it
questdb-rs/src/ingress/mod.rs
Outdated
unsafe { self.output.set_len(reserve_size) } | ||
|
||
// ndarr data | ||
view.write_row_major_buf(&mut self.output[index..]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security!
The memory would most likely not be initialized, I'll need to think about this in more depth as it could be an attack vector in the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.. there's certainly no attack vector that I can think of, but it is easy for an API to write too little data and the we'd end up with junk data in the db.
This bit I think we ought to do better.
I suggest:
// NB: I haven't run any of this, my code is probably rather broken :-)
let output_len = self.output.len();
let writable = unsafe {
std::slice::from_raw_parts_mut(
self.output.ptr_mut().add(output_len),
array_buf_size)
};
let mut cursor = Cursor::new(writable);
if let Err(..) = view.write_row_major(&mut cursor) {
... add context to the error and return `Err` ...
}
if cursor.position() != (array_buf_size as u64) {
return Err(... tell the user that their array write op was the wrong size ...);
}
// Update last so in case of errors we haven't grown the buffer.
unsafe { self.output.set_len(output_len + array_buf_size) };
It should be logically very similar, one extra check and if I'm not wrong (I could be) the cursor should not perform an extra copy, but this last detail should probably be triple-checked by reading the standard lib impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to mention.. the trait method would then have a different signature:
fn write_row_major<W: std::io::Write>(&self, dest: &mut W) -> std::io::Result<()>;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it
questdb-rs/src/ingress/ndarr.rs
Outdated
@@ -0,0 +1,133 @@ | |||
pub(crate) const MAX_DIMS: usize = 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this top-level in questb::ingress
and make it public, then add a doc to it. It might be useful to callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
questdb-rs/src/error.rs
Outdated
/// Buffer outOfMemory. | ||
BufferOutOfMemory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
#[repr(u8)] | ||
#[derive(Debug, PartialEq)] | ||
pub enum ElemDataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this come from?
Is there a spec document that maps the supported datatypes?
If there's a format document link to it, otherwise let's write a spec for it and link to it.
This enum here is just too confusing and too error-prone otherwise.
Each entry should also be explicit about its value, e.g. Boolean = 1
, Byte = 2
.. etc. This is to quickly be able to cross-reference a format spec and avoid the enum getting accidentally reordered and breaking everything.
Additionally, why is there an Undefined
entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS, do we really intend to support all of these datatypes as binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current version only supports double. We have plans to support more data types in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this list should only contain double :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done/
questdb-rs/src/ingress/mod.rs
Outdated
@@ -24,8 +24,8 @@ | |||
|
|||
#![doc = include_str!("mod.md")] | |||
|
|||
pub use self::ndarr::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The *
risks leaking too many types.
Instead just pick the ones you explicitly are sure you want in the public API.
questdb-rs/src/tests/mock.rs
Outdated
self.msgs = accum; | ||
Ok(self.msgs.len()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
The previous code was there to handle cases where multiple messages are received in a single chunk. I don't think that possibility goes away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In binary format, the binary data might inherently contain characters that would normally separate rows (\n). Distinguishing rows in this case can be achieved through further parsing of the binary data. However, implementing this is relatively complex and would require supporting binary format parsing in mock.rs. Therefore, for convenience, the output in the test scenario has been changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but ignoring the problem is not a solution :-)
This makes the tests frail and prone to occasionally fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this make the tests more frail? Compared to the previous testing method, this only changes the returned number of msgs to the returned number of bytes. If network uncertainties cause the number of received bytes to be uncertain here, that issue existed previously. IMHO, this doesn't makes the problem any worse.
@@ -29,6 +29,9 @@ mod http; | |||
mod mock; | |||
mod sender; | |||
|
|||
#[cfg(feature = "ndarray")] | |||
mod ndarr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be tests for the trait directly as well, even without ndarray
enabled.
This is to be sure that we never tie the ndarray
feature to the ability to write arrays.
The tests that check the NdArrayView
trait should also check what happens when the trait does nothing or less data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another test to write is if the trait gives back the wrong number of dimensions.
template <typename T, size_t N> | ||
line_sender_buffer& column( | ||
column_name_view name, | ||
const std::vector<uint32_t>& shape, | ||
const std::array<T, N>& data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to come up with something that's a bit more idiomatic C++.
Let's park the C++ API for now until we have everything else nailed down.
questdb-rs/src/ingress/ndarr.rs
Outdated
} | ||
} | ||
|
||
fn write_row_major_buf(&self, buf: &mut [u8]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation here is incomplete.
It should handle cases where the strides are non-row-major standard.
The current code will serialize the same buffer for both array A and its transpose A'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth having a bunch of independent unit tests (without constructing full buffers) that just test this single trait method for different ndarrays of different shapes and strides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation here is incomplete.
It should handle cases where the strides are non-row-major standard.
The current code will serialize the same buffer for both array A and its transpose A'.
These cases are handled here, with corresponding tests (test_2d_non_contiguous_buffer
and test_strided_layout
). When the array
is non-row-major, view.as_slice()
returns None, and the array's iterator(have considerd all kinds of layouts inner) is used to copy individual elements.
Or did I miss something?
# Conflicts: # questdb-rs/Cargo.toml
introduce ndarr support