-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Parse functionnality and some unit tests #3
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up! I've added some comments.
HexDump/HexDump_Parse.cs
Outdated
/// - includeOffset = true | ||
/// - includeAscii = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These documented restrictions don't appear to match the regular expression pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implenented in parse so we have same behaviour in both methodes
HexDump/HexDump_Parse.cs
Outdated
/// </summary> | ||
/// <param name="dump"></param> | ||
/// <returns></returns> | ||
public static byte[] Parse(string dump ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an FYI: This method allocates a lot of temporary memory on the heap. It is possible to write this without allocating anything other than the List<byte>
, though the parsing has to be more manual. I can provide more details if you're interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glad to see any improvement if you feel like it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether it's worth it depends upon the consumer's situation. If they're only parsing one short blob of text then it's probably fine as is. If they're running many operations concurrently or in an application that's sensitive to GC pauses, then the current implementation may be problematic. This is general library code, so I try to be as well behaved as possible as we cannot make many assumptions about the user's requirements.
At a high level, all of these can be avoided:
- A string per line of the input
- A
Replace
string per line - A string per hex character pair
- Enumerators (via Linq) per line
- StringReader per call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseLookup implementation seems the most promising one.
Will have a look at a state machine to implement special cases.
HexDump/HexDump_Parse.cs
Outdated
//00000000 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ | ||
|
||
var result = new List<byte>(); | ||
var lines = dump.Split(Environment.NewLine.ToCharArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Environment.NewLine
means you can get different behaviours on different machines for the same input. This kind of thing regularly breaks CI, for example. I'd rather see split explicitly on '\r'
and '\n'
and remove empty entries (though I'd rather not use String.Split
at all as it allocates an array and temporary strings, both of which can be avoided).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to use StringReader and remove the ToArray() in linq query.
HexDump/HexDump_Parse.cs
Outdated
|
||
int hexaWidth = (columnWidth * 3 * columnCount) + columnCount - 1 - 1; | ||
var _re = new Regex($"^{rio}(?<hexa>[0-9a-f\\s]{{{hexaWidth}}}){ria}$", | ||
RegexOptions.Compiled | RegexOptions.IgnoreCase); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid compiling the regex here, given its only used once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally right here, compiling the regex is worthless and take too much time.
HexDump/HexDump_Parse.cs
Outdated
/// <param name="includeOffset"></param> | ||
/// <param name="includeAscii"></param> | ||
/// <returns></returns> | ||
public static byte[] Parse(string dump, int columnWidth = 8, int columnCount = 2, bool includeOffset = true, bool includeAscii = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's possible to make parsing work without so many options?
00000000 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
What if the parser just looked for valid pairs of hex characters separated by whitespace?
- The offset would have to have more than two characters to be automatically excluded
- The ASCII section could contain eg
00 01 02
The second point is the hardest one to deal with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basic state machine will do the trick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the case:
00000000 30 31 20 30 32 30 30 33 20 30 34 20 30 35 20 30 01 02 03 04 05 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseStateMachine takes into account special cases like this one.
I'd be interested to see benchmark results. |
The only features complete are ParseRegex and ParseStateMachine. ParseStateMachine is the best so far. // * Summary * BenchmarkDotNet=v0.12.0, OS=macOS 10.15.4 (19E266) [Darwin 19.4.0] Runtime=.NET Core 3.1
// * Legends * // * Diagnostic Output - MemoryDiagnoser * // ***** BenchmarkRunner: End ***** |
Benchmark on Feature complete implementation only.. // * Summary * BenchmarkDotNet=v0.12.0, OS=macOS 10.15.4 (19E266) [Darwin 19.4.0] Runtime=.NET Core 3.1
// * Hints * // * Legends * // * Diagnostic Output - MemoryDiagnoser * |
This might be of interest