Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rqd] Fix non ASCII chars #1335

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 14 additions & 6 deletions rqd/rqd/rqcore.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,17 +318,13 @@ def runLinux(self):
else:
tempCommand += [self._createCommandFile(runFrame.command)]

if rqd.rqconstants.RQD_PREPEND_TIMESTAMP:
file_descriptor = subprocess.PIPE
else:
file_descriptor = self.rqlog
# pylint: disable=subprocess-popen-preexec-fn
frameInfo.forkedCommand = subprocess.Popen(tempCommand,
env=self.frameEnv,
cwd=self.rqCore.machine.getTempPath(),
stdin=subprocess.PIPE,
stdout=file_descriptor,
stderr=file_descriptor,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
preexec_fn=os.setsid)
finally:
Expand All @@ -343,6 +339,16 @@ def runLinux(self):

if rqd.rqconstants.RQD_PREPEND_TIMESTAMP:
pipe_to_file(frameInfo.forkedCommand.stdout, frameInfo.forkedCommand.stderr, self.rqlog)
else:
with open(self.rqlog, 'a') as f:
Copy link
Contributor

@lithorus lithorus Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this broke file logging. self.rqlog is a file object and not a string.
(When not using RQD_PREPEND_TIMESTAMP=True)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look and fix that soon!

Thanks!

Copy link
Contributor

@lithorus lithorus Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that #1401 changes alot of this logic anyway (if it gets merged)

Copy link
Collaborator Author

@ramonfigueiredo ramonfigueiredo Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lithorus

FYI ...

I changed the code to prevent crashes due to TypeError and fixed the broken file logging. Thanks!

This is the new PR: #1417 . PR was merged into the master!

Copy link
Collaborator Author

@ramonfigueiredo ramonfigueiredo Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lithorus ,

Let me know when your PR #1401 is ready for review. For now, it is on draft and some checks/pipelines are falling.

For sure it is a nice feature. Since Loki offers efficient, scalable log aggregation, cost savings, seamless Prometheus/Grafana integration, and strong community support. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will write some examples and documentation on how it's used after #1416 is merged (some of the loki stuff is not compatible with python 2.x)
Also, seeing the recent sentry support in cuebot, it would be interesting to also add support for that aswell.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks!

# Convert to ASCII while discarding characters that can not be encoded
for line in frameInfo.forkedCommand.stdout:
line = line.encode('ascii', 'ignore')
f.write(line.decode('ascii') + '\n')
for line in frameInfo.forkedCommand.stderr:
line = line.encode('ascii', 'ignore')
f.write(line.decode('ascii') + '\n')

returncode = frameInfo.forkedCommand.wait()

# Find exitStatus and exitSignal
Expand Down Expand Up @@ -1222,6 +1228,8 @@ def print_and_flush_ln(fd, last_timestamp):

remainder = lines[-1]
for line in lines[0:-1]:
# Convert to ASCII while discarding characters that can not be encoded
line = line.encode('ascii', 'ignore')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramonfigueiredo Thank you for the PR and the detailed write up. The detail makes it nice and easy to review with the proper context.

So, this error occurs when we intercept output in order to prepend a timestamp. What is logged to the file when RQD_PREPEND_TIMESTAMP is False?

My sense is that the output which is logged when RQD_PREPEND_TIMESTAMP is True vs False should be as similar as possible, aside from the timestamp of course.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @bcipriano ,

Sorry for the delay in answering your question. Thank you for your thorough review and the positive feedback on the PR and write-up.

Regarding your comment, the UnicodeEncodeError occurs when we intercept output to prepend a timestamp. When RQD_PREPEND_TIMESTAMP is False, the output is logged directly without any modification or encoding to ASCII, preserving the original characters, including any non-ASCII ones.

Here is how the output differs based on the value of RQD_PREPEND_TIMESTAMP:

  1. When RQD_PREPEND_TIMESTAMP is True:
  • We intercept the output to prepend a timestamp.
  • Non-ASCII characters are encoded to ASCII with the 'ignore' option, discarding any non-ASCII characters.
  • Example log: "[12:34:56] text here , caf" (non-ASCII characters are ignored).
  1. When RQD_PREPEND_TIMESTAMP is False:
  • The output is logged directly without any modification.
  • Non-ASCII characters are preserved.
  • Example log: "text here 영화, café" (non-ASCII characters are retained).

To ensure the outputs are as similar as possible, aside from the timestamp, I can adjust our approach to handle encoding more gracefully. Instead of discarding non-ASCII characters, I can consider other strategies, such as escaping them or converting them to a specific placeholder.

However, given the current solution, the primary goal was to prevent crashes due to UnicodeEncodeError by ignoring non-ASCII characters when RQD_PREPEND_TIMESTAMP is True (default option). This approach was chosen for simplicity and robustness.

If maintaining non-ASCII characters is critical, I can explore additional strategies to handle encoding more gracefully without discarding valuable information. I am open to suggestions and further discussion on the best approach to balance robustness and information retention.

Thank you again for your review and insights. I look forward to your feedback.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bcipriano

FYI ...

The updated logic to ensure consistent handling of non-ASCII characters in logs with RQD_PREPEND_TIMESTAMP = True or RQD_PREPEND_TIMESTAMP = False.

print("[%s] %s" % (curr_line_timestamp, line), file=outfile)
outfile.flush()
os.fsync(outfile)
Expand Down
Loading