Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Restore generate_gantt_chart functionality #3290

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

shnizzedy
Copy link
Member

@shnizzedy shnizzedy commented Jan 8, 2021

Summary

Fixes #2982. Maybe fixes #3527.

All tests pass locally. 8/13 jobs pass on Travis. The Travis failures seem unrelated to the changes in this PR.

List of changes proposed in this PR (pull-request)

Acknowledgment

  • (Mandatory) I acknowledge that this contribution will be available under the Apache 2 license.

@shnizzedy
Copy link
Member Author

As noted

[T]here is an issue with the number of threads being estimated by the callback, or the gantt chart creation script is pulling in the wrong numbers. Some of the nodes are reporting using 210 threads!

Originally posted by @ccraddock in FCP-INDI/C-PAC#1404 (comment)

I thought maybe runtime_threads was counting something different than I expected.

I see the profile uses cpu_percent for runtime_threads which returns a percentage of a CPU, so I think something like math.ceil(cpu_percent)/100 would be an estimate of the number of threads, but there's some disconnected code that looks like it collects the actual number of threads used (as opposed to percentage of 1 CPU).

Originally posted by @shnizzedy in FCP-INDI/C-PAC#1404 (comment)

I think estimating the number of threads (by dividing by cpu_percent 100 and rounding up) is good enough for what I'm trying to do.

callback.log.html screenshot

Originally posted by @shnizzedy in FCP-INDI/C-PAC#1404 (comment)

I think the issues of

  1. what runtime_threads is logging and
  2. whether the number of threads used by a node is recorded

are related to this PR and issue, but beyond the scope of these changes. C-PAC has its own callback function in which I'm dividing and rounding, so I made no changes regarding runtime_threads in Nipype.

@codecov
Copy link

codecov bot commented Jan 8, 2021

Codecov Report

Merging #3290 (933fad3) into master (47fe00b) will increase coverage by 3.87%.
The diff coverage is 68.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3290      +/-   ##
==========================================
+ Coverage   64.70%   68.57%   +3.87%     
==========================================
  Files         302      302              
  Lines       39869    48743    +8874     
  Branches     5288     7226    +1938     
==========================================
+ Hits        25796    33425    +7629     
- Misses      12984    14091    +1107     
- Partials     1089     1227     +138     
Flag Coverage Δ
unittests 65.01% <64.70%> (+0.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
nipype/utils/draw_gantt_chart.py 93.33% <68.42%> (+83.26%) ⬆️
nipype/utils/logger.py 81.60% <0.00%> (-3.01%) ⬇️
nipype/utils/onetime.py 81.81% <0.00%> (-2.80%) ⬇️
nipype/interfaces/niftyseg/label_fusion.py 55.42% <0.00%> (-1.72%) ⬇️
nipype/interfaces/diffusion_toolkit/dti.py 61.64% <0.00%> (-1.57%) ⬇️
nipype/pipeline/plugins/base.py 57.89% <0.00%> (-0.19%) ⬇️
nipype/algorithms/icc.py 57.53% <0.00%> (ø)
nipype/utils/docparse.py 52.21% <0.00%> (ø)
nipype/interfaces/fsl/utils.py 63.76% <0.00%> (ø)
nipype/interfaces/afni/__init__.py 100.00% <0.00%> (ø)
... and 119 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 47fe00b...933fad3. Read the comment docs.

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, though I don't have any experience with this bit of the code. Inclined to merge tomorrow unless someone complains.

@shnizzedy
Copy link
Member Author

My only hesitance is the potentially misleading runtime_threads ― maybe that should be fixed before restoring this functionality?

Copy link
Member

@mgxd mgxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor nits.

My only hesitance is the potentially misleading runtime_threads ― maybe that should be fixed before restoring this functionality?

I agree 👍

try:
all_res += float(event[resource])
except ValueError:
next
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
next
pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, good catch! I actually meant

25dd1fc

try:
all_res -= float(event[resource])
except ValueError:
next
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
next
pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, good catch! I actually meant

25dd1fc

nipype/pipeline/plugins/tests/test_callback.py Outdated Show resolved Hide resolved
nipype/utils/draw_gantt_chart.py Outdated Show resolved Hide resolved
shnizzedy and others added 3 commits April 1, 2021 16:03
Co-authored-by: Mathias Goncalves <[email protected]>
Co-authored-by: Mathias Goncalves <[email protected]>
@effigies
Copy link
Member

My only hesitance is the potentially misleading runtime_threads ― maybe that should be fixed before restoring this functionality?

I agree

Was this fixed? What needs doing?

@shnizzedy
Copy link
Member Author

Was this fixed? What needs doing?

I haven't fixed it (yet at least). The issue is that the chart uses runtime_threads from the callback log as a count of threads observed being used at runtime, but the value actually stored there is cpu_percent,

"runtime_threads": getattr(node.result.runtime, "cpu_percent", "N/A"),

a float representing the current process CPU utilization as a percentage

This leads to thread counts in the hundreds when they're expected to be in the ones, like Gantt chart screenshot with CPU percent in "Threads"

So I think the "threads" part of these charts should be changed before the chart functionality is restored, either

  • by updating the log to include an integer count of threads and use this value in the chart
  • change the column from threads to CPU percentage
  • something else?

@effigies
Copy link
Member

effigies commented May 6, 2021

Yeah, seems like we want something like:

if status_dict['runtime_threads'] != "N/A":
    status_dict['runtime_threads'] //= 100

@shnizzedy
Copy link
Member Author

An existing unit test does

assert (
int(result.runtime.cpu_percent / 100 + 0.2) == n_procs
), "wrong number of threads estimated"

which is similar to what we're doing for now in C-PAC:

if runtime_threads != 'N/A':
    runtime_threads = math.ceil(runtime_threads/100)

My concern is that, as I read

Note: the returned value can be > 100.0 in case of a process running multiple threads on different CPU cores.
Note: the returned value is explicitly not split evenly between all available CPUs (differently from psutil.cpu_percent()). This means that a busy loop process running on a system with 2 logical CPUs will be reported as having 100% CPU utilization instead of 50%. This was done in order to be consistent with top UNIX utility and also to make it easier to identify processes hogging CPU resources independently from the number of CPUs. It must be noted that taskmgr.exe on Windows does not behave like this (it would report 50% usage instead). To emulate Windows taskmgr.exe behavior you can do: p.cpu_percent() / psutil.cpu_count().

psutil documentation: Process.cpu_percent

this number can be a misleading estimate. For example, if a process is using 25% of each of 4 CPUs, I believe this would report 100%, which would reduce to 1 or 2 threads depending on how we're rounding up or not. I'd be happy to learn that either I'm misunderstanding the number or that the number is good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

generate_gantt_chart fails on logfile generate_gantt_chart fails to convert strings to datetime objects
3 participants