-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tracking to data file type column names 2. #553
Closed
Closed
Changes from 5 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
2cb095b
Changed how paths, nodes, and indexes are handled
ptth222 ca12fe4
Added new test
ptth222 e50e65e
Name headers also tracked
ptth222 bc33a2f
Addressed comments in #553
ptth222 ed25daa
Merge branch 'issue-511' into fix-data-file-name-bug2
proccaserra ad22cae
Changed write based on comments in #553
ptth222 4a8b735
Changes to make tests work
ptth222 770db3c
Removed commented out code
ptth222 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,7 @@ | |
) | ||
from isatools.isatab.defaults import log | ||
from isatools.isatab.graph import _all_end_to_end_paths, _longest_path_and_attrs | ||
from isatools.model.utils import _build_paths_and_indexes | ||
from isatools.isatab.utils import ( | ||
get_comment_column, | ||
get_pv_columns, | ||
|
@@ -260,24 +261,21 @@ def flatten(current_list): | |
|
||
columns = [] | ||
|
||
# start_nodes, end_nodes = _get_start_end_nodes(a_graph) | ||
paths = _all_end_to_end_paths( | ||
a_graph, [x for x in a_graph.nodes() | ||
if isinstance(a_graph.indexes[x], Sample)]) | ||
paths, indexes = _build_paths_and_indexes(assay_obj.process_sequence) | ||
if len(paths) == 0: | ||
log.info("No paths found, skipping writing assay file") | ||
continue | ||
if _longest_path_and_attrs(paths, a_graph.indexes) is None: | ||
if _longest_path_and_attrs(paths, indexes) is None: | ||
raise IOError( | ||
"Could not find any valid end-to-end paths in assay graph") | ||
|
||
protocol_in_path_count = 0 | ||
for node_index in _longest_path_and_attrs(paths, a_graph.indexes): | ||
node = a_graph.indexes[node_index] | ||
output_label_in_path_counts = {} | ||
name_label_in_path_counts = {} | ||
for node_index in _longest_path_and_attrs(paths, indexes): | ||
node = indexes[node_index] | ||
if isinstance(node, Sample): | ||
olabel = "Sample Name" | ||
# olabel = "Sample Name.{}".format(sample_in_path_count) | ||
# sample_in_path_count += 1 | ||
columns.append(olabel) | ||
columns += flatten( | ||
map(lambda x: get_comment_column(olabel, x), | ||
|
@@ -313,22 +311,21 @@ def flatten(current_list): | |
oname_label = None | ||
|
||
if oname_label is not None: | ||
columns.append(oname_label) | ||
|
||
if node.executes_protocol.protocol_type.term.lower() in \ | ||
protocol_types_dict["nucleic acid hybridization"][SYNONYMS]: | ||
columns.append("Array Design REF") | ||
|
||
if oname_label not in name_label_in_path_counts: | ||
name_label_in_path_counts[oname_label] = 0 | ||
new_oname_label = oname_label + "." + str(name_label_in_path_counts[oname_label]) | ||
|
||
columns.append(new_oname_label) | ||
name_label_in_path_counts[oname_label] += 1 | ||
elif node.executes_protocol.protocol_type.term.lower() \ | ||
in protocol_types_dict["nucleic acid hybridization"][SYNONYMS]: | ||
columns.extend( | ||
["Hybridization Assay Name", | ||
"Array Design REF"]) | ||
columns += flatten( | ||
map(lambda x: get_comment_column(olabel, x), | ||
node.comments)) | ||
|
||
for output in [x for x in node.outputs if isinstance(x, DataFile)]: | ||
if output.label not in columns: | ||
columns.append(output.label) | ||
columns += flatten( | ||
map(lambda x: get_comment_column(output.label, x), | ||
output.comments)) | ||
elif isinstance(node, Material): | ||
olabel = node.type | ||
columns.append(olabel) | ||
|
@@ -340,7 +337,18 @@ def flatten(current_list): | |
node.comments)) | ||
|
||
elif isinstance(node, DataFile): | ||
pass # handled in process | ||
# pass # handled in process | ||
|
||
output_label = node.label | ||
if output_label not in output_label_in_path_counts: | ||
output_label_in_path_counts[output_label] = 0 | ||
new_output_label = output_label + "." + str(output_label_in_path_counts[output_label]) | ||
|
||
columns.append(new_output_label) | ||
output_label_in_path_counts[output_label] += 1 | ||
columns += flatten( | ||
map(lambda x: get_comment_column(new_output_label, x), | ||
node.comments)) | ||
|
||
omap = get_object_column_map(columns, columns) | ||
|
||
|
@@ -355,8 +363,10 @@ def pbar(x): | |
df_dict[k].extend([""]) | ||
|
||
protocol_in_path_count = 0 | ||
output_label_in_path_counts = {} | ||
name_label_in_path_counts = {} | ||
for node_index in path_: | ||
node = a_graph.indexes[node_index] | ||
node = indexes[node_index] | ||
if isinstance(node, Process): | ||
olabel = "Protocol REF.{}".format(protocol_in_path_count) | ||
protocol_in_path_count += 1 | ||
|
@@ -374,12 +384,19 @@ def pbar(x): | |
oname_label = None | ||
|
||
if oname_label is not None: | ||
df_dict[oname_label][-1] = node.name | ||
if oname_label not in name_label_in_path_counts: | ||
name_label_in_path_counts[oname_label] = 0 | ||
new_oname_label = oname_label + "." + str(name_label_in_path_counts[oname_label]) | ||
|
||
df_dict[new_oname_label][-1] = node.name | ||
name_label_in_path_counts[oname_label] += 1 | ||
elif node.executes_protocol.protocol_type.term.lower() in \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see comment above, same logic |
||
protocol_types_dict["nucleic acid hybridization"][SYNONYMS]: | ||
df_dict["Hybridization Assay Name"][-1] = \ | ||
node.name | ||
df_dict["Array Design REF"][-1] = \ | ||
node.array_design_ref | ||
|
||
if node.executes_protocol.protocol_type.term.lower() in \ | ||
protocol_types_dict["nucleic acid hybridization"][SYNONYMS]: | ||
df_dict["Array Design REF"][-1] = node.array_design_ref | ||
|
||
if node.date is not None: | ||
df_dict[olabel + ".Date"][-1] = node.date | ||
if node.performer is not None: | ||
|
@@ -391,23 +408,8 @@ def pbar(x): | |
colabel = "{0}.Comment[{1}]".format(olabel, co.name) | ||
df_dict[colabel][-1] = co.value | ||
|
||
for output in [x for x in node.outputs if isinstance(x, DataFile)]: | ||
output_by_type = [] | ||
delim = ";" | ||
olabel = output.label | ||
if output.label not in columns: | ||
columns.append(output.label) | ||
output_by_type.append(output.filename) | ||
df_dict[olabel][-1] = delim.join(map(str, output_by_type)) | ||
|
||
for co in output.comments: | ||
colabel = "{0}.Comment[{1}]".format(olabel, co.name) | ||
df_dict[colabel][-1] = co.value | ||
|
||
elif isinstance(node, Sample): | ||
olabel = "Sample Name" | ||
# olabel = "Sample Name.{}".format(sample_in_path_count) | ||
# sample_in_path_count += 1 | ||
df_dict[olabel][-1] = node.name | ||
for co in node.comments: | ||
colabel = "{0}.Comment[{1}]".format( | ||
|
@@ -434,7 +436,19 @@ def pbar(x): | |
df_dict[colabel][-1] = co.value | ||
|
||
elif isinstance(node, DataFile): | ||
pass # handled in process | ||
# pass # handled in process | ||
|
||
output_label = node.label | ||
if output_label not in output_label_in_path_counts: | ||
output_label_in_path_counts[output_label] = 0 | ||
new_output_label = output_label + "." + str(output_label_in_path_counts[output_label]) | ||
df_dict[new_output_label][-1] = node.filename | ||
output_label_in_path_counts[output_label] += 1 | ||
|
||
for co in node.comments: | ||
colabel = "{0}.Comment[{1}]".format( | ||
new_output_label, co.name) | ||
df_dict[colabel][-1] = co.value | ||
|
||
DF = DataFrame(columns=columns) | ||
DF = DF.from_dict(data=df_dict) | ||
|
@@ -482,6 +496,11 @@ def pbar(x): | |
columns[i] = "Protocol REF" | ||
elif "." in col: | ||
columns[i] = col[:col.rindex(".")] | ||
else: | ||
for output_label in output_label_in_path_counts: | ||
if output_label in col: | ||
columns[i] = output_label | ||
break | ||
|
||
log.debug("Rendered {} paths".format(len(DF.index))) | ||
if len(DF.index) > 1: | ||
|
@@ -521,8 +540,6 @@ def write_value_columns(df_dict, label, x): | |
elif x.unit.term_source.name: | ||
df_dict[label + ".Unit.Term Source REF"][-1] = x.unit.term_source.name | ||
|
||
# df_dict[label + ".Unit.Term Source REF"][-1] = \ | ||
# x.unit.term_source.name if x.unit.term_source else "" | ||
df_dict[label + ".Unit.Term Accession Number"][-1] = \ | ||
x.unit.term_accession | ||
else: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptth222: doing the code review, and trying to merge, caused 2 tests to fail.
There are several issues we need to discuss but the PR can not be merged as is:
elif
at line 320Hybridization Assay Name
,Array Design REF
are appended with .0 when there is one occurrence only. this prevents the df_dict to retrieve the right key, raising aKeyError
. We suggest a first pass to count the number of headers and only append the process number when there is more than one.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why I made that an "elif". I think It's been too long and I can't remember. I found a dataset that uses "nucleic acid hybridization" and used that to test with, so now it should work. I'm not sure what you are talking about with the KeyError. If you have a specific dataset to illustrate that would be helpful.