Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line numbers #647

Open
wants to merge 53 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
a57ae4a
Create test_line_numbers.py
acoleman2000 Nov 21, 2022
4b23d8d
adding tests
acoleman2000 Jan 9, 2023
2e584ac
updating save method within python_codegen_support.py
acoleman2000 Jan 9, 2023
b7f22de
adding helper methods
acoleman2000 Jan 9, 2023
21659d2
adding additional helper method
acoleman2000 Jan 9, 2023
f68e74a
fixing bugs in helper method
acoleman2000 Jan 9, 2023
2c26bfa
updating python codegen_support.py and python_codegen.py
acoleman2000 Jan 10, 2023
e0ee3f4
updating files
acoleman2000 Jan 11, 2023
8b31a59
updating test
acoleman2000 Jan 12, 2023
6d8bce9
updating files and adding test cwl files
acoleman2000 Jan 13, 2023
3150b2f
Merge branch 'common-workflow-language:main' into line_numbers
acoleman2000 Jan 13, 2023
3143595
Fixing bug with updating sub-docs and non-kv values getting added to …
acoleman2000 Jan 17, 2023
9919bc4
Merge branch 'line_numbers' of https://github.com/acoleman2000/schema…
acoleman2000 Jan 17, 2023
3b80801
updating CommentedSeq lc update
acoleman2000 Jan 17, 2023
8152f0c
updating CommentedSeq lc data
acoleman2000 Jan 17, 2023
6ec8730
Updating metaschema.py
acoleman2000 Jan 17, 2023
149b1ba
updating type -> asinstance and bug fix in save
acoleman2000 Jan 17, 2023
16996bb
Adding doc = copy.copy(doc) before removing values
acoleman2000 Jan 17, 2023
0522ce8
removing typecheck for key in val
acoleman2000 Jan 17, 2023
6f62fb8
running make cleanup
acoleman2000 Jan 18, 2023
ba2dd90
updating metaschema.py
acoleman2000 Jan 18, 2023
2dea597
Fixing type warning for doc
acoleman2000 Jan 23, 2023
74b23d0
working on test
acoleman2000 Jan 23, 2023
8575f3f
Fixing issue with type hints and indentation of setting global variable
acoleman2000 Jan 23, 2023
a087833
adding metaschema.py
acoleman2000 Jan 23, 2023
7bfac7c
fix type error
acoleman2000 Jan 23, 2023
06fc513
fix type error
acoleman2000 Jan 23, 2023
5fd6ca1
Merge branch 'common-workflow-language:main' into line_numbers
acoleman2000 Jan 25, 2023
41d406d
updates to codegen
acoleman2000 Mar 28, 2023
27c7314
Merge branch 'line_numbers' of https://github.com/acoleman2000/schema…
acoleman2000 May 2, 2023
f4e098b
Updating python_codegen and python_codegen_support for cleaner logic …
acoleman2000 May 5, 2023
8263fdb
updating for consistent line numbers
acoleman2000 May 8, 2023
add86c6
adding files for line number tests
acoleman2000 May 11, 2023
625a3a5
adding cwl python codegen files for tests and having them be ignored …
acoleman2000 May 11, 2023
6f544e9
updating python codegen/codegen_support, metaschema, and tests.
acoleman2000 May 11, 2023
4178c78
Merge branch 'main' into line_numbers
acoleman2000 May 11, 2023
74e3247
running make clean-up
acoleman2000 May 11, 2023
c9d35a2
Merge branch 'line_numbers' of https://github.com/acoleman2000/schema…
acoleman2000 May 11, 2023
752dbab
trying to pass tox tests
acoleman2000 May 15, 2023
5d198ee
updating to remove inserted_line_info from global variable
acoleman2000 May 15, 2023
160f559
updating cwl codegen filesfor updated codegen
acoleman2000 May 15, 2023
bdd5c04
Updating codegen to support shifting down of text
acoleman2000 Jun 5, 2023
cc76eb9
Updating metaschema.py and updating to pass lint
acoleman2000 Jun 5, 2023
f85ed3c
running make cleanup
acoleman2000 Jun 5, 2023
3d61e55
updating Makefile to properly exclude cwl files
acoleman2000 Jun 8, 2023
63da121
Trying to pass metaschema up to date test
acoleman2000 Jun 8, 2023
ba8be89
trying alternate style of loading test files in
acoleman2000 Jun 9, 2023
d84a8bd
Merge branch 'main' into line_numbers
acoleman2000 Jun 14, 2023
be53207
Bogus commit to re-run testS
acoleman2000 Jun 15, 2023
5b10422
Merge branch 'main' into line_numbers
Nov 9, 2023
93406dd
Updating line numbers tests to use generated cwl files.
Nov 14, 2023
154af86
Removing static cwl files.
Nov 14, 2023
3afd4b0
Merge branch 'main' into line_numbers
acoleman2000 Nov 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,496 changes: 1,379 additions & 117 deletions schema_salad/metaschema.py

Large diffs are not rendered by default.

43 changes: 37 additions & 6 deletions schema_salad/python_codegen.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,10 +274,29 @@ def fromDoc(
self.serializer.write(
"""
def save(
self, top: bool = False, base_url: str = "", relative_uris: bool = True
) -> Dict[str, Any]:
r: Dict[str, Any] = {}

self, top: bool = False, base_url: str = "", relative_uris: bool = True, keys: Optional[List[Any]] = None
) -> CommentedMap:
if keys is None:
keys = []
r = CommentedMap()
doc = doc_line_info
for key in keys:

if isinstance(doc, CommentedMap):
doc = doc.get(key)
elif isinstance(doc, (CommentedSeq, list)) and isinstance(key, int):
if key < len(doc):
doc = doc[key]
else:
doc = None
else:
doc = None
break
if doc is not None:
r._yaml_set_line_col(doc.lc.line, doc.lc.col)
line_numbers = get_line_numbers(doc)
max_len = get_max_line_num(doc)
cols: Dict[int, int] = {}
if relative_uris:
for ef in self.extension_fields:
r[prefix_url(ef, self.loadingOptions.vocab)] = self.extension_fields[ef]
Expand All @@ -301,6 +320,7 @@ def save(
self.serializer.write(
"""
r["class"] = "{class_}"
max_len = add_kv(old_doc=doc, new_doc=r, line_numbers=line_numbers, key="class", val=r.get("class"), max_len=max_len, cols=cols)
""".format(
class_=classname
)
Expand Down Expand Up @@ -395,6 +415,7 @@ def type_loader(
sub_names: List[str] = list(
dict.fromkeys([self.type_loader(i).name for i in type_declaration])
)

return self.declare_type(
TypeDef(
"union_of_{}".format("_or_".join(sub_names)),
Expand Down Expand Up @@ -566,12 +587,15 @@ def declare_field(
if self.{safename} is not None:
u = save_relative_uri(self.{safename}, {baseurl}, {scoped_id}, {ref_scope}, relative_uris)
r["{fieldname}"] = u
max_len = add_kv(old_doc = doc, new_doc = r, line_numbers = line_numbers, key = "{key_1}", val = r.get("{key_2}"), max_len = max_len, cols = cols)
""".format(
safename=self.safe_name(name),
fieldname=shortname(name).strip(),
baseurl=baseurl,
scoped_id=fieldtype.scoped_id,
ref_scope=fieldtype.ref_scope,
key_1=self.safe_name(name),
key_2=self.safe_name(name),
),
8,
)
Expand All @@ -581,9 +605,16 @@ def declare_field(
fmt(
"""
if self.{safename} is not None:
r["{fieldname}"] = save(
self.{safename}, top=False, base_url={baseurl}, relative_uris=relative_uris
saved_val = save(
self.{safename}, top=False, base_url={baseurl}, relative_uris=relative_uris, keys = keys + ["{fieldname}"]
)

if type(saved_val) == list:
if len(saved_val) == 1: # If the returned value is a list of size 1, just save the value in the list
saved_val = saved_val[0]
r["{fieldname}"] = saved_val

max_len = add_kv(old_doc = doc, new_doc = r, line_numbers = line_numbers, key = "{fieldname}", val = r.get("{fieldname}"), max_len = max_len, cols = cols)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have r, key and val together already, I would consider moving the assignment of r["{fieldname}"] = saved_val into add_kv() so the add_kv() method is responsible for setting both the value and metadata of the entry at the same time.

""".format(
safename=self.safe_name(name),
fieldname=shortname(name),
Expand Down
179 changes: 162 additions & 17 deletions schema_salad/python_codegen_support.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

from rdflib import Graph
from rdflib.plugins.parsers.notation3 import BadSyntax
from ruamel.yaml.comments import CommentedMap
from ruamel.yaml.comments import CommentedMap, CommentedSeq

from schema_salad.exceptions import SchemaSaladException, ValidationException
from schema_salad.fetcher import DefaultFetcher, Fetcher, MemoryCachingFetcher
Expand All @@ -43,6 +43,9 @@
IdxType = MutableMapping[str, Tuple[Any, "LoadingOptions"]]


doc_line_info = CommentedMap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doc_line_info being a global variable, how about having save() take LoadingOptions and using original_doc ?



class LoadingOptions:
idx: IdxType
fileuri: Optional[str]
Expand Down Expand Up @@ -203,8 +206,12 @@ def fromDoc(

@abstractmethod
def save(
self, top: bool = False, base_url: str = "", relative_uris: bool = True
) -> Dict[str, Any]:
self,
top: bool = False,
base_url: str = "",
relative_uris: bool = True,
keys: Optional[List[Any]] = None,
) -> CommentedMap:
"""Convert this object to a JSON/YAML friendly dictionary."""


Expand Down Expand Up @@ -238,26 +245,154 @@ def load_field(val, fieldtype, baseuri, loadingOptions):
]


def add_kv(
old_doc: CommentedMap,
new_doc: CommentedMap,
line_numbers: Dict[Any, Dict[str, int]],
key: str,
val: Any,
max_len: int,
cols: Dict[int, int],
) -> int:
"""Add key value pair into Commented Map.

Function to add key value pair into new CommentedMap given old CommentedMap, line_numbers for each key/val pair in the old CommentedMap,
key/val pair to insert, max_line of the old CommentedMap, and max col value taken for each line.
"""
if key in line_numbers: # If the key to insert is in the original CommentedMap
new_doc.lc.add_kv_line_col(key, old_doc.lc.data[key])
elif isinstance(val, (int, float, bool, str)): # If the value is hashable
if val in line_numbers: # If the value is in the original CommentedMap
line = line_numbers[val]["line"]
if line in cols:
col = max(line_numbers[val]["col"], cols[line])
else:
col = line_numbers[val]["col"]
new_doc.lc.add_kv_line_col(key, [line, col, line, col + len(key) + 2])
cols[line] = col + len("id") + 2
else: # If neither the key or value is in the original CommentedMap (or value is not hashable)
new_doc.lc.add_kv_line_col(key, [max_len, 0, max_len, len(key) + 2])
max_len += 1
else: # If neither the key or value is in the original CommentedMap (or value is not hashable)
new_doc.lc.add_kv_line_col(key, [max_len, 0, max_len, len(key) + 2])
max_len += 1
return max_len


def get_line_numbers(doc: CommentedMap) -> Dict[Any, Dict[str, int]]:
"""Get line numbers for kv pairs in CommentedMap.

For each key/value pair in a CommentedMap, save the line/col info into a dictionary,
only save value info if value is hashable.
"""
line_numbers: Dict[Any, Dict[str, int]] = {}
if isinstance(doc, dict) or doc is None:
return {}
for key, value in doc.lc.data.items():
line_numbers[key] = {}

line_numbers[key]["line"] = doc.lc.data[key][0]
line_numbers[key]["col"] = doc.lc.data[key][1]
if isinstance(value, (int, float, bool, str)):
line_numbers[value] = {}
line_numbers[value]["line"] = doc.lc.data[key][2]
line_numbers[value]["col"] = doc.lc.data[key][3]
return line_numbers


def get_max_line_num(doc: CommentedMap) -> int:
"""Get the max line number for a CommentedMap.

Iterate through the the key with the highest line number until you reach a non-CommentedMap value or empty CommentedMap.
"""
max_line = 0
max_key = ""
cur = doc
while isinstance(cur, CommentedMap) and len(cur) > 0:
for key in cur.lc.data.keys():
if cur.lc.data[key][2] >= max_line:
max_line = cur.lc.data[key][2]
max_key = key
cur = cur[max_key]
return max_line + 1


def save(
val: Any,
top: bool = True,
base_url: str = "",
relative_uris: bool = True,
keys: Optional[List[Any]] = None,
) -> save_type:
"""Save a val of any type.

Recursively calls save method from class if val is of type Saveable. Otherwise, saves val to CommentedMap or CommentedSeq
"""
if keys is None:
keys = []
doc = doc_line_info
for key in keys:
if isinstance(doc, CommentedMap):
doc = doc.get(key)
elif isinstance(doc, (CommentedSeq, list)) and isinstance(key, int):
if key < len(doc):
doc = doc[key]
else:
doc = None
else:
doc = None
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some discussion about what's going on here. It looks like you're using "keys" to find a path through the original document, to find the leaf node that has the line number info we want.

What happens when you have a field with mapSubject and it's been converted (internally) from a dict to a list? In this case it is intentional for save() to emit the normalized form, which is the list form, but that may or may not correspond to the original document, depending on the original document used the list form or the dict form.


if isinstance(val, Saveable):
return val.save(top=top, base_url=base_url, relative_uris=relative_uris)
return val.save(
top=top, base_url=base_url, relative_uris=relative_uris, keys=keys
)
if isinstance(val, MutableSequence):
return [
save(v, top=False, base_url=base_url, relative_uris=relative_uris)
for v in val
]
r = CommentedSeq()
r.lc.data = {}
for i in range(0, len(val)):
new_keys = keys
if doc:
if i in doc:
r.lc.data[i] = doc.lc.data[i]
new_keys.append(i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append is a destructive modification, so appending to new_keys is also modifying the contents of keys which is probably not what you intended.

r.append(
save(
val[i],
top=False,
base_url=base_url,
relative_uris=relative_uris,
keys=new_keys,
)
)
return r
# return [
# save(v, top=False, base_url=base_url, relative_uris=relative_uris)
# for v in val
# ]
if isinstance(val, MutableMapping):
newdict = {}
newdict = CommentedMap()
new_keys = keys
for key in val:
if doc:
if key in doc:
newdict.lc.add_kv_line_col(key, doc.lc.data[key])
new_keys.append(key)

newdict[key] = save(
val[key], top=False, base_url=base_url, relative_uris=relative_uris
val[key],
top=False,
base_url=base_url,
relative_uris=relative_uris,
keys=new_keys,
)
return newdict
# newdict = {}
# for key in val:
# newdict[key] = save(
# val[key], top=False, base_url=base_url, relative_uris=relative_uris
# )
# return newdict
tetron marked this conversation as resolved.
Show resolved Hide resolved
if val is None or isinstance(val, (int, float, bool, str)):
return val
raise Exception("Not Saveable: %s" % type(val))
Expand Down Expand Up @@ -697,7 +832,7 @@ def load(self, doc, baseuri, loadingOptions, docRoot=None):

def _document_load(
loader: _Loader,
doc: Union[str, MutableMapping[str, Any], MutableSequence[Any]],
doc: Union[CommentedMap, str, MutableMapping[str, Any], MutableSequence[Any]],
baseuri: str,
loadingOptions: LoadingOptions,
addl_metadata_fields: Optional[MutableSequence[str]] = None,
Expand Down Expand Up @@ -729,11 +864,22 @@ def _document_load(
addl_metadata=addl_metadata,
)

doc = {
k: v
for k, v in doc.items()
if k not in ("$namespaces", "$schemas", "$base")
}
# doc = {
# k: v
# for k, v in doc.items()
# if k not in ("$namespaces", "$schemas", "$base")
# }
tetron marked this conversation as resolved.
Show resolved Hide resolved
doc = copy.copy(doc)
if "$namespaces" in doc:
doc.pop("$namespaces")
if "$schemas" in doc:
doc.pop("$schemas")
if "$base" in doc:
doc.pop("$base")
tetron marked this conversation as resolved.
Show resolved Hide resolved

if isinstance(doc, CommentedMap):
global doc_line_info
doc_line_info = doc

if "$graph" in doc:
loadingOptions.idx[baseuri] = (
Expand All @@ -750,7 +896,6 @@ def _document_load(
loadingOptions.idx[docuri] = loadingOptions.idx[baseuri]

return loadingOptions.idx[baseuri]

if isinstance(doc, MutableSequence):
loadingOptions.idx[baseuri] = (
loader.load(doc, baseuri, loadingOptions),
Expand Down
26 changes: 26 additions & 0 deletions schema_salad/tests/count-lines6-wf_v1_0.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env cwl-runner
class: Workflow
cwlVersion: v1.0

requirements:
- class: ScatterFeatureRequirement
- class: MultipleInputFeatureRequirement

inputs:
file1: File[]
file2: File[]

outputs:
count_output:
type: int
outputSource: step1/output

steps:
step1:
run: wc3-tool_v1_0.cwl
scatter: file1
in:
file1:
source: [file1, file2]
linkMerge: merge_nested
out: [output]
26 changes: 26 additions & 0 deletions schema_salad/tests/count-lines6-wf_v1_1.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env cwl-runner
class: Workflow
cwlVersion: v1.1

requirements:
- class: ScatterFeatureRequirement
- class: MultipleInputFeatureRequirement

inputs:
file1: File[]
file2: File[]

outputs:
count_output:
type: int
outputSource: step1/output

steps:
step1:
run: wc3-tool_v1_1.cwl
scatter: file1
in:
file1:
source: [file1, file2]
linkMerge: merge_nested
out: [output]
Loading