Skip to content

Commit

Permalink
Ensure all body content can have multiple types
Browse files Browse the repository at this point in the history
We've found that content with multiple types pops up in more places than
we originally thought.

- Use the new `BodyContent` parser for all extracted values in
  `PublishingApi::Content`
- Add new `manual_section` message fixture as a regression integration
  test (where we saw it failing)
  • Loading branch information
csutter committed Nov 24, 2023
1 parent 143f53d commit 0ae3744
Show file tree
Hide file tree
Showing 4 changed files with 218 additions and 3 deletions.
9 changes: 6 additions & 3 deletions app/models/concerns/publishing_api/content.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,18 @@ module Content

# Extracts a single string of indexable unstructured content from the document.
def content
values_from_json_paths = INDEXABLE_CONTENT_VALUES_JSON_PATHS.map { _1.on(document_hash) }
values_from_parts = document_hash.dig(:details, :parts)&.map do
values_from_json_paths = INDEXABLE_CONTENT_VALUES_JSON_PATHS.map do |item|
item.on(document_hash).map { |body| BodyContent.new(body).html_content }
end
values_from_parts = document_hash.dig(:details, :parts)&.map do |part|
# Add the part title as a heading to help the search model better understand the structure
# of the content
["<h1>#{_1[:title]}</h1>", BodyContent.new(_1[:body]).html_content]
["<h1>#{part[:title]}</h1>", BodyContent.new(part[:body]).html_content]
end

[*values_from_json_paths, *values_from_parts]
.flatten
.compact
.join(INDEXABLE_CONTENT_SEPARATOR)
.truncate_bytes(INDEXABLE_CONTENT_MAX_BYTE_SIZE)
end
Expand Down
Loading

0 comments on commit 0ae3744

Please sign in to comment.