Skip to content

Commit

Permalink
Merge pull request #124 from alphagov/parse-all-body-content
Browse files Browse the repository at this point in the history
Ensure all body content can have multiple types
  • Loading branch information
csutter authored Nov 24, 2023
2 parents 72f46ea + 0ae3744 commit fc3d352
Show file tree
Hide file tree
Showing 4 changed files with 218 additions and 3 deletions.
9 changes: 6 additions & 3 deletions app/models/concerns/publishing_api/content.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,18 @@ module Content

# Extracts a single string of indexable unstructured content from the document.
def content
values_from_json_paths = INDEXABLE_CONTENT_VALUES_JSON_PATHS.map { _1.on(document_hash) }
values_from_parts = document_hash.dig(:details, :parts)&.map do
values_from_json_paths = INDEXABLE_CONTENT_VALUES_JSON_PATHS.map do |item|
item.on(document_hash).map { |body| BodyContent.new(body).html_content }
end
values_from_parts = document_hash.dig(:details, :parts)&.map do |part|
# Add the part title as a heading to help the search model better understand the structure
# of the content
["<h1>#{_1[:title]}</h1>", BodyContent.new(_1[:body]).html_content]
["<h1>#{part[:title]}</h1>", BodyContent.new(part[:body]).html_content]
end

[*values_from_json_paths, *values_from_parts]
.flatten
.compact
.join(INDEXABLE_CONTENT_SEPARATOR)
.truncate_bytes(INDEXABLE_CONTENT_MAX_BYTE_SIZE)
end
Expand Down
Loading

0 comments on commit fc3d352

Please sign in to comment.