Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add flat gdoc components table #4237

Merged
merged 12 commits into from
Dec 13, 2024
Merged

✨ Add flat gdoc components table #4237

merged 12 commits into from
Dec 13, 2024

Conversation

danyx23
Copy link
Contributor

@danyx23 danyx23 commented Dec 1, 2024

This PR adds a new table that contains an unfolded, flat list of all gdocs components in all gdocs.

For each gdoc, the tree in $.content.body is iterated, a copy of the content is made without any children and the children are recursed into. Span arrays are converted to plain text.

A new script, reconstructPostsGdocsComponents fills the db initially. When saving gdocs, the components for this gdoc are updated in the new posts_gdocs_components table.

To test this, run the migration and then the reconstructPostsGdocsComponents script. Then query it, e.g.:

select * from posts_gdocs_components where gdocId = 'SOME-ID'
select * from posts_gdocs_components where config ->> '$.type' = 'image'

select gdocId, count(*) 
from posts_gdocs_components 
group by gdocId 
order by count(*) desc 
limit 10

The path and parent columns are json path expressions, so you can use those to query back into the gdoc.content column if you like to get the original component (e.g. if you care about the spans)

@danyx23 danyx23 changed the title Add flat gdoc components table ✨ Add flat gdoc components table Dec 2, 2024
@danyx23 danyx23 force-pushed the gdoc-components-table branch 2 times, most recently from e04cb1e to d26ae5d Compare December 11, 2024 10:42
@danyx23 danyx23 force-pushed the gdoc-components-table branch from d26ae5d to 769f953 Compare December 12, 2024 18:37
@danyx23 danyx23 merged commit bea23fc into master Dec 13, 2024
16 of 18 checks passed
@danyx23 danyx23 deleted the gdoc-components-table branch December 13, 2024 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant