Complete reference for read_ast() output columns.
Default output (20 columns):
| Column | Type | Description |
|---|---|---|
node_id |
BIGINT | Unique node identifier within file |
type |
VARCHAR | Tree-sitter AST node type |
semantic_type |
SEMANTIC_TYPE | Universal semantic category |
flags |
UTINYINT | Node property flags |
name |
VARCHAR | Extracted identifier name |
signature_type |
VARCHAR | Type/return type information |
parameters |
STRUCT[] | Function parameters with names and types |
modifiers |
VARCHAR[] | Access modifiers and keywords |
annotations |
VARCHAR | Decorator/annotation text |
qualified_name |
LIST<STRUCT> | Scope-based path as segment list (semantic_type + name + index), unique within a file |
file_path |
VARCHAR | Source file path |
language |
VARCHAR | Programming language |
start_line |
UINTEGER | Starting line (1-based) |
end_line |
UINTEGER | Ending line (1-based) |
parent_id |
BIGINT | Parent node ID |
depth |
UINTEGER | Tree depth (0 for root) |
sibling_index |
UINTEGER | Position among siblings |
children_count |
UINTEGER | Direct child count |
descendant_count |
UINTEGER | Total descendant count |
peek |
VARCHAR | Source code snippet |
Additional columns with source := 'full' (22 columns total):
| Column | Type | Description |
|---|---|---|
start_column |
UINTEGER | Starting column (1-based) |
end_column |
UINTEGER | Ending column (1-based) |
-- Get column positions with source := 'full'
SELECT name, start_line, start_column, end_line, end_column
FROM read_ast('example.py', source := 'full')
WHERE is_function_definition(semantic_type);Type: BIGINT
Unique identifier for each AST node within a file.
SELECT node_id, type, name
FROM read_ast('example.py')
ORDER BY node_id;- Starts at 0 for the root node
- Sequential within each file
- Use with
parent_idfor tree traversal
Type: VARCHAR
Language-specific AST node type from Tree-sitter.
-- Common types vary by language
SELECT DISTINCT type
FROM read_ast('example.py')
ORDER BY type;Python examples: function_definition, class_definition, identifier
JavaScript examples: function_declaration, class_declaration, identifier
Type: SEMANTIC_TYPE (custom logical type)
Universal semantic category. This is a custom DuckDB type that:
- Displays as string - Shows
DEFINITION_FUNCTIONinstead of numeric code - Supports string comparison -
WHERE semantic_type = 'DEFINITION_FUNCTION' - Stores efficiently - Underlying storage is UTINYINT for fast comparisons
-- Direct string comparison (natural syntax)
SELECT * FROM read_ast('example.py')
WHERE semantic_type = 'DEFINITION_FUNCTION';
-- Group by semantic type
SELECT semantic_type, COUNT(*)
FROM read_ast('example.py')
GROUP BY semantic_type
ORDER BY COUNT(*) DESC;- Cross-language semantic classification
- See Semantic Types for values
Type: UTINYINT
Bitfield containing node properties. Use helper predicates to check flags:
| Flag | Predicate | Description |
|---|---|---|
IS_CONSTRUCT |
is_construct(flags) |
Semantic construct (not punctuation) |
IS_EMBODIED |
is_embodied(flags) / has_body(flags) |
Has implementation body |
IS_DECLARATION_ONLY |
is_declaration_only(flags) |
Forward declaration without body |
IS_SYNTAX_ONLY |
is_syntax_only(flags) |
Pure syntax token (keyword, punctuation) |
-- Find function definitions with implementations (not just declarations)
SELECT name, file_path
FROM read_ast('**/*.cpp', ignore_errors := true)
WHERE is_function_definition(semantic_type)
AND has_body(flags);
-- Find forward declarations only
SELECT name, file_path
FROM read_ast('**/*.h', ignore_errors := true)
WHERE is_function_definition(semantic_type)
AND is_declaration_only(flags);Type: VARCHAR (nullable)
Extracted identifier or name for the node.
SELECT name, type, start_line
FROM read_ast('example.py')
WHERE name IS NOT NULL;- NULL for nodes without meaningful names
- Populated for definitions, identifiers, and named constructs
- Extraction depends on language-specific rules
Type: VARCHAR (nullable)
Type information extracted from the node. Meaning varies by semantic type:
| Semantic Type | signature_type Contains |
|---|---|
DEFINITION_FUNCTION |
Return type (int, void, *big.Int) |
DEFINITION_CLASS |
Class kind (class, interface, trait, struct) |
COMPUTATION_CALL |
Full call expression (obj.method, pkg.func) |
DEFINITION_VARIABLE |
Variable type |
-- Find functions with their return types
SELECT name, signature_type as return_type, start_line
FROM read_ast('src/**/*.go')
WHERE is_function_definition(semantic_type);
-- Find method calls by signature
SELECT file_path, start_line, signature_type
FROM read_ast('src/**/*.cpp')
WHERE is_function_call(semantic_type)
AND signature_type LIKE '%.empty';Type: STRUCT("name" VARCHAR, "type" VARCHAR)[]
Function parameters as an array of structs containing parameter name and type.
-- List functions with their parameters
SELECT
name,
parameters,
array_length(parameters) as param_count
FROM read_ast('example.py')
WHERE is_function_definition(semantic_type);
-- Find functions with specific parameter names
SELECT name, parameters
FROM read_ast('src/**/*.py')
WHERE is_function_definition(semantic_type)
AND list_contains([p.name FOR p IN parameters], 'self');- Empty array
[]for functions with no parameters - Parameter types may be NULL for dynamically-typed languages
Type: VARCHAR[]
Access modifiers, keywords, and other declarative attributes.
-- Find public static methods in Java
SELECT name, modifiers
FROM read_ast('src/**/*.java')
WHERE is_function_definition(semantic_type)
AND list_contains(modifiers, 'public')
AND list_contains(modifiers, 'static');
-- Find async functions
SELECT name, file_path
FROM read_ast('src/**/*.js')
WHERE is_function_definition(semantic_type)
AND list_contains(modifiers, 'async');Common modifier values by language:
- Java:
public,private,protected,static,final,abstract - JavaScript:
async,const,let,var - Go:
var - Rust:
pub,mut,async
Type: VARCHAR (nullable)
Decorator or annotation text for the node.
-- Find decorated Python functions
SELECT name, annotations
FROM read_ast('src/**/*.py')
WHERE is_function_definition(semantic_type)
AND annotations IS NOT NULL;Type: LIST(STRUCT(semantic_type SEMANTIC_TYPE, name VARCHAR, index INTEGER)) (nullable)
Scope-based definition path that uniquely identifies a node within its file. Each element of the list is one segment of the path from outermost scope to innermost. Built during AST traversal by tracking nested definition scopes.
Segment fields:
| Field | Type | Meaning |
|---|---|---|
semantic_type |
SEMANTIC_TYPE |
The semantic role of this scope level (e.g. DEFINITION_FUNCTION, DEFINITION_CLASS, DEFINITION_VARIABLE, DEFINITION_MODULE, EXTERNAL_IMPORT, EXTERNAL_EXPORT). Cast to VARCHAR for the long-form name. |
name |
VARCHAR |
The identifier name at that scope level. |
index |
INTEGER |
1-based occurrence index for disambiguating same-name collisions within a scope. The first x = ... in a scope gets index = 1; the second gets index = 2; and so on. Always explicit. |
-- Inspect the raw segment list
SELECT name, qualified_name
FROM read_ast('src/models.py')
WHERE name = 'validate';
-- validate [{semantic_type: DEFINITION_CLASS, name: User, index: 1},
-- {semantic_type: DEFINITION_FUNCTION, name: validate, index: 1}]
-- Filter by semantic role at any depth (e.g. all definitions inside a class)
SELECT name
FROM read_ast('src/**/*.py')
WHERE len(list_filter(qualified_name,
s -> is_class_definition(s.semantic_type))) > 0;
-- Innermost enclosing class name
SELECT name,
(list_reverse(list_filter(qualified_name,
s -> is_class_definition(s.semantic_type))))[1].name AS in_class
FROM read_ast('src/**/*.py')
WHERE is_function_definition(semantic_type);
-- Top-level definitions only (depth = 1)
SELECT name
FROM read_ast('src/**/*.py')
WHERE len(qualified_name) = 1 AND is_function_definition(semantic_type);
-- Innermost segment
SELECT name, qualified_name[-1].name, qualified_name[-1].index
FROM read_ast('src/**/*.py')
WHERE is_definition(semantic_type);
-- Composite join key across parses
SELECT a.name, a.start_line, b.start_line
FROM read_ast('v1/**/*.py') a
JOIN read_ast('v2/**/*.py') b USING (file_path, qualified_name);Collision disambiguation. Within a single file, each (semantic_type, name) pair in a given scope gets a monotonically increasing index. The first occurrence has index = 1, the second has index = 2, and so on. This makes qualified_name unique within a file for every named definition.
counter = 0 # [{variable, counter, 1}]
counter = 1 # [{variable, counter, 2}]
counter = 2 # [{variable, counter, 3}]Counters reset at scope boundaries — each function/class gets a fresh counter, so [{function, foo}, {variable, x}] and [{function, bar}, {variable, x}] both use index = 1 for their x.
String form. For display, logging, or test assertions, ast_qualified_name_as_string(qualified_name) renders the list into the legacy bracket form:
SELECT name, ast_qualified_name_as_string(qualified_name)
FROM read_ast('src/models.py')
WHERE name = 'validate';
-- validate C[User] F[validate]The bracket form uses single-letter prefixes (F/C/V/M/I/E) and omits the [N] suffix for index = 1, so V[x] means "the first x" and V[x][2] means "the second x in that scope".
Key properties:
- NULL for non-definition nodes (the
qualified_namecolumn holds a NULL list, not an empty one) - Only populated for named definition nodes (functions, classes, variables, modules, imports, exports)
- Available at
context := 'normalized'and above - Unique within a file — every named definition node gets a distinct segment list
- Joinable:
USING (file_path, qualified_name)works as a cross-parse composite key
Type: VARCHAR
Path to the source file.
SELECT DISTINCT file_path
FROM read_ast('src/**/*.py');- Relative paths preserved from input
- Absolute paths if provided as input
Type: VARCHAR
Detected or specified programming language.
SELECT language, COUNT(*) as node_count
FROM read_ast(['**/*.py', '**/*.js'], ignore_errors := true)
GROUP BY language;Type: UINTEGER
Line numbers (1-based).
SELECT name, start_line, end_line, end_line - start_line + 1 as line_count
FROM read_ast('example.py')
WHERE type = 'function_definition';Type: UINTEGER
Column positions (1-based). Only available with source := 'full'.
-- Must use source := 'full' to get column positions
SELECT name, start_line, start_column, end_line, end_column
FROM read_ast('example.py', source := 'full')
WHERE type = 'identifier';Type: BIGINT (nullable)
Node ID of the parent node.
-- Find children of a specific node
SELECT type, name
FROM read_ast('example.py')
WHERE parent_id = 5;- NULL for root node
- Use for tree traversal
Type: UINTEGER
Tree depth (0 for root).
-- Find deeply nested code
SELECT type, name, depth
FROM read_ast('example.py')
WHERE depth > 5
ORDER BY depth DESC;Type: UINTEGER
Position among siblings (0-based).
SELECT type, sibling_index
FROM read_ast('example.py')
WHERE parent_id = 0
ORDER BY sibling_index;Type: UINTEGER
Number of direct child nodes.
SELECT type, name, children_count
FROM read_ast('example.py')
WHERE children_count > 10;Type: UINTEGER
Total number of descendant nodes. Useful as a complexity metric.
-- Find complex functions
SELECT name, descendant_count as complexity
FROM read_ast('example.py')
WHERE type = 'function_definition'
ORDER BY complexity DESC;Type: VARCHAR (nullable)
Source code snippet for the node.
SELECT type, name, peek
FROM read_ast('example.py', peek := 100)
WHERE type = 'function_definition';- Size controlled by
peekparameter - NULL if
peek := 'none' - Use
peek := 'full'for complete source text
| Column | 'none' |
'node_types_only' |
'normalized' |
'native' |
|---|---|---|---|---|
node_id |
Yes | Yes | Yes | Yes |
type |
Yes | Yes | Yes | Yes |
semantic_type |
No | Yes | Yes | Yes |
flags |
No | Yes | Yes | Yes |
name |
No | No | Yes | Yes |
signature_type |
No | No | No | Yes |
parameters |
No | No | No | Yes |
modifiers |
No | No | No | Yes |
annotations |
No | No | No | Yes |
qualified_name |
No | No | Yes | Yes |
file_path |
Yes | Yes | Yes | Yes |
language |
Yes | Yes | Yes | Yes |
start_line |
Yes | Yes | Yes | Yes |
end_line |
Yes | Yes | Yes | Yes |
parent_id |
Yes* | Yes* | Yes* | Yes |
depth |
Yes* | Yes* | Yes* | Yes |
sibling_index |
Yes* | Yes* | Yes* | Yes |
children_count |
Yes* | Yes* | Yes* | Yes |
descendant_count |
Yes* | Yes* | Yes* | Yes |
peek |
Yes** | Yes** | Yes** | Yes** |
* Depends on structure parameter
** Depends on peek parameter
| Column | 'none' |
'path' |
'lines_only' |
'lines' |
'full' |
|---|---|---|---|---|---|
file_path |
No | Yes | Yes | Yes | Yes |
start_line |
No | No | Yes | Yes | Yes |
end_line |
No | No | Yes | Yes | Yes |
start_column |
No | No | No | No | Yes |
end_column |
No | No | No | No | Yes |
-- Default: line positions only
SELECT start_line, end_line FROM read_ast('example.py');
-- With source := 'full': includes column positions
SELECT start_line, start_column, end_line, end_column
FROM read_ast('example.py', source := 'full');CREATE TABLE ast_cache AS
SELECT * FROM read_ast('src/**/*.py', ignore_errors := true);
-- Query cached data
SELECT type, COUNT(*) FROM ast_cache GROUP BY type;COPY (
SELECT file_path, type, name, start_line, semantic_type
FROM read_ast('src/**/*.py')
) TO 'ast_export.parquet';-- Join with file metadata
SELECT
ast.file_path,
ast.name,
files.last_modified
FROM read_ast('src/**/*.py') ast
JOIN file_metadata files ON ast.file_path = files.path
WHERE ast.type = 'function_definition';- Core Functions - Function reference
- Parameters - Parameter reference
- Semantic Types - Type system
- Native Extraction Semantics - Cross-language field behavior