Extend Power Query capabilities using Python as backend system
A FastAPI-based backend service that provides advanced text processing capabilities through HTTP endpoints, specifically designed to extend Power Query functionality.
-
Regex Functionality - Complete regex operations via REST API
- Pattern matching
- Find all matches
- Text substitution with backreferences
- Text splitting
- Pattern validation
- Support for all standard regex flags
- Bulk Operations - Optimized processing for multiple texts
-
JSONPath Functionality - Query and filter JSON data via REST API
- Search JSON data with JSONPath expressions
- Load and search JSON files
- Multiple JSONPath queries on single data
- Batch processing with file loading
- Complete JSONPath syntax support
- XPath
- HTML Selector Functionality
- Fuzzy Match
- Clone the repository:
git clone <repository-url>
cd extend-pq
- Install dependencies:
pip install -r requirements.txt
- Run the server:
python run.py
The API will be available at http://localhost:28080
Once the server is running, visit:
- Interactive API Docs: http://localhost:28080/docs
- ReDoc Documentation: http://localhost:28080/redoc
POST /match
- Find the first match of a pattern
{
"pattern": "\\b\\w+@\\w+\\.\\w+\\b",
"text": "Contact us at [email protected]",
"flags": ["IGNORECASE"]
}
POST /findall
- Find all matches of a pattern
{
"pattern": "\\d+",
"text": "There are 123 apples and 456 oranges",
"flags": []
}
POST /substitute
- Replace matches with replacement text
{
"pattern": "(\\w+)@(\\w+)\\.(\\w+)",
"replacement": "[EMAIL: \\1 at \\2 dot \\3]",
"text": "Contact [email protected]",
"count": 0,
"flags": ["IGNORECASE"]
}
POST /split
- Split text using regex pattern
{
"pattern": "[,;]\\s*",
"text": "apple, banana; cherry, date",
"maxsplit": 0,
"flags": []
}
POST /validate
- Validate regex pattern syntax
{
"pattern": "\\b\\w+@\\w+\\.\\w+\\b"
}
GET /flags
- Get list of supported regex flags
IGNORECASE
- Case-insensitive matchingMULTILINE
- ^ and $ match start/end of each lineDOTALL
- . matches any character including newlineVERBOSE
- Allow verbose regex with commentsASCII
- ASCII-only matching for \w, \d, \sLOCALE
- Locale-dependent matching
Optimized for processing multiple texts with the same pattern. The regex pattern is compiled once and applied to all texts, providing significant performance improvements.
POST /match
- Find first match in multiple texts
{
"pattern": "\\b\\w+@\\w+\\.\\w+\\b",
"texts": [
"Email me at [email protected]",
"Contact [email protected]",
"No email here"
],
"flags": ["IGNORECASE"]
}
POST /findall
- Find all matches in multiple texts
{
"pattern": "\\d+",
"texts": [
"Room 123 and 456",
"No numbers here",
"Years: 2023, 2024, 2025"
],
"flags": []
}
POST /substitute
- Replace matches in multiple texts
{
"pattern": "(\\w+)@(\\w+)\\.(\\w+)",
"replacement": "[EMAIL: \\1 at \\2 dot \\3]",
"texts": [
"Contact [email protected]",
"Email [email protected]"
],
"count": 0,
"flags": ["IGNORECASE"]
}
POST /split
- Split multiple texts using regex pattern
{
"pattern": "[,;]\\s*",
"texts": [
"apple, banana, cherry",
"red; green; blue",
"single-item"
],
"maxsplit": 0,
"flags": []
}
GET /info
- Get information about bulk operations
JSONPath is a query language for JSON, similar to XPath for XML. It allows you to extract and filter data from JSON documents using expressions.
POST /search
- Search JSON data with JSONPath expression
{
"json_data": "{\"store\":{\"book\":[{\"title\":\"Book 1\",\"price\":10},{\"title\":\"Book 2\",\"price\":15}]}}",
"jsonpath": "$.store.book[*].title"
}
POST /load-and-search
- Load JSON file and search with JSONPath
{
"file_path": "C:\\data\\products.json",
"jsonpath": "$.products[?(@.price < 50)].name"
}
POST /search-all
- Apply multiple JSONPath expressions to JSON data
{
"json_data": "{\"users\":[{\"name\":\"John\",\"age\":30},{\"name\":\"Jane\",\"age\":25}]}",
"jsonpaths": [
"$.users[*].name",
"$.users[?(@.age > 25)].name",
"$.users[0]"
]
}
POST /load-and-search-all
- Load JSON file and apply multiple JSONPath expressions
{
"file_path": "C:\\data\\inventory.json",
"jsonpaths": [
"$.products[*].name",
"$.products[?(@.stock > 0)]",
"$.categories[*]"
]
}
GET /info
- Get JSONPath syntax help and examples
$
- Root element$.store.book[*]
- All books in store$.store.book[0]
- First book$.store.book[-1]
- Last book$.store.book[0:2]
- First two books (slice)$.store.book[?(@.price < 10)]
- Books with price less than 10$..author
- All authors (recursive descent)$.store.*
- All things in store
- Pattern Compilation: Regex compiled once for all texts
- Error Isolation: Individual text failures don't stop processing
- Detailed Tracking: Per-text success/failure status
- Performance Metrics: Processing time measurement
- Aggregated Statistics: Total matches, substitutions, splits across all texts
Performance Example: Processing 100 texts is typically 5-10x faster using bulk operations compared to 100 individual requests.
extend-pq/
βββ main.py # FastAPI application entry point
βββ run.py # Server startup script
βββ requirements.txt # Python dependencies
βββ models/ # Pydantic data models
β βββ __init__.py
β βββ regex_models.py # Standard regex request/response models
β βββ bulk_regex_models.py # Bulk regex operations models
β βββ jsonpath_models.py # JSONPath operations models
βββ services/ # Business logic
β βββ __init__.py
β βββ regex_service.py # Standard regex operations service
β βββ bulk_regex_service.py # Bulk regex operations service
β βββ jsonpath_service.py # JSONPath operations service
βββ routes/ # API endpoints
β βββ __init__.py
β βββ regex_routes.py # Standard regex API routes
β βββ bulk_regex_routes.py # Bulk regex operations API routes
β βββ jsonpath_routes.py # JSONPath operations API routes
βββ examples/ # Usage examples
βββ test_regex_api.py # Standard API testing examples
βββ test_bulk_regex_api.py # Bulk operations testing examples
import requests
# Test regex match
response = requests.post("http://localhost:28080/regex/match", json={
"pattern": r"\b\w+@\w+\.\w+\b",
"text": "Email me at [email protected]",
"flags": ["IGNORECASE"]
})
result = response.json()
if result["match"]:
print(f"Found email: {result['match']['match']}")
import requests
# Search JSON data with JSONPath
response = requests.post("http://localhost:28080/jsonpath/search", json={
"json_data": '{"store":{"book":[{"title":"Book 1","price":10},{"title":"Book 2","price":15}]}}',
"jsonpath": "$.store.book[*].title"
})
result = response.json()
if result["success"]:
print(f"Found {result['matches_found']} matches: {result['result']}")
# Load and search JSON file
response = requests.post("http://localhost:28080/jsonpath/load-and-search", json={
"file_path": "C:\\data\\products.json",
"jsonpath": "$.products[?(@.price < 50)]"
})
# Multiple JSONPath searches
response = requests.post("http://localhost:28080/jsonpath/search-all", json={
"json_data": '{"users":[{"name":"John","age":30},{"name":"Jane","age":25}]}',
"jsonpaths": [
"$.users[*].name",
"$.users[?(@.age > 25)].name"
]
})
result = response.json()
print(f"Processed {result['total_jsonpaths']} JSONPath expressions")
for res in result['results']:
if res['success']:
print(f"JSONPath '{res['jsonpath']}': {res['matches_found']} matches")
import requests
# Process multiple texts with one API call
response = requests.post("http://localhost:28080/regex/bulk/match", json={
"pattern": r"\b\w+@\w+\.\w+\b",
"texts": [
"Email me at [email protected]",
"Contact [email protected]",
"No email in this text"
],
"flags": ["IGNORECASE"]
})
result = response.json()
print(f"Processed {result['total_texts']} texts in {result['processing_time_ms']}ms")
print(f"Successful: {result['successful_operations']}, Failed: {result['failed_operations']}")
for res in result['results']:
if res['success'] and res['match']:
print(f"Found: {res['match']['match']} in text {res['text_index']}")
You can call these endpoints from Power Query using Web.Contents()
:
let
// Create the request payload
RequestData = [
pattern = "\b\w+@\w+\.\w+\b",
text = "Contact [email protected]",
flags = {"IGNORECASE"}
],
// Convert to JSON and make the API call
Source = Json.Document(Web.Contents("http://localhost:28080/regex/match", [
Headers = [#"Content-Type"="application/json"],
Content = Text.ToBinary(Json.FromValue(RequestData))
]))
in
Source
let
// Create the request payload for bulk operation
RequestData = [
pattern = "\b\w+@\w+\.\w+\b",
texts = {"Contact [email protected]", "Email [email protected]", "No email here"},
flags = {"IGNORECASE"}
],
// Convert to JSON and make the API call
Source = Json.Document(Web.Contents("http://localhost:28080/regex/bulk/match", [
Headers = [#"Content-Type"="application/json"],
Content = Text.ToBinary(Json.FromValue(RequestData))
]))
in
Source
let
// Create the request payload
RequestData = [
pattern = "\b\w+@\w+\.\w+\b",
text = "Contact [email protected]",
flags = {"IGNORECASE"}
],
// Make the API call
ApiResponse = Json.Document(Web.Contents("http://localhost:28080/regex/match", [
Headers = [#"Content-Type"="application/json"],
Content = Text.ToBinary(Json.FromValue(RequestData))
])),
// Extract the match result
MatchFound = if ApiResponse[success] and ApiResponse[match] <> null
then ApiResponse[match][match]
else null,
// Create final result
Result = [
Success = ApiResponse[success],
Pattern = ApiResponse[pattern],
InputText = ApiResponse[text],
MatchedText = MatchFound,
ErrorMessage = ApiResponse[error]
]
in
Result
let
// Create JSONPath search request
RequestData = [
json_data = "{""users"":[{""name"":""John"",""age"":30},{""name"":""Jane"",""age"":25}]}",
jsonpath = "$.users[*].name"
],
// Make the API call
ApiResponse = Json.Document(Web.Contents("http://localhost:28080/jsonpath/search", [
Headers = [#"Content-Type"="application/json"],
Content = Text.ToBinary(Json.FromValue(RequestData))
])),
// Extract results
ExtractedData = if ApiResponse[success] then ApiResponse[result] else null,
// Create final result
Result = [
Success = ApiResponse[success],
JSONPath = ApiResponse[jsonpath],
MatchesFound = ApiResponse[matches_found],
Data = ExtractedData,
ErrorMessage = ApiResponse[error]
]
in
Result
let
// Create file load and search request
RequestData = [
file_path = "C:\\data\\products.json",
jsonpath = "$.products[?(@.price < 100)].name"
],
// Make the API call
ApiResponse = Json.Document(Web.Contents("http://localhost:28080/jsonpath/load-and-search", [
Headers = [#"Content-Type"="application/json"],
Content = Text.ToBinary(Json.FromValue(RequestData))
])),
// Convert result to table if it's a list
ResultData = if ApiResponse[success] and ApiResponse[result] <> null then
if Value.Is(ApiResponse[result], type list) then
Table.FromList(ApiResponse[result], Splitter.SplitByNothing(), {"ProductName"})
else
Table.FromRecords({[ProductName = ApiResponse[result]]})
else
Table.FromRecords({[Error = ApiResponse[error]]}),
// Add metadata
FinalResult = Table.AddColumn(ResultData, "Metadata", each [
FilePath = ApiResponse[file_path],
MatchesFound = ApiResponse[matches_found],
FileSizeBytes = ApiResponse[file_size_bytes]
])
in
FinalResult
The project is structured to easily add new capabilities:
- Models: Add request/response models in
models/
package - Services: Add business logic in
services/
package - Routes: Add API endpoints in
routes/
package - Integration: Register routes in
main.py
Copy .env.example
to .env
and adjust settings as needed.
All endpoints return structured responses with:
success
: Operation success statuserror
: Error message (if any)- Operation-specific data fields
Example response:
{
"success": true,
"pattern": "\\d+",
"text": "123 apples",
"match": {
"match": "123",
"groups": [],
"start": 0,
"end": 3
},
"error": null
}
- Fork the repository
- Create a feature branch
- Add your functionality following the existing patterns
- Test your changes
- Submit a pull request
[Add your license information here]