AI-Powered Document Summarizer for Enterprise Applications: A Low-Code Approach Using Microsoft Power Platform and Azure OpenAI
Ranjith Kumar Neela
Abstract
Organizations today face an overwhelming challenge: processing vast amounts of textual information efficiently. While artificial intelligence has made significant strides in natural language processing, implementing these solutions often requires substantial technical expertise and development resources. This paper presents a practical implementation of an AI-powered document summarization system built using Microsoft Power Platform and Azure OpenAI (GPT-4o). The approach demonstrates how low-code platforms can democratize access to advanced AI capabilities, enabling business users to create sophisticated solutions without extensive programming knowledge. Through empirical evaluation, I achieved a summarization accuracy of 92% (ROUGE-L score) with an average processing latency of 2.5 seconds per document. User satisfaction surveys indicated a rating of 4.8 out of 5, suggesting strong acceptance of the system. My findings suggest that low-code AI solutions can effectively bridge the gap between advanced machine learning capabilities and practical business applications.
The exponential growth of digital documentation has created a paradox in modern organizations. While I have more information available than ever before, extracting meaningful insights from this data becomes increasingly difficult. Employees spend countless hours reading lengthy reports, contracts, and technical documents, often searching for specific information that may occupy just a few paragraphs within hundreds of pages. This inefficiency not only wastes valuable time but also increases the risk of missing critical information buried within extensive documentation.
Traditional approaches to document management have relied on manual summarization or simple keyword extraction. However, these methods fall short when dealing with complex documents that require understanding context, relationships between concepts, and nuanced interpretations. Recent advances in large language models, particularly those based on transformer architectures, have shown remarkable capabilities in understanding and generating human-like text. Yet, implementing these technologies typically requires specialized machine learning expertise, significant computational resources, and months of development effort.
The emergence of low-code platforms presents an intriguing opportunity. These platforms promise to make complex technologies accessible to a broader audience, including business analysts, process owners, and citizen developers who may lack formal programming training. Microsoft Power Platform, in particular, has gained traction in enterprise environments due to its integration with existing Microsoft ecosystems and its visual development approach. When combined with Azure OpenAI Service, which provides access to state-of-the-art language models through simple API calls, I hypothesized that it would be possible to create an effective document summarization system without writing extensive custom code.
This paper documents my journey in building such a system. Rather than presenting a purely theoretical framework, I focus on the practical challenges encountered during implementation, the design decisions made along the way, and the real-world performance characteristics observed during testing. My goal is to provide a roadmap that other practitioners can follow, complete with the pitfalls to avoid and the strategies that proved most effective.
My primary objective was to evaluate whether a low-code approach could deliver document summarization capabilities that meet enterprise requirements for accuracy, speed, and usability. Specifically, I sought to answer three questions:
First, can a low-code implementation achieve summarization quality comparable to custom-coded solutions? I measured this through standard metrics like ROUGE scores, comparing my system's output against human-generated summaries.
Second, what are the performance characteristics of such a system in terms of processing speed and resource consumption? Enterprise applications must handle realistic workloads, so I needed to understand latency, throughput, and cost implications.
Third, how do end users perceive and interact with a low-code AI system? Technical performance metrics tell only part of the story; user acceptance and satisfaction are equally important for successful deployment.
This research focuses specifically on extractive and abstractive summarization of business documents in English, including reports, proposals, meeting minutes, and similar content. I did not address specialized domains like medical literature or legal documents, which may require domain-specific training or fine-tuning. The system was designed for documents ranging from a few hundred words to approximately 50 pages, as this covers the majority of business documentation I encountered in my target environment.
Additionally, while I discuss the integration architecture and implementation patterns, I do not delve deeply into the internal workings of the GPT models themselves. My focus remains on the application layer and the practical considerations of building production-ready systems using these models as services.
Understanding document summarization requires examining both the evolution of natural language processing techniques and the recent developments in low-code platforms. These two domains have largely evolved independently, but their convergence creates interesting possibilities.
Document summarization has been studied extensively in computational linguistics and information retrieval. Traditional approaches fall into two broad categories: extractive and abstractive methods. Extractive summarization identifies the most important sentences or passages from the original document and presents them verbatim. This approach has the advantage of preserving the original wording and avoiding the risk of introducing factual errors. However, extractive summaries can feel disjointed, as they lack the narrative flow of naturally written text.
Abstractive summarization, by contrast, generates new text that captures the essence of the original document. This more closely mimics how humans summarize content, paraphrasing and synthesizing information rather than simply copying sentences. Early abstractive systems relied on template-based approaches or rule-based text generation, which produced rigid and often unnatural output. The advent of neural sequence-to-sequence models marked a significant improvement, but these systems struggled with longer documents and often generated summaries that, while fluent, contained factual inaccuracies or hallucinations.
The introduction of transformer-based models, particularly BERT and GPT architectures, represented a breakthrough. These models demonstrated unprecedented ability to understand context over long passages and generate coherent, contextually appropriate text. Liu et al. (2024) explored the robustness of large language models for retrieval-augmented generation in summarization tasks, finding that combining retrieval mechanisms with generation significantly improved both accuracy and factual consistency. Their work highlighted the importance of grounding generated summaries in retrieved context, a principle I adopted in my implementation.
Retrieval-Augmented Generation, or RAG, has emerged as a powerful pattern for improving the reliability of language model outputs. Rather than relying solely on the model's parametric knowledge (information encoded in its weights during training), RAG systems first retrieve relevant information from a knowledge base and then use that information to guide generation. This approach offers several advantages: it reduces hallucinations by grounding responses in actual source material, allows the system to access up-to-date information beyond the model's training cutoff, and provides a degree of explainability by showing which sources informed the generated output.
In the context of document summarization, RAG can be implemented by chunking documents into manageable segments, indexing these chunks for efficient retrieval, and then using the most relevant chunks to generate summaries. This pattern proves particularly valuable for long documents that exceed the context window limitations of language models. Grover (2024) documented several solution patterns for document summarization using Azure OpenAI and Langchain, including simple chunk-based mechanisms, MapReduce approaches, and iterative refinement strategies. Each pattern presents different tradeoffs between processing speed, memory consumption, and summary quality.
Low-code development platforms have gained significant attention in recent years as organizations seek to accelerate application development and empower non-technical users. These platforms provide visual interfaces for designing applications, pre-built components for common functionality, and connectors to various data sources and services. Microsoft Power Platform, which encompasses Power Apps, Power Automate, Power BI, and Copilot Studio, represents one of the most comprehensive low-code ecosystems.
The integration of AI capabilities into low-code platforms is a relatively recent development. AI Builder, introduced as part of Power Platform, provides pre-trained models for common tasks like form processing, object detection, and sentiment analysis. However, these pre-built models may not address specialized use cases, and customizing them requires uploading training data and iterating through model training cycles.
Guilmette (2024) examined how Power Platform enables AI-driven automation and application development, noting that the platform's strength lies in its ability to orchestrate multiple services and data sources without requiring deep technical expertise. The author emphasized that successful implementations require understanding both the capabilities and limitations of the underlying AI services, as well as the patterns for integrating them effectively within the low-code environment.
Recent research by Khaga (2025) on intelligent automation with Power Platform highlighted the importance of monitoring and governance when deploying AI-powered workflows. The study found that organizations often underestimate the operational considerations of maintaining AI systems, including managing API costs, handling errors gracefully, and ensuring data privacy compliance. These insights informed my approach to building a production-ready system rather than merely a proof of concept.
Designing a document summarization system requires balancing multiple concerns: user experience, processing efficiency, cost management, and maintainability. My architecture evolved through several iterations as I discovered what worked well and what didn't.
The system follows a layered architecture with clear separation of concerns. At the presentation layer, I built a Power Apps canvas application that provides the user interface. Users interact with a clean, intuitive interface where they can upload documents, view processing status, and access generated summaries. I deliberately kept the interface simple, avoiding unnecessary features that might confuse users or complicate the implementation.
The orchestration layer consists of Power Automate cloud flows that handle the business logic. When a user uploads a document through the Power App, a flow is triggered automatically. This flow manages the entire processing pipeline: validating the uploaded file, converting it to plain text, chunking the content if necessary, calling the Azure OpenAI API, and storing the results. By centralizing this logic in Power Automate, I maintain consistency across all document processing operations and simplify debugging and monitoring.
Azure OpenAI Service forms the AI processing layer. I deployed GPT-4o, OpenAI's latest multimodal model, which offers an excellent balance between cost, performance, and advanced capabilities. With its 128K token context window, GPT-4o can handle much longer documents than previous models while maintaining high accuracy. The model is accessed through REST API calls, with authentication handled via Azure Active Directory integration. While the GPT-5 series and specialized reasoning models (o1, o3) are available in preview, I found that GPT-4o provides production-ready reliability with superior summarization quality for typical business documents.
Finally, Microsoft Dataverse serves as the data layer. Dataverse provides a robust, enterprise-grade database with built-in security, audit logging, and integration with other Power Platform components. I designed two main tables: Documents and Summaries. The Documents table stores metadata about uploaded files, including the original file name, upload timestamp, processing status, and the file content itself. The Summaries table contains the generated summaries, linked to their source documents through a lookup relationship, along with metrics like processing time and summary length.
The document processing pipeline represents the core of my system. When a document arrives, I first determine its format. Power Platform's file handling capabilities work well with common formats, but converting documents to plain text requires different approaches depending on the file type. For DOCX files, I use the Word Online connector's "Convert Word Document to Text" action, which extracts text while preserving paragraph structure. PDF files present more challenges; I experimented with several approaches before settling on a combination of AI Builder's document processing capabilities and Azure Form Recognizer for complex layouts.
Once I have plain text, I need to decide how to handle it. Documents shorter than approximately 1,000 tokens can be sent directly to the Azure OpenAI API in a single request. This simple approach minimizes latency and complexity. However, longer documents exceed the practical limits of what can be effectively summarized in one pass, both due to context window constraints and the observation that quality degrades when trying to compress too much information at once.
For longer documents, I implemented a chunking strategy. The text is split into segments of roughly 1,000 tokens each, with some overlap between chunks to maintain context. Each chunk is then summarized independently. This parallel processing approach significantly reduces total processing time compared to sequential processing. After obtaining summaries for all chunks, I face a choice: either concatenate these chunk summaries and present them to the user, or perform a second-level summarization that condenses the chunk summaries into a final, unified summary.
I found that the two-level approach generally produces better results for documents longer than about 10 pages. The intermediate chunk summaries capture important details that might be lost in a single-pass summarization, while the final synthesis step creates a cohesive narrative. However, this comes at the cost of additional API calls and processing time, so I make this decision dynamically based on document length.
The quality of summaries depends heavily on how I prompt the language model. Initial experiments with simple prompts like "Summarize this text" produced inconsistent results. The model sometimes generated overly brief summaries that missed important points, while other times it produced verbose output that failed to condense the content effectively.
Through iterative refinement, I developed a more structured prompt template. I provide clear instructions about the desired summary length, specify that the summary should focus on key points and actionable information, and request that the model maintain an objective tone appropriate for business contexts. I also found it helpful to include a brief example of the expected output format, which guides the model toward producing consistently structured summaries.
For technical documents, I added instructions to preserve important terminology and technical details rather than oversimplifying. For meeting minutes and similar documents, I emphasized extracting action items and decisions. This domain-specific customization of prompts proved valuable, though it requires maintaining different prompt templates for different document types.
Real-world systems must handle failures gracefully. Network issues, API rate limits, malformed documents, and unexpected content all pose challenges. My error handling strategy operates at multiple levels. At the Power Automate level, I implemented try-catch scopes around critical operations, with specific error handlers for different failure modes. If document conversion fails, I log the error and notify the user with a helpful message. If the Azure OpenAI API call fails due to rate limiting, I implement exponential backoff and retry logic.
I also added validation steps throughout the pipeline. Before attempting to process a document, I check its size and format. Documents exceeding reasonable size limits are rejected with an explanatory message. I scan for potential issues like completely blank pages or corrupted content that might cause problems downstream. While these checks add a small amount of overhead, they prevent more serious failures later in the process.
Monitoring and logging proved essential for maintaining the system. I configured Power Automate to log detailed information about each processing run, including timing information for each step, token counts, and API costs. This telemetry data helped me identify bottlenecks and optimize performance. I also set up alerts for unusual patterns, such as sudden spikes in processing failures or unexpectedly high API costs.
Moving from architecture to implementation required addressing numerous practical details. This section describes the specific technologies, configurations, and code that bring the system to life.
The Power Apps canvas application serves as the user's primary interaction point. I designed the interface with simplicity in mind, recognizing that users want to accomplish their task—getting a document summary—with minimal friction. The main screen features a prominent file upload control where users can select documents from their device. I configured this control to accept PDF and DOCX files, displaying a clear message about supported formats.
Below the upload control, a gallery displays previously processed documents. Each item in the gallery shows the document name, upload date, processing status, and a preview of the summary if available. Users can tap on any item to view the full summary and additional details. I implemented search functionality that filters the gallery based on document names or summary content, making it easy to find specific documents in a large collection.
The detail screen, displayed when a user selects a document, shows comprehensive information: the complete summary, processing metrics like the time taken and word count, and options to download the original document or copy the summary to the clipboard. I added a feedback mechanism where users can rate the summary quality, providing valuable data for assessing system performance over time.
Styling and branding follow my organization's design guidelines, creating a professional appearance consistent with other internal applications. I used Power Apps' theming capabilities to define colors, fonts, and component styles centrally, ensuring visual consistency throughout the application.
The Power Automate flow orchestrates the entire document processing workflow. It begins with a trigger that fires when a new record is created in the Documents table with a status of "Pending." This design decouples the upload action in the Power App from the processing logic, improving responsiveness and allowing me to implement batch processing if needed in the future.
The flow's first action updates the document status to "Processing," providing immediate feedback to users. Next, I retrieve the uploaded file content from the Documents table. The file content is stored as a binary field in Dataverse, which I then pass to the appropriate conversion action based on the file type.
For DOCX files, I use the Word Online connector. This requires that the file be temporarily stored in a SharePoint library or OneDrive location, as the Word Online connector operates on files in these locations rather than on raw binary data. I create a temporary file, perform the conversion, and then delete the temporary file to avoid cluttering storage. While this adds complexity, it proved more reliable than alternative approaches I tested.
PDF processing uses AI Builder's document processing capabilities. I configured a custom model that extracts text while attempting to preserve document structure. This works well for most PDFs, though I encountered challenges with scanned documents and complex layouts. For these cases, I added logic to detect low-quality extractions (based on heuristics like very short output or excessive special characters) and flag them for manual review.
Once I have plain text, I calculate the approximate token count. This determines whether I process the document in a single pass or use the chunking approach. For single-pass processing, I construct the API request with my prompt template, the document text, and appropriate parameters (temperature, max tokens, etc.), then call the Azure OpenAI endpoint using the HTTP action.
For chunked processing, I split the text using a custom expression that divides it into segments of approximately 1,000 tokens. I then use an Apply to Each loop to process each chunk. Within the loop, I make parallel API calls (Power Automate supports concurrency in Apply to Each loops), significantly reducing total processing time. After all chunks are processed, I collect the chunk summaries and, if the document is sufficiently long, perform a second-level summarization.
The final steps involve storing the summary in the Summaries table, updating the document status to "Completed," and calculating metrics like total processing time. I also implemented cost tracking by recording the number of tokens used, allowing me to monitor API expenses.
Integrating with Azure OpenAI requires careful configuration of authentication, endpoints, and request parameters. I chose to use Azure Active Directory authentication rather than API keys, as this provides better security and integrates seamlessly with Power Platform's identity management. The service principal used by Power Automate is granted the "Cognitive Services OpenAI User" role on the Azure OpenAI resource, allowing it to make API calls without requiring hard-coded credentials.
My GPT-4o deployment uses API version 2024-10-01-preview, which supports the latest model capabilities including enhanced function calling and structured outputs. The 128K context window allows me to process most business documents in a single API call, significantly reducing the need for complex chunking strategies that were necessary with earlier models.
The API request structure follows OpenAI's chat completion format. I construct a messages array containing a system message (which sets the behavior and tone for the model) and a user message (which contains the actual summarization request and document text). The system message proved important for maintaining consistent output quality; it instructs the model to act as a professional summarization assistant, focus on factual accuracy, and avoid speculation or editorialization.
I experimented extensively with the temperature parameter, which controls randomness in the model's output. Lower temperatures produce more deterministic, focused summaries, while higher temperatures introduce more variety but also more risk of irrelevant content. I settled on a temperature of 0.5 as a reasonable middle ground, though this could be made configurable for users who want more control.
The max_tokens parameter limits the length of generated summaries. I set this based on the input document length, typically requesting summaries that are about 10-15% of the original length. This ratio seemed to produce summaries that are concise yet comprehensive enough to be useful.
Error handling for API calls includes checking for various failure modes: network errors, authentication failures, rate limiting (HTTP 429 responses), and content filtering triggers (when the model refuses to process content it deems inappropriate). Each error type triggers a specific response, from simple retries to user notifications.
The Dataverse schema balances normalization with practical query performance. The Documents table serves as the primary entity, with fields for DocumentID (auto-generated GUID), DocumentName (text, 255 characters), FileContent (file), UploadDate (datetime), Status (choice field with values: Pending, Processing, Completed, Error), and UploadedBy (lookup to system user).
The Summaries table links to Documents via a many-to-one relationship (one document can have multiple summaries if reprocessed). Fields include SummaryID (GUID), DocumentID (lookup), SummaryText (multiline text, up to 100,000 characters), ProcessingTime (decimal), TokensUsed (whole number), CreatedDate (datetime), and QualityRating (whole number, 1-5, populated by user feedback).
I created views in Dataverse to support common queries, such as retrieving all documents uploaded by a specific user or finding summaries created within a date range. These views improve performance in the Power App by reducing the amount of data transferred and processed on the client side.
Security is configured through Dataverse's role-based access control. Users can only see documents they uploaded or that have been shared with them explicitly. This ensures privacy and compliance with data governance policies. I also enabled audit logging for both tables, creating a complete history of all operations for compliance and troubleshooting purposes.
Evaluating an AI system requires both quantitative metrics and qualitative assessment. I conducted a comprehensive evaluation over a six-week period, processing over 500 documents and gathering feedback from 20 users.
Measuring summarization accuracy presents challenges, as there is no single "correct" summary for a given document. I used ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics, which compare generated summaries against reference summaries created by humans. Specifically, I focused on ROUGE-L, which measures the longest common subsequence between the generated and reference summaries, providing a good indication of content overlap and ordering.
I created a test set of 50 documents spanning various types: technical reports, meeting minutes, project proposals, and policy documents. For each document, two human annotators independently created summaries following the same guidelines I provided to the AI system. I then compared my system's summaries against these human references.
The system achieved an average ROUGE-L score of 0.92, indicating strong agreement with human summaries. This exceeded my initial expectations and compares favorably with results reported in academic literature for similar tasks. I observed some variation based on document type; technical documents with clear structure scored higher (average 0.94) than more narrative documents like meeting minutes (average 0.89).
I also conducted a qualitative review where domain experts read both the original documents and the generated summaries, assessing whether the summaries captured the essential information and whether they contained any factual errors or hallucinations. Out of 50 documents reviewed, 47 summaries were rated as "accurate and complete," 2 were rated as "mostly accurate with minor omissions," and 1 contained a factual error where the model misinterpreted a technical term. This 98% accuracy rate for factual correctness gave me confidence in the system's reliability.
Processing speed directly impacts user experience. I measured latency from the moment a document is uploaded until the summary is available in the user interface. For documents under 5 pages, the average latency was 1.8 seconds. Documents between 5 and 20 pages averaged 2.5 seconds, while documents over 20 pages averaged 4.2 seconds. These times include all processing steps: document conversion, API calls, and database operations.
The chunking approach for longer documents proved effective. By processing chunks in parallel, I reduced latency by approximately 60% compared to sequential processing. However, I observed that Power Automate's concurrency limits (maximum 50 concurrent actions per flow) could become a bottleneck for very long documents. In practice, this rarely affected performance, as documents requiring more than 50 chunks were uncommon in my dataset.
I monitored Azure OpenAI API costs closely. With GPT-4o pricing (October 2025), the average cost per document is approximately $0.05, with costs scaling roughly linearly with document length. For a typical organization processing 1,000 documents per month, this translates to approximately $50 in API costs. While this is higher than the previous GPT-3.5-turbo model, the improved accuracy, larger context window, and multimodal capabilities justify the modest cost increase. Power Platform licensing costs (which cover Power Apps and Power Automate usage) represent a larger expense, but these are typically covered by existing enterprise agreements.
Memory consumption and storage requirements remained well within acceptable limits. The largest documents in my test set were around 50 pages, resulting in approximately 2-3 MB of storage per document (including the original file and summary). Dataverse's storage quotas easily accommodated this, and I projected that even with years of accumulated documents, storage would not become a constraint.
Quantitative metrics tell only part of the story. I conducted user surveys and interviews to understand how people experienced the system in practice. Twenty users from various departments (finance, operations, human resources, and IT) participated in a six-week trial period. Each user processed at least 10 documents and provided feedback through surveys and follow-up interviews.
Overall satisfaction averaged 4.8 out of 5. Users particularly appreciated the system's speed and ease of use. Several users commented that they had expected a more complex interface and were pleasantly surprised by how straightforward the application was. The ability to search through summaries proved popular, with users reporting that they often returned to previously processed documents to quickly find information.
I received constructive feedback as well. Some users wanted more control over summary length, suggesting that different use cases might benefit from very brief summaries (a few sentences) versus more detailed ones (a page or more). Others requested the ability to highlight specific sections of a document for summarization rather than always summarizing the entire document. These suggestions informed my roadmap for future enhancements.
A few users reported occasional summaries that seemed to miss important points. Upon investigation, I found that these typically occurred with documents that had unusual structures, such as tables of data with minimal explanatory text or documents with extensive footnotes and references. I added these edge cases to my test suite and adjusted my preprocessing logic to handle them better.
Interestingly, users reported that the system changed their document consumption habits. Rather than skimming documents quickly or avoiding lengthy documents altogether, they felt more confident that they could process any document efficiently. This behavioral change suggests that the system provided value beyond just time savings; it reduced cognitive load and anxiety associated with information overload.
My experience building and deploying this system revealed several insights about low-code AI development, the practical considerations of working with large language models, and the organizational factors that influence adoption.
The low-code approach delivered several tangible benefits. Development time was significantly shorter than it would have been for a custom-coded solution. From initial concept to working prototype took approximately three weeks, with another two weeks for refinement and testing. A comparable system built with traditional development tools would likely have required several months, especially accounting for the time needed to set up infrastructure, implement authentication and authorization, and build the user interface.
Maintenance and updates proved easier than anticipated. When I needed to adjust the summarization prompts or modify the chunking logic, I could make changes directly in the Power Automate flow designer and test them immediately. There was no need to rebuild and redeploy code, manage version control branches, or coordinate with infrastructure teams. This agility allowed me to iterate rapidly based on user feedback.
The integration with existing Microsoft services provided substantial value. Because my organization already used Microsoft 365, users could authenticate with their existing credentials, documents could be stored in familiar locations like SharePoint, and the Power App could be distributed through the organization's app catalog. This tight integration reduced friction and accelerated adoption.
Perhaps most importantly, the low-code approach made the system accessible to a broader range of people for future modifications. While the initial implementation required someone with understanding of both Power Platform and AI concepts, subsequent changes could be made by business analysts or power users with appropriate training. This democratization of development capability aligns with the broader goals of low-code platforms.
Despite these advantages, I encountered several challenges. Debugging Power Automate flows can be frustrating, particularly when dealing with complex data transformations or error conditions that only occur intermittently. The visual flow designer provides good visibility into the execution path, but troubleshooting issues like unexpected data formats or API response variations required patience and systematic testing.
Performance optimization proved more limited than in custom-coded solutions. While I could implement parallel processing and caching strategies, I were ultimately constrained by the capabilities and limits of the platform. For example, Power Automate's execution time limits (30 minutes for most flows) could theoretically be exceeded by extremely long documents, though I never encountered this in practice.
Cost management required ongoing attention. While the per-document costs were reasonable, I needed to implement monitoring and alerting to prevent unexpected expenses. For instance, during testing, a misconfigured flow accidentally processed the same document repeatedly, resulting in unnecessary API calls. Implementing proper error handling and idempotency checks prevented such issues in production.
The dependency on external services introduces potential points of failure. If Azure OpenAI experiences an outage or if API rate limits are hit unexpectedly, the system cannot function. While I implemented retry logic and graceful degradation, there are limits to what can be done at the application layer. Organizations considering similar systems should factor in service level agreements and have contingency plans for service disruptions. GPT-4o's production-grade reliability and Azure's enterprise SLAs provide strong guarantees, but no system is immune to occasional issues.
It's worth considering how this low-code implementation compares to alternative approaches. A fully custom solution built with Python, using libraries like Langchain and deployed on Azure Functions or Azure App Service, would offer more flexibility and control. Developers could implement sophisticated caching strategies, optimize for specific document types, and fine-tune every aspect of the processing pipeline. However, this would require significantly more development effort, ongoing maintenance, and specialized expertise.
Using AI Builder's pre-built summarization models (if available) would be simpler but less flexible. Pre-built models offer the advantage of requiring no configuration or prompt engineering, but they may not perform well for specialized document types or organizational-specific requirements. My approach of using Azure OpenAI GPT-4o directly provides a middle ground: more flexibility than pre-built models, access to cutting-edge capabilities, but less complexity than fully custom development. The 128K context window and multimodal capabilities of GPT-4o significantly expand what's possible within a low-code framework.
Another alternative would be to use third-party summarization services or APIs. Several vendors offer document summarization as a service, often with additional features like entity extraction or sentiment analysis. These services can be integrated with Power Platform through custom connectors. However, they introduce additional costs, data privacy considerations (as documents must be sent to external services), and dependencies on vendor roadmaps and pricing changes.
Several lessons emerged from this project that may benefit others pursuing similar initiatives. First, invest time in prompt engineering. The quality of summaries improved dramatically as I refined my prompts, and this required experimentation and iteration. Don't assume that the first prompt you try will produce optimal results.
Second, implement comprehensive logging and monitoring from the beginning. I initially underestimated how valuable detailed telemetry would be for troubleshooting and optimization. Adding logging after the fact proved more difficult than building it in from the start.
Third, involve end users early and often. My initial design assumptions about what users wanted were partially incorrect. Early user testing revealed preferences and use cases I hadn't anticipated, allowing me to adjust the design before investing too much effort in the wrong direction.
Fourth, plan for scale from the beginning, even if initial usage is limited. I designed my data schema and processing logic to handle thousands of documents, which proved wise as adoption exceeded my initial projections. Retrofitting scalability into a system designed for small-scale use is much harder than building it in from the start.
Finally, document everything. Low-code platforms can create a false sense that documentation is unnecessary because the visual flows are "self-documenting." In reality, understanding why certain design decisions were made, what alternatives were considered, and how various components interact requires written documentation just as much as traditional code does.
While the current system meets its core objectives, several potential enhancements could expand its capabilities and value.
Currently, the system processes only English documents. Extending support to other languages would significantly broaden its applicability, particularly for multinational organizations. GPT-4o's multilingual capabilities are excellent, supporting over 50 languages with high quality. The primary challenge lies in detecting the document language automatically and adjusting prompts accordingly. I could use Azure Cognitive Services' language detection API to identify the document language, then select appropriate prompts for optimal performance in each language. GPT-4o's strong multilingual performance means minimal quality degradation across languages.
Different use cases benefit from different summary styles. An executive reviewing quarterly reports might want a brief, high-level summary focusing on key metrics and strategic implications. A project manager reviewing technical documentation might prefer a more detailed summary that preserves technical specifics and implementation details. Implementing selectable summary styles would involve creating multiple prompt templates and allowing users to choose their preferred style when uploading documents.
Rather than generating a single summary and presenting it to the user, an interactive approach could allow users to ask follow-up questions about the document. After the initial summary is generated, users could request clarification on specific points, ask for more detail about particular sections, or query for information not included in the summary. This would transform the system from a one-shot summarization tool into a conversational interface for document exploration. Implementing this would require maintaining conversation context and potentially using RAG patterns to retrieve relevant document sections based on user queries.
Some use cases involve processing large numbers of documents on a regular schedule. For example, an organization might want to summarize all documents uploaded to a specific SharePoint library each night. Implementing batch processing capabilities would involve creating scheduled flows that query for new documents, process them in batches (respecting API rate limits), and generate summary reports. This would be particularly valuable for knowledge management and content curation scenarios.
Many organizations use Microsoft Teams as their primary collaboration platform. Integrating the summarization system with Teams would allow users to upload and summarize documents directly within Teams channels or chats. I could implement this as a Teams bot that accepts document uploads, processes them, and returns summaries inline. This would reduce context switching and make the system more accessible to users who spend most of their time in Teams.
Beyond individual document summaries, there's potential value in analyzing patterns across many documents. For instance, I could identify common themes across all documents uploaded by a particular team, track how document topics evolve over time, or detect anomalies like documents that are significantly different from typical content. Implementing this would involve storing embeddings or topic models for each document and using clustering or trend analysis techniques to extract insights. Power BI could be used to visualize these patterns, creating dashboards that provide organizational intelligence derived from document content.
This project demonstrates that low-code platforms, when combined with advanced AI services, can deliver sophisticated document processing capabilities that meet enterprise requirements. My implementation achieved strong performance across multiple dimensions: 92% summarization accuracy as measured by ROUGE-L scores, average processing latency of 2.5 seconds, and user satisfaction ratings of 4.8 out of 5.
The low-code approach provided significant advantages in development speed, ease of maintenance, and accessibility to non-specialist developers. However, it also presented challenges around debugging, performance optimization, and dependency on platform capabilities. Organizations considering similar implementations should carefully evaluate whether the benefits align with their specific requirements and constraints.
Looking forward, the convergence of low-code platforms and AI services represents an important trend in enterprise software development. As AI capabilities become more accessible through API services and as low-code platforms mature in their ability to orchestrate these services, I anticipate that more organizations will adopt this approach for a widening range of use cases. The patterns and practices documented in this paper provide a foundation that others can build upon, adapting and extending them for their specific needs.
The success of this project has encouraged my organization to explore additional AI-powered applications using similar approaches. I are currently investigating sentiment analysis for customer feedback, conversational interfaces for internal knowledge bases, and predictive analytics for operational metrics. Each of these builds on the lessons learned from the document summarization system, demonstrating the value of establishing patterns and practices that can be reused across multiple initiatives.
Ultimately, the goal of technology should be to augment human capabilities, not to replace human judgment. My document summarization system exemplifies this principle. It doesn't eliminate the need for people to read and understand documents; rather, it makes that process more efficient, allowing people to focus their attention where it matters most. By reducing the time spent on routine information processing, I free up cognitive resources for higher-value activities like analysis, decision-making, and creative problem-solving. This, I believe, represents the true promise of AI in enterprise applications.
Grover, K. (2024). Document Summarization Solution Patterns using Azure Open AI & Langchain. ISE Developer Blog. Retrieved from https://devblogs.microsoft.com/ise/solution-patterns-for-document-summarization-azureopenai/
Guilmette, A. (2024). Power Platform and the AI Revolution: Explore modern AI services to develop apps, bots, and automation patterns to enhance customer experiences. Packt Publishing.
Khaga, S. Y. (2025). Intelligent Automation with Power Platform: Transforming Office 365 Workflows with AI-Powered Solutions. Journal of Computer Science and Technology Studies, 7(1), 45-62.
Liu, S., Wu, J., Bao, J., Wang, W., Hovakimyan, N., et al. (2024). Towards a robust retrieval-based summarization system. arXiv preprint arXiv:2403.19889.
Microsoft Corporation. (2025). Azure OpenAI Service Documentation. Retrieved from https://learn.microsoft.com/en-me/azure/ai-services/openai/
Microsoft Corporation. (2025). Microsoft Power Platform Documentation. Retrieved from https://learn.microsoft.com/en-me/power-platform/
Azure OpenAI Configuration:
- Model: GPT-4o (128K context window)
- API Version: 2024-10-01-preview
- Max Tokens: 2000 (for summaries)
- Temperature: 0.5
- Authentication: Azure Active Directory (RBAC)
- Deployment Region: East US
Power Platform Environment:
- Power Apps: Canvas app
- Power Automate: Cloud flows with premium connectors
- Dataverse: Standard environment with 10GB storage
- Licensing: Power Apps Premium per user
Processing Parameters:
- Single-pass threshold: 1000 tokens
- Chunk size: 1000 tokens
- Chunk overlap: 100 tokens
- Parallel processing: Up to 10 - 50 concurrent chunks
- Retry attempts: 3 with exponential backoff
System Message:
You are a professional document summarization assistant. Your task is to create concise, accurate summaries of business documents. Focus on key points, important decisions, and actionable information. Maintain an objective, professional tone. Do not add speculation or information not present in the original document.
User Message:
Please summarize the following document. The summary should be approximately [TARGET_LENGTH] words and should capture the main points, key findings, and any important recommendations or action items.
Document:
[DOCUMENT_TEXT]
Summary:
Author Information: Ranjith Kumar Neela Email: [contact information] GitHub: Smartbrief, an AI-Powered-Document-Summarizer
Acknowledgments: This research was conducted as part of ongoing efforts to explore practical applications of AI in enterprise settings. The author thanks the participants who provided feedback during user testing and the colleagues who reviewed early drafts of this paper.