diff --git a/Solar-Fullstack-LLM-101/15_auto_marketing.ipynb b/Solar-Fullstack-LLM-101/15_auto_marketing.ipynb
index 86e0f09..79a8aa9 100644
--- a/Solar-Fullstack-LLM-101/15_auto_marketing.ipynb
+++ b/Solar-Fullstack-LLM-101/15_auto_marketing.ipynb
@@ -1,499 +1,543 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n",
- "
\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Auto Marketing \n",
- "Using product descriptions and corresponding webpage content, Solar will create marketing email messages automatically."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "! pip3 install -qU langchain langchain-upstage langchain_community python-dotenv firecrawl-py==0.0.20 marko"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## UPSTAGE_API_KEY\n",
- "To obtain your Upstage API key, follow these steps:\n",
- "\n",
- "1. Visit the Upstage AI console at .\n",
- "2. Sign up for an account if you don't already have one.\n",
- "3. Log in to your account.\n",
- "4. Navigate to the API key section.\n",
- "5. Generate your API key.\n",
- "6. Copy the key and save it securely.\n",
- "\n",
- ""
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [],
- "source": [
- "# @title set API key\n",
- "import os\n",
- "import getpass\n",
- "\n",
- "from pprint import pprint\n",
- "\n",
- "import warnings\n",
- "\n",
- "warnings.filterwarnings(\"ignore\")\n",
- "\n",
- "if \"google.colab\" in str(get_ipython()):\n",
- " # Running in Google Colab. Please set the UPSTAGE_API_KEY in the Colab Secrets\n",
- " from google.colab import userdata\n",
- "\n",
- " os.environ[\"UPSTAGE_API_KEY\"] = userdata.get(\"UPSTAGE_API_KEY\")\n",
- "else:\n",
- " # Running locally. Please set the UPSTAGE_API_KEY in the .env file\n",
- " from dotenv import load_dotenv\n",
- "\n",
- " load_dotenv()\n",
- "\n",
- "if \"UPSTAGE_API_KEY\" not in os.environ:\n",
- " os.environ[\"UPSTAGE_API_KEY\"] = getpass.getpass(\"Enter your Upstage API key: \")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [],
- "source": [
- "# We will use FireCrawl, https://python.langchain.com/docs/integrations/document_loaders/firecrawl/\n",
- "# You will need to set FIRECRAWL_API_KEY. Learn mode https://firecrawl.dev/.\n",
- "\n",
- "\n",
- "if \"FIRECRAWL_API_KEY\" not in os.environ:\n",
- " os.environ[\"FIRECRAWL_API_KEY\"] = getpass.getpass(\"Enter your Firecrawl API key: \")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [],
- "source": [
- "product_info = \"\"\"\n",
- "UpstageAI provides AI-driven tools and solutions aimed at enhancing productivity, automating tasks, and assisting decision-making for businesses. Their core offerings include:\n",
- "\n",
- "1. **Solar Pro Preview**: A cutting-edge, highly intelligent large language model (LLM) that runs efficiently on a single GPU, designed to support complex tasks like data analysis and decision support. It's particularly suitable for businesses seeking advanced AI capabilities.\n",
- " \n",
- "2. **Solar Mini**: A purpose-trained LLM tailored for specific tasks, providing businesses with powerful yet efficient AI functionalities.\n",
- " \n",
- "3. **Document AI Tools**: These include Document Parsing, OCR (Optical Character Recognition), and Key Information Extraction, all designed to streamline document handling. Users can extract tables, figures, and text while enabling automation workflows for data management.\n",
- "\n",
- "4. **Task APIs**: Upstage offers several APIs, such as:\n",
- " - **Chat**: For building conversational agents.\n",
- " - **Translation**: Context-aware English-Korean translation.\n",
- " - **Embeddings**: For text-to-vector conversion.\n",
- " - **Groundedness Check**: Ensures AI-generated responses are based on accurate data.\n",
- " - **Text-to-SQL**: Converts natural language into SQL queries (coming soon).\n",
- "\n",
- "5. **Industry-Specific Intelligence**: Upstage is developing specialized APIs for industries such as healthcare, finance, and law, providing domain-specific AI solutions (coming soon).\n",
- "\n",
- "Upstage focuses on automating repetitive tasks, improving business decision-making, and providing generative business intelligence tools to increase efficiency across various sectors.\n",
- "\"\"\""
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [],
- "source": [
- "from langchain_community.document_loaders import FireCrawlLoader\n",
- "\n",
- "target_page = \"https://www.firecrawl.dev/\"\n",
- "loader = FireCrawlLoader(url=target_page, mode=\"scrape\")\n",
- "docs = loader.load()"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [
+ "cells": [
{
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "{'description': 'Firecrawl crawls and converts any website into clean '\n",
- " 'markdown.',\n",
- " 'keywords': 'Firecrawl,Markdown,Data,Mendable,Langchain',\n",
- " 'language': 'en',\n",
- " 'ogDescription': 'Turn any website into LLM-ready data.',\n",
- " 'ogImage': 'https://www.firecrawl.dev/og.png?123',\n",
- " 'ogLocaleAlternate': [],\n",
- " 'ogSiteName': 'Firecrawl',\n",
- " 'ogTitle': 'Firecrawl',\n",
- " 'ogUrl': 'https://www.firecrawl.dev/',\n",
- " 'pageStatusCode': 200,\n",
- " 'robots': 'follow, index',\n",
- " 'sitemap': {'changefreq': 'weekly', 'lastmod': '2024-09-16T18:04:06.184Z'},\n",
- " 'sourceURL': 'https://www.firecrawl.dev/',\n",
- " 'title': 'Home - Firecrawl'}\n"
- ]
- }
- ],
- "source": [
- "pprint(docs[0].metadata)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "metadata": {},
- "outputs": [
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FYkZT8VA8jv_"
+ },
+ "source": [
+ "\n",
+ "
\n",
+ ""
+ ]
+ },
{
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "('Our first Launch Week is over! Beta Features 13.9k Turn websites '\n",
- " 'into LLM-ready data Power ')\n"
- ]
- }
- ],
- "source": [
- "import marko\n",
- "from marko.inline import RawText, Emphasis, CodeSpan, Link, Image\n",
- "import re\n",
- "\n",
- "\n",
- "def extract_clean_text(markdown_text):\n",
- " # Parse the markdown into an AST\n",
- " ast = marko.parse(markdown_text)\n",
- "\n",
- " clean_text = []\n",
- "\n",
- " def traverse(node):\n",
- " # If the node is a RawText, append its content\n",
- " if isinstance(node, RawText):\n",
- " clean_text.append(node.children)\n",
- " # If the node has children, recurse into them unless it's a link or image\n",
- " elif hasattr(node, \"children\") and not isinstance(node, (Link, Image)):\n",
- " for child in node.children:\n",
- " traverse(child)\n",
- " # For other node types that contain text (e.g., Emphasis, CodeSpan), handle accordingly\n",
- " elif isinstance(node, (Emphasis, CodeSpan)):\n",
- " traverse(node.children if hasattr(node, \"children\") else node)\n",
- " # You can add more conditions here if you want to handle other specific node types\n",
- "\n",
- " # Start traversing from the root of the AST\n",
- " traverse(ast)\n",
- "\n",
- " # Join all extracted text parts with spaces and clean up whitespace\n",
- " text = \" \".join(clean_text).strip()\n",
- "\n",
- " # Remove URLs\n",
- " text = re.sub(r\"http[s]?://\\S+\", \"\", text)\n",
- " text = re.sub(r\"\\bwww\\.\\S+\", \"\", text)\n",
- "\n",
- " return text\n",
- "\n",
- "\n",
- "# Use the function to extract clean text\n",
- "target_info = extract_clean_text(docs[0].page_content)\n",
- "\n",
- "# Print the cleaned text\n",
- "pprint(target_info[:100])"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "metadata": {},
- "outputs": [],
- "source": [
- "from langchain_core.prompts import ChatPromptTemplate\n",
- "from langchain_core.output_parsers import StrOutputParser\n",
- "from langchain_upstage import ChatUpstage\n",
- "\n",
- "marketing_prompt_teample = ChatPromptTemplate.from_messages(\n",
- " [\n",
- " (\n",
- " \"human\",\n",
- " \"\"\"You are a marketing specialist at UpsatgeAI selling the product.\n",
- "Write a convincing and professional email message to a target company to sell your product.\n",
- "Use the provided product information and details about the target company to personalize the message effectively.\n",
- "Make your message relevant to the target company.\n",
- "\n",
- "---\n",
- "**Requirements:**\n",
- "- The email should have a clear subject line.\n",
- "- Highlight how your product can solve specific problems or add value to the target company.\n",
- "- Maintain a professional and engaging tone.\n",
- "- Include a call to action encouraging the recipient to take the next step.\n",
- "\n",
- "--- \n",
- "**Target Company Information to Email:** \n",
- "{target_company_info}\n",
- "--- \n",
- "**Your Product to Sell:** \n",
- "{product_info}\n",
- "\n",
- "\"\"\",\n",
- " )\n",
- " ]\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": []
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": []
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {},
- "outputs": [
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HikAulTP8jwB"
+ },
+ "source": [
+ "# Auto Marketing\n",
+ "Using product descriptions and corresponding webpage content, Solar will create marketing email messages automatically."
+ ]
+ },
{
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "You are a marketing specialist at UpsatgeAI selling the product.\n",
- "Write a convincing and professional email message to a target company to sell your product.\n",
- "Use the provided product information and details about the target company to personalize the message effectively.\n",
- "Make your message relevant to the target company.\n",
- "\n",
- "---\n",
- "**Requirements:**\n",
- "- The email should have a clear subject line.\n",
- "- Highlight how your product can solve specific problems or add value to the target company.\n",
- "- Maintain a professional and engaging tone.\n",
- "- Include a call to action encouraging the recipient to take the next step.\n",
- "\n",
- "--- \n",
- "**Target Company Information to Email:** \n",
- "Our first Launch Week is over! Beta Features 13.9k Turn websites into LLM-ready data Power your AI apps with clean data crawled from any website. It's also open-source. Start for free (500 credits)Start for free A product by Crawl, Scrape, Clean We crawl all accessible subpages and give you clean markdown for each. No sitemap required. \n",
- " [\\\n",
- " {\\\n",
- " \"url\": \"\n",
- " \"markdown\": \"## Welcome to Firecrawl\\\n",
- " Firecrawl is a web scraper that allows you to extract the content of a webpage.\"\\\n",
- " },\\\n",
- " {\\\n",
- " \"url\": \"\n",
- " \"markdown\": \"## Features\\\n",
- " Discover how Firecrawl's cutting-edge features can\\\n",
- " transform your data operations.\"\\\n",
- " },\\\n",
- " {\\\n",
- " \"url\": \"\n",
- " \"markdown\": \"## Pricing Plans\\\n",
- " Choose the perfect plan that fits your needs.\"\\\n",
- " },\\\n",
- " {\\\n",
- " \"url\": \"\n",
- " \"markdown\": \"## About Us\\\n",
- " Learn more about Firecrawl's mission and the\\\n",
- " team behind our innovative platform.\"\\\n",
- " }\\\n",
- " ]\n",
- "\n",
- " Note: The markdown has been edited for display purposes. Trusted by Top Companies Integrate today Enhance your applications with top-tier web scraping and crawling capabilities. Scrape Extract markdown or structured data from websites quickly and efficiently. Crawling Navigate and retrieve data from all accessible subpages, even without a sitemap. Node.js Python cURL 1\n",
- " 2\n",
- " 3\n",
- " 4\n",
- " 5\n",
- " 6\n",
- " 7\n",
- " 8\n",
- " 9\n",
- " 10\n",
- " // npm install @mendable/firecrawl-js\n",
- "\n",
- "import FirecrawlApp from '@mendable/firecrawl-js';\n",
- "\n",
- "const app = new FirecrawlApp({ apiKey: \"fc-YOUR_API_KEY\" });\n",
- "\n",
- "// Scrape a website:\n",
- "const scrapeResult = await app.scrapeUrl('firecrawl.dev');\n",
- "\n",
- "console.log(scrapeResult.data.markdown)\n",
- " Use well-known tools Already fully integrated with the greatest existing tools and workflows. Start for free, scale easily Kick off your journey for free and scale seamlessly as your project expands. Open-source Developed transparently and collaboratively. Join our community of contributors. We handle the hard stuff Rotating proxies, caching, rate limits, js-blocked content and more Crawling Firecrawl crawls all accessible subpages, even without a sitemap. Dynamic content Firecrawl gathers data even if a website uses javascript to render content. To Markdown Firecrawl returns clean, well formatted markdown - ready for use in LLM applications Crawling Orchestration Firecrawl orchestrates the crawling process in parallel for the fastest results. Caching Firecrawl caches content, so you don't have to wait for a full scrape unless new content exists. Built for AI Built by LLM engineers, for LLM engineers. Giving you clean data the way you want it. Our wall of love Don't take our word for it Greg Kamradt LLM structured data via API, handling requests, cleaning, and crawling. Enjoyed the early preview. Amit Naik #llm success with RAG relies on Retrieval. Firecrawl by @mendableai structures web content for processing. 👏 Jerry Liu Firecrawl is awesome 🔥 Turns web pages into structured markdown for LLM apps, thanks to @mendableai. Bardia Pourvakil These guys ship. I wanted types for their node SDK, and less than an hour later, I got them. Can't recommend them enough. latentsauce 🧘🏽 Firecrawl simplifies data preparation significantly, exactly what I was hoping for. Thank you for creating Firecrawl ❤️❤️❤️ Greg Kamradt LLM structured data via API, handling requests, cleaning, and crawling. Enjoyed the early preview. Amit Naik #llm success with RAG relies on Retrieval. Firecrawl by @mendableai structures web content for processing. 👏 Jerry Liu Firecrawl is awesome 🔥 Turns web pages into structured markdown for LLM apps, thanks to @mendableai. Bardia Pourvakil These guys ship. I wanted types for their node SDK, and less than an hour later, I got them. Can't recommend them enough. latentsauce 🧘🏽 Firecrawl simplifies data preparation significantly, exactly what I was hoping for. Thank you for creating Firecrawl ❤️❤️❤️ Michael Ning Firecrawl is impressive, saving us 2/3 the tokens and allowing gpt3.5turbo use over gpt4. Major savings in time and money. Alex Reibman 🖇️ Moved our internal agent's web scraping tool from Apify to Firecrawl because it benchmarked 50x faster with AgentOps. Michael @michael chomsky I really like some of the design decisions Firecrawl made, so I really want to share with others. Paul Scott Appreciating your lean approach, Firecrawl ticks off everything on our list without the cost prohibitive overkill. Michael Ning Firecrawl is impressive, saving us 2/3 the tokens and allowing gpt3.5turbo use over gpt4. Major savings in time and money. Alex Reibman 🖇️ Moved our internal agent's web scraping tool from Apify to Firecrawl because it benchmarked 50x faster with AgentOps. Michael @michael chomsky I really like some of the design decisions Firecrawl made, so I really want to share with others. Paul Scott Appreciating your lean approach, Firecrawl ticks off everything on our list without the cost prohibitive overkill. Flexible Pricing Start for free, then scale as you grow Yearly (17% off)Yearly (2 months free)Monthly Free Plan 500 credits $0 one-time Scrape 500 pages 10 /scrape per min 1 /crawl per min Get Started Hobby 3,000 credits $16/month Billed annually Scrape 3,000 pages 20 /scrape per min 3 /crawl per min Subscribe StandardMost Popular 100,000 credits $83/month Billed annually Scrape 100,000 pages 100 /scrape per min 10 /crawl per min 2 seats Subscribe Growth 500,000 credits $333/month Billed annually Scrape 500,000 pages 1000 /scrape per min 50 /crawl per min 4 seats Priority Support Subscribe Enterprise Plan Unlimited credits. Custom RPMs. Talk to us Top priority support Feature Acceleration SLAs Account Manager Custom rate limits volume Custom concurrency limits Custom seats CEO's number * a /scrape refers to the API endpoint. Structured extraction costs vary. See . * a /crawl refers to the API endpoint. API Credits Credits are consumed for each API request, varying by endpoint and feature. | Features | Credits | | --- | --- | | Scrape(/scrape) | 1 / page | | Crawl(/crawl) | 1 / page | | Map(/map) | 1 / call | | Search(/search) | 1 / page | | Scrape + LLM extraction(/scrape) | 5 / page | Ready to Build? Start scraping web data for your AI apps today. No credit card needed. FAQ Frequently asked questions about Firecrawl General What is Firecrawl? Firecrawl turns entire websites into clean, LLM-ready markdown or structured data. Scrape, crawl and extract the web with a single API. Ideal for AI companies looking to empower their LLM applications with web data. What sites work? Firecrawl is best suited for business websites, docs and help centers. We currently don't support social media platforms. Who can benefit from using Firecrawl? Firecrawl is tailored for LLM engineers, data scientists, AI researchers, and developers looking to harness web data for training machine learning models, market research, content aggregation, and more. It simplifies the data preparation process, allowing professionals to focus on insights and model development. Is Firecrawl open-source? Yes, it is. You can check out the repository on GitHub. Keep in mind that this repository is currently in its early stages of development. We are in the process of merging custom modules into this mono repository. Scraping & Crawling How does Firecrawl handle dynamic content on websites? Unlike traditional web scrapers, Firecrawl is equipped to handle dynamic content rendered with JavaScript. It ensures comprehensive data collection from all accessible subpages, making it a reliable tool for scraping websites that rely heavily on JS for content delivery. Why is it not crawling all the pages? There are a few reasons why Firecrawl may not be able to crawl all the pages of a website. Some common reasons include rate limiting, and anti-scraping mechanisms, disallowing the crawler from accessing certain pages. If you're experiencing issues with the crawler, please reach out to our support team at help@firecrawl.com. Can Firecrawl crawl websites without a sitemap? Yes, Firecrawl can access and crawl all accessible subpages of a website, even in the absence of a sitemap. This feature enables users to gather data from a wide array of web sources with minimal setup. What formats can Firecrawl convert web data into? Firecrawl specializes in converting web data into clean, well-formatted markdown. This format is particularly suited for LLM applications, offering a structured yet flexible way to represent web content. How does Firecrawl ensure the cleanliness of the data? Firecrawl employs advanced algorithms to clean and structure the scraped data, removing unnecessary elements and formatting the content into readable markdown. This process ensures that the data is ready for use in LLM applications without further preprocessing. Is Firecrawl suitable for large-scale data scraping projects? Absolutely. Firecrawl offers various pricing plans, including a Scale plan that supports scraping of millions of pages. With features like caching and scheduled syncs, it's designed to efficiently handle large-scale data scraping and continuous updates, making it ideal for enterprises and large projects. Does it respect robots.txt? Yes, Firecrawl crawler respects the rules set in a website's robots.txt file. If you notice any issues with the way Firecrawl interacts with your website, you can adjust the robots.txt file to control the crawler's behavior. Firecrawl user agent name is 'FirecrawlAgent'. If you notice any behavior that is not expected, please let us know at help@firecrawl.com. What measures does Firecrawl take to handle web scraping challenges like rate limits and caching? Firecrawl is built to navigate common web scraping challenges, including reverse proxies, rate limits, and caching. It smartly manages requests and employs caching techniques to minimize bandwidth usage and avoid triggering anti-scraping mechanisms, ensuring reliable data collection. Does Firecrawl handle captcha or authentication? Firecrawl avoids captcha by using stealth proxyies. When it encounters captcha, it attempts to solve it automatically, but this is not always possible. We are working to add support for more captcha solving methods. Firecrawl can handle authentication by providing auth headers to the API. API Related Where can I find my API key? Click on the dashboard button on the top navigation menu when logged in and you will find your API key in the main screen and under API Keys. Billing Is Firecrawl free? Firecrawl is free for the first 500 scraped pages (500 free credits). After that, you can upgrade to our Standard or Scale plans for more credits. Is there a pay per use plan instead of monthly? No we do not currently offer a pay per use plan, instead you can upgrade to our Standard or Growth plans for more credits and higher rate limits. How many credit does scraping, crawling, and extraction cost? Scraping costs 1 credit per page. Crawling costs 1 credit per page. Do you charge for failed requests (scrape, crawl, extract)? We do not charge for any failed requests (scrape, crawl, extract). Please contact support at caleb@firecrawl.com if you have any questions. What payment methods do you accept? We accept payments through Stripe which accepts most major credit cards, debit cards, and PayPal. © A product by Mendable.ai - All rights reserved. Helpful Links Backed by Resources Legals\n",
- "--- \n",
- "**Your Product to Sell:** \n",
- "\n",
- "UpstageAI provides AI-driven tools and solutions aimed at enhancing productivity, automating tasks, and assisting decision-making for businesses. Their core offerings include:\n",
- "\n",
- "1. **Solar Pro Preview**: A cutting-edge, highly intelligent large language model (LLM) that runs efficiently on a single GPU, designed to support complex tasks like data analysis and decision support. It's particularly suitable for businesses seeking advanced AI capabilities.\n",
- " \n",
- "2. **Solar Mini**: A purpose-trained LLM tailored for specific tasks, providing businesses with powerful yet efficient AI functionalities.\n",
- " \n",
- "3. **Document AI Tools**: These include Document Parsing, OCR (Optical Character Recognition), and Key Information Extraction, all designed to streamline document handling. Users can extract tables, figures, and text while enabling automation workflows for data management.\n",
- "\n",
- "4. **Task APIs**: Upstage offers several APIs, such as:\n",
- " - **Chat**: For building conversational agents.\n",
- " - **Translation**: Context-aware English-Korean translation.\n",
- " - **Embeddings**: For text-to-vector conversion.\n",
- " - **Groundedness Check**: Ensures AI-generated responses are based on accurate data.\n",
- " - **Text-to-SQL**: Converts natural language into SQL queries (coming soon).\n",
- "\n",
- "5. **Industry-Specific Intelligence**: Upstage is developing specialized APIs for industries such as healthcare, finance, and law, providing domain-specific AI solutions (coming soon).\n",
- "\n",
- "Upstage focuses on automating repetitive tasks, improving business decision-making, and providing generative business intelligence tools to increase efficiency across various sectors.\n",
- "\n",
- "\n",
- "\n"
- ]
- }
- ],
- "source": [
- "# write prompt\n",
- "prompt = marketing_prompt_teample.format_messages(\n",
- " target_company_info=target_info,\n",
- " product_info=product_info,\n",
- ")\n",
- "\n",
- "print(prompt[0].content)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "nx792nok8jwB"
+ },
+ "outputs": [],
+ "source": [
+ "! pip3 install -qU langchain langchain-upstage langchain_community python-dotenv firecrawl-py==0.0.20 marko"
+ ]
+ },
{
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Subject: Enhance Your Business Intelligence with UpstageAI's AI-Driven Solutions\n",
- "\n",
- "Dear [Target Company],\n",
- "\n",
- "I hope this email finds you well. I am reaching out to introduce you to UpstageAI, a leading provider of AI-driven tools and solutions designed to enhance productivity, automate tasks, and support decision-making for businesses.\n",
- "\n",
- "As a company that specializes in turning websites into clean, LLM-ready markdown, we believe that our suite of AI solutions can complement your existing offerings and help you provide even more value to your clients.\n",
- "\n",
- "Our core products include:\n",
- "\n",
- "1. **Solar Pro Preview**: A highly intelligent, single GPU-efficient large language model (LLM) suitable for complex tasks like data analysis and decision support.\n",
- "2. **Solar Mini**: A purpose-trained LLM tailored for specific tasks, providing powerful yet efficient AI functionalities.\n",
- "3. **Document AI Tools**: These include Document Parsing, OCR (Optical Character Recognition), and Key Information Extraction, streamlining document handling and enabling automation workflows for data management.\n",
- "4. **Task APIs**: We offer several APIs, such as Chat, Translation, Embeddings, Groundedness Check, and Text-to-SQL (coming soon), to automate repetitive tasks and improve decision-making.\n",
- "5. **Industry-Specific Intelligence**: We are developing specialized APIs for industries such as healthcare, finance, and law, providing domain-specific AI solutions (coming soon).\n",
- "\n",
- "By integrating our solutions with your existing services, you can offer your clients a more comprehensive and efficient data processing experience, making the most of the data you extract from websites.\n",
- "\n",
- "I would be delighted to discuss how UpstageAI's AI-driven tools and solutions can benefit your company and help you provide even more value to your clients. Please let me know if you are interested in scheduling a call to explore this opportunity further.\n",
- "\n",
- "Best regards,\n",
- "\n",
- "[Your Name]\n",
- "UpstageAI Marketing Specialist\n"
- ]
- }
- ],
- "source": [
- "llm = ChatUpstage(model=\"solar-1-mini-chat\")\n",
- "chain = marketing_prompt_teample | llm | StrOutputParser()\n",
- "\n",
- "email_msg = chain.invoke(\n",
- " {\n",
- " \"target_company_info\": target_info,\n",
- " \"product_info\": product_info,\n",
- " }\n",
- ")\n",
- "\n",
- "print(email_msg)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {},
- "outputs": [
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "UbYig4L-8jwC"
+ },
+ "source": [
+ "## UPSTAGE_API_KEY\n",
+ "To obtain your Upstage API key, follow these steps:\n",
+ "\n",
+ "1. Visit the Upstage AI console at .\n",
+ "2. Sign up for an account if you don't already have one.\n",
+ "3. Log in to your account.\n",
+ "4. Navigate to the API key section.\n",
+ "5. Generate your API key.\n",
+ "6. Copy the key and save it securely.\n",
+ "\n",
+ ""
+ ]
+ },
{
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Subject: Boost Your Business Productivity with UpstageAI's AI-Powered Tools\n",
- "\n",
- "---\n",
- "\n",
- "**Unlock Efficiency and Innovation with UpstageAI's AI Solutions**\n",
- "\n",
- "Dear [Target Company],\n",
- "\n",
- "As a marketing specialist at UpsageAI, I'm excited to introduce you to our cutting-edge AI-driven tools and solutions designed to streamline your operations, automate repetitive tasks, and enhance decision-making. Our product suite, tailored for businesses like yours, includes:\n",
- "\n",
- "1. **Solar Pro Preview:** A highly intelligent large language model (LLM) that runs efficiently on a single GPU, ideal for complex tasks like data analysis and decision support.\n",
- "\n",
- "2. **Solar Mini:** A purpose-trained LLM for specific tasks, providing powerful yet efficient AI functionalities.\n",
- "\n",
- "3. **Document AI Tools:** Streamline document handling with Document Parsing, OCR, and Key Information Extraction, enabling automation workflows for efficient data management.\n",
- "\n",
- "4. **Task APIs:** Leverage our APIs for conversational agents (Chat), context-aware English-Korean translation (Translation), text-to-vector conversion (Embeddings), and more.\n",
- "\n",
- "5. **Industry-Specific Intelligence:** Stay ahead with our upcoming specialized APIs for healthcare, finance, and law, providing domain-specific AI solutions.\n",
- "\n",
- "By integrating UpstageAI's offerings, your team can focus on high-value tasks while our tools handle the mundane, leading to increased productivity, cost savings, and better decision-making.\n",
- "\n",
- "To learn more about how UpstageAI can support your business, I invite you to schedule a demo with our product experts. Let's un\n"
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "id": "s-NjajHy8jwC"
+ },
+ "outputs": [],
+ "source": [
+ "# @title set API key\n",
+ "from pprint import pprint\n",
+ "import os\n",
+ "\n",
+ "import warnings\n",
+ "\n",
+ "warnings.filterwarnings(\"ignore\")\n",
+ "\n",
+ "if \"google.colab\" in str(get_ipython()):\n",
+ " # Running in Google Colab. Please set the UPSTAGE_API_KEY in the Colab Secrets\n",
+ " from google.colab import userdata\n",
+ "\n",
+ " os.environ[\"UPSTAGE_API_KEY\"] = userdata.get(\"UPSTAGE_API_KEY\")\n",
+ "else:\n",
+ " # Running locally. Please set the UPSTAGE_API_KEY in the .env file\n",
+ " from dotenv import load_dotenv\n",
+ "\n",
+ " load_dotenv()\n",
+ "\n",
+ "assert (\n",
+ " \"UPSTAGE_API_KEY\" in os.environ\n",
+ "), \"Please set the UPSTAGE_API_KEY environment variable\""
]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "id": "JkX5iW2X8jwD",
+ "outputId": "44db4d87-71a6-49e0-9b67-d4538152cf26",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Enter your Firecrawl API key: ··········\n"
+ ]
+ }
+ ],
+ "source": [
+ "# We will use FireCrawl, https://python.langchain.com/docs/integrations/document_loaders/firecrawl/\n",
+ "# You will need to set FIRECRAWL_API_KEY. Learn mode https://firecrawl.dev/.\n",
+ "\n",
+ "\n",
+ "if \"FIRECRAWL_API_KEY\" not in os.environ:\n",
+ " os.environ[\"FIRECRAWL_API_KEY\"] = getpass.getpass(\"Enter your Firecrawl API key: \")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "id": "KoRUxT9O8jwD"
+ },
+ "outputs": [],
+ "source": [
+ "product_info = \"\"\"\n",
+ "UpstageAI provides AI-driven tools and solutions aimed at enhancing productivity, automating tasks, and assisting decision-making for businesses. Their core offerings include:\n",
+ "\n",
+ "1. **Solar Pro Preview**: A cutting-edge, highly intelligent large language model (LLM) that runs efficiently on a single GPU, designed to support complex tasks like data analysis and decision support. It's particularly suitable for businesses seeking advanced AI capabilities.\n",
+ "\n",
+ "2. **Solar Mini**: A purpose-trained LLM tailored for specific tasks, providing businesses with powerful yet efficient AI functionalities.\n",
+ "\n",
+ "3. **Document AI Tools**: These include Document Parsing, OCR (Optical Character Recognition), and Key Information Extraction, all designed to streamline document handling. Users can extract tables, figures, and text while enabling automation workflows for data management.\n",
+ "\n",
+ "4. **Task APIs**: Upstage offers several APIs, such as:\n",
+ " - **Chat**: For building conversational agents.\n",
+ " - **Translation**: Context-aware English-Korean translation.\n",
+ " - **Embeddings**: For text-to-vector conversion.\n",
+ " - **Groundedness Check**: Ensures AI-generated responses are based on accurate data.\n",
+ " - **Text-to-SQL**: Converts natural language into SQL queries (coming soon).\n",
+ "\n",
+ "5. **Industry-Specific Intelligence**: Upstage is developing specialized APIs for industries such as healthcare, finance, and law, providing domain-specific AI solutions (coming soon).\n",
+ "\n",
+ "Upstage focuses on automating repetitive tasks, improving business decision-making, and providing generative business intelligence tools to increase efficiency across various sectors.\n",
+ "\"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "id": "8QmG12bW8jwE"
+ },
+ "outputs": [],
+ "source": [
+ "from langchain_community.document_loaders import FireCrawlLoader\n",
+ "\n",
+ "target_page = \"https://www.firecrawl.dev/\"\n",
+ "loader = FireCrawlLoader(url=target_page, mode=\"scrape\")\n",
+ "docs = loader.load()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "id": "Op9kJV0Y8jwE",
+ "outputId": "789dc502-8f5f-49b6-9daa-fe0f61fc040a",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "{'description': 'Firecrawl crawls and converts any website into clean '\n",
+ " 'markdown.',\n",
+ " 'keywords': 'Firecrawl,Markdown,Data,Mendable,Langchain',\n",
+ " 'language': 'en',\n",
+ " 'ogDescription': 'Turn any website into LLM-ready data.',\n",
+ " 'ogImage': 'https://www.firecrawl.dev/og.png?123',\n",
+ " 'ogLocaleAlternate': [],\n",
+ " 'ogSiteName': 'Firecrawl',\n",
+ " 'ogTitle': 'Firecrawl',\n",
+ " 'ogUrl': 'https://www.firecrawl.dev/',\n",
+ " 'pageStatusCode': 200,\n",
+ " 'robots': 'follow, index',\n",
+ " 'sitemap': {'changefreq': 'weekly', 'lastmod': '2024-10-03T20:51:47.851Z'},\n",
+ " 'sourceURL': 'https://www.firecrawl.dev/',\n",
+ " 'title': 'Home - Firecrawl'}\n"
+ ]
+ }
+ ],
+ "source": [
+ "pprint(docs[0].metadata)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "id": "ep0Vm1E58jwF",
+ "outputId": "5788565e-42cc-4f60-d2e6-7b40171e3969",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "('Our first Launch Week is over! Beta Features Resources 15.3k Turn '\n",
+ " 'websites into LLM-ready d')\n"
+ ]
+ }
+ ],
+ "source": [
+ "import marko\n",
+ "from marko.inline import RawText, Emphasis, CodeSpan, Link, Image\n",
+ "import re\n",
+ "\n",
+ "\n",
+ "def extract_clean_text(markdown_text):\n",
+ " # Parse the markdown into an AST\n",
+ " ast = marko.parse(markdown_text)\n",
+ "\n",
+ " clean_text = []\n",
+ "\n",
+ " def traverse(node):\n",
+ " # If the node is a RawText, append its content\n",
+ " if isinstance(node, RawText):\n",
+ " clean_text.append(node.children)\n",
+ " # If the node has children, recurse into them unless it's a link or image\n",
+ " elif hasattr(node, \"children\") and not isinstance(node, (Link, Image)):\n",
+ " for child in node.children:\n",
+ " traverse(child)\n",
+ " # For other node types that contain text (e.g., Emphasis, CodeSpan), handle accordingly\n",
+ " elif isinstance(node, (Emphasis, CodeSpan)):\n",
+ " traverse(node.children if hasattr(node, \"children\") else node)\n",
+ " # You can add more conditions here if you want to handle other specific node types\n",
+ "\n",
+ " # Start traversing from the root of the AST\n",
+ " traverse(ast)\n",
+ "\n",
+ " # Join all extracted text parts with spaces and clean up whitespace\n",
+ " text = \" \".join(clean_text).strip()\n",
+ "\n",
+ " # Remove URLs\n",
+ " text = re.sub(r\"http[s]?://\\S+\", \"\", text)\n",
+ " text = re.sub(r\"\\bwww\\.\\S+\", \"\", text)\n",
+ "\n",
+ " return text\n",
+ "\n",
+ "\n",
+ "# Use the function to extract clean text\n",
+ "target_info = extract_clean_text(docs[0].page_content)\n",
+ "\n",
+ "# Print the cleaned text\n",
+ "pprint(target_info[:100])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "id": "355CGYNe8jwF"
+ },
+ "outputs": [],
+ "source": [
+ "from langchain_core.prompts import ChatPromptTemplate\n",
+ "from langchain_core.output_parsers import StrOutputParser\n",
+ "from langchain_upstage import ChatUpstage\n",
+ "\n",
+ "marketing_prompt_teample = ChatPromptTemplate.from_messages(\n",
+ " [\n",
+ " (\n",
+ " \"human\",\n",
+ " \"\"\"You are a marketing specialist at UpsatgeAI selling the product.\n",
+ "Write a convincing and professional email message to a target company to sell your product.\n",
+ "Use the provided product information and details about the target company to personalize the message effectively.\n",
+ "Make your message relevant to the target company.\n",
+ "\n",
+ "---\n",
+ "**Requirements:**\n",
+ "- The email should have a clear subject line.\n",
+ "- Highlight how your product can solve specific problems or add value to the target company.\n",
+ "- Maintain a professional and engaging tone.\n",
+ "- Include a call to action encouraging the recipient to take the next step.\n",
+ "\n",
+ "---\n",
+ "**Target Company Information to Email:**\n",
+ "{target_company_info}\n",
+ "---\n",
+ "**Your Product to Sell:**\n",
+ "{product_info}\n",
+ "\n",
+ "\"\"\",\n",
+ " )\n",
+ " ]\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "id": "THPNTWUZ8jwG",
+ "outputId": "cf84650e-fbc4-48e3-ac8f-a84f787d3689",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "You are a marketing specialist at UpsatgeAI selling the product.\n",
+ "Write a convincing and professional email message to a target company to sell your product.\n",
+ "Use the provided product information and details about the target company to personalize the message effectively.\n",
+ "Make your message relevant to the target company.\n",
+ "\n",
+ "---\n",
+ "**Requirements:**\n",
+ "- The email should have a clear subject line.\n",
+ "- Highlight how your product can solve specific problems or add value to the target company.\n",
+ "- Maintain a professional and engaging tone.\n",
+ "- Include a call to action encouraging the recipient to take the next step.\n",
+ "\n",
+ "--- \n",
+ "**Target Company Information to Email:** \n",
+ "Our first Launch Week is over! Beta Features Resources 15.3k Turn websites into LLM-ready data Power your AI apps with clean data crawled from any website. It's also open-source. Start for free (500 credits)Start for free A product by Crawl, Scrape, Clean We crawl all accessible subpages and give you clean markdown for each. No sitemap required. \n",
+ " [\\\n",
+ " {\\\n",
+ " \"url\": \"\n",
+ " \"markdown\": \"## Welcome to Firecrawl\\\n",
+ " Firecrawl is a web scraper that allows you to extract the content of a webpage.\"\\\n",
+ " },\\\n",
+ " {\\\n",
+ " \"url\": \"\n",
+ " \"markdown\": \"## Features\\\n",
+ " Discover how Firecrawl's cutting-edge features can\\\n",
+ " transform your data operations.\"\\\n",
+ " },\\\n",
+ " {\\\n",
+ " \"url\": \"\n",
+ " \"markdown\": \"## Pricing Plans\\\n",
+ " Choose the perfect plan that fits your needs.\"\\\n",
+ " },\\\n",
+ " {\\\n",
+ " \"url\": \"\n",
+ " \"markdown\": \"## About Us\\\n",
+ " Learn more about Firecrawl's mission and the\\\n",
+ " team behind our innovative platform.\"\\\n",
+ " }\\\n",
+ " ]\n",
+ "\n",
+ " Note: The markdown has been edited for display purposes. Trusted by Top Companies Integrate today Enhance your applications with top-tier web scraping and crawling capabilities. Scrape Extract markdown or structured data from websites quickly and efficiently. Crawling Navigate and retrieve data from all accessible subpages, even without a sitemap. Node.js Python cURL 1\n",
+ " 2\n",
+ " 3\n",
+ " 4\n",
+ " 5\n",
+ " 6\n",
+ " 7\n",
+ " 8\n",
+ " 9\n",
+ " 10\n",
+ " // npm install @mendable/firecrawl-js\n",
+ "\n",
+ "import FirecrawlApp from '@mendable/firecrawl-js';\n",
+ "\n",
+ "const app = new FirecrawlApp({ apiKey: \"fc-YOUR_API_KEY\" });\n",
+ "\n",
+ "// Scrape a website:\n",
+ "const scrapeResult = await app.scrapeUrl('firecrawl.dev');\n",
+ "\n",
+ "console.log(scrapeResult.data.markdown)\n",
+ " Use well-known tools Already fully integrated with the greatest existing tools and workflows. Start for free, scale easily Kick off your journey for free and scale seamlessly as your project expands. Open-source Developed transparently and collaboratively. Join our community of contributors. We handle the hard stuff Rotating proxies, orchestration, rate limits, js-blocked content and more Crawling Firecrawl crawls all accessible subpages, even without a sitemap. Dynamic content Firecrawl gathers data even if a website uses javascript to render content. To Markdown Firecrawl returns clean, well formatted markdown - ready for use in LLM applications Reliability first Reliability is our core focus. Firecrawl is designed to ensure you get all the data you need. No Caching Firecrawl doesn't cache content by default. You always get the latest data. Built for AI Built by LLM engineers, for LLM engineers. Giving you clean data the way you want it. Our wall of love Don't take our word for it Greg Kamradt LLM structured data via API, handling requests, cleaning, and crawling. Enjoyed the early preview. Amit Naik #llm success with RAG relies on Retrieval. Firecrawl by @mendableai structures web content for processing. 👏 Jerry Liu Firecrawl is awesome 🔥 Turns web pages into structured markdown for LLM apps, thanks to @mendableai. Bardia Pourvakil These guys ship. I wanted types for their node SDK, and less than an hour later, I got them. Can't recommend them enough. latentsauce 🧘🏽 Firecrawl simplifies data preparation significantly, exactly what I was hoping for. Thank you for creating Firecrawl ❤️❤️❤️ Greg Kamradt LLM structured data via API, handling requests, cleaning, and crawling. Enjoyed the early preview. Amit Naik #llm success with RAG relies on Retrieval. Firecrawl by @mendableai structures web content for processing. 👏 Jerry Liu Firecrawl is awesome 🔥 Turns web pages into structured markdown for LLM apps, thanks to @mendableai. Bardia Pourvakil These guys ship. I wanted types for their node SDK, and less than an hour later, I got them. Can't recommend them enough. latentsauce 🧘🏽 Firecrawl simplifies data preparation significantly, exactly what I was hoping for. Thank you for creating Firecrawl ❤️❤️❤️ Michael Ning Firecrawl is impressive, saving us 2/3 the tokens and allowing gpt3.5turbo use over gpt4. Major savings in time and money. Alex Reibman 🖇️ Moved our internal agent's web scraping tool from Apify to Firecrawl because it benchmarked 50x faster with AgentOps. Michael @michael chomsky I really like some of the design decisions Firecrawl made, so I really want to share with others. Paul Scott Appreciating your lean approach, Firecrawl ticks off everything on our list without the cost prohibitive overkill. Michael Ning Firecrawl is impressive, saving us 2/3 the tokens and allowing gpt3.5turbo use over gpt4. Major savings in time and money. Alex Reibman 🖇️ Moved our internal agent's web scraping tool from Apify to Firecrawl because it benchmarked 50x faster with AgentOps. Michael @michael chomsky I really like some of the design decisions Firecrawl made, so I really want to share with others. Paul Scott Appreciating your lean approach, Firecrawl ticks off everything on our list without the cost prohibitive overkill. Flexible Pricing Start for free, then scale as you grow Yearly (17% off)Yearly (2 months free)Monthly Free Plan 500 credits $0 one-time Scrape 500 pages 10 /scrape per min 1 /crawl per min Get Started Hobby 3,000 credits $16/month Billed annually Scrape 3,000 pages 20 /scrape per min 3 /crawl per min Subscribe StandardMost Popular 100,000 credits $83/month Billed annually Scrape 100,000 pages 100 /scrape per min 10 /crawl per min 2 seats Subscribe Growth 500,000 credits $333/month Billed annually Scrape 500,000 pages 1000 /scrape per min 50 /crawl per min 4 seats Priority Support Subscribe Enterprise Plan Unlimited credits. Custom RPMs. Talk to us Top priority support Feature Acceleration SLAs Account Manager Custom rate limits volume Custom concurrency limits Custom seats CEO's number * a /scrape refers to the API endpoint. Structured extraction costs vary. See . * a /crawl refers to the API endpoint. API Credits Credits are consumed for each API request, varying by endpoint and feature. | Features | Credits | | --- | --- | | Scrape(/scrape) | 1 / page | | Crawl(/crawl) | 1 / page | | Map(/map) | 1 / call | | Search(/search) | 1 / page | | Scrape + LLM extraction(/scrape) | 5 / page | Ready to Build? Start scraping web data for your AI apps today. No credit card needed. FAQ Frequently asked questions about Firecrawl General What is Firecrawl? Firecrawl turns entire websites into clean, LLM-ready markdown or structured data. Scrape, crawl and extract the web with a single API. Ideal for AI companies looking to empower their LLM applications with web data. What sites work? Firecrawl is best suited for business websites, docs and help centers. We currently don't support social media platforms. Who can benefit from using Firecrawl? Firecrawl is tailored for LLM engineers, data scientists, AI researchers, and developers looking to harness web data for training machine learning models, market research, content aggregation, and more. It simplifies the data preparation process, allowing professionals to focus on insights and model development. Is Firecrawl open-source? Yes, it is. You can check out the repository on GitHub. Keep in mind that this repository is currently in its early stages of development. We are in the process of merging custom modules into this mono repository. Scraping & Crawling How does Firecrawl handle dynamic content on websites? Unlike traditional web scrapers, Firecrawl is equipped to handle dynamic content rendered with JavaScript. It ensures comprehensive data collection from all accessible subpages, making it a reliable tool for scraping websites that rely heavily on JS for content delivery. Why is it not crawling all the pages? There are a few reasons why Firecrawl may not be able to crawl all the pages of a website. Some common reasons include rate limiting, and anti-scraping mechanisms, disallowing the crawler from accessing certain pages. If you're experiencing issues with the crawler, please reach out to our support team at help@firecrawl.com. Can Firecrawl crawl websites without a sitemap? Yes, Firecrawl can access and crawl all accessible subpages of a website, even in the absence of a sitemap. This feature enables users to gather data from a wide array of web sources with minimal setup. What formats can Firecrawl convert web data into? Firecrawl specializes in converting web data into clean, well-formatted markdown. This format is particularly suited for LLM applications, offering a structured yet flexible way to represent web content. How does Firecrawl ensure the cleanliness of the data? Firecrawl employs advanced algorithms to clean and structure the scraped data, removing unnecessary elements and formatting the content into readable markdown. This process ensures that the data is ready for use in LLM applications without further preprocessing. Is Firecrawl suitable for large-scale data scraping projects? Absolutely. Firecrawl offers various pricing plans, including a Scale plan that supports scraping of millions of pages. With features like caching and scheduled syncs, it's designed to efficiently handle large-scale data scraping and continuous updates, making it ideal for enterprises and large projects. Does it respect robots.txt? Yes, Firecrawl crawler respects the rules set in a website's robots.txt file. If you notice any issues with the way Firecrawl interacts with your website, you can adjust the robots.txt file to control the crawler's behavior. Firecrawl user agent name is 'FirecrawlAgent'. If you notice any behavior that is not expected, please let us know at help@firecrawl.com. What measures does Firecrawl take to handle web scraping challenges like rate limits and caching? Firecrawl is built to navigate common web scraping challenges, including reverse proxies, rate limits, and caching. It smartly manages requests and employs caching techniques to minimize bandwidth usage and avoid triggering anti-scraping mechanisms, ensuring reliable data collection. Does Firecrawl handle captcha or authentication? Firecrawl avoids captcha by using stealth proxyies. When it encounters captcha, it attempts to solve it automatically, but this is not always possible. We are working to add support for more captcha solving methods. Firecrawl can handle authentication by providing auth headers to the API. API Related Where can I find my API key? Click on the dashboard button on the top navigation menu when logged in and you will find your API key in the main screen and under API Keys. Billing Is Firecrawl free? Firecrawl is free for the first 500 scraped pages (500 free credits). After that, you can upgrade to our Standard or Scale plans for more credits. Is there a pay per use plan instead of monthly? No we do not currently offer a pay per use plan, instead you can upgrade to our Standard or Growth plans for more credits and higher rate limits. How many credit does scraping, crawling, and extraction cost? Scraping costs 1 credit per page. Crawling costs 1 credit per page. Do you charge for failed requests (scrape, crawl, extract)? We do not charge for any failed requests (scrape, crawl, extract). Please contact support at caleb@firecrawl.com if you have any questions. What payment methods do you accept? We accept payments through Stripe which accepts most major credit cards, debit cards, and PayPal. © A product by Mendable.ai - All rights reserved. Helpful Links Backed by Resources Legals\n",
+ "--- \n",
+ "**Your Product to Sell:** \n",
+ "\n",
+ "UpstageAI provides AI-driven tools and solutions aimed at enhancing productivity, automating tasks, and assisting decision-making for businesses. Their core offerings include:\n",
+ "\n",
+ "1. **Solar Pro Preview**: A cutting-edge, highly intelligent large language model (LLM) that runs efficiently on a single GPU, designed to support complex tasks like data analysis and decision support. It's particularly suitable for businesses seeking advanced AI capabilities.\n",
+ " \n",
+ "2. **Solar Mini**: A purpose-trained LLM tailored for specific tasks, providing businesses with powerful yet efficient AI functionalities.\n",
+ " \n",
+ "3. **Document AI Tools**: These include Document Parsing, OCR (Optical Character Recognition), and Key Information Extraction, all designed to streamline document handling. Users can extract tables, figures, and text while enabling automation workflows for data management.\n",
+ "\n",
+ "4. **Task APIs**: Upstage offers several APIs, such as:\n",
+ " - **Chat**: For building conversational agents.\n",
+ " - **Translation**: Context-aware English-Korean translation.\n",
+ " - **Embeddings**: For text-to-vector conversion.\n",
+ " - **Groundedness Check**: Ensures AI-generated responses are based on accurate data.\n",
+ " - **Text-to-SQL**: Converts natural language into SQL queries (coming soon).\n",
+ "\n",
+ "5. **Industry-Specific Intelligence**: Upstage is developing specialized APIs for industries such as healthcare, finance, and law, providing domain-specific AI solutions (coming soon).\n",
+ "\n",
+ "Upstage focuses on automating repetitive tasks, improving business decision-making, and providing generative business intelligence tools to increase efficiency across various sectors.\n",
+ "\n",
+ "\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "# write prompt\n",
+ "prompt = marketing_prompt_teample.format_messages(\n",
+ " target_company_info=target_info,\n",
+ " product_info=product_info,\n",
+ ")\n",
+ "\n",
+ "print(prompt[0].content)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "id": "rBT_VyHc8jwG",
+ "outputId": "4356d3d7-da5f-4ee3-e1ab-5937bbca6fa9",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Subject: Transform Your Business with UpstageAI's AI-Driven Solutions\n",
+ "\n",
+ "Dear [Target Company],\n",
+ "\n",
+ "I hope this email finds you well. I am writing to introduce you to UpstageAI, a company that specializes in providing AI-driven tools and solutions designed to enhance productivity, automate tasks, and assist decision-making for businesses.\n",
+ "\n",
+ "As a marketing specialist at UpstageAI, I have been thoroughly impressed by our product offerings and believe they could significantly benefit your company. Our core products include:\n",
+ "\n",
+ "1. **Solar Pro Preview**: A highly intelligent Large Language Model (LLM) that runs efficiently on a single GPU, designed to support complex tasks like data analysis and decision support.\n",
+ "2. **Solar Mini**: A purpose-trained LLM tailored for specific tasks, providing businesses with powerful yet efficient AI functionalities.\n",
+ "3. **Document AI Tools**: Including Document Parsing, OCR (Optical Character Recognition), and Key Information Extraction, all designed to streamline document handling.\n",
+ "4. **Task APIs**: Our APIs include Chat, Translation, Embeddings, Groundedness Check, and Text-to-SQL (coming soon), all aimed at automating repetitive tasks and improving business decision-making.\n",
+ "5. **Industry-Specific Intelligence**: Upstage is developing specialized APIs for industries such as healthcare, finance, and law, providing domain-specific AI solutions (coming soon).\n",
+ "\n",
+ "I understand that your company is focused on turning websites into LLM-ready data and powering AI apps with clean data crawled from any website. Our AI-driven solutions can complement your existing offerings and help you streamline your processes, enabling you to provide even more value to your clients.\n",
+ "\n",
+ "I would be delighted to discuss how UpstageAI's solutions can help your company achieve its goals. Please let me know if you would be interested in a demonstration or if you have any questions.\n",
+ "\n",
+ "Thank you for considering UpstageAI. I look forward to hearing from you soon.\n",
+ "\n",
+ "Best regards,\n",
+ "\n",
+ "[Your Name]\n",
+ "Marketing Specialist, UpstageAI\n"
+ ]
+ }
+ ],
+ "source": [
+ "llm = ChatUpstage(model=\"solar-1-mini-chat\")\n",
+ "chain = marketing_prompt_teample | llm | StrOutputParser()\n",
+ "\n",
+ "email_msg = chain.invoke(\n",
+ " {\n",
+ " \"target_company_info\": target_info,\n",
+ " \"product_info\": product_info,\n",
+ " }\n",
+ ")\n",
+ "\n",
+ "print(email_msg)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "id": "-tRD37TI8jwG",
+ "outputId": "cdd40302-b662-4f14-d0d1-f7cf84f1f778",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Subject: Transform Your Business Processes with UpstageAI's AI-Powered Solutions\n",
+ "\n",
+ "Dear [Target Company Name],\n",
+ "\n",
+ "I hope this message finds you well. As a marketing specialist at UpstageAI, I'm excited to introduce our groundbreaking AI-driven tools and solutions designed to significantly enhance productivity, automate tasks, and assist decision-making for businesses like yours.\n",
+ "\n",
+ "UpstageAI's Solar Pro Preview, a cutting-edge large language model (LLM) running efficiently on a single GPU, is perfect for businesses seeking advanced AI capabilities to support complex tasks like data analysis and decision support. This tool can help your team make informed decisions faster and more accurately.\n",
+ "\n",
+ "For specific task automation, consider our Solar Mini, a purpose-trained LLM that offers powerful yet efficient AI functionalities tailored to your unique needs. This solution can help you streamline operations, improve efficiency, and reduce costs.\n",
+ "\n",
+ "Our Document AI Tools, including Document Parsing, OCR, and Key Information Extraction, are designed to revolutionize your document handling processes. With these tools, you can extract tables, figures, and text while enabling automation workflows for data management. This can save your team countless hours spent on manual document handling, allowing them to focus on higher-value tasks.\n",
+ "\n",
+ "UpstageAI's Task APIs, such as Chat, Translation, Embeddings, Groundedness Check, and Text-to-SQL, offer additional ways to automate tasks and improve decision-making. For instance, our Chat API enables you to build conversational agents, improving customer interactions and support. Our Translation API provides context-aware English-Korean translation, facilitating smoother communication with international partners or customers.\n",
+ "\n",
+ "Moreover, UpstageAI is committed to providing industry-\n"
+ ]
+ }
+ ],
+ "source": [
+ "llm = ChatUpstage(model=\"solar-pro\")\n",
+ "chain = marketing_prompt_teample | llm | StrOutputParser()\n",
+ "\n",
+ "email_msg = chain.invoke(\n",
+ " {\n",
+ " \"target_company_info\": target_info,\n",
+ " \"product_info\": product_info,\n",
+ " }\n",
+ ")\n",
+ "\n",
+ "print(email_msg)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.6"
+ },
+ "colab": {
+ "provenance": []
}
- ],
- "source": [
- "llm = ChatUpstage(model=\"solar-pro\")\n",
- "chain = marketing_prompt_teample | llm | StrOutputParser()\n",
- "\n",
- "email_msg = chain.invoke(\n",
- " {\n",
- " \"target_company_info\": target_info,\n",
- " \"product_info\": product_info,\n",
- " }\n",
- ")\n",
- "\n",
- "print(email_msg)"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
},
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.9.6"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat": 4,
+ "nbformat_minor": 0
}