使用 Colaboratory 建立

JosephBless · JosephBless · commit 2e2d55248cb9 · 2021-10-25T08:16:40.000+08:00
diff --git a/W&B_Artifacts_API_Demo.ipynb b/W&B_Artifacts_API_Demo.ipynb
@@ -0,0 +1,213 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "W&B Artifacts API Demo",
+      "private_outputs": true,
+      "provenance": [],
+      "collapsed_sections": [],
+      "toc_visible": true,
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/JosephBless/DL/blob/main/W%26B_Artifacts_API_Demo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "hpPsCoeioB9F"
+      },
+      "source": [
+        "# W&B Artifacts API Demo\n",
+        "\n",
+        "In this notebook, we'll show you how to use W&B Artifacts to keep track of dataset versions in Weights & Biases.\n",
+        "\n",
+        "### How it works\n",
+        " Using our Artifacts API, you can log artifacts as outputs of W&B runs, or use artifacts as input to runs.\n",
+        " \n",
+        "\n",
+        "<img src=\"https://i.imgur.com/Pw8luzD.png\" width=\"500\">\n",
+        "\n",
+        "Since a run can use another run’s output artifact as input, artifacts and runs together form a directed graph. You don’t need to define pipelines ahead of time. Just use and log artifacts, and we’ll stitch everything together."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "G03d9HSHuKF9"
+      },
+      "source": [
+        "## 1. Install Weights & Biases"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "Ny-d-pHrnrui"
+      },
+      "source": [
+        "# Install Weights & Biases\n",
+        "!pip install wandb -qqq\n",
+        "\n",
+        "import os\n",
+        "import random\n",
+        "import wandb"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4PE2_FgKodSi"
+      },
+      "source": [
+        "## 2. Create an Artifact\n",
+        "\n",
+        "Call `wandb.init` to initialize a new run. Use a run to track any script in your pipeline— anything from training and evaluation to scraping and preprocessing data. Specify what type of run it is in the **job_type**, and you'll be able to filter and group based on **job_type** in the web interface.\n",
+        "\n",
+        "`wandb.Artifact()`\n",
+        "- **name**: The name of this dataset, model, or other set of files. The files inside can change, but the name stays the same across versions.\n",
+        "- **type**: This helps you group together top-level artifacts, like \"dataset\" and \"model\".\n",
+        "- **metadata**: This dictionary lets you track class distributions, UUIDs, and any important details you want to save and search over later."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "tHvfBXS0oig1"
+      },
+      "source": [
+        "# This will create a new run in the W&B database.\n",
+        "# Init starts tracking stdout/stderr and system metrics automatically.\n",
+        "run = wandb.init(project='artifacts-demo', job_type='producer')\n",
+        "\n",
+        "# Create an Artifact. Give it a name and a type. Type is used for\n",
+        "# organizational purposes and should typically be \"dataset\" or \"model\".\n",
+        "artifact = wandb.Artifact('mountain-view-scenes', type='dataset', metadata={})\n",
+        "\n",
+        "# Store a new file in the artifact\n",
+        "with artifact.new_file('overview.txt') as f:\n",
+        "    f.write('This is my first artifact!\\n')\n",
+        "\n",
+        "# Save the artifact to W&B. It will be tracked as output of the current run\n",
+        "# and appended to the Artifact Sequence called 'hello artifacts dataset'.\n",
+        "run.log_artifact(artifact)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "MLhTfDMNokk7"
+      },
+      "source": [
+        "Open [wandb.ai](https://app.wandb.ai/home) and click on your latest run to see the artifact tab appear on the left sidebar."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ihqE_Qn7pK1g"
+      },
+      "source": [
+        "## 3. Use that Artifact\n",
+        "Next, let's use that artifact as input to another run."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "ZKpridn_pczr"
+      },
+      "source": [
+        "# Start a new run\n",
+        "run = wandb.init(project='artifacts-demo', job_type='consumer')\n",
+        "\n",
+        "# We'll use the artifact we created in the previous run as input to this run.\n",
+        "# This will fetch the latest entry in the 'hello artifacts dataset' Artifact Sequence\n",
+        "artifact = run.use_artifact('mountain-view-scenes:latest')\n",
+        "\n",
+        "# Download all of the files contained in the artifact.\n",
+        "artifact_dir = artifact.download()\n",
+        "\n",
+        "# Let's take a look at the downloaded files contents\n",
+        "print(open(os.path.join(artifact_dir, 'overview.txt')).read())"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "o1GXQwx3rU0G"
+      },
+      "source": [
+        "Let's log an output artifact too— in this example it's just a fake model file."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "oFchQuvVrY9T"
+      },
+      "source": [
+        "artifact = wandb.Artifact('run-%s-model' % run.id, type='model')\n",
+        "\n",
+        "# This time we'll use artifact.add_file, to add a file that already exists.\n",
+        "f = open('mymodel.txt', 'w')\n",
+        "f.write('This is a really awesome trained model: %s' % random.random())\n",
+        "f.close()\n",
+        "\n",
+        "artifact.add_file('mymodel.txt')\n",
+        "run.log_artifact(artifact)\n",
+        "\n",
+        "# end the current run\n",
+        "run.finish()"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Z9lxlSqnsY8I"
+      },
+      "source": [
+        "Now you can navigate to your project page (linked above), and then click on the artifacts tab, to dig into all the artifacts you've created so far.\n",
+        "\n",
+        "If you click through to an artifact, and then click on the \"Graph\" tab, you'll see a visualization that shows how your artifacts and runs are related to each other."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "J3JeRwj_2oG-"
+      },
+      "source": [
+        "## Documentation\n",
+        "\n",
+        "For more details, [see the docs →](https://docs.wandb.com/artifacts)\n",
+        "- Storing directories in artifacts\n",
+        "- Referring to external data using references\n",
+        "- Automatic file and artifact deduplication\n",
+        "- Best practices for dataset versioning and model management"
+      ]
+    }
+  ]
+}