diff --git a/quantization/int8/introduction-to-weight-quantization.ipynb b/quantization/int8/introduction-to-weight-quantization.ipynb new file mode 100644 index 0000000..b2e255a --- /dev/null +++ b/quantization/int8/introduction-to-weight-quantization.ipynb @@ -0,0 +1 @@ +{"cells":[{"source":"\"Kaggle\"","metadata":{},"cell_type":"markdown"},{"cell_type":"markdown","id":"b36dc721","metadata":{"papermill":{"duration":0.008326,"end_time":"2024-02-15T23:59:56.861498","exception":false,"start_time":"2024-02-15T23:59:56.853172","status":"completed"},"tags":[]},"source":["# Overview\n","\n","**Note:all the images are from the Credits section or internet**\n","\n","Typically, the size of of a model is calculated by multiplying the number of parameters(**size**) by the precision of these values(**data type**). However, to save memory, weights can be stored using lower-precision data types through a process known as quantization.\n","\n","We distinguish two main families of weight quantization techniques in the literature:\n","\n","**Post-Training Quantization(PTQ)**\n","\n","It is a straightforward technique where the weights if an already trained model are converted to lower precision without necessitating any retraining. Although easy to implement, PTQ is associated with potential performance degradation. More examples of Post-Training Quantization, see [Post-Training Quantization](https://www.kaggle.com/code/aisuko/post-training-quantization-methods/notebook)\n","\n","**Quantization-Aware Training(QAT)**\n","\n","It incorporates the weight conversion process during the pre-training or fine-tuning stage, resulting in enhanced model performance. However, QAT is computationally expensive and demands representative training data.\n","\n","Here, we focus one PTQ to reduce the precision of our parameters. \n"]},{"cell_type":"markdown","id":"36b59bb9","metadata":{"papermill":{"duration":0.007493,"end_time":"2024-02-15T23:59:56.876718","exception":false,"start_time":"2024-02-15T23:59:56.869225","status":"completed"},"tags":[]},"source":["# Background on Floating Point representation\n","\n","The choice of data type dictates the quantity of computational resources required, affecting the speed and efficiency of the model. In deep learning applications, balancing precision and computational performance becomes a vital exercise as higher precision often implies greater computational demands.\n","\n","Among various data types, floating point numbers are predominantly employed in deep learning due to their ability to represent a wide range of values with high precision. Typically, a floating point number uses n bits to store a numerical value. These n bits are further partitioned into three distinct components:\n","\n","\n","## Sign\n","\n","The sign bit indicates the positive or negative nature of the number. It uses one bit where 0 indicates a positive number and 1 signals a negative number.\n","\n","## Exponent\n","\n","The exponent is a segment of bits that represents the power to which the base (usually 2 in binary representation) is raised. The exponent can also be positive or magtive, allowing the number to represent very large or very small values.\n","\n","## Significand/Mantissa\n","\n","The remaining bits are used to store the significand, also referred to as the mantissa. This represents the signigicant digits of the number. The precision of the number heavily depends on the length of the significand.\n","\n","\n","This design allows floating point numbers to cover a wide range of values with varying levels of precision. The formula used for this representation is:\n","\n","\n","$$(-1)^{sign}*base^{exponent}*significand$$\n","\n","For example, if we try to convert int to binary\n","\n","
\"Converting
\n","\n","We are trying to convert float to a binary\n","\n","
\"Converting
\n","\n","To understand this better, let's delve into some commonly used data types in deep learning"]},{"cell_type":"markdown","id":"6a75bc9c","metadata":{"papermill":{"duration":0.007706,"end_time":"2024-02-15T23:59:56.892158","exception":false,"start_time":"2024-02-15T23:59:56.884452","status":"completed"},"tags":[]},"source":["\n","# Common data types used in ML\n","\n","The size of a model is determined by the number of its parameters, and their precision, typically one of float32(FP32), float16(FP16) or bfloat16(BF16)\n","\n","
\"float
\n","\n","\n","## Float32(FP32)\n","\n","It is stands for the standardized IEEE 32-bit floating point representation. With this data type it is possible to represent a wide range of floating numbers. In FP32, 8 bits are reserved for the \"exponent\", 23 bits for the \"manitissa\" and 1 bit for the \"sign\" of the number. In addition to that, most of the hardware supports FP32 operations and instructions. While it provides a high degree of precision, the downside of FP32 is its high computational and memory footprint.\n","\n","\n","## Float16(FP16)\n","\n","5 bits are reserved for the exponent and 10 bits are reserved for the mantissa. This makes the representable range of FP16 numbers much lower than FP32, so it is more memory-efficient and accelerate computations. However, the reduced range and precision can introduce numberical instability, potentially impacting model accuracy. \n","\n","For example, if you do 10k* 10k you end up with 100M which is not possible to represent in FP16, as the largest number possible is 64k.\n","\n","\n","## BFloat16(BF16)\n","\n","It is also a 16-bit format but with one bit for the sign, eight for the exponent, and seven for the significand. BF16 expands the representable range compared to FP16, thus decreasing underflow and overflow risk. Despite a reduction in precision due to fewer significand bits, BF16 typically does not signigicantly impact model performance and is a useful compromise for deep learning tasks.\n","\n","\n","In ML jargon, FP32 is often termed \"full precision\"(4 bytes), while BF16 and FP16 are \"half-precision\"(2bytes).\n","\n","We need to store those weights with less memory using a different data type, it is **quantization**. And according to the blog in the Credit section, Int8(8bits) consists of an 8-bit representation capable of storing $2^8=256$ different values. "]},{"cell_type":"markdown","id":"21f9e6f1","metadata":{"_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","papermill":{"duration":0.007131,"end_time":"2024-02-15T23:59:56.907395","exception":false,"start_time":"2024-02-15T23:59:56.900264","status":"completed"},"tags":[]},"source":["# Introduction to model quantization(8bit)\n","\n","Experimentially, we have discovered that instead of using the 4-byte FP32 precision, we can get an almost identical inference outcome with 2-byte BF16/FP16 hald-precision, which halves the model size. If we cut it further, the inference quality outcome starts to drop dramatically at lower precision.\n","\n","To remediate that, we introduce 8-bit quantization. This method uses a quarter precision, thus needing only 1/4th of the model size. But it's not done by just dropping another half of the bits.\n","\n","Quantization is done by essentially \"rounding\" from one data type to another. For example, if one data type has the range 0..9 and another 0..4, then the value \"4\" in the first data type would be rounded to \"2\" in the second datatype. However, if we have the value \"3\" in the first data type, it lies between 1 and 2 of the second type, then we would usually round to \"2\". This shows that both values \"4\" and \"3\" of the first data type have the same value \"2\" in the second data type. This highlights that quantization is a noisy process that can lead to information loss, a sort of lossy compression.\n","\n","The two most common 8-bit quantization techniques are zero-point quantization and absolute maximum(absmax) quantization. Zero-point quantization and absmax quantization map the floating point values into more compact int8(1 byte) values. Let's mapping an FP32(tensor X) to an INT8 (tensor X_quant).\n","\n","\n","## Absolute maximum(absmax)\n","\n","With absmax quantization, the original number is divided by the absolute maximum value of the tensot and multiplied by a scaling factor(127) to map inputs into the range[-127,127]. To retrieve the original FP16 values, the INT8 number is divided by the quantization factor, acknowledging some loss of precision due to rounding.\n","\n","$$X_{quant}=round({127 \\over max[X]}*X)$$\n","\n","$$X_{dequant}={{max[X] \\over 127 }* X_{quant}}$$\n","\n","\n","For instance, let's say we have an absolution maximum value of 3.2. A weight of 0.1 would be quantized to $round(0.1*127/3.2)=4$. If we want to dequantize it, we would get $4x3.2/127=0.1008$, which implies an error of 0.008. Here's the corresponding Python implementation:"]},{"cell_type":"code","execution_count":1,"id":"6f0f4643","metadata":{"execution":{"iopub.execute_input":"2024-02-15T23:59:56.923495Z","iopub.status.busy":"2024-02-15T23:59:56.923118Z","iopub.status.idle":"2024-02-16T00:00:00.361097Z","shell.execute_reply":"2024-02-16T00:00:00.360228Z"},"papermill":{"duration":3.448918,"end_time":"2024-02-16T00:00:00.363625","exception":false,"start_time":"2024-02-15T23:59:56.914707","status":"completed"},"tags":[]},"outputs":[],"source":["import torch\n","\n","def absmax_quantize(X):\n"," # Calculate scale\n"," scale=127/torch.max(torch.abs(X))\n"," \n"," # Quantize\n"," X_quant=(scale*X).round()\n"," \n"," # Dequantize\n"," X_dequant=X_quant/scale\n"," \n"," return X_quant.to(torch.int8), X_dequant"]},{"cell_type":"markdown","id":"e67f16f9","metadata":{"papermill":{"duration":0.007771,"end_time":"2024-02-16T00:00:00.38007","exception":false,"start_time":"2024-02-16T00:00:00.372299","status":"completed"},"tags":[]},"source":["## Zero-point quantization\n","\n","With zero-point quantization, we can consider asymmetric input distributions, which is useful when you consider the ouput of a ReLU function(only positive values), for example. The input values are first scaled by the total range of values(255) divided by the difference between the maximum and minimum values. This distribution is then shifted by the zero-point to map it into the range[-128,127] (notice the extra value compared to absmax). First, we calculate the scale factor and the zero-point value:\n","\n","$$scale={255 \\over max(X)-min(X)}$$\n","\n","$$zeropoint=-round(scale*min(X))-128$$\n","\n","Then, we can use these variables to quantize or dequantize our weights:\n","\n","$$X_{quant}=round(scale*X+zeropoint)$$\n","\n","$$X_{dequant}={X_{quant}-zeropoint \\over scale}$$\n","\n","For example: we have a maximum value of 3.2 and a minimum value of -3.0. We can calcualte the scale is $255/(3.2+3.0)=41.13$ and the $zero-point-round(41.13-3.0)-128=123-128=-5$, so our previous weight of 0.1 would be quantized to $round(41.13*0.1-5)=-1$. This is very different from the previous value obtained using absmax(4 vs.-1).\n","\n","
\"Absmax/Zero-point
"]},{"cell_type":"code","execution_count":2,"id":"fe38654c","metadata":{"execution":{"iopub.execute_input":"2024-02-16T00:00:00.39765Z","iopub.status.busy":"2024-02-16T00:00:00.397128Z","iopub.status.idle":"2024-02-16T00:00:00.404637Z","shell.execute_reply":"2024-02-16T00:00:00.40365Z"},"papermill":{"duration":0.018817,"end_time":"2024-02-16T00:00:00.406799","exception":false,"start_time":"2024-02-16T00:00:00.387982","status":"completed"},"tags":[]},"outputs":[],"source":["def zeropoint_quantize(X):\n"," # Calculate value range (denominator)\n"," x_range=torch.max(X) -torch.min(X)\n"," x_range=1 if x_range==0 else x_range\n"," \n"," # Calculate scale\n"," scale=255/x_range\n"," \n"," # Shift by zero-point\n"," zeropoint=(-scale*torch.min(X)-128).round()\n"," \n"," # Scale and round the inputs\n"," X_quant=torch.clip((X*scale+zeropoint).round(),-128,127)\n"," \n"," # Dequantize\n"," X_dequant=(X_quant-zeropoint) / scale\n"," \n"," return X_quant.to(torch.int8), X_dequant"]},{"cell_type":"markdown","id":"535d6b73","metadata":{"papermill":{"duration":0.007743,"end_time":"2024-02-16T00:00:00.422242","exception":false,"start_time":"2024-02-16T00:00:00.414499","status":"completed"},"tags":[]},"source":["# Demo with Transformers\n","\n","We start by loading the model and tokenizer for GPT-2. It is a very small modell for us to do the demo easier."]},{"cell_type":"code","execution_count":3,"id":"53bd8e9c","metadata":{"execution":{"iopub.execute_input":"2024-02-16T00:00:00.444035Z","iopub.status.busy":"2024-02-16T00:00:00.443Z","iopub.status.idle":"2024-02-16T00:00:44.921664Z","shell.execute_reply":"2024-02-16T00:00:44.920398Z"},"papermill":{"duration":44.492651,"end_time":"2024-02-16T00:00:44.924673","exception":false,"start_time":"2024-02-16T00:00:00.432022","status":"completed"},"tags":[]},"outputs":[],"source":["%%capture\n","!pip install transformers==4.36.2\n","!pip install accelerate==0.25.0\n","!pip install bitsandbytes==0.41.3"]},{"cell_type":"code","execution_count":4,"id":"45e38be0","metadata":{"execution":{"iopub.execute_input":"2024-02-16T00:00:44.942325Z","iopub.status.busy":"2024-02-16T00:00:44.941891Z","iopub.status.idle":"2024-02-16T00:00:54.436227Z","shell.execute_reply":"2024-02-16T00:00:54.434939Z"},"papermill":{"duration":9.506333,"end_time":"2024-02-16T00:00:54.43906","exception":false,"start_time":"2024-02-16T00:00:44.932727","status":"completed"},"tags":[]},"outputs":[{"name":"stdout","output_type":"stream","text":["Loading pretrained config for `gpt2` from `transformers`...\r\n","config.json: 100%|█████████████████████████████| 665/665 [00:00<00:00, 3.11MB/s]\r\n","┌────────────────────────────────────────────────────┐\r\n","│ Memory Usage for loading `gpt2` │\r\n","├───────┬─────────────┬──────────┬───────────────────┤\r\n","│ dtype │Largest Layer│Total Size│Training using Adam│\r\n","├───────┼─────────────┼──────────┼───────────────────┤\r\n","│float32│ 147.24 MB │ 476.2 MB │ 1.86 GB │\r\n","│float16│ 73.62 MB │ 238.1 MB │ 952.4 MB │\r\n","│ int8 │ 36.81 MB │119.05 MB │ 476.2 MB │\r\n","│ int4 │ 18.4 MB │ 59.53 MB │ 238.1 MB │\r\n","└───────┴─────────────┴──────────┴───────────────────┘\r\n"]}],"source":["!accelerate estimate-memory gpt2 --library_name transformers"]},{"cell_type":"code","execution_count":5,"id":"495adee5","metadata":{"execution":{"iopub.execute_input":"2024-02-16T00:00:54.457578Z","iopub.status.busy":"2024-02-16T00:00:54.45722Z","iopub.status.idle":"2024-02-16T00:01:01.552229Z","shell.execute_reply":"2024-02-16T00:01:01.551159Z"},"papermill":{"duration":7.106615,"end_time":"2024-02-16T00:01:01.554721","exception":false,"start_time":"2024-02-16T00:00:54.448106","status":"completed"},"tags":[]},"outputs":[{"data":{"application/vnd.jupyter.widget-view+json":{"model_id":"087049aac69c40dfa0c108c7f4ed3600","version_major":2,"version_minor":0},"text/plain":["model.safetensors: 0%| | 0.00/548M [00:00=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3\n"," warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n"]}],"source":["def generate_text(model, input_text, max_length=50):\n"," input_ids=tokenizer.encode(input_text, return_tensors='pt').to('cuda')\n"," output=model.generate(\n"," inputs=input_ids,\n"," max_length=max_length,\n"," do_sample=True,\n"," top_k=30,\n"," pad_token_id=tokenizer.eos_token_id,\n"," attention_mask=input_ids.new_ones(input_ids.shape)\n"," )\n"," \n"," return tokenizer.decode(output[0], skip_special_tokens=True)\n","\n","# Generate text with original and quantized models\n","original_text=generate_text(model, \"The weather in Melbourne\")\n","abs_max_text=generate_text(model_abs, \"The weather in Melbourne\")\n","zp_text=generate_text(model_zp, \"The weather in Melbourne\")"]},{"cell_type":"code","execution_count":11,"id":"b1634a70","metadata":{"execution":{"iopub.execute_input":"2024-02-16T00:01:18.22585Z","iopub.status.busy":"2024-02-16T00:01:18.224235Z","iopub.status.idle":"2024-02-16T00:01:18.388448Z","shell.execute_reply":"2024-02-16T00:01:18.387195Z"},"papermill":{"duration":0.178788,"end_time":"2024-02-16T00:01:18.391567","exception":false,"start_time":"2024-02-16T00:01:18.212779","status":"completed"},"tags":[]},"outputs":[{"name":"stdout","output_type":"stream","text":["Original perplexity: 14.23\n","Absmax perplexity: 15.59\n","Zeropoint perplexity: 17.43\n"]},{"name":"stderr","output_type":"stream","text":["/tmp/ipykernel_26/162940024.py:14: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n"," ppl=torch.exp(torch.tensor(neg_log_likelihood,dtype=torch.float))\n"]}],"source":["def calculate_perplexity(model, text):\n"," encodings=tokenizer(text, return_tensors=\"pt\").to('cuda')\n"," \n"," input_ids=encodings.input_ids\n"," target_ids=input_ids.clone()\n"," \n"," with torch.no_grad():\n"," outputs=model(input_ids, labels=target_ids)\n"," \n"," # loss calculation\n"," neg_log_likelihood=outputs.loss\n"," \n"," # Perplexity calculation\n"," ppl=torch.exp(torch.tensor(neg_log_likelihood,dtype=torch.float))\n"," \n"," return ppl\n","\n","ppl = calculate_perplexity(model, original_text)\n","ppl_abs=calculate_perplexity(model_abs, abs_max_text)\n","ppl_zp=calculate_perplexity(model_zp, zp_text)\n","\n","print(f\"Original perplexity: {ppl.item():.2f}\")\n","print(f\"Absmax perplexity: {ppl_abs.item():.2f}\")\n","print(f\"Zeropoint perplexity: {ppl_zp.item():.2f}\")"]},{"cell_type":"markdown","id":"05e07dea","metadata":{"papermill":{"duration":0.010031,"end_time":"2024-02-16T00:01:18.411662","exception":false,"start_time":"2024-02-16T00:01:18.401631","status":"completed"},"tags":[]},"source":["We see that the perplexity of the original model is lower than the two others. A single experiment is not very reliable, but we could repear this process multiple times to see the difference between each model. In theory, zero-point quantization should be slightly better than absmax, but is also more costly to compute.\n","\n","We applied quantization techniques to the entire layers(per-tensor basis). However, we could apply it at different granularity levels: from the entire model to individual values. Quantizing the entire model in one pass would seriously degrade the performance, while quantizing individual values would create a big overhead. In practice, we often prefer the **vector-wise quantization**, which considers the variability of values in rows and columns inside the same tensor.\n","\n","However, even vector-wise quantization doesn't solve the problem of outlier features. **Outlier features are extreme values(negative or positive) that appear in all transformer layers when the model reach a certain scale(>6.7B parameters)**. This is an issue since a single outlier can reduce the precision for all other values. But discarding these outlier features is not an option since it would **greatly degrade** the model's performance."]},{"cell_type":"markdown","id":"349e5d24","metadata":{"papermill":{"duration":0.010005,"end_time":"2024-02-16T00:01:18.431801","exception":false,"start_time":"2024-02-16T00:01:18.421796","status":"completed"},"tags":[]},"source":["# LLM.int8()\n","\n","It is a solution to the outlier problem. **It relies on a vector-wise(absmax) quantization scheme and introduces mixed-precision quantization.** This means that outlier features are processed in a FP16 format to retain their precision, while the other values are processed in an INT8 format. As outliers represent about 0.1% of values, this effectively reduces the memory footprint of the LLM by almost 2x.\n","\n","LLM.int8() works by conducting matrix multiplication computation in three key steps:\n","1.Extract columns from the input hidden states X containing outlier features using a custom threshold.\n","2.Perform the matrix multiplication of the outliers using FP16 and the non_outliers using INT8 with vector-wise quantization(row-wide for the hidden state X and column-wide for th weight matrix W)\n","3.Dequantize the non_outlier results (INT8 to FP16) and add them to the outlier results to get the full result in FP16.\n","\n","\n","This approach is necessary because 8-bit precision is limited and can lead to substrantial errors when quantizing a vector with large values. These errors also tend to amplify as they propagate through multiple layers.\n","\n","However, we can see 8-bit models bring some useful features, like offloading, outlier thresholds, skipping modules conversion.\n","\n","\n","## Offloading\n","\n","8-bit models can offload weights between the CPU and GPU to support fitting very large models into memory. The weights dispacthed to the CPU are actually stored in `float32`, and aren't converted to 8-bit.\n"," \n","## Outlier threshold\n","\n","An \"outlier\" is a hidden state value greater than a certain threshold, and these values are computed in fp16. While the values are usually normally distributed ([-3.5,3.5]), this distribution can be very different for large models([-60,6] or [6.60]). 8-bit quantization works well for values ~5, but beyond that, there is a significant performance penalty. A good default threshold value is 6, but a lower threshold may be needed for more unstable models(small models or finetuning).\n","\n","## Skip module conversion\n","\n","For some models, like Jukebox, we do not need to quantize every module to 8-bit which can actually cause instability. With Jukebix, there are severak `lm_head` modules that should be skipped using the `llm_int8_skip_modules` parameter."]},{"cell_type":"code","execution_count":12,"id":"9939e4d9","metadata":{"execution":{"iopub.execute_input":"2024-02-16T00:01:18.454626Z","iopub.status.busy":"2024-02-16T00:01:18.453622Z","iopub.status.idle":"2024-02-16T00:01:20.342599Z","shell.execute_reply":"2024-02-16T00:01:20.341275Z"},"papermill":{"duration":1.903001,"end_time":"2024-02-16T00:01:20.344944","exception":false,"start_time":"2024-02-16T00:01:18.441943","status":"completed"},"tags":[]},"outputs":[{"data":{"text/plain":["176527896"]},"execution_count":12,"metadata":{},"output_type":"execute_result"}],"source":["from transformers import AutoModelForCausalLM, BitsAndBytesConfig\n","\n","bnb_config=BitsAndBytesConfig(\n"," load_in_8bit=True,\n"," # https://github.com/huggingface/transformers/issues/22018#issuecomment-1460139242\n","# llm_int8_enable_fp32_cpu_offload=True,\n","# llm_int8_skip_modules=[\"lm_head\"],\n"," llm_int8_threshold=6.0,\n",")\n","\n","# device_map={\n","# \"transformer.word_embeddings\":0,\n","# \"transformer.word_embeddings_layernorm\":0,\n","# \"lm_head\":\"cpu\",\n","# \"transformer.h\":0,\n","# \"transformer.ln_f\":0,\n","# }\n","\n","\n","\n","int8_model=AutoModelForCausalLM.from_pretrained(\n"," model_name,\n"," device_map='auto',\n"," quantization_config=bnb_config\n",")\n","\n","int8_model.get_memory_footprint()"]},{"cell_type":"markdown","id":"83895a57","metadata":{"papermill":{"duration":0.010024,"end_time":"2024-02-16T00:01:20.366645","exception":false,"start_time":"2024-02-16T00:01:20.356621","status":"completed"},"tags":[]},"source":["The INT8 model is almost 3 times smaller then the original(FP32)."]},{"cell_type":"code","execution_count":13,"id":"18fdefc4","metadata":{"execution":{"iopub.execute_input":"2024-02-16T00:01:20.389183Z","iopub.status.busy":"2024-02-16T00:01:20.388779Z","iopub.status.idle":"2024-02-16T00:01:20.397233Z","shell.execute_reply":"2024-02-16T00:01:20.396246Z"},"papermill":{"duration":0.02296,"end_time":"2024-02-16T00:01:20.400278","exception":false,"start_time":"2024-02-16T00:01:20.377318","status":"completed"},"tags":[]},"outputs":[{"name":"stdout","output_type":"stream","text":["2.8910002530138352\n"]}],"source":["print(f\"{model.get_memory_footprint()/int8_model.get_memory_footprint()}\")"]},{"cell_type":"code","execution_count":14,"id":"fe30501e","metadata":{"execution":{"iopub.execute_input":"2024-02-16T00:01:20.421166Z","iopub.status.busy":"2024-02-16T00:01:20.420842Z","iopub.status.idle":"2024-02-16T00:01:22.788577Z","shell.execute_reply":"2024-02-16T00:01:22.787544Z"},"papermill":{"duration":2.381751,"end_time":"2024-02-16T00:01:22.791844","exception":false,"start_time":"2024-02-16T00:01:20.410093","status":"completed"},"tags":[]},"outputs":[{"name":"stdout","output_type":"stream","text":["Model with INT8 perplexity: 20.28\n"]},{"name":"stderr","output_type":"stream","text":["/tmp/ipykernel_26/162940024.py:14: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n"," ppl=torch.exp(torch.tensor(neg_log_likelihood,dtype=torch.float))\n"]}],"source":["prompt=generate_text(int8_model, \"The weather in Melbourne is \")\n","\n","ppl_int8=calculate_perplexity(int8_model, prompt)\n","\n","print(f\"Model with INT8 perplexity: {ppl_int8.item():.2f}\")"]},{"cell_type":"markdown","id":"9628ef07","metadata":{"papermill":{"duration":0.00973,"end_time":"2024-02-16T00:01:22.811702","exception":false,"start_time":"2024-02-16T00:01:22.801972","status":"completed"},"tags":[]},"source":["# More demo using int8\n","\n","The practice with Transformers see [Lighter models on GPU for inference](https://www.kaggle.com/code/aisuko/lighter-models-on-gpu-for-inference/notebook)\n","\n","The prectice with PyTorch see [Zero degradation matrix multiplication](https://www.kaggle.com/code/aisuko/zero-degradation-matrix-multiplication)"]},{"cell_type":"markdown","id":"6cdbe2de","metadata":{"papermill":{"duration":0.00945,"end_time":"2024-02-16T00:01:22.831151","exception":false,"start_time":"2024-02-16T00:01:22.821701","status":"completed"},"tags":[]},"source":["# Credit\n","\n","* https://huggingface.co/blog/hf-bitsandbytes-integration?source=post_page-----287da2d5d7f1--------------------------------\n","* https://huggingface.co/blog/4bit-transformers-bitsandbytes\n","* https://towardsdatascience.com/introduction-to-weight-quantization-2494701b9c0c\n","* https://huggingface.co/docs/transformers/main/quantization#8-bit\n","* https://pub.towardsai.net/how-to-fit-large-language-models-in-small-memory-quantization-e8c3981430b2"]}],"metadata":{"kaggle":{"accelerator":"nvidiaTeslaT4","dataSources":[],"dockerImageVersionId":30635,"isGpuEnabled":true,"isInternetEnabled":true,"language":"python","sourceType":"notebook"},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.12"},"papermill":{"default_parameters":{},"duration":92.702644,"end_time":"2024-02-16T00:01:25.850812","environment_variables":{},"exception":null,"input_path":"__notebook__.ipynb","output_path":"__notebook__.ipynb","parameters":{},"start_time":"2024-02-15T23:59:53.148168","version":"2.4.0"},"widgets":{"application/vnd.jupyter.widget-state+json":{"state":{"031a7d4f123c42369409ac83128ee45f":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"087049aac69c40dfa0c108c7f4ed3600":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_44fc2d0bfff547f7b12cbd6fac93bedb","IPY_MODEL_703c6e12ed304de798b9acbecac7de7f","IPY_MODEL_34a6435b78d541139c6846e0585d9a5d"],"layout":"IPY_MODEL_6fd84237ccaf448392de9c54f6aa3f9c"}},"0fa4bcf979774ec398911b69bc032e19":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"151d36a75998486393dc9524d7fed797":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"158c4856c73c42bc828839f60b9c77cc":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"1a93aadf584940269135c068956e6206":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"28a49b1c60854631bcc747192228e9a1":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"30c07a5f5b4f48c9bbeb85c7ccc0e366":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"34a6435b78d541139c6846e0585d9a5d":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_b66c2936deee4c5e98c791592b030410","placeholder":"​","style":"IPY_MODEL_c173f9a65ba24897b79d824f84313c04","value":" 548M/548M [00:03<00:00, 185MB/s]"}},"34c816da7fcf43b6b5b751e72f2d5fb9":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"3782cf886a7d4615a6dcd7ff779423cc":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"396e609488504e9283b8ec53e1554fa5":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_f136a904f6694082950e4106fb6e5a3f","IPY_MODEL_b98d529db59d450b9f72656acb07b5ab","IPY_MODEL_3db874bbdaf44c9ba694bccc3bbab0c0"],"layout":"IPY_MODEL_158c4856c73c42bc828839f60b9c77cc"}},"3a5c793f89b344a3b539de7009cd0643":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"3db874bbdaf44c9ba694bccc3bbab0c0":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_30c07a5f5b4f48c9bbeb85c7ccc0e366","placeholder":"​","style":"IPY_MODEL_151d36a75998486393dc9524d7fed797","value":" 1.36M/1.36M [00:00<00:00, 43.1MB/s]"}},"3f6de851352c4fc5ae273ee568f97ace":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"44fc2d0bfff547f7b12cbd6fac93bedb":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_3f6de851352c4fc5ae273ee568f97ace","placeholder":"​","style":"IPY_MODEL_979931378c0d4abf9f59f8b3366dd748","value":"model.safetensors: 100%"}},"45e9faa2d04842b18dbf9a04f751544d":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"4679ce24416d49d883576828d300a36a":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"5d7a4bf19d924e459718533cb2238db4":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"688325eac4964c89ad02225917166075":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"6fc0482b9d5648d5bd6ddb417efd523f":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_a9b043c242f24a52a0f93414d7274ec6","placeholder":"​","style":"IPY_MODEL_d0a9fd41c82a4692ac4ee727f9d4fc95","value":" 456k/456k [00:00<00:00, 13.2MB/s]"}},"6fd84237ccaf448392de9c54f6aa3f9c":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"703c6e12ed304de798b9acbecac7de7f":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_28a49b1c60854631bcc747192228e9a1","max":548105171.0,"min":0.0,"orientation":"horizontal","style":"IPY_MODEL_9bb67e0cf3244709b126739de56fc5a5","value":548105171.0}},"760519d895904ff5936c4163449d819a":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_45e9faa2d04842b18dbf9a04f751544d","max":124.0,"min":0.0,"orientation":"horizontal","style":"IPY_MODEL_85d1f6cec9a9440f94cf6501be59e8fe","value":124.0}},"77cd011a5a6a48f981c575d64df0492b":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"788ef963f916435e82e21370f553c45e":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"79f5a7f600474b33bf8cc3ba492b8b8a":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"7ec4057879794525928536528dec55be":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_1a93aadf584940269135c068956e6206","placeholder":"​","style":"IPY_MODEL_77cd011a5a6a48f981c575d64df0492b","value":"merges.txt: 100%"}},"82109babffd74ae6954fc22e895bb1ae":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"85d1f6cec9a9440f94cf6501be59e8fe":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"8e52e34d11664e569641212428d1adcb":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_7ec4057879794525928536528dec55be","IPY_MODEL_fe3d9087836642d7a925a9d0c359e9c3","IPY_MODEL_6fc0482b9d5648d5bd6ddb417efd523f"],"layout":"IPY_MODEL_afd1d2ec038a4a5bb9bb1577533fc78e"}},"961e57581afc4e48bb4cae31451d7689":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"979931378c0d4abf9f59f8b3366dd748":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"9bb67e0cf3244709b126739de56fc5a5":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"ProgressStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"a423d561c98c481585c116c17fb27d5a":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_5d7a4bf19d924e459718533cb2238db4","placeholder":"​","style":"IPY_MODEL_ffbfde32c15343d993234206cafdfd17","value":"vocab.json: 100%"}},"a8136e76324449a1aa2617e4797d2e22":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_a423d561c98c481585c116c17fb27d5a","IPY_MODEL_d29e4ea67e134d5592a207e8da3578a9","IPY_MODEL_f6865970f6274aa38cab2e4ea89e1cf6"],"layout":"IPY_MODEL_82109babffd74ae6954fc22e895bb1ae"}},"a9b043c242f24a52a0f93414d7274ec6":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"afd1d2ec038a4a5bb9bb1577533fc78e":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"b66c2936deee4c5e98c791592b030410":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"b86558bad9474d8aa17ee7bbda3d064b":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"b98d529db59d450b9f72656acb07b5ab":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_3a5c793f89b344a3b539de7009cd0643","max":1355256.0,"min":0.0,"orientation":"horizontal","style":"IPY_MODEL_79f5a7f600474b33bf8cc3ba492b8b8a","value":1355256.0}},"c16b053ccbde425095b384b416e6cea2":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"c173f9a65ba24897b79d824f84313c04":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"c25d488cc76642d29861ef42e4964470":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_c81b41991ad9407c8bdd2c04a7a601b6","placeholder":"​","style":"IPY_MODEL_961e57581afc4e48bb4cae31451d7689","value":"generation_config.json: 100%"}},"c81b41991ad9407c8bdd2c04a7a601b6":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d0a9fd41c82a4692ac4ee727f9d4fc95":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"d29e4ea67e134d5592a207e8da3578a9":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_34c816da7fcf43b6b5b751e72f2d5fb9","max":1042301.0,"min":0.0,"orientation":"horizontal","style":"IPY_MODEL_3782cf886a7d4615a6dcd7ff779423cc","value":1042301.0}},"d91b73dee6df4459ad16f55471c59926":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HBoxModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_c25d488cc76642d29861ef42e4964470","IPY_MODEL_760519d895904ff5936c4163449d819a","IPY_MODEL_f18d9ba6db42414690c736c551e34579"],"layout":"IPY_MODEL_faef3794647e476ea38eec9f7d343a4e"}},"e057087ebdf04fdb91d30fb88b755e95":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"f136a904f6694082950e4106fb6e5a3f":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_788ef963f916435e82e21370f553c45e","placeholder":"​","style":"IPY_MODEL_0fa4bcf979774ec398911b69bc032e19","value":"tokenizer.json: 100%"}},"f18d9ba6db42414690c736c551e34579":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_4679ce24416d49d883576828d300a36a","placeholder":"​","style":"IPY_MODEL_b86558bad9474d8aa17ee7bbda3d064b","value":" 124/124 [00:00<00:00, 9.80kB/s]"}},"f6865970f6274aa38cab2e4ea89e1cf6":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"HTMLModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_688325eac4964c89ad02225917166075","placeholder":"​","style":"IPY_MODEL_e057087ebdf04fdb91d30fb88b755e95","value":" 1.04M/1.04M [00:00<00:00, 13.0MB/s]"}},"faef3794647e476ea38eec9f7d343a4e":{"model_module":"@jupyter-widgets/base","model_module_version":"1.2.0","model_name":"LayoutModel","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"fe3d9087836642d7a925a9d0c359e9c3":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"FloatProgressModel","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_c16b053ccbde425095b384b416e6cea2","max":456318.0,"min":0.0,"orientation":"horizontal","style":"IPY_MODEL_031a7d4f123c42369409ac83128ee45f","value":456318.0}},"ffbfde32c15343d993234206cafdfd17":{"model_module":"@jupyter-widgets/controls","model_module_version":"1.5.0","model_name":"DescriptionStyleModel","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}}},"version_major":2,"version_minor":0}}},"nbformat":4,"nbformat_minor":5} \ No newline at end of file