2022年1月9日 星期日

PyTorch: 對矩陣做 Softmax 運算


{ "cells": [ { "cell_type": "markdown", "id": "420ffd5f-8871-40e6-97b0-001df7ce78ad", "metadata": { "tags": [] }, "source": [ "\n", "原函數定義: https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html\n", "\n", "$$ Softmax(x_i) = \\frac{e^{x_i}}{\\sum_{j}{e^{x_j}}} $$\n", "\n", "其中 $ x_i $ 的 i 是指某個維度上的陣列每一個元素。\n", "\n", "其中 $ x_j $ 的 j 是指沿著指定維度上的每一個值,這會被拿來加總成分母。\n", "\n", "假設我們生成一個 2 維, 包含兩個 2x2 隨機陣列向量的值:" ] }, { "cell_type": "code", "execution_count": 14, "id": "b9b43aca-e48c-4eaa-a3ed-968e33a287e8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[[ 0.4890, 0.0378],\n", " [-1.4088, 0.3213]],\n", "\n", " [[-0.4598, -0.4771],\n", " [-1.6266, -0.8554]]])\n" ] } ], "source": [ "import torch\n", "import torch.nn.functional as F\n", "\n", "input = torch.randn(2,2,2)\n", "print(input)" ] }, { "cell_type": "markdown", "id": "5178d221-32e9-46dd-8bbf-8b580141a8fe", "metadata": {}, "source": [ "那麼矩陣就會像以下:\n", "\n", "$$\n", "\\begin{bmatrix}\n", "\\begin{bmatrix}\n", "0.4890 & 0.0378\\\\\n", "-1.4088 & 0.3213\\\\\n", "\\end{bmatrix} \\\\\n", "\\begin{bmatrix} \n", "-0.4598 & -0.4771\\\\\n", "-1.6266 & -0.8554\\\\\n", "\\end{bmatrix}\n", "\\end{bmatrix}\n", "$$\n", "\n", "使用代數來代表這些數值,則有:\n", "\n", "$$\n", "\\begin{bmatrix}\n", "\\begin{bmatrix}\n", "x_1 & x_2\\\\\n", "x_3 & x_4\\\\\n", "\\end{bmatrix} \\\\\n", "\\begin{bmatrix} \n", "x_5 & x_6\\\\\n", "x_7 & x_8\\\\\n", "\\end{bmatrix}\n", "\\end{bmatrix}\n", "$$\n", "\n", "如果對它們做 `softmax` 計算,選擇對第 0 維度做計算,得到:" ] }, { "cell_type": "code", "execution_count": 23, "id": "b39c4faf-67fb-42e3-8c27-acabe13ed7bb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[[0.7209, 0.6260],\n", " [0.5542, 0.7644]],\n", "\n", " [[0.2791, 0.3740],\n", " [0.4458, 0.2356]]])\n" ] } ], "source": [ "matrix = F.softmax(input, dim=0)\n", "print(matrix)" ] }, { "cell_type": "markdown", "id": "c9c43304-507d-4baf-8982-a9692e417fb5", "metadata": {}, "source": [ "`dim=0` 這個計算方式是目前所在位置上當作分子,對所有維度相同位置的數值進行加總後變成分母,相除的值。\n", "\n", "在上面的數值中,得到了這樣的結果:\n", "\n", "$$\n", "\\begin{bmatrix}\n", "\\begin{bmatrix}\n", "0.7209 & 0.6260\\\\\n", "0.5542 & 0.7644\\\\\n", "\\end{bmatrix} \\\\\n", "\\begin{bmatrix} \n", "0.2791 & 0.3740\\\\\n", "0.4458 & 0.2356\\\\\n", "\\end{bmatrix}\n", "\\end{bmatrix}\n", "$$\n", "\n", "為了方便解釋,這樣的結果可以想成套上了 s(x) 這個函數,才得到上述的結果:\n", "\n", "$$\n", "\\begin{bmatrix}\n", "\\begin{bmatrix}\n", "s(0.4890) & s(0.0378)\\\\\n", "s(-1.4088) & s(0.3213)\\\\\n", "\\end{bmatrix} \\\\\n", "\\begin{bmatrix} \n", "s(-0.4598) & s(-0.4771)\\\\\n", "s(-1.6266) & s(-0.8554)\\\\\n", "\\end{bmatrix}\n", "\\end{bmatrix}\n", "$$\n", "\n", "\n", "該演算如下:\n", "\n", "假設要計算 $ s(x_1) $,則為\n", "\n", "$$\n", "Softmax(x_1) = \\frac{e^{x_1}}{\\sum_{j}{e^{x_j}}} = \\frac{e^{x_1}}{e^{x_1} + e^{x_5}} = \\frac{e^{0.4890}}{e^{0.4890} + e^{-0.4598}} = 0.7209\n", "$$\n", "\n", "相同計算手法要從 $x_1$ 算到 $x_8$ 共 8 次。" ] }, { "cell_type": "markdown", "id": "3f6f5215-36b7-4098-bcd6-a8fd8890e765", "metadata": {}, "source": [ "為什麼分母的加總是 $ e^{x_1} + e^{x_5} $ ? \n", "\n", "`softmax` 指定 `dim=0`,就是對第 0 維度做計算,這裡的意思是每一個維度上,每一個相同位置進行 `softmax` 計算,我們已知現在要計算所在第一維度的 $ x_1 $ ,而在第二維度上與 $ x_1 $ 相同位置的,就只有 $ x_5 $。\n", "\n", "分母的 $ \\sum_{j}{e^{x_j}} $ 就是加總那個維度上的資訊,所以就是分子除上所有位置的算術平均。\n", "\n", "用程式碼來驗證的話,即:" ] }, { "cell_type": "code", "execution_count": 41, "id": "f8e97c77-fae7-4cb5-a05e-906c7dad8ee7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.7208737843071955" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import math\n", "ex1 = math.exp(0.4890)\n", "ex5 = math.exp(-0.4598)\n", "\n", "# s(x1) = 𝑠(0.4890)\n", "ex1 / (ex1 + ex5)" ] }, { "cell_type": "code", "execution_count": 42, "id": "e57be527-912c-4e5b-b450-7aef1b139c78", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.6259544445137329" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# s(x2) = 𝑠(0.0378) \n", "ex2 = math.exp(0.0378)\n", "ex6 = math.exp(-0.4771)\n", "\n", "ex2 / (ex2 + ex6)" ] }, { "cell_type": "code", "execution_count": 43, "id": "61904b76-1a4b-4e09-abeb-804612ba7da4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.23564606371434452" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# s(x8) = 𝑠(−0.8554) \n", "\n", "ex8 = math.exp(-0.8554)\n", "ex4 = math.exp(0.3213)\n", "\n", "ex8 / (ex4 + ex8)" ] }, { "cell_type": "markdown", "id": "c94e4dd7-31c1-43be-a91e-9e59844d763b", "metadata": {}, "source": [ "softmax 處理後的結果會有機率論中的特性,直接將 dim=0 ,也就是相同維度的值相加後 = 1。\n", "\n", "比方說:\n", "\n", "$$ e^{x_1} + e^{x_5} = 0.7209 + 0.2791 = 1 $$\n", "\n", "對 dim=n 維度亦同,只是加總的位置不同而已。\n", "\n", "## 計算維度為 1 時\n", "\n", "如果對它們做 softmax 計算,選擇對第 1 維度 `dim=1` 做計算,得到:" ] }, { "cell_type": "code", "execution_count": 44, "id": "2eff857d-ac73-46eb-8d29-1c3967d33b10", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[[0.8696, 0.4296],\n", " [0.1304, 0.5704]],\n", "\n", " [[0.7626, 0.5935],\n", " [0.2374, 0.4065]]])\n" ] } ], "source": [ "matrix = F.softmax(input, dim=1)\n", "print(matrix)" ] }, { "cell_type": "markdown", "id": "d7f2baef-6ad7-4d33-8cc6-164e0559db45", "metadata": {}, "source": [ "`dim=1` 的計算方式是目前所在維度的 `行(row)` 上進行運算:\n", "\n", "假設要計算 $ s(x_1) $,則為\n", "\n", "$$\n", "Softmax(x_1) = \\frac{e^{x_1}}{\\sum_{j}{e^{x_j}}} = \\frac{e^{x_1}}{e^{x_1} + e^{x_3}} = \\frac{e^{0.4890}}{e^{0.4890} + e^{-0.8554}} = 0.7209\n", "$$\n", "\n", "*避免混淆,請記得參考 `input` 變數上的矩陣值,不要誤會到 `dim=0` 計算的值。\n", "\n", "*相同計算手法也要從 $x_1$ 算到 $x_8$ 共 8 次。 \n", "\n", "用程式碼驗算則為:" ] }, { "cell_type": "code", "execution_count": 45, "id": "35c1a7b8-b912-41e4-aad9-a77f188f369d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.8696423263784095" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ex1 = math.exp(0.4890)\n", "ex3 = math.exp(-1.4088)\n", "\n", "# s(x1) = 𝑠(0.4890)\n", "ex1 / (ex1 + ex3)" ] }, { "cell_type": "code", "execution_count": 47, "id": "a6e26832-afb9-42b8-8a36-8ac9d527ae38", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5934630182609121" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ex6 = math.exp(-0.4771)\n", "ex8 = math.exp(-0.8554)\n", "\n", "# s(x6) = 𝑠(−0.4771)\n", "ex6 / (ex6 + ex8)" ] }, { "cell_type": "markdown", "id": "d08d9e0a-1152-4680-a678-41fefe079fd4", "metadata": {}, "source": [ "## 計算維度為 2, -1 時" ] }, { "cell_type": "code", "execution_count": 52, "id": "b29239b0-f767-458f-9696-01e54f8add23", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[[0.6109, 0.3891],\n", " [0.1506, 0.8494]],\n", "\n", " [[0.5043, 0.4957],\n", " [0.3162, 0.6838]]])\n", "===============================\n", "tensor([[[0.6109, 0.3891],\n", " [0.1506, 0.8494]],\n", "\n", " [[0.5043, 0.4957],\n", " [0.3162, 0.6838]]])\n" ] } ], "source": [ "matrix = F.softmax(input, dim=2)\n", "print(matrix)\n", "\n", "print(\"===============================\")\n", "\n", "matrix = F.softmax(input, dim=-1)\n", "print(matrix)" ] }, { "cell_type": "markdown", "id": "4bd1a3cd-8cc7-467f-9717-25a466f261c9", "metadata": {}, "source": [ "另一個計算方式是 `dim=2`, `dim=-1` 的情形,這兩個都屬於一樣的計算方式,是將每個維度的每一個 `列 (column)` 進行運算:\n", "\n", "假設要計算 $ s(x_1) $,則為\n", "\n", "$$\n", "Softmax(x_1) = \\frac{e^{x_1}}{\\sum_{j}{e^{x_j}}} = \\frac{e^{x_1}}{e^{x_1} + e^{x_2}} = \\frac{e^{0.4890}}{e^{0.4890} + e^{0.0378}} = 0.6109\n", "$$\n" ] }, { "cell_type": "code", "execution_count": 64, "id": "3b5102f0-fc0a-452d-a104-e233df041a15", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "s(x1) = 0.61092450679204\n", "s(x2) = 0.38907549320796\n" ] } ], "source": [ "ex1 = math.exp(0.4890)\n", "ex2 = math.exp(0.0378)\n", "\n", "# s(x1) = 𝑠(0.4890)\n", "print('s(x1) = ' + str(ex1 / (ex1 + ex2)))\n", "print('s(x2) = ' + str(ex2 / (ex1 + ex2)))" ] }, { "cell_type": "code", "execution_count": null, "id": "dbc67a44-db60-459c-ba25-c6ebe123e59a", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 5 }


Reference:

https://blog.csdn.net/will_ye/article/details/104994504
https://stackoverflow.com/questions/52513802/pytorch-softmax-with-dim

沒有留言:

張貼留言

© Mac Taylor, 歡迎自由轉貼。
Background Email Pattern by Toby Elliott
Since 2014