{
"cells": [
{
"cell_type": "markdown",
"id": "420ffd5f-8871-40e6-97b0-001df7ce78ad",
"metadata": {
"tags": []
},
"source": [
"\n",
"原函數定義: https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html\n",
"\n",
"$$ Softmax(x_i) = \\frac{e^{x_i}}{\\sum_{j}{e^{x_j}}} $$\n",
"\n",
"其中 $ x_i $ 的 i 是指某個維度上的陣列每一個元素。\n",
"\n",
"其中 $ x_j $ 的 j 是指沿著指定維度上的每一個值,這會被拿來加總成分母。\n",
"\n",
"假設我們生成一個 2 維, 包含兩個 2x2 隨機陣列向量的值:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "b9b43aca-e48c-4eaa-a3ed-968e33a287e8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[[ 0.4890, 0.0378],\n",
" [-1.4088, 0.3213]],\n",
"\n",
" [[-0.4598, -0.4771],\n",
" [-1.6266, -0.8554]]])\n"
]
}
],
"source": [
"import torch\n",
"import torch.nn.functional as F\n",
"\n",
"input = torch.randn(2,2,2)\n",
"print(input)"
]
},
{
"cell_type": "markdown",
"id": "5178d221-32e9-46dd-8bbf-8b580141a8fe",
"metadata": {},
"source": [
"那麼矩陣就會像以下:\n",
"\n",
"$$\n",
"\\begin{bmatrix}\n",
"\\begin{bmatrix}\n",
"0.4890 & 0.0378\\\\\n",
"-1.4088 & 0.3213\\\\\n",
"\\end{bmatrix} \\\\\n",
"\\begin{bmatrix} \n",
"-0.4598 & -0.4771\\\\\n",
"-1.6266 & -0.8554\\\\\n",
"\\end{bmatrix}\n",
"\\end{bmatrix}\n",
"$$\n",
"\n",
"使用代數來代表這些數值,則有:\n",
"\n",
"$$\n",
"\\begin{bmatrix}\n",
"\\begin{bmatrix}\n",
"x_1 & x_2\\\\\n",
"x_3 & x_4\\\\\n",
"\\end{bmatrix} \\\\\n",
"\\begin{bmatrix} \n",
"x_5 & x_6\\\\\n",
"x_7 & x_8\\\\\n",
"\\end{bmatrix}\n",
"\\end{bmatrix}\n",
"$$\n",
"\n",
"如果對它們做 `softmax` 計算,選擇對第 0 維度做計算,得到:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "b39c4faf-67fb-42e3-8c27-acabe13ed7bb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[[0.7209, 0.6260],\n",
" [0.5542, 0.7644]],\n",
"\n",
" [[0.2791, 0.3740],\n",
" [0.4458, 0.2356]]])\n"
]
}
],
"source": [
"matrix = F.softmax(input, dim=0)\n",
"print(matrix)"
]
},
{
"cell_type": "markdown",
"id": "c9c43304-507d-4baf-8982-a9692e417fb5",
"metadata": {},
"source": [
"`dim=0` 這個計算方式是目前所在位置上當作分子,對所有維度相同位置的數值進行加總後變成分母,相除的值。\n",
"\n",
"在上面的數值中,得到了這樣的結果:\n",
"\n",
"$$\n",
"\\begin{bmatrix}\n",
"\\begin{bmatrix}\n",
"0.7209 & 0.6260\\\\\n",
"0.5542 & 0.7644\\\\\n",
"\\end{bmatrix} \\\\\n",
"\\begin{bmatrix} \n",
"0.2791 & 0.3740\\\\\n",
"0.4458 & 0.2356\\\\\n",
"\\end{bmatrix}\n",
"\\end{bmatrix}\n",
"$$\n",
"\n",
"為了方便解釋,這樣的結果可以想成套上了 s(x) 這個函數,才得到上述的結果:\n",
"\n",
"$$\n",
"\\begin{bmatrix}\n",
"\\begin{bmatrix}\n",
"s(0.4890) & s(0.0378)\\\\\n",
"s(-1.4088) & s(0.3213)\\\\\n",
"\\end{bmatrix} \\\\\n",
"\\begin{bmatrix} \n",
"s(-0.4598) & s(-0.4771)\\\\\n",
"s(-1.6266) & s(-0.8554)\\\\\n",
"\\end{bmatrix}\n",
"\\end{bmatrix}\n",
"$$\n",
"\n",
"\n",
"該演算如下:\n",
"\n",
"假設要計算 $ s(x_1) $,則為\n",
"\n",
"$$\n",
"Softmax(x_1) = \\frac{e^{x_1}}{\\sum_{j}{e^{x_j}}} = \\frac{e^{x_1}}{e^{x_1} + e^{x_5}} = \\frac{e^{0.4890}}{e^{0.4890} + e^{-0.4598}} = 0.7209\n",
"$$\n",
"\n",
"相同計算手法要從 $x_1$ 算到 $x_8$ 共 8 次。"
]
},
{
"cell_type": "markdown",
"id": "3f6f5215-36b7-4098-bcd6-a8fd8890e765",
"metadata": {},
"source": [
"為什麼分母的加總是 $ e^{x_1} + e^{x_5} $ ? \n",
"\n",
"`softmax` 指定 `dim=0`,就是對第 0 維度做計算,這裡的意思是每一個維度上,每一個相同位置進行 `softmax` 計算,我們已知現在要計算所在第一維度的 $ x_1 $ ,而在第二維度上與 $ x_1 $ 相同位置的,就只有 $ x_5 $。\n",
"\n",
"分母的 $ \\sum_{j}{e^{x_j}} $ 就是加總那個維度上的資訊,所以就是分子除上所有位置的算術平均。\n",
"\n",
"用程式碼來驗證的話,即:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "f8e97c77-fae7-4cb5-a05e-906c7dad8ee7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.7208737843071955"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import math\n",
"ex1 = math.exp(0.4890)\n",
"ex5 = math.exp(-0.4598)\n",
"\n",
"# s(x1) = 𝑠(0.4890)\n",
"ex1 / (ex1 + ex5)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "e57be527-912c-4e5b-b450-7aef1b139c78",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6259544445137329"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# s(x2) = 𝑠(0.0378) \n",
"ex2 = math.exp(0.0378)\n",
"ex6 = math.exp(-0.4771)\n",
"\n",
"ex2 / (ex2 + ex6)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "61904b76-1a4b-4e09-abeb-804612ba7da4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.23564606371434452"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# s(x8) = 𝑠(−0.8554) \n",
"\n",
"ex8 = math.exp(-0.8554)\n",
"ex4 = math.exp(0.3213)\n",
"\n",
"ex8 / (ex4 + ex8)"
]
},
{
"cell_type": "markdown",
"id": "c94e4dd7-31c1-43be-a91e-9e59844d763b",
"metadata": {},
"source": [
"softmax 處理後的結果會有機率論中的特性,直接將 dim=0 ,也就是相同維度的值相加後 = 1。\n",
"\n",
"比方說:\n",
"\n",
"$$ e^{x_1} + e^{x_5} = 0.7209 + 0.2791 = 1 $$\n",
"\n",
"對 dim=n 維度亦同,只是加總的位置不同而已。\n",
"\n",
"## 計算維度為 1 時\n",
"\n",
"如果對它們做 softmax 計算,選擇對第 1 維度 `dim=1` 做計算,得到:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "2eff857d-ac73-46eb-8d29-1c3967d33b10",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[[0.8696, 0.4296],\n",
" [0.1304, 0.5704]],\n",
"\n",
" [[0.7626, 0.5935],\n",
" [0.2374, 0.4065]]])\n"
]
}
],
"source": [
"matrix = F.softmax(input, dim=1)\n",
"print(matrix)"
]
},
{
"cell_type": "markdown",
"id": "d7f2baef-6ad7-4d33-8cc6-164e0559db45",
"metadata": {},
"source": [
"`dim=1` 的計算方式是目前所在維度的 `行(row)` 上進行運算:\n",
"\n",
"假設要計算 $ s(x_1) $,則為\n",
"\n",
"$$\n",
"Softmax(x_1) = \\frac{e^{x_1}}{\\sum_{j}{e^{x_j}}} = \\frac{e^{x_1}}{e^{x_1} + e^{x_3}} = \\frac{e^{0.4890}}{e^{0.4890} + e^{-0.8554}} = 0.7209\n",
"$$\n",
"\n",
"*避免混淆,請記得參考 `input` 變數上的矩陣值,不要誤會到 `dim=0` 計算的值。\n",
"\n",
"*相同計算手法也要從 $x_1$ 算到 $x_8$ 共 8 次。 \n",
"\n",
"用程式碼驗算則為:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "35c1a7b8-b912-41e4-aad9-a77f188f369d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.8696423263784095"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ex1 = math.exp(0.4890)\n",
"ex3 = math.exp(-1.4088)\n",
"\n",
"# s(x1) = 𝑠(0.4890)\n",
"ex1 / (ex1 + ex3)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "a6e26832-afb9-42b8-8a36-8ac9d527ae38",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.5934630182609121"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ex6 = math.exp(-0.4771)\n",
"ex8 = math.exp(-0.8554)\n",
"\n",
"# s(x6) = 𝑠(−0.4771)\n",
"ex6 / (ex6 + ex8)"
]
},
{
"cell_type": "markdown",
"id": "d08d9e0a-1152-4680-a678-41fefe079fd4",
"metadata": {},
"source": [
"## 計算維度為 2, -1 時"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "b29239b0-f767-458f-9696-01e54f8add23",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[[0.6109, 0.3891],\n",
" [0.1506, 0.8494]],\n",
"\n",
" [[0.5043, 0.4957],\n",
" [0.3162, 0.6838]]])\n",
"===============================\n",
"tensor([[[0.6109, 0.3891],\n",
" [0.1506, 0.8494]],\n",
"\n",
" [[0.5043, 0.4957],\n",
" [0.3162, 0.6838]]])\n"
]
}
],
"source": [
"matrix = F.softmax(input, dim=2)\n",
"print(matrix)\n",
"\n",
"print(\"===============================\")\n",
"\n",
"matrix = F.softmax(input, dim=-1)\n",
"print(matrix)"
]
},
{
"cell_type": "markdown",
"id": "4bd1a3cd-8cc7-467f-9717-25a466f261c9",
"metadata": {},
"source": [
"另一個計算方式是 `dim=2`, `dim=-1` 的情形,這兩個都屬於一樣的計算方式,是將每個維度的每一個 `列 (column)` 進行運算:\n",
"\n",
"假設要計算 $ s(x_1) $,則為\n",
"\n",
"$$\n",
"Softmax(x_1) = \\frac{e^{x_1}}{\\sum_{j}{e^{x_j}}} = \\frac{e^{x_1}}{e^{x_1} + e^{x_2}} = \\frac{e^{0.4890}}{e^{0.4890} + e^{0.0378}} = 0.6109\n",
"$$\n"
]
},
{
"cell_type": "code",
"execution_count": 64,
"id": "3b5102f0-fc0a-452d-a104-e233df041a15",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"s(x1) = 0.61092450679204\n",
"s(x2) = 0.38907549320796\n"
]
}
],
"source": [
"ex1 = math.exp(0.4890)\n",
"ex2 = math.exp(0.0378)\n",
"\n",
"# s(x1) = 𝑠(0.4890)\n",
"print('s(x1) = ' + str(ex1 / (ex1 + ex2)))\n",
"print('s(x2) = ' + str(ex2 / (ex1 + ex2)))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dbc67a44-db60-459c-ba25-c6ebe123e59a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Reference:
https://blog.csdn.net/will_ye/article/details/104994504
https://stackoverflow.com/questions/52513802/pytorch-softmax-with-dim
沒有留言:
張貼留言