Term Extracting steps
[R: 字串左右出現不同字的種類數]
Candidate Terms: R>1 [STRING] R>1 (左右 R 值均大於 1 的字串數): 15,694,556
(某詞條的前一個字或後一個字出現過句號或任兩種以上的字)
佛光大辭典詞條數: 24,448 (將缺字的詞條去除後的數量)
Candidate Terms 與 佛光辭典交集詞條數: 16,884
|R|值變化的計算與圖表
candidate terms: 15,694,556
佛光大辭典's terms: 24,448
candidate ^ 佛光大辭典's: 16,844 (when |R| >= 2)
Precision = 16,844 / 15,694,556 (candidate ^ 佛光大辭典's / candidate terms)
Recall = 16,844 / 24448 (candidate ^ 佛光大辭典's / 佛光大辭典's terms)
F-measure = 2 / (1 / Precision) + (1 / Recall)
Y : Precision, Recall, F-measure
X : |R|數量的變化, 僅取 R > 2 and R < 500
[圖 1] 詞條左側 |R| 值對應佛光的變化
上圖 1 實際參數
X | All Candidates | ^ Mullers | Y (Precision) | Y (Recall) | Y (F-measure) |
... | ... | ... | ... | ... | ... |
321 | 26355 | 2687 | 0.101954088408 | 0.109906740838 | 0.105781154656 |
322 | 26241 | 2675 | 0.101939712663 | 0.109415903141 | 0.105545581882 |
323 | 26131 | 2672 | 0.102254027783 | 0.109293193717 | 0.105656497756 |
324 | 26025 | 2669 | 0.102555235351 | 0.109170484293 | 0.105759514988 |
325 | 25917 | 2659 | 0.102596751167 | 0.10876145288 | 0.105589198848 |
326 | 25822 | 2650 | 0.102625668035 | 0.108393324607 | 0.105430674358 |
327 | 25735 | 2645 | 0.102778317466 | 0.108188808901 | 0.105414184086 |
328 | 25633 | 2635 | 0.102797175516 | 0.107779777487 | 0.105229528164 |
329 | 25532 | 2630 | 0.103007989973 | 0.10757526178 | 0.105242096839 |
330 | 25434 | 2622 | 0.103090351498 | 0.107248036649 | 0.105128102321 |
331 | 25342 | 2619 | 0.10334622366 | 0.107125327225 | 0.105201847761 |
332 | 25250 | 2615 | 0.103564356436 | 0.10696171466 | 0.105235623164 |
333 | 25165 | 2606 | 0.103556526922 | 0.106593586387 | 0.10505311108 |
334 | 25087 | 2602 | 0.103719057679 | 0.106429973822 | 0.105057030383 |
335 | 24995 | 2598 | 0.103940788158 | 0.106266361257 | 0.105090710515 |
336 | 24906 | 2589 | 0.103950855216 | 0.105898232984 | 0.104915508368 |
337 | 24813 | 2585 | 0.104179260871 | 0.105734620419 | 0.104951178417 |
338 | 24721 | 2581 | 0.104405161603 | 0.105571007853 | 0.104984848177 |
339 | 24637 | 2574 | 0.104477006129 | 0.105284685864 | 0.104879291026 |
... | ... | ... | ... | ... | ... |
[圖 2] 詞條右側 |R| 值對應佛光的變化
上圖 2 實際參數
X | All Candidates | ^ Mullers | Y (Precision) | Y (Recall) | Y (F-measure) |
... | ... | ... | ... | ... | ... |
341 | 25948 | 3165 | 0.121974718668 | 0.129458442408 | 0.125605206762 |
342 | 25845 | 3159 | 0.122228670923 | 0.12921302356 | 0.125623844273 |
343 | 25773 | 3152 | 0.122298529469 | 0.128926701571 | 0.12552517871 |
344 | 25664 | 3143 | 0.122467269327 | 0.128558573298 | 0.125439016603 |
345 | 25580 | 3139 | 0.122713057076 | 0.128394960733 | 0.125489725754 |
346 | 25482 | 3136 | 0.123067263166 | 0.128272251309 | 0.125615862207 |
347 | 25401 | 3130 | 0.123223495138 | 0.128026832461 | 0.125579249333 |
348 | 25328 | 3127 | 0.123460202148 | 0.127904123037 | 0.125642880103 |
349 | 25241 | 3122 | 0.123687651044 | 0.12769960733 | 0.125661615247 |
350 | 25135 | 3114 | 0.123890988661 | 0.127372382199 | 0.12560756711 |
351 | 25042 | 3108 | 0.124111492692 | 0.127126963351 | 0.125601131542 |
352 | 24963 | 3101 | 0.1242238513 | 0.126840641361 | 0.125518609217 |
353 | 24873 | 3096 | 0.124472319382 | 0.126636125654 | 0.125544899738 |
354 | 24785 | 3094 | 0.124833568691 | 0.126554319372 | 0.12568805476 |
355 | 24687 | 3085 | 0.124964556244 | 0.126186191099 | 0.125572402564 |
356 | 24606 | 3081 | 0.125213362594 | 0.126022578534 | 0.125616667346 |
357 | 24522 | 3076 | 0.125438381861 | 0.125818062827 | 0.125627935471 |
358 | 24428 | 3067 | 0.125552644506 | 0.125449934555 | 0.125501268516 |
359 | 24336 | 3063 | 0.125862919132 | 0.12528632199 | 0.125573958675 |
... | ... | ... | ... | ... | ... |
[Max: 詞條左右出現各種不同的可能(|R|)中, 其中次數最多的是多少次]
一般化: 將 Max / fx, fx 是該詞條的總數
左右 Max/fx 的變化圖表.
Y : Precision, Recall, F-measure
X : Max/fx 數量的變化, 間隔 0.1
[圖 3] 詞條左側 Max/fx 值對應佛光的變化
[圖 4] 詞條右側 Max/fx 值對應佛光的變化
[Algorism AEc]
AEc = fx / fy + fz - fx
Ex:
string: 中華佛學研究所
fx = No. of 中華佛學研究所
fy = No. of 中華佛學研究
fz = No. of 華佛學研究所
以 0.01 為間隔, 取一百段的 AEc 值(0.01~1.0)
計算候選詞條中的 AEc 值 >= 上述區段時, 與佛光比對的結果