Wykład 6. Problem nadmiernego dopasowania

This commit is contained in:
Paweł Skórzewski 2022-11-24 07:22:33 +01:00
parent ff0bda56cf
commit eacef8109a
12 changed files with 6330 additions and 51 deletions

View File

@ -199,18 +199,6 @@
"Jak widać powyżej, tutaj oprócz liczb pojawiają się pewne tekstowe wartości specjalne, takie jak `parter`, `poddasze` czy `niski parter`." "Jak widać powyżej, tutaj oprócz liczb pojawiają się pewne tekstowe wartości specjalne, takie jak `parter`, `poddasze` czy `niski parter`."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Takie wartości należy zamienić na liczby. Jak?\n",
"* Wydaje się, że `parter` czy `niski parter` można z powodzeniem potraktować jako piętro „zerowe” i zamienić na `0`.\n",
"* Z poddaszem sytuacja nie jest już tak oczywista. Czy mają Państwo jakieś propozycje?\n",
" * Może zamienić `poddasze` na wartość NaN (zobacz poniżej)?\n",
" * Może wykorzystać w tym celu wartość z sąsiedniej kolumny *Liczba pięter w budynku*?\n",
" * Może w ogóle odrzucić przykłady, w których występuje ta wartość? (jeżeli tych przykładów jest bardzo mało)"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 8, "execution_count": 8,
@ -251,6 +239,18 @@
"alldata[\"Piętro\"].value_counts()\n" "alldata[\"Piętro\"].value_counts()\n"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Takie wartości należy zamienić na liczby. Jak?\n",
"* Wydaje się, że `parter` czy `niski parter` można z powodzeniem potraktować jako piętro „zerowe” i zamienić na `0`.\n",
"* Z poddaszem sytuacja nie jest już tak oczywista. Czy mają Państwo jakieś propozycje?\n",
" * Może zamienić `poddasze` na wartość NaN (zobacz poniżej)?\n",
" * Może wykorzystać w tym celu wartość z sąsiedniej kolumny *Liczba pięter w budynku*?\n",
" * Skoro `poddasze` pojawia się tylko w nielicznych przykładach, może w ogóle odrzucić te przykłady?"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@ -98,7 +98,6 @@
"data = preprocess(data) # wstępne przetworzenie danych\n", "data = preprocess(data) # wstępne przetworzenie danych\n",
"\n", "\n",
"# Podział danych na zbiory uczący i testowy\n", "# Podział danych na zbiory uczący i testowy\n",
"split_point = int(0.8 * len(data))\n",
"data_train, data_test = train_test_split(data, test_size=0.2)\n", "data_train, data_test = train_test_split(data, test_size=0.2)\n",
"\n", "\n",
"# Uczenie modelu\n", "# Uczenie modelu\n",
@ -252,7 +251,6 @@
")\n", ")\n",
"\n", "\n",
"# Podział danych na zbiór uczący i zbiór testowy\n", "# Podział danych na zbiór uczący i zbiór testowy\n",
"split_point = int(0.8 * len(data_iris))\n",
"data_train, data_test = train_test_split(data_iris, test_size=0.2)\n", "data_train, data_test = train_test_split(data_iris, test_size=0.2)\n",
"\n", "\n",
"# Uczenie modelu\n", "# Uczenie modelu\n",
@ -283,7 +281,7 @@
"metadata": { "metadata": {
"celltoolbar": "Slideshow", "celltoolbar": "Slideshow",
"kernelspec": { "kernelspec": {
"display_name": "Python 3.10.6 64-bit", "display_name": "Python 3 (ipykernel)",
"language": "python", "language": "python",
"name": "python3" "name": "python3"
}, },

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

BIN
wyk/bias2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 122 KiB

BIN
wyk/curves.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

50
wyk/data-metrics.tsv Normal file
View File

@ -0,0 +1,50 @@
0 -0.9410633308036449 0.46518252113944425
1 0.4700636553691919 -0.3970321538875541
1 -0.01609299859794966 0.23161453968628254
0 -0.9966154155058933 0.06419313152355421
0 0.8000009607150127 0.44133107977776875
0 0.389227379480078 -0.8415416694237676
0 -0.7786281038890375 0.2833716839963434
1 -0.10150562150521569 -0.02968754639839366
1 -0.14995353486391494 0.30921523116923866
0 0.3150219624148183 0.4186143523577863
0 -0.5542734031872467 0.9291684810885719
0 -0.44750469543445215 -0.8240387195698262
0 -0.7875312310670415 0.27475695030524894
0 0.20470154428730747 -0.8122722630746713
0 0.07472783793361693 0.8936381678688297
0 -0.6016285994197443 -0.9783927694535444
0 0.4235345463350013 -0.23977977886239832
0 0.256790496684171 -0.5587059709121811
0 -0.2172656054288027 0.8015306542483966
0 0.2009238354275602 0.9376873763906164
0 -0.8760038215191506 0.015194717659306356
0 -0.1512141038160364 -0.9575528046526418
0 -0.6378974241766098 0.35900665963616696
0 -0.6219617077011876 0.04019896541474166
0 -0.2533778634666939 -0.8576798720089458
0 -0.9398823073223508 0.806594859009744
0 -0.24161324930138606 -0.6982896600554984
0 -0.967724402993285 0.15651783268628372
0 0.9587968810951801 -0.3382309645563397
1 0.18040441263417084 -0.026706542719935777
0 -0.2403226372749332 -0.2694487472698215
0 -0.49494412803453747 -0.6833825934742561
0 -0.32266963833818574 0.6299706350061482
0 -0.716450532167108 0.7792499086149187
1 -0.5661825812948427 -0.3045016769669948
0 -0.9014952263862088 0.19697267011506714
1 0.3192734822128551 -0.3145295901019187
1 -0.4386590899062277 0.6119229005694005
0 -0.6306933372350818 0.4721301354446683
0 0.3302936606411402 -0.3047093070118343
1 -0.38049655790356285 -0.609474130471132
1 0.32069301644263426 0.17266197471996692
1 0.8349752241994568 0.4408717276862013
0 -0.26741723386938343 -0.4919294757003996
0 -0.7786699335922747 -0.47305795528791905
0 0.723410510517891 -0.010095862311693793
0 0.0902826080483603 -0.6805262097228113
0 -0.9286972617786873 0.7200430642275493
0 -0.0623197964184079 0.8187639325432745
0 -0.20572090815735944 -0.6655000969777327
1 0 -0.9410633308036449 0.46518252113944425
2 1 0.4700636553691919 -0.3970321538875541
3 1 -0.01609299859794966 0.23161453968628254
4 0 -0.9966154155058933 0.06419313152355421
5 0 0.8000009607150127 0.44133107977776875
6 0 0.389227379480078 -0.8415416694237676
7 0 -0.7786281038890375 0.2833716839963434
8 1 -0.10150562150521569 -0.02968754639839366
9 1 -0.14995353486391494 0.30921523116923866
10 0 0.3150219624148183 0.4186143523577863
11 0 -0.5542734031872467 0.9291684810885719
12 0 -0.44750469543445215 -0.8240387195698262
13 0 -0.7875312310670415 0.27475695030524894
14 0 0.20470154428730747 -0.8122722630746713
15 0 0.07472783793361693 0.8936381678688297
16 0 -0.6016285994197443 -0.9783927694535444
17 0 0.4235345463350013 -0.23977977886239832
18 0 0.256790496684171 -0.5587059709121811
19 0 -0.2172656054288027 0.8015306542483966
20 0 0.2009238354275602 0.9376873763906164
21 0 -0.8760038215191506 0.015194717659306356
22 0 -0.1512141038160364 -0.9575528046526418
23 0 -0.6378974241766098 0.35900665963616696
24 0 -0.6219617077011876 0.04019896541474166
25 0 -0.2533778634666939 -0.8576798720089458
26 0 -0.9398823073223508 0.806594859009744
27 0 -0.24161324930138606 -0.6982896600554984
28 0 -0.967724402993285 0.15651783268628372
29 0 0.9587968810951801 -0.3382309645563397
30 1 0.18040441263417084 -0.026706542719935777
31 0 -0.2403226372749332 -0.2694487472698215
32 0 -0.49494412803453747 -0.6833825934742561
33 0 -0.32266963833818574 0.6299706350061482
34 0 -0.716450532167108 0.7792499086149187
35 1 -0.5661825812948427 -0.3045016769669948
36 0 -0.9014952263862088 0.19697267011506714
37 1 0.3192734822128551 -0.3145295901019187
38 1 -0.4386590899062277 0.6119229005694005
39 0 -0.6306933372350818 0.4721301354446683
40 0 0.3302936606411402 -0.3047093070118343
41 1 -0.38049655790356285 -0.609474130471132
42 1 0.32069301644263426 0.17266197471996692
43 1 0.8349752241994568 0.4408717276862013
44 0 -0.26741723386938343 -0.4919294757003996
45 0 -0.7786699335922747 -0.47305795528791905
46 0 0.723410510517891 -0.010095862311693793
47 0 0.0902826080483603 -0.6805262097228113
48 0 -0.9286972617786873 0.7200430642275493
49 0 -0.0623197964184079 0.8187639325432745
50 0 -0.20572090815735944 -0.6655000969777327

File diff suppressed because it is too large Load Diff

118
wyk/ex2data2.txt Normal file
View File

@ -0,0 +1,118 @@
0.051267,0.69956,1
-0.092742,0.68494,1
-0.21371,0.69225,1
-0.375,0.50219,1
-0.51325,0.46564,1
-0.52477,0.2098,1
-0.39804,0.034357,1
-0.30588,-0.19225,1
0.016705,-0.40424,1
0.13191,-0.51389,1
0.38537,-0.56506,1
0.52938,-0.5212,1
0.63882,-0.24342,1
0.73675,-0.18494,1
0.54666,0.48757,1
0.322,0.5826,1
0.16647,0.53874,1
-0.046659,0.81652,1
-0.17339,0.69956,1
-0.47869,0.63377,1
-0.60541,0.59722,1
-0.62846,0.33406,1
-0.59389,0.005117,1
-0.42108,-0.27266,1
-0.11578,-0.39693,1
0.20104,-0.60161,1
0.46601,-0.53582,1
0.67339,-0.53582,1
-0.13882,0.54605,1
-0.29435,0.77997,1
-0.26555,0.96272,1
-0.16187,0.8019,1
-0.17339,0.64839,1
-0.28283,0.47295,1
-0.36348,0.31213,1
-0.30012,0.027047,1
-0.23675,-0.21418,1
-0.06394,-0.18494,1
0.062788,-0.16301,1
0.22984,-0.41155,1
0.2932,-0.2288,1
0.48329,-0.18494,1
0.64459,-0.14108,1
0.46025,0.012427,1
0.6273,0.15863,1
0.57546,0.26827,1
0.72523,0.44371,1
0.22408,0.52412,1
0.44297,0.67032,1
0.322,0.69225,1
0.13767,0.57529,1
-0.0063364,0.39985,1
-0.092742,0.55336,1
-0.20795,0.35599,1
-0.20795,0.17325,1
-0.43836,0.21711,1
-0.21947,-0.016813,1
-0.13882,-0.27266,1
0.18376,0.93348,0
0.22408,0.77997,0
0.29896,0.61915,0
0.50634,0.75804,0
0.61578,0.7288,0
0.60426,0.59722,0
0.76555,0.50219,0
0.92684,0.3633,0
0.82316,0.27558,0
0.96141,0.085526,0
0.93836,0.012427,0
0.86348,-0.082602,0
0.89804,-0.20687,0
0.85196,-0.36769,0
0.82892,-0.5212,0
0.79435,-0.55775,0
0.59274,-0.7405,0
0.51786,-0.5943,0
0.46601,-0.41886,0
0.35081,-0.57968,0
0.28744,-0.76974,0
0.085829,-0.75512,0
0.14919,-0.57968,0
-0.13306,-0.4481,0
-0.40956,-0.41155,0
-0.39228,-0.25804,0
-0.74366,-0.25804,0
-0.69758,0.041667,0
-0.75518,0.2902,0
-0.69758,0.68494,0
-0.4038,0.70687,0
-0.38076,0.91886,0
-0.50749,0.90424,0
-0.54781,0.70687,0
0.10311,0.77997,0
0.057028,0.91886,0
-0.10426,0.99196,0
-0.081221,1.1089,0
0.28744,1.087,0
0.39689,0.82383,0
0.63882,0.88962,0
0.82316,0.66301,0
0.67339,0.64108,0
1.0709,0.10015,0
-0.046659,-0.57968,0
-0.23675,-0.63816,0
-0.15035,-0.36769,0
-0.49021,-0.3019,0
-0.46717,-0.13377,0
-0.28859,-0.060673,0
-0.61118,-0.067982,0
-0.66302,-0.21418,0
-0.59965,-0.41886,0
-0.72638,-0.082602,0
-0.83007,0.31213,0
-0.72062,0.53874,0
-0.59389,0.49488,0
-0.48445,0.99927,0
-0.0063364,0.99927,0
0.63265,-0.030612,0

BIN
wyk/fit.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 KiB

BIN
wyk/learning-curves.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.9 KiB

100
wyk/polynomial_logistic.tsv Normal file
View File

@ -0,0 +1,100 @@
1 0.25777005758108174 0.601012316037165
1 0.3659669567447452 -0.11214686303429633
0 0.49453050141627375 0.47110655546911206
0 0.7029060372914113 -0.9225798301680093
0 0.46658862037642423 -0.6226973935055724
0 0.8793946243263941 -0.11408014657778076
0 -0.3311850002119068 0.8444766749977881
0 -0.5435170087333634 0.8851383010436487
0 0.9197924083397226 0.41607011737177735
0 0.28011742147804797 0.6143115673056148
0 0.9475436344725683 -0.7830731144606005
0 0.4904989452188586 0.649356142549592
0 -0.865983500565505 0.9896361556274065
0 -0.8579184997717257 0.3062253122060574
0 0.08082005095746103 -0.7736760810964189
0 -0.3363842450225085 -0.8802992880290186
0 0.4748472924067402 0.9756949850919965
0 -0.7956979203895616 0.8751067723304518
1 0.06752895667287606 -0.7683056187589332
0 -0.5825898275446799 0.8068359661366173
1 0.1109238791315652 -0.2034825016864903
0 0.5011542085506828 0.9366868642789181
0 0.2011359606302785 0.4800561245801245
1 -0.38620580274071115 0.4003933803256208
1 -0.1722113915778094 0.3926707935387965
0 0.6575404624823169 -0.7070032890943085
1 -0.2832309098070882 0.034184675674787446
1 -0.16828017341376333 -0.1628482245819587
0 -0.6552618226108893 -0.3159705063754401
0 -0.6466772083696701 -0.07116372625398881
0 0.848711325640519 0.2132898335742659
1 -0.35490315474701606 -0.0025105634256454845
0 -0.36568446532837817 0.5637325774329354
0 -0.5089179414092766 0.8086671779253405
0 0.9609295951994559 -0.12114542304082354
0 0.055563338045806265 0.8532855304613407
0 -0.8937129542754998 -0.02555660184206876
0 0.40678784672410284 -0.5480665560665205
0 -0.7683896050204841 0.9475293644451854
0 -0.515467982993429 -0.5389177617277066
0 0.9693903475176826 -0.9765032993967369
0 -0.5476549714934908 -0.018838768427513974
0 0.5262277827151787 -0.9936327305281174
1 0.9394838829593151 0.9962891110157359
0 -0.935709119652979 -0.6940925482964921
0 0.6161569745665239 -0.044448545050667976
0 -0.08521587367561922 0.9636255303204684
0 0.9073344675416231 -0.08813265618067079
1 -0.1563237189794715 0.05022859605451302
0 -0.9785642881644829 -0.5076719844587916
0 -0.5494648865481802 -0.6044852696776528
0 -0.7170122682018529 -0.6250685449461151
1 0.5333872877810009 0.1395189003073396
0 -0.49270328980187905 0.9081426529064955
1 0.07777642690144848 -0.44188199856981347
0 0.8328452661100116 0.5508441451500428
1 -0.33275827507477573 -0.15434344174028314
0 -0.9057550401714867 0.6324599729071743
0 -0.8476574433184823 0.5739140088331203
0 -0.37393930555231103 0.7361874446899226
1 0.6610910543790163 0.0036185958785315275
0 0.49147748571126004 -0.6155167984371757
0 0.31992462553488177 -0.38253832622755657
0 0.7398386519468336 -0.915886088774648
0 0.5915392280694003 0.011422405850611383
0 -0.5818860867200502 -0.44086037005029377
0 -0.9066322824076023 0.21754010215910524
1 0.12243932470792318 -0.3830697406526009
0 0.40607941790742297 0.5626829623336307
1 -0.1210920179663808 -0.20552144405177608
0 0.48099006522554233 0.9583656149315158
0 -0.059491720260914205 0.6161097510891897
1 -0.053220979060695006 0.07562497263502688
0 -0.8742304482942296 -0.13488952315510616
0 0.7362712712103594 0.6087347685508093
0 0.025549937023763736 -0.6202087182389777
0 0.6755333538371804 0.7047713746899604
0 -0.3954771867034055 0.3567082570178153
1 0.24896928809009156 -0.17106278785061302
0 0.6133735778535989 -0.6297261231852487
1 -0.35955189872833593 -0.2086164112593747
0 0.646544898896497 0.8858921579510579
0 0.6459228334265068 -0.9141274779126995
0 -0.5279127041052518 -0.11119649758918437
0 -0.47141090620857784 -0.29849889702571786
1 0.1901970467567704 -0.5049996808415897
0 -0.5497623652380574 -0.49032403671408553
0 -0.5759454285366339 0.445122514716527
0 -0.7800687910859982 -0.4823078816937112
0 0.39722150362989095 0.5827352140491311
1 0.018540458464545218 -0.20805328372207677
0 -0.14419638252986933 -0.8679481460173017
1 -0.15012196110925857 0.5474017473230433
1 -0.11028545705088533 0.5371497474265077
0 -0.46577855502057375 -0.9226883886539352
0 0.4843595022265692 0.47692504895620713
0 0.4330264545403766 -0.40096944878062857
0 -0.7401024435876022 0.758623363044544
0 0.20470935356917574 -0.7551473328272353
0 0.1877078820888327 -0.3377139504156679
1 1 0.25777005758108174 0.601012316037165
2 1 0.3659669567447452 -0.11214686303429633
3 0 0.49453050141627375 0.47110655546911206
4 0 0.7029060372914113 -0.9225798301680093
5 0 0.46658862037642423 -0.6226973935055724
6 0 0.8793946243263941 -0.11408014657778076
7 0 -0.3311850002119068 0.8444766749977881
8 0 -0.5435170087333634 0.8851383010436487
9 0 0.9197924083397226 0.41607011737177735
10 0 0.28011742147804797 0.6143115673056148
11 0 0.9475436344725683 -0.7830731144606005
12 0 0.4904989452188586 0.649356142549592
13 0 -0.865983500565505 0.9896361556274065
14 0 -0.8579184997717257 0.3062253122060574
15 0 0.08082005095746103 -0.7736760810964189
16 0 -0.3363842450225085 -0.8802992880290186
17 0 0.4748472924067402 0.9756949850919965
18 0 -0.7956979203895616 0.8751067723304518
19 1 0.06752895667287606 -0.7683056187589332
20 0 -0.5825898275446799 0.8068359661366173
21 1 0.1109238791315652 -0.2034825016864903
22 0 0.5011542085506828 0.9366868642789181
23 0 0.2011359606302785 0.4800561245801245
24 1 -0.38620580274071115 0.4003933803256208
25 1 -0.1722113915778094 0.3926707935387965
26 0 0.6575404624823169 -0.7070032890943085
27 1 -0.2832309098070882 0.034184675674787446
28 1 -0.16828017341376333 -0.1628482245819587
29 0 -0.6552618226108893 -0.3159705063754401
30 0 -0.6466772083696701 -0.07116372625398881
31 0 0.848711325640519 0.2132898335742659
32 1 -0.35490315474701606 -0.0025105634256454845
33 0 -0.36568446532837817 0.5637325774329354
34 0 -0.5089179414092766 0.8086671779253405
35 0 0.9609295951994559 -0.12114542304082354
36 0 0.055563338045806265 0.8532855304613407
37 0 -0.8937129542754998 -0.02555660184206876
38 0 0.40678784672410284 -0.5480665560665205
39 0 -0.7683896050204841 0.9475293644451854
40 0 -0.515467982993429 -0.5389177617277066
41 0 0.9693903475176826 -0.9765032993967369
42 0 -0.5476549714934908 -0.018838768427513974
43 0 0.5262277827151787 -0.9936327305281174
44 1 0.9394838829593151 0.9962891110157359
45 0 -0.935709119652979 -0.6940925482964921
46 0 0.6161569745665239 -0.044448545050667976
47 0 -0.08521587367561922 0.9636255303204684
48 0 0.9073344675416231 -0.08813265618067079
49 1 -0.1563237189794715 0.05022859605451302
50 0 -0.9785642881644829 -0.5076719844587916
51 0 -0.5494648865481802 -0.6044852696776528
52 0 -0.7170122682018529 -0.6250685449461151
53 1 0.5333872877810009 0.1395189003073396
54 0 -0.49270328980187905 0.9081426529064955
55 1 0.07777642690144848 -0.44188199856981347
56 0 0.8328452661100116 0.5508441451500428
57 1 -0.33275827507477573 -0.15434344174028314
58 0 -0.9057550401714867 0.6324599729071743
59 0 -0.8476574433184823 0.5739140088331203
60 0 -0.37393930555231103 0.7361874446899226
61 1 0.6610910543790163 0.0036185958785315275
62 0 0.49147748571126004 -0.6155167984371757
63 0 0.31992462553488177 -0.38253832622755657
64 0 0.7398386519468336 -0.915886088774648
65 0 0.5915392280694003 0.011422405850611383
66 0 -0.5818860867200502 -0.44086037005029377
67 0 -0.9066322824076023 0.21754010215910524
68 1 0.12243932470792318 -0.3830697406526009
69 0 0.40607941790742297 0.5626829623336307
70 1 -0.1210920179663808 -0.20552144405177608
71 0 0.48099006522554233 0.9583656149315158
72 0 -0.059491720260914205 0.6161097510891897
73 1 -0.053220979060695006 0.07562497263502688
74 0 -0.8742304482942296 -0.13488952315510616
75 0 0.7362712712103594 0.6087347685508093
76 0 0.025549937023763736 -0.6202087182389777
77 0 0.6755333538371804 0.7047713746899604
78 0 -0.3954771867034055 0.3567082570178153
79 1 0.24896928809009156 -0.17106278785061302
80 0 0.6133735778535989 -0.6297261231852487
81 1 -0.35955189872833593 -0.2086164112593747
82 0 0.646544898896497 0.8858921579510579
83 0 0.6459228334265068 -0.9141274779126995
84 0 -0.5279127041052518 -0.11119649758918437
85 0 -0.47141090620857784 -0.29849889702571786
86 1 0.1901970467567704 -0.5049996808415897
87 0 -0.5497623652380574 -0.49032403671408553
88 0 -0.5759454285366339 0.445122514716527
89 0 -0.7800687910859982 -0.4823078816937112
90 0 0.39722150362989095 0.5827352140491311
91 1 0.018540458464545218 -0.20805328372207677
92 0 -0.14419638252986933 -0.8679481460173017
93 1 -0.15012196110925857 0.5474017473230433
94 1 -0.11028545705088533 0.5371497474265077
95 0 -0.46577855502057375 -0.9226883886539352
96 0 0.4843595022265692 0.47692504895620713
97 0 0.4330264545403766 -0.40096944878062857
98 0 -0.7401024435876022 0.758623363044544
99 0 0.20470935356917574 -0.7551473328272353
100 0 0.1877078820888327 -0.3377139504156679