Esta é a segunda parte do projeto relacionado ao dataset de competição do Kaggle "House Prices - Advanced Regression Techniques", que teve como objetivos:
Analisar e comparar o desempenho de 4 modelos de Machine Learning (Regressão Linear, Random Forest Regressor, KNN Regressor e Support Vector Regression).
Otimizar cada modelo através do tuning dos respectivos hiperparâmetros, analisando de que forma eles impactam no desempenho.
Utilizar a técnica de Stacking Generalization, para combinar todas as predições de forma a encontrar um modelo cujo desempenho supere todos os outros modelos individualmente.
Para a Análise Exploratória de Dados completa, clique aqui.
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_columns', 81)
train = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Projetos/Advanced Houses/train.csv')
train.head()
Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196.0 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NaN | Attchd | 2003.0 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0.0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976.0 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162.0 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001.0 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0.0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998.0 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350.0 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000.0 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
Como nesta competição oss envios são avaliados em Root-Mean-Squared-Error (RMSE) entre o logaritmo do valor previsto e o logaritmo do preço de venda observado, vamos desde já fazer a transformação da variável target, usando a função log1p do Numpy:
y = np.log1p(train['SalePrice'])
X = train.drop(columns=['Id','SalePrice'])
Para facilitar a etapa de pré-processamento, vamos separar as colunas em dois tipos, numéricas e categóricas, uma vez que cada um demana diferentes transformações.
types = pd.DataFrame(X.dtypes).T
types
MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | int64 | object | float64 | int64 | object | object | object | object | object | object | object | object | object | object | object | object | int64 | int64 | int64 | int64 | object | object | object | object | object | float64 | object | object | object | object | object | object | object | int64 | object | int64 | int64 | int64 | object | object | object | object | int64 | int64 | int64 | int64 | int64 | int64 | int64 | int64 | int64 | int64 | object | int64 | object | int64 | object | object | float64 | object | int64 | int64 | object | object | object | int64 | int64 | int64 | int64 | int64 | int64 | object | object | object | int64 | int64 | int64 | object | object |
Precisamos corrigir os tipos de algumas colunas, de acordo com as descrições disponíveis na competição:
X['MSSubClass'] = X['MSSubClass'].astype(object)
var_cat =[]
var_num =[]
for col in X.columns:
if X[col].dtypes == object:
var_cat.append(col)
else:
var_num.append(col)
print(f"Há {len(var_cat)} variáveis categóricas ('object'): \n {var_cat}\n")
print(f"Há {len(var_num)} varáveis numéricas ('int64 e float 64'): \n {var_num}")
Há 44 variáveis categóricas ('object'): ['MSSubClass', 'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive', 'PoolQC', 'Fence', 'MiscFeature', 'SaleType', 'SaleCondition'] Há 35 varáveis numéricas ('int64 e float 64'): ['LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold']
Vamos agora passar a etapa de dados ausentes:
pd.DataFrame(X.isna().sum()/X.shape[0]).sort_values(by=0, ascending=False).T
PoolQC | MiscFeature | Alley | Fence | FireplaceQu | LotFrontage | GarageType | GarageYrBlt | GarageQual | GarageCond | GarageFinish | BsmtFinType2 | BsmtExposure | BsmtCond | BsmtFinType1 | BsmtQual | MasVnrArea | MasVnrType | Electrical | BedroomAbvGr | BsmtHalfBath | FullBath | KitchenAbvGr | HalfBath | Functional | Fireplaces | KitchenQual | TotRmsAbvGrd | MSSubClass | GrLivArea | GarageCars | GarageArea | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | SaleType | BsmtFullBath | HeatingQC | LowQualFinSF | Neighborhood | OverallCond | OverallQual | HouseStyle | BldgType | Condition2 | Condition1 | LandSlope | 2ndFlrSF | LotConfig | Utilities | LandContour | LotShape | Street | LotArea | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | ExterQual | ExterCond | Foundation | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | MSZoning | CentralAir | 1stFlrSF | SaleCondition | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.995205 | 0.963014 | 0.937671 | 0.807534 | 0.472603 | 0.177397 | 0.055479 | 0.055479 | 0.055479 | 0.055479 | 0.055479 | 0.026027 | 0.026027 | 0.025342 | 0.025342 | 0.025342 | 0.005479 | 0.005479 | 0.000685 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Vemos que as colunas 'PoolQC', 'MiscFeature', 'Alley', 'Fence' e FireplaceQu' apresentam grande quantidade de dados ausentes. A descrição do problema nos é informado que 'NA' significa ausência da feature (isto é, "ausência de piscina" para 'PoolQC', pode exemplo), no entanto, não temos o valor 'NA', e sim 'NaN'.
Para as colunas 'PoolQC', 'MiscFeature', 'Alley', 'Fence', ainda que venhamos a substituir os valores 'NaN' pela string 'NA', essas colunas apresentarão valores quasi-constantes, que terão pouco valor preditivo. Assim, optamos por removê-las da lista de variavéis a serem utilizadas.
No entanto, para a variável 'FirepplaceQu', optamos por substituir os valores 'NaN' pela string 'NA', ao invés de remover essa feature, pois ela apresenta menos de 50% de valores 'NaN'.
As demais colunas possuem baixo valor de variáveis ausentes e terão esses valores substituídos pela mediana, se variável numérica, ou pela valor mais frequente, se variável categórica.
#Removendo as variáveis com valores quasi-constantes
var_cat.remove('PoolQC')
var_cat.remove('MiscFeature')
var_cat.remove('Alley')
var_cat.remove('Fence')
#Substituindo os valores 'NaN' pela string 'NA'
X['FireplaceQu'].replace(np.NaN,'NA',inplace=True)
Vamos agora redefinir o dataframe de features baseado na exclusão de algumas colunas e pela substituição de NaN por NA na coluna 'FireplaceQu':
X = X[var_cat+var_num]
Vamos agora criar pipelines de transformação usando o ColumnTransformer:
num_trans = Pipeline(
steps=[("imputer", SimpleImputer(strategy="median")),
("scaler", StandardScaler())
]
)
cat_trans = Pipeline(
steps=[("imputer", SimpleImputer(strategy="most_frequent")),
("ohe", OneHotEncoder(drop='first', handle_unknown="ignore"))]
)
preprocessor = ColumnTransformer(
[("num", num_trans, var_num),
("cat", cat_trans, var_cat)
]
)
X = preprocessor.fit_transform(X)
Inicialmente, vamos testar diversos modelos diferentes para selecionar os mais promissores. Em seguida trabalhar na otimização dos melhores e finalmente tentar um ensemble dos modelos usando o método de Stacking.
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
from sklearn.model_selection import KFold
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import SGDRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
Para facilitar, vamos criar uma função para obter os modelos, uma outra para avaliá-los usando Validação Cruzada e outra função para gerar gráficos boxplot onde será possível comparar visuamente o desempenho de cada um. Testaremos os modelos de Regressão Linear, Random Forest Regressor, K-Nearest Neighbors Regressor e Support Vector Regressor.
#criando uma lista de modelos para avaliação
def get_models():
models = dict()
models['lr'] = LinearRegression()
models['rfr'] = RandomForestRegressor(random_state=0)
models['knn'] = KNeighborsRegressor()
models['svr'] = SVR()
return models
#avaliando cada modelo usando cross-validation
def evaluate_model(model, X, y):
cv = KFold(n_splits=5, random_state=0, shuffle=True)
scores = -cross_val_score(model, X, y, scoring = "neg_root_mean_squared_error",cv=cv)
return scores
#gerando gráficos para comparação visual do desempenho
def graph():
g = sns.boxplot(data=results)
g.set_xticklabels(names)
g.set_xlabel("Modelo")
g.set_ylabel("RMSE")
plt.show()
#obtendo os modelos
models = get_models()
#avaliando os modelos
results, names = list(), list()
for name, model in models.items():
scores = evaluate_model(model, X, y)
results.append(scores)
names.append(name)
print('%s = %.5f (%.5f)' % (name, np.mean(scores), np.std(scores)))
lr = 0.20495 (0.02186) rfr = 0.14340 (0.01834) knn = 0.17359 (0.02001) svr = 0.14246 (0.02211)
graph()
Vamos trabalhar a partir de agora na otimização de cada modelo separadamente.
Parte do desempenho dos modelos está relacionada com o tipo de de distribuição das variáveis (para ver mais, acesse a EDA aqui). Assim, vamos aplicar transformação logarítimica àquelas variáveis numéricas que apresentarem skew>0.5 (em valores absolutos), a fim de de diminuir as assimetrias.
y = np.log1p(train['SalePrice'])
X = train.drop(columns=['Id','SalePrice'])
Como exemplo do que queremos mostrar, observe as duas distribuições abaixo da coluna 'LotFrontage'. À esquerda, está a distribuição dos dados na sua forma original. À direita, a mesma distribuição com o logarítmo dos valores. Pode-se constatar que a forma da distribuição "tornou-se mais normal".
fig, ax = plt.subplots(1,2,figsize=(15,5))
sns.histplot(train['LotFrontage'],kde=True, ax=ax[0]).set_title('Distribuição de LotFrontage original')
sns.histplot(np.log1p(train['LotFrontage']),kde=True,ax=ax[1]).set_title('Distribuição de LotFrontage após aplicar o logaritmo')
plt.show()
Prosseguimos, então, com a tranformação logarítmica das colunas com alta assimetria.
skewed = X[var_num].skew().abs()
skewed = skewed[(skewed>0.5)].sort_values(ascending=False)
skewed
MiscVal 24.476794 PoolArea 14.828374 LotArea 12.207688 3SsnPorch 10.304342 LowQualFinSF 9.011341 KitchenAbvGr 4.488397 BsmtFinSF2 4.255261 ScreenPorch 4.122214 BsmtHalfBath 4.103403 EnclosedPorch 3.089872 MasVnrArea 2.669084 OpenPorchSF 2.364342 LotFrontage 2.163569 BsmtFinSF1 1.685503 WoodDeckSF 1.541376 TotalBsmtSF 1.524255 1stFlrSF 1.376757 GrLivArea 1.366560 BsmtUnfSF 0.920268 2ndFlrSF 0.813030 OverallCond 0.693067 TotRmsAbvGrd 0.676341 HalfBath 0.675897 Fireplaces 0.649565 GarageYrBlt 0.649415 YearBuilt 0.613461 BsmtFullBath 0.596067 YearRemodAdd 0.503562 dtype: float64
X[skewed.index] = np.log1p(X[skewed.index])
Vamos novamente substituir os valores NaN por NA.
X['FireplaceQu'].replace(np.NaN,'NA',inplace=True)
E redefinir o dataframe X a partir das variáveis que foram removidas:
X = X[var_cat+var_num]
X.shape
(1460, 75)
Chamamos novamente a transformação das variáveis usando o Column Transformer:
X = preprocessor.fit_transform(X)
Para otimizar a Regressão Linear, vamos empregar o método de regularização de Ridge, mudando o parâmetro 'alpha':
from sklearn.linear_model import Ridge
alphas = [0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 20, 25, 30, 50, 75,100]
rmse_ridge = [evaluate_model(Ridge(alpha = alpha), X, y).mean() for alpha in alphas]
plt.figure(figsize=(8,4))
sns.lineplot(x=alphas, y=rmse_ridge, marker='o')
plt.xlabel("alpha")
plt.ylabel("RMSE")
plt.text(25,0.130, f'alpha = {alphas[-6]}, menor RMSE = {min(rmse_ridge):.5}')
plt.show()
Apenas otimizando o hiperparâmetro 'alpha', conseguimos melhorar o desempenho do modelo, diminuindo o RMSE de 0.20495 para 0.12757!
Tentaremos otimizar a Regressão Linear empregando a regularização Lasso:
from sklearn import linear_model
alphas = [0.0001,0.0002,0.0003,0.0004,0.0005,0.00075,0.00085,0.001,0.0015,0.0020]
rmse_lasso = [evaluate_model(linear_model.Lasso(alpha=alpha), X, y).mean() for alpha in alphas]
plt.figure(figsize=(8,4))
sns.lineplot(x=alphas, y=rmse_lasso, marker='o')
plt.xlabel("alpha")
plt.ylabel("RMSE")
plt.text(0.0005,0.13, f'Alpha = {alphas[-5]}, menor RMSE = {min(rmse_lasso):.5}')
plt.show()
Vemos que a Regressão Lasso encontrou valores ainda menores que a Regressão Ridge (RMSE=0.12643).
Optamos por não tentar otimizar o modelo Random Forest Regressor.
Para melhorar o modelo KNN, vamos avaliar como o parâmetro 'n' impacta no RMSE do modelo.
from sklearn.neighbors import KNeighborsRegressor
n_neighbors = [2,3,4,5,6,7,8,9,10,20,30,40,50]
rmse_knn = [evaluate_model(KNeighborsRegressor(n_neighbors=n, weights='distance'),X,y).mean() for n in n_neighbors]
plt.figure(figsize=(8,4))
sns.lineplot(x=n_neighbors, y=rmse_knn, marker='o')
plt.xlabel("n_neighbors")
plt.ylabel("RMSE")
plt.text(4,0.184, f'n_neighbors = {n_neighbors[6]}, menor RMSE = {min(rmse_knn):.5}')
plt.show()
O modelo KNN obteve RMSE = 0.16977.
Para otimizar o modelo SVR, vamos analisar como RMSE varia em função dos parâmetros C e epsilon.
from sklearn.svm import SVR
Cs = [0.05,0.5,0.6,0.7,0.8,0.9,1,4]
eps = [0.0001,0.001,0.005,0.010,0.015,0.020,0.040]
table = pd.DataFrame(columns=['C','epsilon','RMSE'])
for C in Cs:
for e in eps:
rmse = evaluate_model(SVR(C=C, epsilon=e),X,y).mean()
table = table.append({'C':C,'epsilon':e,'RMSE':rmse},ignore_index=True)
Abaixo, as melhores combinações de C e epsilon que minimizam o RMSE:
table.sort_values(by='RMSE').head()
C | epsilon | RMSE | |
---|---|---|---|
18 | 0.6 | 0.015 | 0.134012 |
17 | 0.6 | 0.010 | 0.134021 |
19 | 0.6 | 0.020 | 0.134092 |
24 | 0.7 | 0.010 | 0.134097 |
25 | 0.7 | 0.015 | 0.134107 |
Podemos também visualizar o comportamento da RMSE em função de C e epsilon através as curvas abaixo:
plt.figure(figsize=(8,4))
sns.lineplot(x=table[table['C']==0.6]['epsilon'],y=table[table['C']==0.6]['RMSE'], marker='o')
sns.lineplot(x=table[table['C']==1]['epsilon'],y=table[table['C']==1]['RMSE'], marker='o')
sns.lineplot(x=table[table['C']==4]['epsilon'],y=table[table['C']==4]['RMSE'], marker='o')
plt.legend(labels=['C = 0.6','C = 1','C = 4'])
plt.text(0.001,0.1355,f"Menor RMSE = {table['RMSE'].min():.5}, C=0.6, epsilon=0.015")
plt.show()
Para o Support Vector Machine, vemos que a melhor combinação que diminui o valor de RMSE é C=0.6 e epsilon=0.015 (RMSE = 0.13401).
Para finalizar, vamos construir um modelo Ensemble que combine todos os modelos anteriores e comparar seu desempenho com cada modelo individualmente.
O tipo de algoritmo 'Stacking Generalization' ou simplesmente 'stacking' (empilhamento) envolve a combinação das predições de diferentes modelos de machine learning (base models), através da utilização de outro modelo (meta model).
from sklearn.ensemble import StackingRegressor
def get_stacking():
#definir os modelos base
level0 = list()
level0.append(('ridge', Ridge(alpha = 20)))
level0.append(('lasso', linear_model.Lasso(alpha=0.00075)))
level0.append(('rfr', RandomForestRegressor(random_state=0)))
level0.append(('knn', KNeighborsRegressor(n_neighbors=8, weights='distance')))
level0.append(('svr', SVR(C=0.6, epsilon=0.015)))
#definir modelo metalearner
level1 = LinearRegression()
#definir o ensemble stack
model_stack = StackingRegressor(estimators=level0, final_estimator = level1, cv = 5)
return model_stack
#Redefinindo a função de obter os modelos para levar em conta o ensemble
def get_models():
models = dict()
models['ridge'] = Ridge(alpha = 20)
models['lasso'] = linear_model.Lasso(alpha = 0.00075)
models['rfr'] = RandomForestRegressor(random_state=0)
models['knn'] = KNeighborsRegressor(n_neighbors=8, weights='distance')
models['svr'] = SVR(C=0.6, epsilon=0.015)
models['stack'] = get_stacking()
return models
#obtendo os modelos
models = get_models()
#avaliando os modelos
results, names = list(), list()
for name, model in models.items():
scores = evaluate_model(model, X, y)
results.append(scores)
names.append(name)
print('RMSE(%s) = %.5f (%.5f)' % (name, np.mean(scores), np.std(scores)))
RMSE(ridge) = 0.12757 (0.02473) RMSE(lasso) = 0.12643 (0.02487) RMSE(rfr) = 0.14353 (0.01795) RMSE(knn) = 0.16977 (0.01739) RMSE(svr) = 0.13401 (0.02006) RMSE(stack) = 0.12113 (0.02356)
graph()
Comparando os modelos individualmente, concluímos que quando realizamos o stacking, este ensemble é levemente superior a todos os outros, apresentando RMSE de 0.12113 contra o segundo melhor modelo, Lasso (RMSE= 0.12643). No entanto, ele apresenta um custo computacional mais alto, realizando os cálculos de maneira muito mais demorada, onde numa posterior implentação, seria necessário analisar o trade-off, para avaliar se compensaria utilizar um modelo como esse.
Uma vez que nosso modelo ensemble obteve um desempenho melhor que os outros modelos isoladamente, vamos utilizá-lo para predizer os dados de teste da competição:
#Lendo os dados
test = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Projetos/Advanced Houses/test.csv')
#Definimos o dataframe com as features
X_test = test.drop(columns=['Id'])
#Aplicamos o logaritmo nas variáveis com skewd>0.5, conforme tópico 5.1
X_test[skewed.index] = np.log1p(X_test[skewed.index])
#Vamos novamente substituir os valores NaN por NA.
X_test['FireplaceQu'].replace(np.NaN,'NA',inplace=True)
X_test = X_test[var_num+var_cat]
#Aplicamos a transformação dos dados
X_test = preprocessor.transform(X_test)
#Realizamos o ajuste utilizando o modelo Stacking
model_stack = get_stacking()
model_stack.fit(X,y)
#Predizemos os dados de teste
y_test = model_stack.predict(X_test)
#Criamos os dados para submissão
submission_stack = pd.DataFrame()
submission_stack['Id'] = test['Id']
submission_stack['SalePrice'] = np.expm1(y_test)
submission_stack
Id | SalePrice | |
---|---|---|
0 | 1461 | 117578.259435 |
1 | 1462 | 154601.103861 |
2 | 1463 | 184322.369472 |
3 | 1464 | 197057.079360 |
4 | 1465 | 199596.386230 |
... | ... | ... |
1454 | 2915 | 86660.037827 |
1455 | 2916 | 84167.554825 |
1456 | 2917 | 169186.286304 |
1457 | 2918 | 119185.445876 |
1458 | 2919 | 229885.536233 |
1459 rows × 2 columns
submission_stack.to_csv(index=False)
'Id,SalePrice\n1461,117578.25943534428\n1462,154601.10386109308\n1463,184322.3694721332\n1464,197057.07935951953\n1465,199596.38622976842\n1466,175577.89477671235\n1467,178090.92200392616\n1468,164169.05105159583\n1469,195833.64454858674\n1470,117138.90358118288\n1471,193978.03581511733\n1472,95076.56001656946\n1473,96428.31136743474\n1474,147678.5218468781\n1475,114602.79681061473\n1476,364314.63061858225\n1477,260034.7294382055\n1478,291327.32755789533\n1479,289599.31880385533\n1480,456338.475894588\n1481,338706.8253169904\n1482,219117.15636590595\n1483,179569.6483936275\n1484,166135.07663590115\n1485,193639.76197632134\n1486,197912.3843448395\n1487,336370.2228192486\n1488,239576.78514170198\n1489,199038.10233005273\n1490,226014.1604964759\n1491,195604.8559769749\n1492,88620.87588985468\n1493,189419.63985532295\n1494,297112.12935283006\n1495,298870.19473818265\n1496,235553.9049580433\n1497,179802.40124289668\n1498,162178.59050285688\n1499,160283.23740118445\n1500,157660.56568650083\n1501,175795.54676927836\n1502,150586.9110711016\n1503,276512.7129189956\n1504,242014.946409074\n1505,225502.35734366853\n1506,192016.37960292527\n1507,242452.1678773637\n1508,205338.40302382893\n1509,165069.67572069843\n1510,145060.15690849163\n1511,145642.19594077647\n1512,176343.5218295646\n1513,141995.36681522272\n1514,153989.680290079\n1515,196184.49562203666\n1516,155012.264880956\n1517,167791.8730916743\n1518,124723.05661489414\n1519,218912.34399117535\n1520,128217.52639352811\n1521,132783.27576072054\n1522,178568.01691959298\n1523,106282.34267801714\n1524,126825.80904008521\n1525,120978.79512039592\n1526,118358.3999414207\n1527,105848.10599230307\n1528,135547.6004179844\n1529,146880.9282061831\n1530,181648.58742964955\n1531,111772.36551916318\n1532,98088.04731573896\n1533,145150.59212732565\n1534,125586.71163108878\n1535,150631.47205683935\n1536,108001.80263363548\n1537,61525.130610498265\n1538,158949.3498609103\n1539,206223.62387087324\n1540,102952.40937400251\n1541,144434.2047701619\n1542,137159.4124333493\n1543,197782.96030272273\n1544,83134.05040077312\n1545,112161.55125807809\n1546,131621.81956671708\n1547,127799.82042565597\n1548,136590.78842130376\n1549,114962.56539626772\n1550,133727.4505181098\n1551,107685.61263665787\n1552,133109.21302287377\n1553,144621.7438683639\n1554,114757.28766049977\n1555,155234.32854484202\n1556,84831.9092329688\n1557,106187.03069821798\n1558,97097.68663889472\n1559,82012.90526858556\n1560,129442.49270433758\n1561,126740.03509239573\n1562,126495.06257358097\n1563,116024.44007997551\n1564,157298.74115816288\n1565,148153.12478794705\n1566,242746.73612234482\n1567,73084.14535537954\n1568,228054.9546595663\n1569,128806.78558549187\n1570,135646.85683549155\n1571,125603.17938172672\n1572,138935.16405249797\n1573,225949.9958292581\n1574,111458.93769846826\n1575,217973.24112747097\n1576,244076.6567209533\n1577,186244.87277700956\n1578,149255.2973439525\n1579,144987.37447450127\n1580,195443.09427385952\n1581,151296.43089700252\n1582,125933.30035199522\n1583,303275.11733609386\n1584,220554.28698261274\n1585,137743.4664660467\n1586,63013.64525747434\n1587,101690.1100463922\n1588,146493.08097642177\n1589,99041.14862678171\n1590,131597.27080115353\n1591,88723.63478999783\n1592,120743.22785358793\n1593,123274.59510893405\n1594,120109.65064808742\n1595,95502.58637019376\n1596,223582.3508804283\n1597,182059.39435528222\n1598,229848.333704118\n1599,172951.44416055214\n1600,166283.47035949503\n1601,62359.93708816476\n1602,110643.66132286236\n1603,88110.89729700121\n1604,267230.48310376034\n1605,253826.1210070169\n1606,169851.64094735787\n1607,171996.83201251115\n1608,218177.56819933833\n1609,186094.94381257275\n1610,158910.31618209087\n1611,142929.6206700268\n1612,170407.9103902753\n1613,164193.78086344094\n1614,124419.80906156063\n1615,90296.15906743056\n1616,75353.48899513175\n1617,89874.6183271794\n1618,122503.2928483479\n1619,138587.51061143112\n1620,147892.62332678545\n1621,142513.85291956578\n1622,142066.4008458928\n1623,273606.4637140251\n1624,215810.12851375784\n1625,117852.5745791333\n1626,171542.56907575374\n1627,193067.7906024793\n1628,294278.39729435154\n1629,186951.66097880024\n1630,351633.20598738425\n1631,234541.6651434909\n1632,260229.68907092733\n1633,178211.03600866563\n1634,188465.64862577742\n1635,176344.9548420213\n1636,153582.39967274925\n1637,195853.67660127842\n1638,191302.45500798803\n1639,196188.19562229252\n1640,259087.79974895276\n1641,189273.25180811726\n1642,237115.539355525\n1643,215556.4112970125\n1644,238132.05816927133\n1645,191172.22880992183\n1646,159990.56181112962\n1647,164191.31410500137\n1648,127604.51129462206\n1649,138704.07730666868\n1650,117181.7098126181\n1651,122865.8949206867\n1652,95455.09984961961\n1653,101321.05255375632\n1654,145778.67426179317\n1655,141671.2909244101\n1656,139456.58071725466\n1657,145158.73074930024\n1658,143966.71750434663\n1659,121796.32067100875\n1660,155284.6953331836\n1661,420321.6541014588\n1662,403041.62049625505\n1663,373244.05552423757\n1664,469659.0314765889\n1665,306784.2692282003\n1666,337336.1627678109\n1667,377694.09949037107\n1668,342015.5594635203\n1669,311920.8936229446\n1670,349714.0371286616\n1671,265033.36180240096\n1672,425537.37278034614\n1673,305995.45085561654\n1674,251241.57095592003\n1675,199690.31937001096\n1676,201871.11203481857\n1677,229990.91880778986\n1678,465987.21293255535\n1679,397270.08402856847\n1680,338441.9688134174\n1681,248161.4200516242\n1682,320188.68147751334\n1683,190682.69449213994\n1684,171671.0060492069\n1685,172931.03794914595\n1686,169380.14768412703\n1687,175025.36959224698\n1688,193084.05094913655\n1689,192563.77464762595\n1690,195184.25014482343\n1691,183912.32762533813\n1692,273744.75997824967\n1693,174460.8429835898\n1694,181712.97432145447\n1695,169140.02880751467\n1696,279023.40086621355\n1697,175750.89423347462\n1698,352863.60779118293\n1699,314644.55731589475\n1700,258295.57652440545\n1701,276483.4159810098\n1702,248689.05935716617\n1703,248692.09983847383\n1704,289664.2651539726\n1705,236978.4253303998\n1706,421035.65099589276\n1707,214853.107765224\n1708,204920.59907983473\n1709,267763.3056553195\n1710,221505.15549110982\n1711,276598.34296555055\n1712,254715.10207918828\n1713,273053.35459651315\n1714,228875.11270363437\n1715,212642.864671649\n1716,181074.29219508037\n1717,172937.7042182404\n1718,131749.53956597927\n1719,211478.88040543086\n1720,246380.91485138403\n1721,163733.23842330702\n1722,123689.42024495492\n1723,156719.27256642454\n1724,208286.45909617856\n1725,237942.26861827093\n1726,188978.04503413272\n1727,159381.06051395362\n1728,181556.60588575003\n1729,170155.37216143776\n1730,153919.1034024221\n1731,120655.55696216892\n1732,122426.20276832947\n1733,111482.52826059647\n1734,119212.01475589265\n1735,127093.68910526634\n1736,109735.73611400247\n1737,293658.7101610444\n1738,247376.05099629535\n1739,245239.45641730123\n1740,208008.1260532914\n1741,185772.76100195077\n1742,174552.8902136074\n1743,181373.00446793297\n1744,287074.0695261378\n1745,225968.23535844384\n1746,199853.0661561763\n1747,223692.7706048614\n1748,219823.80539706803\n1749,147166.547796414\n1750,129660.34942944477\n1751,241332.53081467174\n1752,113902.34364407124\n1753,145052.22249965736\n1754,201714.44843060465\n1755,169536.6690340516\n1756,131129.0700592669\n1757,118392.66129803738\n1758,143933.38406604607\n1759,165198.3607528596\n1760,166700.065352486\n1761,150146.20786292176\n1762,177229.41882220272\n1763,181219.29279660725\n1764,114313.53276145441\n1765,167339.91773302158\n1766,186076.40811457054\n1767,219042.31357348812\n1768,141460.09089778436\n1769,173056.42510512721\n1770,151189.93085848674\n1771,126921.7475532439\n1772,129941.4136662229\n1773,123356.750402868\n1774,133760.94716143562\n1775,140963.31917905775\n1776,126953.93509655153\n1777,107228.1844211164\n1778,136761.7590581628\n1779,113354.81225309821\n1780,181024.38110295942\n1781,128718.98907592105\n1782,83566.95723190106\n1783,140967.9059926062\n1784,106102.51353995211\n1785,119121.31087430558\n1786,140574.80109003713\n1787,168148.21592041195\n1788,60015.808252459115\n1789,95061.29909011917\n1790,74871.57806541168\n1791,175843.93535154633\n1792,163830.92686218914\n1793,127927.82678685251\n1794,151354.25505821605\n1795,135102.4000749501\n1796,129773.32758151017\n1797,118534.13129121109\n1798,119561.15576917009\n1799,107611.39508133125\n1800,130187.0864306535\n1801,125489.05961585836\n1802,141569.77126651772\n1803,146403.29870540882\n1804,140115.13988298576\n1805,134442.6900540009\n1806,124552.77748995865\n1807,137155.58702322494\n1808,124074.03230913704\n1809,122109.44696270609\n1810,139768.55904156558\n1811,102418.97420260534\n1812,100153.1579845548\n1813,121563.17854412102\n1814,91248.82193392012\n1815,55735.725169159094\n1816,100778.33597351471\n1817,106054.96797210856\n1818,140546.88226185157\n1819,117795.51062064468\n1820,63312.46015552268\n1821,114186.23681085343\n1822,151533.29461308673\n1823,52429.530311471404\n1824,130209.3762329314\n1825,131849.88970676129\n1826,101522.57481174968\n1827,101411.00355446477\n1828,140365.32737949636\n1829,127292.33697942136\n1830,140770.4275086422\n1831,157479.70410840737\n1832,81053.7214319932\n1833,139032.42255281415\n1834,118051.1537908059\n1835,134905.38871278704\n1836,119497.56976424443\n1837,89827.22638926159\n1838,128030.80983159055\n1839,98581.773423536\n1840,153927.91921965985\n1841,136391.85993972784\n1842,85687.38726046558\n1843,128986.03789637456\n1844,141867.33798357568\n1845,140217.15938563203\n1846,155103.48510087442\n1847,164366.61132517923\n1848,54970.912577668736\n1849,114378.65156959117\n1850,116357.1350419244\n1851,149851.84095448765\n1852,120418.74770725449\n1853,124442.13282267547\n1854,172075.09432157513\n1855,159592.3888875954\n1856,229860.00514130518\n1857,142172.15084288918\n1858,132076.41913296885\n1859,117978.61851142734\n1860,139833.04339948806\n1861,117336.05825394194\n1862,290163.45997640374\n1863,272142.05615745886\n1864,272219.92446047027\n1865,346085.8601721887\n1866,328044.14979946474\n1867,228510.10617184924\n1868,288701.84108258993\n1869,200995.9236306964\n1870,232370.8838918346\n1871,254056.60176990807\n1872,171954.68728260757\n1873,235850.4898655467\n1874,148775.5520768108\n1875,199909.2104048882\n1876,199843.32846553077\n1877,216646.96686718098\n1878,206341.23582086773\n1879,127498.66055111679\n1880,134282.78317877743\n1881,246121.84517915067\n1882,251260.26516141245\n1883,191374.9215267332\n1884,209355.5852375106\n1885,242493.83050013048\n1886,288825.27828661073\n1887,216969.1521164021\n1888,275021.8295366747\n1889,171702.66322237765\n1890,115994.89607990546\n1891,140395.83015050224\n1892,93121.00447598327\n1893,129653.93347580677\n1894,124738.10509456048\n1895,131654.54760447063\n1896,123224.98970946143\n1897,111423.82225671911\n1898,112256.92059818031\n1899,160930.63929903167\n1900,155952.1620369176\n1901,162191.59705258824\n1902,147828.61303337288\n1903,216083.80316340498\n1904,137443.49713652453\n1905,198674.21227326474\n1906,164520.02777536388\n1907,205793.642302397\n1908,108506.03868408466\n1909,129006.6706380124\n1910,118126.90173374314\n1911,221783.34325257782\n1912,326099.1381615923\n1913,137541.47375146335\n1914,66352.14722080546\n1915,301623.37862585776\n1916,65381.65187738521\n1917,256592.65891503243\n1918,138639.84746479886\n1919,183634.567606922\n1920,161304.0717912087\n1921,365683.31064336124\n1922,312459.81257262407\n1923,229017.60651229235\n1924,222216.6990138117\n1925,205833.78872006686\n1926,369228.6084683578\n1927,133356.40745957286\n1928,159142.53374916659\n1929,123208.11490025214\n1930,131996.678335947\n1931,140938.56038793386\n1932,142567.54834221693\n1933,178706.68308583286\n1934,183964.9167224492\n1935,174945.67343867896\n1936,196590.28794841893\n1937,189115.47176294943\n1938,171793.19325956304\n1939,249002.5910665966\n1940,193888.34034057526\n1941,170201.25548926386\n1942,194797.69846140916\n1943,217770.32154732948\n1944,358536.3139951604\n1945,394925.6782261672\n1946,129971.9037000101\n1947,316243.85223335563\n1948,171434.28748436572\n1949,249429.95219305134\n1950,194587.06243307138\n1951,252011.2999150992\n1952,214654.50381632548\n1953,171703.61833654455\n1954,193721.40454655036\n1955,139986.1075778294\n1956,291015.0881008809\n1957,155718.72490933028\n1958,310456.07858645334\n1959,139917.69062883448\n1960,111799.8545132307\n1961,117163.34285028101\n1962,97154.2571194268\n1963,104969.45551496634\n1964,109189.65674574493\n1965,144020.8469807693\n1966,143334.75469209775\n1967,301078.90786862915\n1968,411006.62861518806\n1969,387957.9041372475\n1970,427343.7297307062\n1971,463101.18633868935\n1972,382531.3699886854\n1973,273235.70569032547\n1974,337030.0874304591\n1975,372312.8992848224\n1976,282988.95893746865\n1977,361099.11220671603\n1978,365610.12696725584\n1979,346061.1797481174\n1980,209902.73614278395\n1981,350103.94002807664\n1982,229785.50811719528\n1983,213045.80372720145\n1984,171949.31797881515\n1985,234122.22177379415\n1986,219468.9497495374\n1987,179416.26478586608\n1988,175163.4535354284\n1989,195652.61879755603\n1990,213753.4710775643\n1991,240684.39882838502\n1992,219446.2606544187\n1993,165932.8598136196\n1994,234552.6673329601\n1995,184394.79520306358\n1996,246149.5129662304\n1997,319541.15211288154\n1998,322399.78726585215\n1999,272033.6433549053\n2000,320389.8319323005\n2001,277013.7141340943\n2002,250103.95911079727\n2003,266974.54321384925\n2004,288529.2701166547\n2005,227933.44461640104\n2006,222415.30467685324\n2007,254776.87153207444\n2008,210963.87459784953\n2009,202681.39926408502\n2010,194504.83533527455\n2011,139666.63839957805\n2012,170078.72094151238\n2013,180215.73613578005\n2014,186542.83994083887\n2015,202883.8265940568\n2016,194687.65056470977\n2017,197633.5139563142\n2018,118740.38415730347\n2019,129884.41596562741\n2020,108570.0702075865\n2021,98958.21248405264\n2022,187930.73089273184\n2023,139297.77562615217\n2024,265642.31694381405\n2025,351750.4857804535\n2026,178352.09176212153\n2027,156684.50196822986\n2028,152060.18641727627\n2029,167564.3619699754\n2030,272115.09707918146\n2031,241663.89255675246\n2032,252840.42263958152\n2033,260311.26137654227\n2034,171224.6249063137\n2035,228459.04457080824\n2036,197934.93627078258\n2037,199693.07872774068\n2038,308624.1849624845\n2039,228217.89716193633\n2040,302340.35408137203\n2041,289948.23612516583\n2042,209400.64953072334\n2043,178599.83530188765\n2044,177599.2290486956\n2045,213394.06659652683\n2046,137204.38001198607\n2047,144650.7671745219\n2048,136636.6198827777\n2049,139417.5660203117\n2050,175941.80714414656\n2051,102387.02753022271\n2052,124748.55784565878\n2053,147084.3895755673\n2054,88626.28203926864\n2055,158410.90004277343\n2056,147145.54355724447\n2057,112386.23588839918\n2058,216293.37672612746\n2059,130204.82436931823\n2060,173580.94212873155\n2061,176156.3800172961\n2062,129541.34600889326\n2063,115421.8296002766\n2064,137830.84318496205\n2065,117587.19382392864\n2066,167892.90222043885\n2067,121019.93703616412\n2068,144579.12779584518\n2069,86085.06958961565\n2070,112256.95033390145\n2071,88657.6297911919\n2072,140402.39831823672\n2073,133404.30958260223\n2074,185025.7637049017\n2075,151124.96651672397\n2076,120497.65640791235\n2077,150352.38255689855\n2078,120135.0313259587\n2079,137212.22340983272\n2080,115943.8437731648\n2081,125506.03685959523\n2082,126096.62465845293\n2083,146822.68927745582\n2084,104923.07373938462\n2085,111567.42181757893\n2086,122657.23001903207\n2087,122685.91590269443\n2088,97393.25762790133\n2089,74136.15642430756\n2090,129553.73322526617\n2091,108067.30853900312\n2092,125127.7266628001\n2093,116591.58933053995\n2094,115200.8643663047\n2095,140129.65717553566\n2096,82695.79881843564\n2097,98582.06744519316\n2098,145165.96632273553\n2099,52913.23264211134\n2100,86443.71208198447\n2101,116050.17040441974\n2102,131241.50295903912\n2103,98240.44126624298\n2104,136594.18609136448\n2105,126926.98427129073\n2106,62453.36538097472\n2107,197791.82549944747\n2108,119649.03909922164\n2109,110311.58618442647\n2110,126888.5226505919\n2111,144033.19067150357\n2112,136410.10028187223\n2113,115675.7507864153\n2114,118902.68337392347\n2115,159373.69960336067\n2116,112144.4316511605\n2117,147617.72962691163\n2118,116670.95852893237\n2119,113460.59354389642\n2120,112150.74327027233\n2121,101610.14078127239\n2122,107262.75358457831\n2123,81951.86873376099\n2124,162047.71617793036\n2125,126261.77368160612\n2126,144894.51402821823\n2127,162821.82497533684\n2128,125004.48982644724\n2129,88577.3914403274\n2130,133947.48765133592\n2131,143029.89920650673\n2132,111375.67480226667\n2133,114432.45546212091\n2134,117379.00103412202\n2135,89826.31456046908\n2136,75860.96329709074\n2137,110859.58848718983\n2138,134239.5217385492\n2139,153972.83344562052\n2140,138749.49153669577\n2141,159566.88648530413\n2142,126416.51760238023\n2143,147041.2452310002\n2144,119291.39603751144\n2145,134273.89254510126\n2146,169405.59029438513\n2147,146651.02178809262\n2148,139209.27982045859\n2149,143746.32590680552\n2150,241029.9569158691\n2151,112142.29404239955\n2152,169537.13800257782\n2153,156266.25083646635\n2154,103848.95793072405\n2155,143343.2353867337\n2156,250942.8731149427\n2157,230763.5043428519\n2158,238233.71832945908\n2159,217475.01473032916\n2160,179097.98615027766\n2161,241971.85839643993\n2162,376502.7218702546\n2163,333713.34243465867\n2164,256146.00767880274\n2165,204576.38736531444\n2166,154834.76600152967\n2167,217305.37614013086\n2168,200230.3787713864\n2169,198422.90942570262\n2170,219489.01051610257\n2171,155131.81324384405\n2172,136729.41991022366\n2173,173223.03057602333\n2174,227033.53172255648\n2175,303764.24237384816\n2176,324025.9483986923\n2177,240325.97515181825\n2178,214044.3843593663\n2179,140576.6561866604\n2180,203900.42800887988\n2181,192416.83528542376\n2182,228348.85345690715\n2183,190963.24151548382\n2184,119973.1135798506\n2185,127141.56400167121\n2186,147058.11351175688\n2187,151758.90109749231\n2188,152585.82831405773\n2189,235015.1967782473\n2190,78226.51768020411\n2191,80840.6251596642\n2192,86226.88416536732\n2193,103541.61181355653\n2194,100022.62284874536\n2195,101970.21650898003\n2196,101197.98046827955\n2197,114012.62362958927\n2198,155881.59289434212\n2199,171409.48760517297\n2200,145831.04836525267\n2201,135380.0764457595\n2202,221203.60959950223\n2203,128431.70723544173\n2204,171320.35478077023\n2205,113443.95282253955\n2206,137277.3516255023\n2207,218879.3949533776\n2208,265679.2450944201\n2209,228253.4594943539\n2210,118107.17528638268\n2211,115510.09054227086\n2212,115788.49185298369\n2213,102067.72017351059\n2214,128860.98467004103\n2215,96779.11560683994\n2216,141164.91295406298\n2217,68164.09508010503\n2218,87579.77293579187\n2219,77343.6886445986\n2220,80202.2578865691\n2221,304636.67723623564\n2222,267418.79683349014\n2223,296052.0223372394\n2224,216302.15931528498\n2225,126202.61129023\n2226,182111.5735822999\n2227,199685.14861486014\n2228,273018.19540958543\n2229,244418.9026709333\n2230,158730.36892645343\n2231,211355.97855290165\n2232,178537.18215264723\n2233,173291.35540925112\n2234,240786.63367578175\n2235,214372.40156826976\n2236,256905.16949451843\n2237,321855.2366065192\n2238,200444.98377477506\n2239,111840.10849062499\n2240,160156.31765632436\n2241,149663.56714918255\n2242,125285.90430996243\n2243,130730.14330717322\n2244,99414.94844037034\n2245,105836.13199681423\n2246,142786.15419180575\n2247,114202.95321884264\n2248,120332.12479348863\n2249,116933.668443594\n2250,130170.90521162948\n2251,99175.28439131854\n2252,180338.25018184455\n2253,155200.10104973495\n2254,185271.93370424368\n2255,191789.35006009624\n2256,180822.23194460213\n2257,209011.8540145186\n2258,162013.72958439437\n2259,177303.33439865208\n2260,155573.53286898867\n2261,197861.3140196946\n2262,213016.1755596586\n2263,378944.60387155233\n2264,446752.58279139374\n2265,180775.5158042282\n2266,307121.7434970696\n2267,355077.21489435434\n2268,410985.4225335532\n2269,157011.3476529513\n2270,194664.195282781\n2271,223138.93300750107\n2272,199580.546117845\n2273,162123.08243016858\n2274,182698.16682631604\n2275,156504.38723325025\n2276,184619.98164016783\n2277,182159.33026289943\n2278,155705.73938333715\n2279,118292.45778589499\n2280,121881.43866505956\n2281,168786.82066398626\n2282,184494.94396204053\n2283,106065.97414630273\n2284,109850.16503357074\n2285,142857.34251984354\n2286,122743.86393834476\n2287,375195.6918468307\n2288,280830.48156845715\n2289,388268.90090231044\n2290,444462.0952401379\n2291,338253.80771178455\n2292,446569.4818584699\n2293,473678.8721930467\n2294,407607.346454522\n2295,485467.1885583733\n2296,287936.1416673105\n2297,342399.48751667823\n2298,337835.96614560235\n2299,355614.88721291977\n2300,341736.15769700304\n2301,310173.0046080779\n2302,261345.808518119\n2303,258665.79457297293\n2304,260609.14503943615\n2305,199455.58235082572\n2306,194985.1895257568\n2307,204968.31367323463\n2308,228840.93401785847\n2309,292941.21802186646\n2310,219794.56125558293\n2311,204467.7440514826\n2312,171026.58799742395\n2313,170538.01345551765\n2314,171663.4964844351\n2315,177090.87228187858\n2316,195785.85619437246\n2317,189426.71896917373\n2318,176369.93864152365\n2319,180258.63682823794\n2320,183948.67071846285\n2321,250889.36391305816\n2322,175771.90291528293\n2323,199636.0991979975\n2324,181948.6743714217\n2325,219612.33358714526\n2326,178410.47267408308\n2327,207780.19886889865\n2328,224901.0755192935\n2329,192258.51428264132\n2330,181114.87293650777\n2331,356931.75017522456\n2332,426181.46681309736\n2333,331744.0101505752\n2334,267406.87897062826\n2335,286230.8422649624\n2336,317170.7992373653\n2337,202395.41893463346\n2338,272838.9625947211\n2339,226664.34702060543\n2340,399662.67785529234\n2341,221179.83234503272\n2342,245674.4556851018\n2343,233006.127204903\n2344,232062.88719019023\n2345,248015.82299913812\n2346,212710.67920541432\n2347,198668.6226895858\n2348,242367.56199602602\n2349,172599.4467491631\n2350,308457.69844504475\n2351,254256.6614988861\n2352,256003.29477069396\n2353,237493.50736581883\n2354,132323.9224343813\n2355,144840.37629768954\n2356,152036.13389572664\n2357,191001.15710975247\n2358,200648.5727624003\n2359,128567.52701768576\n2360,114288.56258928435\n2361,145196.53343312486\n2362,275807.22665202274\n2363,141500.8045872452\n2364,157688.388069872\n2365,224161.34647544022\n2366,191773.7340070062\n2367,227593.3235904468\n2368,211898.23646761235\n2369,223013.10884949553\n2370,169345.91547635844\n2371,167774.26009796435\n2372,186033.37907244812\n2373,283625.41809017013\n2374,306288.0686499201\n2375,261833.0681894677\n2376,280401.64443449693\n2377,338399.2150086693\n2378,142220.47697712472\n2379,204703.0031022632\n2380,147057.6849000146\n2381,169500.2219577817\n2382,212931.89273584995\n2383,209847.77380003082\n2384,251782.53657269408\n2385,161347.1892814603\n2386,132225.70065036818\n2387,136983.26538354598\n2388,106214.62740149129\n2389,119944.353101222\n2390,149331.45092839975\n2391,138225.47659597726\n2392,111096.54226084777\n2393,164129.9071709943\n2394,142793.80431053042\n2395,209419.10855903587\n2396,147512.74184685317\n2397,220185.62881875405\n2398,131387.44403265166\n2399,65791.53095590803\n2400,67261.91674170666\n2401,114169.2092607111\n2402,135782.83695102422\n2403,139057.49192559946\n2404,158152.83473914576\n2405,159676.37122076962\n2406,138919.94670925403\n2407,126503.36670066981\n2408,145576.4557600393\n2409,118241.57677672504\n2410,169087.0640922938\n2411,113778.46005077679\n2412,157242.06268930968\n2413,131419.5589837712\n2414,151927.1836595855\n2415,126974.31686384874\n2416,127413.14558811468\n2417,126154.7281732518\n2418,129543.33395234102\n2419,127558.06887295852\n2420,119481.13608144023\n2421,150492.04907126792\n2422,105459.30765056942\n2423,116738.1429233845\n2424,156770.50071196316\n2425,192273.42640277796\n2426,129033.70809436963\n2427,126301.15508468017\n2428,178178.28165000892\n2429,112943.94865680489\n2430,132301.30065077302\n2431,105810.54074893953\n2432,142226.88532596076\n2433,142769.00104076814\n2434,138479.90928214934\n2435,149486.3729617889\n2436,100150.75978139312\n2437,102245.20876494769\n2438,119211.4407260945\n2439,102167.9266182403\n2440,121068.43377862294\n2441,95922.62113664925\n2442,96194.79378908491\n2443,123846.9708258137\n2444,130139.29357934567\n2445,90222.52097308933\n2446,133200.94220970004\n2447,187737.5123010929\n2448,133905.3400989219\n2449,110038.31288801545\n2450,153005.29864481805\n2451,117583.09517149642\n2452,198875.22650113236\n2453,91369.82786938103\n2454,121634.01203829791\n2455,119647.91203060315\n2456,132840.93507820583\n2457,127602.82805736733\n2458,124541.36220616294\n2459,106799.33064156128\n2460,139530.15830635597\n2461,128577.66568028153\n2462,134183.7503665258\n2463,123174.16505338284\n2464,167627.61729394214\n2465,133934.56817433008\n2466,111595.70816984703\n2467,135530.1850137841\n2468,81259.78090228414\n2469,74684.62606651538\n2470,193890.78707532398\n2471,201997.54255970183\n2472,156930.29877209742\n2473,119077.06895076216\n2474,81230.67594350054\n2475,225377.2779862808\n2476,113518.78261201733\n2477,125566.2754785801\n2478,155354.24660943967\n2479,102016.82890068827\n2480,155497.34167531153\n2481,121381.74762586097\n2482,126245.54892098311\n2483,112256.24482474415\n2484,122942.97594784878\n2485,120920.262879023\n2486,150053.12787864738\n2487,172284.44333797088\n2488,165852.27947690155\n2489,156336.0922993312\n2490,144343.16123132844\n2491,95137.4000227898\n2492,190723.5383827443\n2493,154369.05433328298\n2494,152720.472599921\n2495,89216.70265669835\n2496,234717.87822781134\n2497,149664.84674899848\n2498,102840.01291134946\n2499,82967.87051305891\n2500,122067.8170569287\n2501,138483.0696993556\n2502,143134.86497959195\n2503,97342.0222769237\n2504,174758.21046697279\n2505,226436.58813382196\n2506,253748.2818273715\n2507,296734.36056633166\n2508,255012.90148326437\n2509,221348.17943978473\n2510,218891.46605337068\n2511,175870.52267767538\n2512,212484.70932613878\n2513,229639.64529777464\n2514,251596.18654252207\n2515,149023.04127575102\n2516,176212.2035590047\n2517,142761.56783641886\n2518,150104.27988008855\n2519,235817.0980462052\n2520,218460.0262108546\n2521,188646.22587319487\n2522,226224.50604169403\n2523,116189.99726179408\n2524,135138.6387585131\n2525,147326.60666771544\n2526,134700.8038987343\n2527,115509.22213617622\n2528,122400.51979007009\n2529,129308.25702876066\n2530,123198.6712974396\n2531,260991.3092196532\n2532,229029.47945125672\n2533,194354.14905165078\n2534,238081.29802897948\n2535,296713.2983597426\n2536,241956.81806427622\n2537,242481.65448669618\n2538,181860.01130597066\n2539,190265.83614185435\n2540,176569.67561508273\n2541,181232.1708760587\n2542,161076.31082692734\n2543,119360.40964343562\n2544,118491.83877257374\n2545,134432.33361286804\n2546,121864.59132462075\n2547,137548.87703453877\n2548,149767.0245163345\n2549,158881.05929088715\n2550,437556.1924089505\n2551,141543.00548816126\n2552,123529.9650261173\n2553,78561.18324280786\n2554,102237.48615805374\n2555,114276.38254056692\n2556,93742.26495005185\n2557,102115.29931407952\n2558,148726.5600314295\n2559,134848.48485063485\n2560,146996.7400721327\n2561,146933.08219453954\n2562,135231.85310687203\n2563,149982.84996746664\n2564,197556.4434340781\n2565,134664.21937949196\n2566,156723.23735309648\n2567,140356.30742260435\n2568,191320.3186127913\n2569,218113.68700415833\n2570,116395.2896206599\n2571,182483.23315851853\n2572,157300.79434529578\n2573,226141.67012419572\n2574,269702.0307365296\n2575,125403.19205464052\n2576,121980.0301914009\n2577,144190.06149221028\n2578,82722.59567255314\n2579,71183.63566292435\n2580,112421.83129309783\n2581,118975.60074257644\n2582,124532.78362185863\n2583,279492.1862604034\n2584,179837.61752372613\n2585,186843.44825442912\n2586,222646.68275183832\n2587,206048.08646109945\n2588,153681.42979636378\n2589,148829.06883405364\n2590,211402.2073676727\n2591,230865.24874386174\n2592,214728.2073436814\n2593,237611.376136492\n2594,184039.80799164224\n2595,200923.46903716197\n2596,299943.2537512956\n2597,181901.5585867794\n2598,291698.6524813756\n2599,328755.5075227958\n2600,153534.3815421754\n2601,143279.8432844187\n2602,80519.26123445918\n2603,89775.4692242533\n2604,87461.72788253012\n2605,76193.24752199452\n2606,143892.34448965578\n2607,202595.9099651017\n2608,211578.13188072157\n2609,168104.37995266877\n2610,110344.56779076085\n2611,125054.65505756863\n2612,160870.89140866543\n2613,127555.33371955191\n2614,122727.21715951036\n2615,156383.43767685027\n2616,144283.2478304386\n2617,176230.9726463161\n2618,190434.16385129772\n2619,205061.9943204177\n2620,189632.4402369924\n2621,175606.46407755045\n2622,184702.73871861002\n2623,240193.91819894142\n2624,313122.2130039674\n2625,290041.72541390144\n2626,173286.00481554077\n2627,170234.03116608987\n2628,483673.6818405488\n2629,497960.57326352515\n2630,382225.7536576474\n2631,474912.3502248952\n2632,419734.2589597807\n2633,323064.5417522038\n2634,424242.3003159008\n2635,161191.94921530708\n2636,178444.34813755463\n2637,175570.89043704033\n2638,256424.2779241873\n2639,194869.24573441673\n2640,153223.0472876949\n2641,101956.87495243126\n2642,192708.83492707775\n2643,105254.38766651053\n2644,120762.07841140014\n2645,106830.43396579991\n2646,92297.86786924122\n2647,103316.12258326754\n2648,144326.75355108632\n2649,156252.60024864116\n2650,142712.0935430243\n2651,139693.5953302186\n2652,411242.898978671\n2653,252050.94493125926\n2654,257635.90350767545\n2655,384428.31497430406\n2656,329482.146160144\n2657,345048.28404253087\n2658,303592.806525624\n2659,316123.1670125379\n2660,370181.6246566534\n2661,365895.4075129803\n2662,359857.8499290987\n2663,303768.9046108435\n2664,281692.5417941315\n2665,333805.38862080884\n2666,294823.8889184421\n2667,166677.02185331486\n2668,177554.39800016177\n2669,180244.79543715867\n2670,286325.74312702287\n2671,185377.27447302087\n2672,198642.8601261046\n2673,207339.39907335202\n2674,202743.68647562753\n2675,172187.83277865214\n2676,191730.76255941502\n2677,199290.84533009038\n2678,263956.61015893944\n2679,286428.22549472615\n2680,297646.156412074\n2681,411158.1694662168\n2682,328855.8686801688\n2683,490462.7138957489\n2684,333254.76935799845\n2685,336481.7899503693\n2686,251386.9484696583\n2687,311804.39745370694\n2688,218930.58237886836\n2689,207066.44063801822\n2690,412868.2807553618\n2691,191017.90654366033\n2692,136877.1800544338\n2693,198531.30682634978\n2694,132675.43687433313\n2695,191159.69844068852\n2696,184689.51112135613\n2697,187422.24474082625\n2698,191606.09358418174\n2699,178551.6027432137\n2700,158683.51875749978\n2701,151238.293591478\n2702,111027.87509434209\n2703,129251.1869612986\n2704,142999.3659394564\n2705,113031.02523304141\n2706,111734.7489309432\n2707,124931.44407094897\n2708,131505.8141703429\n2709,107205.4893388882\n2710,126851.2824105086\n2711,229091.95527482033\n2712,371708.95082421903\n2713,172892.43128760048\n2714,148215.5772199916\n2715,170512.18834123077\n2716,149543.22988280683\n2717,194988.90786575817\n2718,221708.3308875066\n2719,155951.96086523897\n2720,175166.8797497281\n2721,131867.1315698793\n2722,157077.28353845706\n2723,142347.96459963528\n2724,123282.28888506792\n2725,129864.75279665182\n2726,146798.75875714578\n2727,182749.72072674832\n2728,184635.54961258755\n2729,150085.67544174762\n2730,147393.57789859676\n2731,124918.40233752485\n2732,132641.61421278268\n2733,160643.18760566995\n2734,140549.26381869652\n2735,131278.2977669769\n2736,142001.95676352087\n2737,117279.51390138296\n2738,134967.1093738705\n2739,157857.06261463935\n2740,132687.5915468404\n2741,145046.4294216244\n2742,153975.01220287848\n2743,156951.55349074412\n2744,156371.58008866353\n2745,139983.83260069345\n2746,135513.43391879552\n2747,151760.91196540513\n2748,126837.18448653826\n2749,128120.03967358505\n2750,132603.08625106714\n2751,132872.15625066782\n2752,217971.32023030936\n2753,149512.09589559725\n2754,206278.39608926827\n2755,135020.96624826404\n2756,92921.60434005919\n2757,71666.38964928426\n2758,88712.22853827875\n2759,156785.27837430997\n2760,123991.27031653002\n2761,139723.76078663065\n2762,141290.205374936\n2763,190838.2743336716\n2764,152295.54443796372\n2765,306494.87953266624\n2766,131428.67796152408\n2767,89065.18968201803\n2768,128843.78088136081\n2769,127251.69170607986\n2770,145182.08430663092\n2771,108110.6684802311\n2772,113975.02985629716\n2773,165617.25920432026\n2774,141504.42066841095\n2775,124374.1728521385\n2776,146214.22352735215\n2777,142172.9307638285\n2778,112555.79821663653\n2779,120925.64562181552\n2780,98294.61132469741\n2781,94321.24339890474\n2782,94800.38095929791\n2783,95909.84408222223\n2784,122739.22165071112\n2785,135106.59481398828\n2786,66766.98985599585\n2787,129180.7854798207\n2788,73225.35052651077\n2789,178220.6047574852\n2790,98363.46526507534\n2791,108296.78074150412\n2792,67680.62369484855\n2793,164558.69104522746\n2794,97078.37729780222\n2795,119456.69081743945\n2796,93877.09289919099\n2797,198685.7482263114\n2798,113920.87025333311\n2799,115058.71252677936\n2800,66671.00876691956\n2801,106110.80373997761\n2802,129142.46859769365\n2803,155909.90631749164\n2804,133661.38702242295\n2805,96832.21976755674\n2806,78536.56546047682\n2807,160518.3094340028\n2808,145769.20881339276\n2809,129650.23583426666\n2810,130020.34712027386\n2811,164299.2632958381\n2812,164930.64123941233\n2813,152264.89741768895\n2814,155470.01782665466\n2815,101211.17364721603\n2816,236190.38470507934\n2817,145600.44639880685\n2818,130607.03138645018\n2819,167017.12673452863\n2820,133448.5882575209\n2821,103319.74920667218\n2822,199547.83617886575\n2823,289217.50608274643\n2824,165196.52887961254\n2825,144345.9960378881\n2826,127872.56022269165\n2827,139435.60553945345\n2828,215289.78866390718\n2829,206744.46457883436\n2830,232949.77847739248\n2831,179720.19526898305\n2832,241065.28120849252\n2833,309075.4784102683\n2834,223518.7309007568\n2835,223148.98764421372\n2836,191472.77039424542\n2837,165527.929252597\n2838,146869.73801061546\n2839,170283.398033935\n2840,203379.87903071463\n2841,204865.49094488402\n2842,233574.7530901601\n2843,147660.11748787653\n2844,177368.83786246134\n2845,125517.36597360016\n2846,210946.60006271277\n2847,208254.97955166557\n2848,223172.47928473275\n2849,203755.3251883776\n2850,286820.507354346\n2851,217479.6635793572\n2852,238554.75668575105\n2853,236462.99388300997\n2854,137172.40751783104\n2855,201523.34803487654\n2856,200578.33655772058\n2857,189384.66096597744\n2858,205987.0344737348\n2859,115419.92826033289\n2860,130214.29484832702\n2861,126020.19334298435\n2862,187281.9555816951\n2863,124439.9216838468\n2864,236445.22204878478\n2865,144623.18931534982\n2866,139207.2852606426\n2867,91879.26331826541\n2868,93900.48296069608\n2869,99487.579547038\n2870,128283.66233315768\n2871,90765.52355476486\n2872,48978.84149065566\n2873,98912.39968989247\n2874,143283.6031331402\n2875,113342.50065321193\n2876,162307.50860689412\n2877,141484.28773195946\n2878,171891.48193677166\n2879,140326.52485531615\n2880,104425.22061868774\n2881,146129.80900017073\n2882,168039.44191200432\n2883,193252.42887986518\n2884,210571.44545784034\n2885,190459.68008439592\n2886,221607.15443759432\n2887,95413.2204453601\n2888,132322.9356157994\n2889,56214.64113403039\n2890,75910.32213394923\n2891,132416.7679762803\n2892,56669.92703968279\n2893,96791.64099791527\n2894,66498.59701268376\n2895,311544.399163394\n2896,277435.88855715765\n2897,219888.38005308047\n2898,151355.18895414058\n2899,211108.39799606314\n2900,163102.2659720965\n2901,203082.8178086734\n2902,186859.57423805963\n2903,331926.7525475502\n2904,345555.5922854043\n2905,92964.05064696432\n2906,203137.9109870475\n2907,113142.67592681957\n2908,130217.59253504862\n2909,152948.51454304782\n2910,78075.04498033761\n2911,80588.05484898394\n2912,156791.49349737584\n2913,85931.42234390511\n2914,78537.68110037813\n2915,86660.03782677995\n2916,84167.55482507776\n2917,169186.2863039336\n2918,119185.44587644828\n2919,229885.53623297266\n'
O score obtido na competição foi 0.12191.
Neste trabalho, comparou-se o desempenho dos modelos de Regressão Linear, Random Forest Regressor, KNN Regressor e Support Vector Regression, onde, antes da otimização o melhor foi o Support Vector Regression, como RMSE = 0.14246. Após a otimização de cada modelo, o menor RMSE foi obtido usando a regularização Lasso (RMSE = 0.12457). Finalmente, temos a superação de todos esses modelos através do Stacking Regression, que chegou a um RMSE = 0.12113.
A técnica de empilhamento de diversos modelos, chamada de Stacking pode ser eficiente para criar um modelo do tipo ensemble que apresenta desempenho superior a outros modelos isolados. No entanto, isso pode vir a ter um alto custo computacional e o ganho pode não ser tão superior, tornando a sua utilização inviável.