Modelagem do dataset de competição do Kaggle "House Prices - Advanced Regression Techniques"¶

housesbanner.png

1 Resumo¶

Esta é a segunda parte do projeto relacionado ao dataset de competição do Kaggle "House Prices - Advanced Regression Techniques", que teve como objetivos:

  1. Analisar e comparar o desempenho de 4 modelos de Machine Learning (Regressão Linear, Random Forest Regressor, KNN Regressor e Support Vector Regression).

  2. Otimizar cada modelo através do tuning dos respectivos hiperparâmetros, analisando de que forma eles impactam no desempenho.

  3. Utilizar a técnica de Stacking Generalization, para combinar todas as predições de forma a encontrar um modelo cujo desempenho supere todos os outros modelos individualmente.

Para a Análise Exploratória de Dados completa, clique aqui.

2 Aquisição dos dados¶

In [1]:
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
pd.set_option('display.max_columns', 81)
In [3]:
train = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Projetos/Advanced Houses/train.csv')
train.head()
Out[3]:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
0 1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NaN Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NaN NaN NaN 0 2 2008 WD Normal 208500
1 2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976.0 RFn 2 460 TA TA Y 298 0 0 0 0 0 NaN NaN NaN 0 5 2007 WD Normal 181500
2 3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162.0 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001.0 RFn 2 608 TA TA Y 0 42 0 0 0 0 NaN NaN NaN 0 9 2008 WD Normal 223500
3 4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0.0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998.0 Unf 3 642 TA TA Y 0 35 272 0 0 0 NaN NaN NaN 0 2 2006 WD Abnorml 140000
4 5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350.0 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000.0 RFn 3 836 TA TA Y 192 84 0 0 0 0 NaN NaN NaN 0 12 2008 WD Normal 250000

Como nesta competição oss envios são avaliados em Root-Mean-Squared-Error (RMSE) entre o logaritmo do valor previsto e o logaritmo do preço de venda observado, vamos desde já fazer a transformação da variável target, usando a função log1p do Numpy:

In [4]:
y = np.log1p(train['SalePrice'])
X = train.drop(columns=['Id','SalePrice'])

3 Pré-processamento¶

Para facilitar a etapa de pré-processamento, vamos separar as colunas em dois tipos, numéricas e categóricas, uma vez que cada um demana diferentes transformações.

3.1 Tipos de dados¶

In [5]:
types = pd.DataFrame(X.dtypes).T
types
Out[5]:
MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
0 int64 object float64 int64 object object object object object object object object object object object object int64 int64 int64 int64 object object object object object float64 object object object object object object object int64 object int64 int64 int64 object object object object int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 object int64 object int64 object object float64 object int64 int64 object object object int64 int64 int64 int64 int64 int64 object object object int64 int64 int64 object object

Precisamos corrigir os tipos de algumas colunas, de acordo com as descrições disponíveis na competição:

In [6]:
X['MSSubClass'] = X['MSSubClass'].astype(object)
In [7]:
var_cat =[]
var_num =[]

for col in X.columns:
    if X[col].dtypes == object:
        var_cat.append(col)
    else:
        var_num.append(col)
print(f"Há {len(var_cat)} variáveis categóricas ('object'): \n {var_cat}\n")
print(f"Há {len(var_num)} varáveis numéricas ('int64 e float 64'): \n {var_num}")
Há 44 variáveis categóricas ('object'): 
 ['MSSubClass', 'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive', 'PoolQC', 'Fence', 'MiscFeature', 'SaleType', 'SaleCondition']

Há 35 varáveis numéricas ('int64 e float 64'): 
 ['LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold']

3.2 Missing values¶

Vamos agora passar a etapa de dados ausentes:

In [8]:
pd.DataFrame(X.isna().sum()/X.shape[0]).sort_values(by=0, ascending=False).T
Out[8]:
PoolQC MiscFeature Alley Fence FireplaceQu LotFrontage GarageType GarageYrBlt GarageQual GarageCond GarageFinish BsmtFinType2 BsmtExposure BsmtCond BsmtFinType1 BsmtQual MasVnrArea MasVnrType Electrical BedroomAbvGr BsmtHalfBath FullBath KitchenAbvGr HalfBath Functional Fireplaces KitchenQual TotRmsAbvGrd MSSubClass GrLivArea GarageCars GarageArea PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold SaleType BsmtFullBath HeatingQC LowQualFinSF Neighborhood OverallCond OverallQual HouseStyle BldgType Condition2 Condition1 LandSlope 2ndFlrSF LotConfig Utilities LandContour LotShape Street LotArea YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd ExterQual ExterCond Foundation BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating MSZoning CentralAir 1stFlrSF SaleCondition
0 0.995205 0.963014 0.937671 0.807534 0.472603 0.177397 0.055479 0.055479 0.055479 0.055479 0.055479 0.026027 0.026027 0.025342 0.025342 0.025342 0.005479 0.005479 0.000685 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Vemos que as colunas 'PoolQC', 'MiscFeature', 'Alley', 'Fence' e FireplaceQu' apresentam grande quantidade de dados ausentes. A descrição do problema nos é informado que 'NA' significa ausência da feature (isto é, "ausência de piscina" para 'PoolQC', pode exemplo), no entanto, não temos o valor 'NA', e sim 'NaN'.

Para as colunas 'PoolQC', 'MiscFeature', 'Alley', 'Fence', ainda que venhamos a substituir os valores 'NaN' pela string 'NA', essas colunas apresentarão valores quasi-constantes, que terão pouco valor preditivo. Assim, optamos por removê-las da lista de variavéis a serem utilizadas.

No entanto, para a variável 'FirepplaceQu', optamos por substituir os valores 'NaN' pela string 'NA', ao invés de remover essa feature, pois ela apresenta menos de 50% de valores 'NaN'.

As demais colunas possuem baixo valor de variáveis ausentes e terão esses valores substituídos pela mediana, se variável numérica, ou pela valor mais frequente, se variável categórica.

In [9]:
#Removendo as variáveis com valores quasi-constantes

var_cat.remove('PoolQC')
var_cat.remove('MiscFeature')
var_cat.remove('Alley')
var_cat.remove('Fence')

#Substituindo os valores 'NaN' pela string 'NA'

X['FireplaceQu'].replace(np.NaN,'NA',inplace=True)

3.3 Transformação dos dados¶

Vamos agora redefinir o dataframe de features baseado na exclusão de algumas colunas e pela substituição de NaN por NA na coluna 'FireplaceQu':

In [10]:
X = X[var_cat+var_num]

Vamos agora criar pipelines de transformação usando o ColumnTransformer:

In [11]:
num_trans = Pipeline(
    steps=[("imputer", SimpleImputer(strategy="median")),
           ("scaler", StandardScaler())
           ]
)

cat_trans = Pipeline(
    steps=[("imputer", SimpleImputer(strategy="most_frequent")),
           ("ohe", OneHotEncoder(drop='first', handle_unknown="ignore"))]
)

preprocessor = ColumnTransformer(
    [("num", num_trans, var_num),
     ("cat", cat_trans, var_cat)
    ]
)
In [12]:
X = preprocessor.fit_transform(X)

4 Modelagem¶

Inicialmente, vamos testar diversos modelos diferentes para selecionar os mais promissores. Em seguida trabalhar na otimização dos melhores e finalmente tentar um ensemble dos modelos usando o método de Stacking.

In [13]:
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
In [14]:
from sklearn.model_selection import KFold
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import SGDRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR

Para facilitar, vamos criar uma função para obter os modelos, uma outra para avaliá-los usando Validação Cruzada e outra função para gerar gráficos boxplot onde será possível comparar visuamente o desempenho de cada um. Testaremos os modelos de Regressão Linear, Random Forest Regressor, K-Nearest Neighbors Regressor e Support Vector Regressor.

In [15]:
#criando uma lista de modelos para avaliação

def get_models():

    models = dict()
    models['lr'] = LinearRegression()
    models['rfr'] = RandomForestRegressor(random_state=0)
    models['knn'] = KNeighborsRegressor()
    models['svr'] = SVR()

    return models

#avaliando cada modelo usando cross-validation

def evaluate_model(model, X, y):
    cv = KFold(n_splits=5, random_state=0, shuffle=True)
    scores = -cross_val_score(model, X, y, scoring = "neg_root_mean_squared_error",cv=cv)
  
  return scores

#gerando gráficos para comparação visual do desempenho

def graph():
    g = sns.boxplot(data=results)
    g.set_xticklabels(names)
    g.set_xlabel("Modelo")
    g.set_ylabel("RMSE")
    plt.show()
In [16]:
#obtendo os modelos

models = get_models()

#avaliando os modelos

results, names = list(), list()

for name, model in models.items():
    scores = evaluate_model(model, X, y)
    results.append(scores)
    names.append(name)
    print('%s = %.5f (%.5f)' % (name, np.mean(scores), np.std(scores)))
lr = 0.20495 (0.02186)
rfr = 0.14340 (0.01834)
knn = 0.17359 (0.02001)
svr = 0.14246 (0.02211)
In [17]:
graph()

Vamos trabalhar a partir de agora na otimização de cada modelo separadamente.

5 Otimização¶

5.1 Feature Engineering¶

Parte do desempenho dos modelos está relacionada com o tipo de de distribuição das variáveis (para ver mais, acesse a EDA aqui). Assim, vamos aplicar transformação logarítimica àquelas variáveis numéricas que apresentarem skew>0.5 (em valores absolutos), a fim de de diminuir as assimetrias.

In [18]:
y = np.log1p(train['SalePrice'])
X = train.drop(columns=['Id','SalePrice'])

Como exemplo do que queremos mostrar, observe as duas distribuições abaixo da coluna 'LotFrontage'. À esquerda, está a distribuição dos dados na sua forma original. À direita, a mesma distribuição com o logarítmo dos valores. Pode-se constatar que a forma da distribuição "tornou-se mais normal".

In [19]:
fig, ax = plt.subplots(1,2,figsize=(15,5))
sns.histplot(train['LotFrontage'],kde=True, ax=ax[0]).set_title('Distribuição de LotFrontage original')
sns.histplot(np.log1p(train['LotFrontage']),kde=True,ax=ax[1]).set_title('Distribuição de LotFrontage após aplicar o logaritmo')
plt.show()

Prosseguimos, então, com a tranformação logarítmica das colunas com alta assimetria.

In [20]:
skewed = X[var_num].skew().abs()
skewed = skewed[(skewed>0.5)].sort_values(ascending=False)
skewed
Out[20]:
MiscVal          24.476794
PoolArea         14.828374
LotArea          12.207688
3SsnPorch        10.304342
LowQualFinSF      9.011341
KitchenAbvGr      4.488397
BsmtFinSF2        4.255261
ScreenPorch       4.122214
BsmtHalfBath      4.103403
EnclosedPorch     3.089872
MasVnrArea        2.669084
OpenPorchSF       2.364342
LotFrontage       2.163569
BsmtFinSF1        1.685503
WoodDeckSF        1.541376
TotalBsmtSF       1.524255
1stFlrSF          1.376757
GrLivArea         1.366560
BsmtUnfSF         0.920268
2ndFlrSF          0.813030
OverallCond       0.693067
TotRmsAbvGrd      0.676341
HalfBath          0.675897
Fireplaces        0.649565
GarageYrBlt       0.649415
YearBuilt         0.613461
BsmtFullBath      0.596067
YearRemodAdd      0.503562
dtype: float64
In [21]:
X[skewed.index] = np.log1p(X[skewed.index])

5.2 Transformação dos dados¶

Vamos novamente substituir os valores NaN por NA.

In [22]:
X['FireplaceQu'].replace(np.NaN,'NA',inplace=True)

E redefinir o dataframe X a partir das variáveis que foram removidas:

In [23]:
X = X[var_cat+var_num]
In [24]:
X.shape
Out[24]:
(1460, 75)

Chamamos novamente a transformação das variáveis usando o Column Transformer:

In [25]:
X = preprocessor.fit_transform(X)

5.3 Tuning dos parâmetros¶

5.3.1 Regresão Ridge¶

Para otimizar a Regressão Linear, vamos empregar o método de regularização de Ridge, mudando o parâmetro 'alpha':

In [26]:
from sklearn.linear_model import Ridge
In [27]:
alphas = [0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 20, 25, 30, 50, 75,100]
rmse_ridge = [evaluate_model(Ridge(alpha = alpha), X, y).mean() for alpha in alphas]
In [28]:
plt.figure(figsize=(8,4))
sns.lineplot(x=alphas, y=rmse_ridge, marker='o')
plt.xlabel("alpha")
plt.ylabel("RMSE")
plt.text(25,0.130, f'alpha = {alphas[-6]}, menor RMSE = {min(rmse_ridge):.5}')
plt.show()

Apenas otimizando o hiperparâmetro 'alpha', conseguimos melhorar o desempenho do modelo, diminuindo o RMSE de 0.20495 para 0.12757!

5.3.2 Regressão Lasso¶

Tentaremos otimizar a Regressão Linear empregando a regularização Lasso:

In [29]:
from sklearn import linear_model
In [30]:
alphas = [0.0001,0.0002,0.0003,0.0004,0.0005,0.00075,0.00085,0.001,0.0015,0.0020]
rmse_lasso = [evaluate_model(linear_model.Lasso(alpha=alpha), X, y).mean() for alpha in alphas]
In [31]:
plt.figure(figsize=(8,4))
sns.lineplot(x=alphas, y=rmse_lasso, marker='o')
plt.xlabel("alpha")
plt.ylabel("RMSE")
plt.text(0.0005,0.13, f'Alpha = {alphas[-5]}, menor RMSE = {min(rmse_lasso):.5}')
plt.show()

Vemos que a Regressão Lasso encontrou valores ainda menores que a Regressão Ridge (RMSE=0.12643).

5.3.3 Random Forest Regressor¶

Optamos por não tentar otimizar o modelo Random Forest Regressor.

5.3.4 KNN regressor¶

Para melhorar o modelo KNN, vamos avaliar como o parâmetro 'n' impacta no RMSE do modelo.

In [32]:
from sklearn.neighbors import KNeighborsRegressor
In [33]:
n_neighbors = [2,3,4,5,6,7,8,9,10,20,30,40,50]
rmse_knn = [evaluate_model(KNeighborsRegressor(n_neighbors=n, weights='distance'),X,y).mean() for n in n_neighbors]
In [34]:
plt.figure(figsize=(8,4))
sns.lineplot(x=n_neighbors, y=rmse_knn, marker='o')
plt.xlabel("n_neighbors")
plt.ylabel("RMSE")
plt.text(4,0.184, f'n_neighbors = {n_neighbors[6]}, menor RMSE = {min(rmse_knn):.5}')
plt.show()

O modelo KNN obteve RMSE = 0.16977.

5.3.5 Support Vector Regression¶

Para otimizar o modelo SVR, vamos analisar como RMSE varia em função dos parâmetros C e epsilon.

In [35]:
from sklearn.svm import SVR
In [36]:
Cs = [0.05,0.5,0.6,0.7,0.8,0.9,1,4]
eps = [0.0001,0.001,0.005,0.010,0.015,0.020,0.040]

table = pd.DataFrame(columns=['C','epsilon','RMSE'])
for C in Cs:
    for e in eps:
        rmse = evaluate_model(SVR(C=C, epsilon=e),X,y).mean()

        table = table.append({'C':C,'epsilon':e,'RMSE':rmse},ignore_index=True)

Abaixo, as melhores combinações de C e epsilon que minimizam o RMSE:

In [37]:
table.sort_values(by='RMSE').head()
Out[37]:
C epsilon RMSE
18 0.6 0.015 0.134012
17 0.6 0.010 0.134021
19 0.6 0.020 0.134092
24 0.7 0.010 0.134097
25 0.7 0.015 0.134107

Podemos também visualizar o comportamento da RMSE em função de C e epsilon através as curvas abaixo:

In [38]:
plt.figure(figsize=(8,4))
sns.lineplot(x=table[table['C']==0.6]['epsilon'],y=table[table['C']==0.6]['RMSE'], marker='o')
sns.lineplot(x=table[table['C']==1]['epsilon'],y=table[table['C']==1]['RMSE'], marker='o')
sns.lineplot(x=table[table['C']==4]['epsilon'],y=table[table['C']==4]['RMSE'], marker='o')
plt.legend(labels=['C = 0.6','C = 1','C = 4'])
plt.text(0.001,0.1355,f"Menor RMSE = {table['RMSE'].min():.5}, C=0.6, epsilon=0.015")
plt.show()

Para o Support Vector Machine, vemos que a melhor combinação que diminui o valor de RMSE é C=0.6 e epsilon=0.015 (RMSE = 0.13401).

5.3.6 Stacking Generalization¶

Para finalizar, vamos construir um modelo Ensemble que combine todos os modelos anteriores e comparar seu desempenho com cada modelo individualmente.

O tipo de algoritmo 'Stacking Generalization' ou simplesmente 'stacking' (empilhamento) envolve a combinação das predições de diferentes modelos de machine learning (base models), através da utilização de outro modelo (meta model).

In [39]:
from sklearn.ensemble import StackingRegressor
In [40]:
def get_stacking():
  
    #definir os modelos base
    level0 = list()
    level0.append(('ridge', Ridge(alpha = 20)))
    level0.append(('lasso', linear_model.Lasso(alpha=0.00075)))
    level0.append(('rfr', RandomForestRegressor(random_state=0)))
    level0.append(('knn', KNeighborsRegressor(n_neighbors=8, weights='distance')))
    level0.append(('svr', SVR(C=0.6, epsilon=0.015)))

    #definir modelo metalearner
    level1 = LinearRegression()

    #definir o ensemble stack
    model_stack = StackingRegressor(estimators=level0, final_estimator = level1, cv = 5)

    return model_stack
In [41]:
#Redefinindo a função de obter os modelos para levar em conta o ensemble

def get_models():

    models = dict()
    models['ridge'] = Ridge(alpha = 20)
    models['lasso'] = linear_model.Lasso(alpha = 0.00075)
    models['rfr'] = RandomForestRegressor(random_state=0)
    models['knn'] = KNeighborsRegressor(n_neighbors=8, weights='distance')
    models['svr'] = SVR(C=0.6, epsilon=0.015)
    models['stack'] = get_stacking()

    return models
In [42]:
#obtendo os modelos
models = get_models()

#avaliando os modelos
results, names = list(), list()

for name, model in models.items():
    scores = evaluate_model(model, X, y)
    results.append(scores)
    names.append(name)
    print('RMSE(%s) = %.5f (%.5f)' % (name, np.mean(scores), np.std(scores)))
RMSE(ridge) = 0.12757 (0.02473)
RMSE(lasso) = 0.12643 (0.02487)
RMSE(rfr) = 0.14353 (0.01795)
RMSE(knn) = 0.16977 (0.01739)
RMSE(svr) = 0.13401 (0.02006)
RMSE(stack) = 0.12113 (0.02356)
In [43]:
graph()

Comparando os modelos individualmente, concluímos que quando realizamos o stacking, este ensemble é levemente superior a todos os outros, apresentando RMSE de 0.12113 contra o segundo melhor modelo, Lasso (RMSE= 0.12643). No entanto, ele apresenta um custo computacional mais alto, realizando os cálculos de maneira muito mais demorada, onde numa posterior implentação, seria necessário analisar o trade-off, para avaliar se compensaria utilizar um modelo como esse.

6 Submissão dos dados de teste¶

Uma vez que nosso modelo ensemble obteve um desempenho melhor que os outros modelos isoladamente, vamos utilizá-lo para predizer os dados de teste da competição:

In [44]:
#Lendo os dados
test = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Projetos/Advanced Houses/test.csv')

#Definimos o dataframe com as features
X_test = test.drop(columns=['Id'])

#Aplicamos o logaritmo nas variáveis com skewd>0.5, conforme tópico 5.1
X_test[skewed.index] = np.log1p(X_test[skewed.index])

#Vamos novamente substituir os valores NaN por NA.
X_test['FireplaceQu'].replace(np.NaN,'NA',inplace=True)
X_test = X_test[var_num+var_cat]

#Aplicamos a transformação dos dados
X_test = preprocessor.transform(X_test)
In [45]:
#Realizamos o ajuste utilizando o modelo Stacking
model_stack = get_stacking()
model_stack.fit(X,y)

#Predizemos os dados de teste
y_test = model_stack.predict(X_test)
In [46]:
#Criamos os dados para submissão

submission_stack = pd.DataFrame()
submission_stack['Id'] = test['Id']
submission_stack['SalePrice'] = np.expm1(y_test)
submission_stack
Out[46]:
Id SalePrice
0 1461 117578.259435
1 1462 154601.103861
2 1463 184322.369472
3 1464 197057.079360
4 1465 199596.386230
... ... ...
1454 2915 86660.037827
1455 2916 84167.554825
1456 2917 169186.286304
1457 2918 119185.445876
1458 2919 229885.536233

1459 rows × 2 columns

In [47]:
submission_stack.to_csv(index=False)
Out[47]:
'Id,SalePrice\n1461,117578.25943534428\n1462,154601.10386109308\n1463,184322.3694721332\n1464,197057.07935951953\n1465,199596.38622976842\n1466,175577.89477671235\n1467,178090.92200392616\n1468,164169.05105159583\n1469,195833.64454858674\n1470,117138.90358118288\n1471,193978.03581511733\n1472,95076.56001656946\n1473,96428.31136743474\n1474,147678.5218468781\n1475,114602.79681061473\n1476,364314.63061858225\n1477,260034.7294382055\n1478,291327.32755789533\n1479,289599.31880385533\n1480,456338.475894588\n1481,338706.8253169904\n1482,219117.15636590595\n1483,179569.6483936275\n1484,166135.07663590115\n1485,193639.76197632134\n1486,197912.3843448395\n1487,336370.2228192486\n1488,239576.78514170198\n1489,199038.10233005273\n1490,226014.1604964759\n1491,195604.8559769749\n1492,88620.87588985468\n1493,189419.63985532295\n1494,297112.12935283006\n1495,298870.19473818265\n1496,235553.9049580433\n1497,179802.40124289668\n1498,162178.59050285688\n1499,160283.23740118445\n1500,157660.56568650083\n1501,175795.54676927836\n1502,150586.9110711016\n1503,276512.7129189956\n1504,242014.946409074\n1505,225502.35734366853\n1506,192016.37960292527\n1507,242452.1678773637\n1508,205338.40302382893\n1509,165069.67572069843\n1510,145060.15690849163\n1511,145642.19594077647\n1512,176343.5218295646\n1513,141995.36681522272\n1514,153989.680290079\n1515,196184.49562203666\n1516,155012.264880956\n1517,167791.8730916743\n1518,124723.05661489414\n1519,218912.34399117535\n1520,128217.52639352811\n1521,132783.27576072054\n1522,178568.01691959298\n1523,106282.34267801714\n1524,126825.80904008521\n1525,120978.79512039592\n1526,118358.3999414207\n1527,105848.10599230307\n1528,135547.6004179844\n1529,146880.9282061831\n1530,181648.58742964955\n1531,111772.36551916318\n1532,98088.04731573896\n1533,145150.59212732565\n1534,125586.71163108878\n1535,150631.47205683935\n1536,108001.80263363548\n1537,61525.130610498265\n1538,158949.3498609103\n1539,206223.62387087324\n1540,102952.40937400251\n1541,144434.2047701619\n1542,137159.4124333493\n1543,197782.96030272273\n1544,83134.05040077312\n1545,112161.55125807809\n1546,131621.81956671708\n1547,127799.82042565597\n1548,136590.78842130376\n1549,114962.56539626772\n1550,133727.4505181098\n1551,107685.61263665787\n1552,133109.21302287377\n1553,144621.7438683639\n1554,114757.28766049977\n1555,155234.32854484202\n1556,84831.9092329688\n1557,106187.03069821798\n1558,97097.68663889472\n1559,82012.90526858556\n1560,129442.49270433758\n1561,126740.03509239573\n1562,126495.06257358097\n1563,116024.44007997551\n1564,157298.74115816288\n1565,148153.12478794705\n1566,242746.73612234482\n1567,73084.14535537954\n1568,228054.9546595663\n1569,128806.78558549187\n1570,135646.85683549155\n1571,125603.17938172672\n1572,138935.16405249797\n1573,225949.9958292581\n1574,111458.93769846826\n1575,217973.24112747097\n1576,244076.6567209533\n1577,186244.87277700956\n1578,149255.2973439525\n1579,144987.37447450127\n1580,195443.09427385952\n1581,151296.43089700252\n1582,125933.30035199522\n1583,303275.11733609386\n1584,220554.28698261274\n1585,137743.4664660467\n1586,63013.64525747434\n1587,101690.1100463922\n1588,146493.08097642177\n1589,99041.14862678171\n1590,131597.27080115353\n1591,88723.63478999783\n1592,120743.22785358793\n1593,123274.59510893405\n1594,120109.65064808742\n1595,95502.58637019376\n1596,223582.3508804283\n1597,182059.39435528222\n1598,229848.333704118\n1599,172951.44416055214\n1600,166283.47035949503\n1601,62359.93708816476\n1602,110643.66132286236\n1603,88110.89729700121\n1604,267230.48310376034\n1605,253826.1210070169\n1606,169851.64094735787\n1607,171996.83201251115\n1608,218177.56819933833\n1609,186094.94381257275\n1610,158910.31618209087\n1611,142929.6206700268\n1612,170407.9103902753\n1613,164193.78086344094\n1614,124419.80906156063\n1615,90296.15906743056\n1616,75353.48899513175\n1617,89874.6183271794\n1618,122503.2928483479\n1619,138587.51061143112\n1620,147892.62332678545\n1621,142513.85291956578\n1622,142066.4008458928\n1623,273606.4637140251\n1624,215810.12851375784\n1625,117852.5745791333\n1626,171542.56907575374\n1627,193067.7906024793\n1628,294278.39729435154\n1629,186951.66097880024\n1630,351633.20598738425\n1631,234541.6651434909\n1632,260229.68907092733\n1633,178211.03600866563\n1634,188465.64862577742\n1635,176344.9548420213\n1636,153582.39967274925\n1637,195853.67660127842\n1638,191302.45500798803\n1639,196188.19562229252\n1640,259087.79974895276\n1641,189273.25180811726\n1642,237115.539355525\n1643,215556.4112970125\n1644,238132.05816927133\n1645,191172.22880992183\n1646,159990.56181112962\n1647,164191.31410500137\n1648,127604.51129462206\n1649,138704.07730666868\n1650,117181.7098126181\n1651,122865.8949206867\n1652,95455.09984961961\n1653,101321.05255375632\n1654,145778.67426179317\n1655,141671.2909244101\n1656,139456.58071725466\n1657,145158.73074930024\n1658,143966.71750434663\n1659,121796.32067100875\n1660,155284.6953331836\n1661,420321.6541014588\n1662,403041.62049625505\n1663,373244.05552423757\n1664,469659.0314765889\n1665,306784.2692282003\n1666,337336.1627678109\n1667,377694.09949037107\n1668,342015.5594635203\n1669,311920.8936229446\n1670,349714.0371286616\n1671,265033.36180240096\n1672,425537.37278034614\n1673,305995.45085561654\n1674,251241.57095592003\n1675,199690.31937001096\n1676,201871.11203481857\n1677,229990.91880778986\n1678,465987.21293255535\n1679,397270.08402856847\n1680,338441.9688134174\n1681,248161.4200516242\n1682,320188.68147751334\n1683,190682.69449213994\n1684,171671.0060492069\n1685,172931.03794914595\n1686,169380.14768412703\n1687,175025.36959224698\n1688,193084.05094913655\n1689,192563.77464762595\n1690,195184.25014482343\n1691,183912.32762533813\n1692,273744.75997824967\n1693,174460.8429835898\n1694,181712.97432145447\n1695,169140.02880751467\n1696,279023.40086621355\n1697,175750.89423347462\n1698,352863.60779118293\n1699,314644.55731589475\n1700,258295.57652440545\n1701,276483.4159810098\n1702,248689.05935716617\n1703,248692.09983847383\n1704,289664.2651539726\n1705,236978.4253303998\n1706,421035.65099589276\n1707,214853.107765224\n1708,204920.59907983473\n1709,267763.3056553195\n1710,221505.15549110982\n1711,276598.34296555055\n1712,254715.10207918828\n1713,273053.35459651315\n1714,228875.11270363437\n1715,212642.864671649\n1716,181074.29219508037\n1717,172937.7042182404\n1718,131749.53956597927\n1719,211478.88040543086\n1720,246380.91485138403\n1721,163733.23842330702\n1722,123689.42024495492\n1723,156719.27256642454\n1724,208286.45909617856\n1725,237942.26861827093\n1726,188978.04503413272\n1727,159381.06051395362\n1728,181556.60588575003\n1729,170155.37216143776\n1730,153919.1034024221\n1731,120655.55696216892\n1732,122426.20276832947\n1733,111482.52826059647\n1734,119212.01475589265\n1735,127093.68910526634\n1736,109735.73611400247\n1737,293658.7101610444\n1738,247376.05099629535\n1739,245239.45641730123\n1740,208008.1260532914\n1741,185772.76100195077\n1742,174552.8902136074\n1743,181373.00446793297\n1744,287074.0695261378\n1745,225968.23535844384\n1746,199853.0661561763\n1747,223692.7706048614\n1748,219823.80539706803\n1749,147166.547796414\n1750,129660.34942944477\n1751,241332.53081467174\n1752,113902.34364407124\n1753,145052.22249965736\n1754,201714.44843060465\n1755,169536.6690340516\n1756,131129.0700592669\n1757,118392.66129803738\n1758,143933.38406604607\n1759,165198.3607528596\n1760,166700.065352486\n1761,150146.20786292176\n1762,177229.41882220272\n1763,181219.29279660725\n1764,114313.53276145441\n1765,167339.91773302158\n1766,186076.40811457054\n1767,219042.31357348812\n1768,141460.09089778436\n1769,173056.42510512721\n1770,151189.93085848674\n1771,126921.7475532439\n1772,129941.4136662229\n1773,123356.750402868\n1774,133760.94716143562\n1775,140963.31917905775\n1776,126953.93509655153\n1777,107228.1844211164\n1778,136761.7590581628\n1779,113354.81225309821\n1780,181024.38110295942\n1781,128718.98907592105\n1782,83566.95723190106\n1783,140967.9059926062\n1784,106102.51353995211\n1785,119121.31087430558\n1786,140574.80109003713\n1787,168148.21592041195\n1788,60015.808252459115\n1789,95061.29909011917\n1790,74871.57806541168\n1791,175843.93535154633\n1792,163830.92686218914\n1793,127927.82678685251\n1794,151354.25505821605\n1795,135102.4000749501\n1796,129773.32758151017\n1797,118534.13129121109\n1798,119561.15576917009\n1799,107611.39508133125\n1800,130187.0864306535\n1801,125489.05961585836\n1802,141569.77126651772\n1803,146403.29870540882\n1804,140115.13988298576\n1805,134442.6900540009\n1806,124552.77748995865\n1807,137155.58702322494\n1808,124074.03230913704\n1809,122109.44696270609\n1810,139768.55904156558\n1811,102418.97420260534\n1812,100153.1579845548\n1813,121563.17854412102\n1814,91248.82193392012\n1815,55735.725169159094\n1816,100778.33597351471\n1817,106054.96797210856\n1818,140546.88226185157\n1819,117795.51062064468\n1820,63312.46015552268\n1821,114186.23681085343\n1822,151533.29461308673\n1823,52429.530311471404\n1824,130209.3762329314\n1825,131849.88970676129\n1826,101522.57481174968\n1827,101411.00355446477\n1828,140365.32737949636\n1829,127292.33697942136\n1830,140770.4275086422\n1831,157479.70410840737\n1832,81053.7214319932\n1833,139032.42255281415\n1834,118051.1537908059\n1835,134905.38871278704\n1836,119497.56976424443\n1837,89827.22638926159\n1838,128030.80983159055\n1839,98581.773423536\n1840,153927.91921965985\n1841,136391.85993972784\n1842,85687.38726046558\n1843,128986.03789637456\n1844,141867.33798357568\n1845,140217.15938563203\n1846,155103.48510087442\n1847,164366.61132517923\n1848,54970.912577668736\n1849,114378.65156959117\n1850,116357.1350419244\n1851,149851.84095448765\n1852,120418.74770725449\n1853,124442.13282267547\n1854,172075.09432157513\n1855,159592.3888875954\n1856,229860.00514130518\n1857,142172.15084288918\n1858,132076.41913296885\n1859,117978.61851142734\n1860,139833.04339948806\n1861,117336.05825394194\n1862,290163.45997640374\n1863,272142.05615745886\n1864,272219.92446047027\n1865,346085.8601721887\n1866,328044.14979946474\n1867,228510.10617184924\n1868,288701.84108258993\n1869,200995.9236306964\n1870,232370.8838918346\n1871,254056.60176990807\n1872,171954.68728260757\n1873,235850.4898655467\n1874,148775.5520768108\n1875,199909.2104048882\n1876,199843.32846553077\n1877,216646.96686718098\n1878,206341.23582086773\n1879,127498.66055111679\n1880,134282.78317877743\n1881,246121.84517915067\n1882,251260.26516141245\n1883,191374.9215267332\n1884,209355.5852375106\n1885,242493.83050013048\n1886,288825.27828661073\n1887,216969.1521164021\n1888,275021.8295366747\n1889,171702.66322237765\n1890,115994.89607990546\n1891,140395.83015050224\n1892,93121.00447598327\n1893,129653.93347580677\n1894,124738.10509456048\n1895,131654.54760447063\n1896,123224.98970946143\n1897,111423.82225671911\n1898,112256.92059818031\n1899,160930.63929903167\n1900,155952.1620369176\n1901,162191.59705258824\n1902,147828.61303337288\n1903,216083.80316340498\n1904,137443.49713652453\n1905,198674.21227326474\n1906,164520.02777536388\n1907,205793.642302397\n1908,108506.03868408466\n1909,129006.6706380124\n1910,118126.90173374314\n1911,221783.34325257782\n1912,326099.1381615923\n1913,137541.47375146335\n1914,66352.14722080546\n1915,301623.37862585776\n1916,65381.65187738521\n1917,256592.65891503243\n1918,138639.84746479886\n1919,183634.567606922\n1920,161304.0717912087\n1921,365683.31064336124\n1922,312459.81257262407\n1923,229017.60651229235\n1924,222216.6990138117\n1925,205833.78872006686\n1926,369228.6084683578\n1927,133356.40745957286\n1928,159142.53374916659\n1929,123208.11490025214\n1930,131996.678335947\n1931,140938.56038793386\n1932,142567.54834221693\n1933,178706.68308583286\n1934,183964.9167224492\n1935,174945.67343867896\n1936,196590.28794841893\n1937,189115.47176294943\n1938,171793.19325956304\n1939,249002.5910665966\n1940,193888.34034057526\n1941,170201.25548926386\n1942,194797.69846140916\n1943,217770.32154732948\n1944,358536.3139951604\n1945,394925.6782261672\n1946,129971.9037000101\n1947,316243.85223335563\n1948,171434.28748436572\n1949,249429.95219305134\n1950,194587.06243307138\n1951,252011.2999150992\n1952,214654.50381632548\n1953,171703.61833654455\n1954,193721.40454655036\n1955,139986.1075778294\n1956,291015.0881008809\n1957,155718.72490933028\n1958,310456.07858645334\n1959,139917.69062883448\n1960,111799.8545132307\n1961,117163.34285028101\n1962,97154.2571194268\n1963,104969.45551496634\n1964,109189.65674574493\n1965,144020.8469807693\n1966,143334.75469209775\n1967,301078.90786862915\n1968,411006.62861518806\n1969,387957.9041372475\n1970,427343.7297307062\n1971,463101.18633868935\n1972,382531.3699886854\n1973,273235.70569032547\n1974,337030.0874304591\n1975,372312.8992848224\n1976,282988.95893746865\n1977,361099.11220671603\n1978,365610.12696725584\n1979,346061.1797481174\n1980,209902.73614278395\n1981,350103.94002807664\n1982,229785.50811719528\n1983,213045.80372720145\n1984,171949.31797881515\n1985,234122.22177379415\n1986,219468.9497495374\n1987,179416.26478586608\n1988,175163.4535354284\n1989,195652.61879755603\n1990,213753.4710775643\n1991,240684.39882838502\n1992,219446.2606544187\n1993,165932.8598136196\n1994,234552.6673329601\n1995,184394.79520306358\n1996,246149.5129662304\n1997,319541.15211288154\n1998,322399.78726585215\n1999,272033.6433549053\n2000,320389.8319323005\n2001,277013.7141340943\n2002,250103.95911079727\n2003,266974.54321384925\n2004,288529.2701166547\n2005,227933.44461640104\n2006,222415.30467685324\n2007,254776.87153207444\n2008,210963.87459784953\n2009,202681.39926408502\n2010,194504.83533527455\n2011,139666.63839957805\n2012,170078.72094151238\n2013,180215.73613578005\n2014,186542.83994083887\n2015,202883.8265940568\n2016,194687.65056470977\n2017,197633.5139563142\n2018,118740.38415730347\n2019,129884.41596562741\n2020,108570.0702075865\n2021,98958.21248405264\n2022,187930.73089273184\n2023,139297.77562615217\n2024,265642.31694381405\n2025,351750.4857804535\n2026,178352.09176212153\n2027,156684.50196822986\n2028,152060.18641727627\n2029,167564.3619699754\n2030,272115.09707918146\n2031,241663.89255675246\n2032,252840.42263958152\n2033,260311.26137654227\n2034,171224.6249063137\n2035,228459.04457080824\n2036,197934.93627078258\n2037,199693.07872774068\n2038,308624.1849624845\n2039,228217.89716193633\n2040,302340.35408137203\n2041,289948.23612516583\n2042,209400.64953072334\n2043,178599.83530188765\n2044,177599.2290486956\n2045,213394.06659652683\n2046,137204.38001198607\n2047,144650.7671745219\n2048,136636.6198827777\n2049,139417.5660203117\n2050,175941.80714414656\n2051,102387.02753022271\n2052,124748.55784565878\n2053,147084.3895755673\n2054,88626.28203926864\n2055,158410.90004277343\n2056,147145.54355724447\n2057,112386.23588839918\n2058,216293.37672612746\n2059,130204.82436931823\n2060,173580.94212873155\n2061,176156.3800172961\n2062,129541.34600889326\n2063,115421.8296002766\n2064,137830.84318496205\n2065,117587.19382392864\n2066,167892.90222043885\n2067,121019.93703616412\n2068,144579.12779584518\n2069,86085.06958961565\n2070,112256.95033390145\n2071,88657.6297911919\n2072,140402.39831823672\n2073,133404.30958260223\n2074,185025.7637049017\n2075,151124.96651672397\n2076,120497.65640791235\n2077,150352.38255689855\n2078,120135.0313259587\n2079,137212.22340983272\n2080,115943.8437731648\n2081,125506.03685959523\n2082,126096.62465845293\n2083,146822.68927745582\n2084,104923.07373938462\n2085,111567.42181757893\n2086,122657.23001903207\n2087,122685.91590269443\n2088,97393.25762790133\n2089,74136.15642430756\n2090,129553.73322526617\n2091,108067.30853900312\n2092,125127.7266628001\n2093,116591.58933053995\n2094,115200.8643663047\n2095,140129.65717553566\n2096,82695.79881843564\n2097,98582.06744519316\n2098,145165.96632273553\n2099,52913.23264211134\n2100,86443.71208198447\n2101,116050.17040441974\n2102,131241.50295903912\n2103,98240.44126624298\n2104,136594.18609136448\n2105,126926.98427129073\n2106,62453.36538097472\n2107,197791.82549944747\n2108,119649.03909922164\n2109,110311.58618442647\n2110,126888.5226505919\n2111,144033.19067150357\n2112,136410.10028187223\n2113,115675.7507864153\n2114,118902.68337392347\n2115,159373.69960336067\n2116,112144.4316511605\n2117,147617.72962691163\n2118,116670.95852893237\n2119,113460.59354389642\n2120,112150.74327027233\n2121,101610.14078127239\n2122,107262.75358457831\n2123,81951.86873376099\n2124,162047.71617793036\n2125,126261.77368160612\n2126,144894.51402821823\n2127,162821.82497533684\n2128,125004.48982644724\n2129,88577.3914403274\n2130,133947.48765133592\n2131,143029.89920650673\n2132,111375.67480226667\n2133,114432.45546212091\n2134,117379.00103412202\n2135,89826.31456046908\n2136,75860.96329709074\n2137,110859.58848718983\n2138,134239.5217385492\n2139,153972.83344562052\n2140,138749.49153669577\n2141,159566.88648530413\n2142,126416.51760238023\n2143,147041.2452310002\n2144,119291.39603751144\n2145,134273.89254510126\n2146,169405.59029438513\n2147,146651.02178809262\n2148,139209.27982045859\n2149,143746.32590680552\n2150,241029.9569158691\n2151,112142.29404239955\n2152,169537.13800257782\n2153,156266.25083646635\n2154,103848.95793072405\n2155,143343.2353867337\n2156,250942.8731149427\n2157,230763.5043428519\n2158,238233.71832945908\n2159,217475.01473032916\n2160,179097.98615027766\n2161,241971.85839643993\n2162,376502.7218702546\n2163,333713.34243465867\n2164,256146.00767880274\n2165,204576.38736531444\n2166,154834.76600152967\n2167,217305.37614013086\n2168,200230.3787713864\n2169,198422.90942570262\n2170,219489.01051610257\n2171,155131.81324384405\n2172,136729.41991022366\n2173,173223.03057602333\n2174,227033.53172255648\n2175,303764.24237384816\n2176,324025.9483986923\n2177,240325.97515181825\n2178,214044.3843593663\n2179,140576.6561866604\n2180,203900.42800887988\n2181,192416.83528542376\n2182,228348.85345690715\n2183,190963.24151548382\n2184,119973.1135798506\n2185,127141.56400167121\n2186,147058.11351175688\n2187,151758.90109749231\n2188,152585.82831405773\n2189,235015.1967782473\n2190,78226.51768020411\n2191,80840.6251596642\n2192,86226.88416536732\n2193,103541.61181355653\n2194,100022.62284874536\n2195,101970.21650898003\n2196,101197.98046827955\n2197,114012.62362958927\n2198,155881.59289434212\n2199,171409.48760517297\n2200,145831.04836525267\n2201,135380.0764457595\n2202,221203.60959950223\n2203,128431.70723544173\n2204,171320.35478077023\n2205,113443.95282253955\n2206,137277.3516255023\n2207,218879.3949533776\n2208,265679.2450944201\n2209,228253.4594943539\n2210,118107.17528638268\n2211,115510.09054227086\n2212,115788.49185298369\n2213,102067.72017351059\n2214,128860.98467004103\n2215,96779.11560683994\n2216,141164.91295406298\n2217,68164.09508010503\n2218,87579.77293579187\n2219,77343.6886445986\n2220,80202.2578865691\n2221,304636.67723623564\n2222,267418.79683349014\n2223,296052.0223372394\n2224,216302.15931528498\n2225,126202.61129023\n2226,182111.5735822999\n2227,199685.14861486014\n2228,273018.19540958543\n2229,244418.9026709333\n2230,158730.36892645343\n2231,211355.97855290165\n2232,178537.18215264723\n2233,173291.35540925112\n2234,240786.63367578175\n2235,214372.40156826976\n2236,256905.16949451843\n2237,321855.2366065192\n2238,200444.98377477506\n2239,111840.10849062499\n2240,160156.31765632436\n2241,149663.56714918255\n2242,125285.90430996243\n2243,130730.14330717322\n2244,99414.94844037034\n2245,105836.13199681423\n2246,142786.15419180575\n2247,114202.95321884264\n2248,120332.12479348863\n2249,116933.668443594\n2250,130170.90521162948\n2251,99175.28439131854\n2252,180338.25018184455\n2253,155200.10104973495\n2254,185271.93370424368\n2255,191789.35006009624\n2256,180822.23194460213\n2257,209011.8540145186\n2258,162013.72958439437\n2259,177303.33439865208\n2260,155573.53286898867\n2261,197861.3140196946\n2262,213016.1755596586\n2263,378944.60387155233\n2264,446752.58279139374\n2265,180775.5158042282\n2266,307121.7434970696\n2267,355077.21489435434\n2268,410985.4225335532\n2269,157011.3476529513\n2270,194664.195282781\n2271,223138.93300750107\n2272,199580.546117845\n2273,162123.08243016858\n2274,182698.16682631604\n2275,156504.38723325025\n2276,184619.98164016783\n2277,182159.33026289943\n2278,155705.73938333715\n2279,118292.45778589499\n2280,121881.43866505956\n2281,168786.82066398626\n2282,184494.94396204053\n2283,106065.97414630273\n2284,109850.16503357074\n2285,142857.34251984354\n2286,122743.86393834476\n2287,375195.6918468307\n2288,280830.48156845715\n2289,388268.90090231044\n2290,444462.0952401379\n2291,338253.80771178455\n2292,446569.4818584699\n2293,473678.8721930467\n2294,407607.346454522\n2295,485467.1885583733\n2296,287936.1416673105\n2297,342399.48751667823\n2298,337835.96614560235\n2299,355614.88721291977\n2300,341736.15769700304\n2301,310173.0046080779\n2302,261345.808518119\n2303,258665.79457297293\n2304,260609.14503943615\n2305,199455.58235082572\n2306,194985.1895257568\n2307,204968.31367323463\n2308,228840.93401785847\n2309,292941.21802186646\n2310,219794.56125558293\n2311,204467.7440514826\n2312,171026.58799742395\n2313,170538.01345551765\n2314,171663.4964844351\n2315,177090.87228187858\n2316,195785.85619437246\n2317,189426.71896917373\n2318,176369.93864152365\n2319,180258.63682823794\n2320,183948.67071846285\n2321,250889.36391305816\n2322,175771.90291528293\n2323,199636.0991979975\n2324,181948.6743714217\n2325,219612.33358714526\n2326,178410.47267408308\n2327,207780.19886889865\n2328,224901.0755192935\n2329,192258.51428264132\n2330,181114.87293650777\n2331,356931.75017522456\n2332,426181.46681309736\n2333,331744.0101505752\n2334,267406.87897062826\n2335,286230.8422649624\n2336,317170.7992373653\n2337,202395.41893463346\n2338,272838.9625947211\n2339,226664.34702060543\n2340,399662.67785529234\n2341,221179.83234503272\n2342,245674.4556851018\n2343,233006.127204903\n2344,232062.88719019023\n2345,248015.82299913812\n2346,212710.67920541432\n2347,198668.6226895858\n2348,242367.56199602602\n2349,172599.4467491631\n2350,308457.69844504475\n2351,254256.6614988861\n2352,256003.29477069396\n2353,237493.50736581883\n2354,132323.9224343813\n2355,144840.37629768954\n2356,152036.13389572664\n2357,191001.15710975247\n2358,200648.5727624003\n2359,128567.52701768576\n2360,114288.56258928435\n2361,145196.53343312486\n2362,275807.22665202274\n2363,141500.8045872452\n2364,157688.388069872\n2365,224161.34647544022\n2366,191773.7340070062\n2367,227593.3235904468\n2368,211898.23646761235\n2369,223013.10884949553\n2370,169345.91547635844\n2371,167774.26009796435\n2372,186033.37907244812\n2373,283625.41809017013\n2374,306288.0686499201\n2375,261833.0681894677\n2376,280401.64443449693\n2377,338399.2150086693\n2378,142220.47697712472\n2379,204703.0031022632\n2380,147057.6849000146\n2381,169500.2219577817\n2382,212931.89273584995\n2383,209847.77380003082\n2384,251782.53657269408\n2385,161347.1892814603\n2386,132225.70065036818\n2387,136983.26538354598\n2388,106214.62740149129\n2389,119944.353101222\n2390,149331.45092839975\n2391,138225.47659597726\n2392,111096.54226084777\n2393,164129.9071709943\n2394,142793.80431053042\n2395,209419.10855903587\n2396,147512.74184685317\n2397,220185.62881875405\n2398,131387.44403265166\n2399,65791.53095590803\n2400,67261.91674170666\n2401,114169.2092607111\n2402,135782.83695102422\n2403,139057.49192559946\n2404,158152.83473914576\n2405,159676.37122076962\n2406,138919.94670925403\n2407,126503.36670066981\n2408,145576.4557600393\n2409,118241.57677672504\n2410,169087.0640922938\n2411,113778.46005077679\n2412,157242.06268930968\n2413,131419.5589837712\n2414,151927.1836595855\n2415,126974.31686384874\n2416,127413.14558811468\n2417,126154.7281732518\n2418,129543.33395234102\n2419,127558.06887295852\n2420,119481.13608144023\n2421,150492.04907126792\n2422,105459.30765056942\n2423,116738.1429233845\n2424,156770.50071196316\n2425,192273.42640277796\n2426,129033.70809436963\n2427,126301.15508468017\n2428,178178.28165000892\n2429,112943.94865680489\n2430,132301.30065077302\n2431,105810.54074893953\n2432,142226.88532596076\n2433,142769.00104076814\n2434,138479.90928214934\n2435,149486.3729617889\n2436,100150.75978139312\n2437,102245.20876494769\n2438,119211.4407260945\n2439,102167.9266182403\n2440,121068.43377862294\n2441,95922.62113664925\n2442,96194.79378908491\n2443,123846.9708258137\n2444,130139.29357934567\n2445,90222.52097308933\n2446,133200.94220970004\n2447,187737.5123010929\n2448,133905.3400989219\n2449,110038.31288801545\n2450,153005.29864481805\n2451,117583.09517149642\n2452,198875.22650113236\n2453,91369.82786938103\n2454,121634.01203829791\n2455,119647.91203060315\n2456,132840.93507820583\n2457,127602.82805736733\n2458,124541.36220616294\n2459,106799.33064156128\n2460,139530.15830635597\n2461,128577.66568028153\n2462,134183.7503665258\n2463,123174.16505338284\n2464,167627.61729394214\n2465,133934.56817433008\n2466,111595.70816984703\n2467,135530.1850137841\n2468,81259.78090228414\n2469,74684.62606651538\n2470,193890.78707532398\n2471,201997.54255970183\n2472,156930.29877209742\n2473,119077.06895076216\n2474,81230.67594350054\n2475,225377.2779862808\n2476,113518.78261201733\n2477,125566.2754785801\n2478,155354.24660943967\n2479,102016.82890068827\n2480,155497.34167531153\n2481,121381.74762586097\n2482,126245.54892098311\n2483,112256.24482474415\n2484,122942.97594784878\n2485,120920.262879023\n2486,150053.12787864738\n2487,172284.44333797088\n2488,165852.27947690155\n2489,156336.0922993312\n2490,144343.16123132844\n2491,95137.4000227898\n2492,190723.5383827443\n2493,154369.05433328298\n2494,152720.472599921\n2495,89216.70265669835\n2496,234717.87822781134\n2497,149664.84674899848\n2498,102840.01291134946\n2499,82967.87051305891\n2500,122067.8170569287\n2501,138483.0696993556\n2502,143134.86497959195\n2503,97342.0222769237\n2504,174758.21046697279\n2505,226436.58813382196\n2506,253748.2818273715\n2507,296734.36056633166\n2508,255012.90148326437\n2509,221348.17943978473\n2510,218891.46605337068\n2511,175870.52267767538\n2512,212484.70932613878\n2513,229639.64529777464\n2514,251596.18654252207\n2515,149023.04127575102\n2516,176212.2035590047\n2517,142761.56783641886\n2518,150104.27988008855\n2519,235817.0980462052\n2520,218460.0262108546\n2521,188646.22587319487\n2522,226224.50604169403\n2523,116189.99726179408\n2524,135138.6387585131\n2525,147326.60666771544\n2526,134700.8038987343\n2527,115509.22213617622\n2528,122400.51979007009\n2529,129308.25702876066\n2530,123198.6712974396\n2531,260991.3092196532\n2532,229029.47945125672\n2533,194354.14905165078\n2534,238081.29802897948\n2535,296713.2983597426\n2536,241956.81806427622\n2537,242481.65448669618\n2538,181860.01130597066\n2539,190265.83614185435\n2540,176569.67561508273\n2541,181232.1708760587\n2542,161076.31082692734\n2543,119360.40964343562\n2544,118491.83877257374\n2545,134432.33361286804\n2546,121864.59132462075\n2547,137548.87703453877\n2548,149767.0245163345\n2549,158881.05929088715\n2550,437556.1924089505\n2551,141543.00548816126\n2552,123529.9650261173\n2553,78561.18324280786\n2554,102237.48615805374\n2555,114276.38254056692\n2556,93742.26495005185\n2557,102115.29931407952\n2558,148726.5600314295\n2559,134848.48485063485\n2560,146996.7400721327\n2561,146933.08219453954\n2562,135231.85310687203\n2563,149982.84996746664\n2564,197556.4434340781\n2565,134664.21937949196\n2566,156723.23735309648\n2567,140356.30742260435\n2568,191320.3186127913\n2569,218113.68700415833\n2570,116395.2896206599\n2571,182483.23315851853\n2572,157300.79434529578\n2573,226141.67012419572\n2574,269702.0307365296\n2575,125403.19205464052\n2576,121980.0301914009\n2577,144190.06149221028\n2578,82722.59567255314\n2579,71183.63566292435\n2580,112421.83129309783\n2581,118975.60074257644\n2582,124532.78362185863\n2583,279492.1862604034\n2584,179837.61752372613\n2585,186843.44825442912\n2586,222646.68275183832\n2587,206048.08646109945\n2588,153681.42979636378\n2589,148829.06883405364\n2590,211402.2073676727\n2591,230865.24874386174\n2592,214728.2073436814\n2593,237611.376136492\n2594,184039.80799164224\n2595,200923.46903716197\n2596,299943.2537512956\n2597,181901.5585867794\n2598,291698.6524813756\n2599,328755.5075227958\n2600,153534.3815421754\n2601,143279.8432844187\n2602,80519.26123445918\n2603,89775.4692242533\n2604,87461.72788253012\n2605,76193.24752199452\n2606,143892.34448965578\n2607,202595.9099651017\n2608,211578.13188072157\n2609,168104.37995266877\n2610,110344.56779076085\n2611,125054.65505756863\n2612,160870.89140866543\n2613,127555.33371955191\n2614,122727.21715951036\n2615,156383.43767685027\n2616,144283.2478304386\n2617,176230.9726463161\n2618,190434.16385129772\n2619,205061.9943204177\n2620,189632.4402369924\n2621,175606.46407755045\n2622,184702.73871861002\n2623,240193.91819894142\n2624,313122.2130039674\n2625,290041.72541390144\n2626,173286.00481554077\n2627,170234.03116608987\n2628,483673.6818405488\n2629,497960.57326352515\n2630,382225.7536576474\n2631,474912.3502248952\n2632,419734.2589597807\n2633,323064.5417522038\n2634,424242.3003159008\n2635,161191.94921530708\n2636,178444.34813755463\n2637,175570.89043704033\n2638,256424.2779241873\n2639,194869.24573441673\n2640,153223.0472876949\n2641,101956.87495243126\n2642,192708.83492707775\n2643,105254.38766651053\n2644,120762.07841140014\n2645,106830.43396579991\n2646,92297.86786924122\n2647,103316.12258326754\n2648,144326.75355108632\n2649,156252.60024864116\n2650,142712.0935430243\n2651,139693.5953302186\n2652,411242.898978671\n2653,252050.94493125926\n2654,257635.90350767545\n2655,384428.31497430406\n2656,329482.146160144\n2657,345048.28404253087\n2658,303592.806525624\n2659,316123.1670125379\n2660,370181.6246566534\n2661,365895.4075129803\n2662,359857.8499290987\n2663,303768.9046108435\n2664,281692.5417941315\n2665,333805.38862080884\n2666,294823.8889184421\n2667,166677.02185331486\n2668,177554.39800016177\n2669,180244.79543715867\n2670,286325.74312702287\n2671,185377.27447302087\n2672,198642.8601261046\n2673,207339.39907335202\n2674,202743.68647562753\n2675,172187.83277865214\n2676,191730.76255941502\n2677,199290.84533009038\n2678,263956.61015893944\n2679,286428.22549472615\n2680,297646.156412074\n2681,411158.1694662168\n2682,328855.8686801688\n2683,490462.7138957489\n2684,333254.76935799845\n2685,336481.7899503693\n2686,251386.9484696583\n2687,311804.39745370694\n2688,218930.58237886836\n2689,207066.44063801822\n2690,412868.2807553618\n2691,191017.90654366033\n2692,136877.1800544338\n2693,198531.30682634978\n2694,132675.43687433313\n2695,191159.69844068852\n2696,184689.51112135613\n2697,187422.24474082625\n2698,191606.09358418174\n2699,178551.6027432137\n2700,158683.51875749978\n2701,151238.293591478\n2702,111027.87509434209\n2703,129251.1869612986\n2704,142999.3659394564\n2705,113031.02523304141\n2706,111734.7489309432\n2707,124931.44407094897\n2708,131505.8141703429\n2709,107205.4893388882\n2710,126851.2824105086\n2711,229091.95527482033\n2712,371708.95082421903\n2713,172892.43128760048\n2714,148215.5772199916\n2715,170512.18834123077\n2716,149543.22988280683\n2717,194988.90786575817\n2718,221708.3308875066\n2719,155951.96086523897\n2720,175166.8797497281\n2721,131867.1315698793\n2722,157077.28353845706\n2723,142347.96459963528\n2724,123282.28888506792\n2725,129864.75279665182\n2726,146798.75875714578\n2727,182749.72072674832\n2728,184635.54961258755\n2729,150085.67544174762\n2730,147393.57789859676\n2731,124918.40233752485\n2732,132641.61421278268\n2733,160643.18760566995\n2734,140549.26381869652\n2735,131278.2977669769\n2736,142001.95676352087\n2737,117279.51390138296\n2738,134967.1093738705\n2739,157857.06261463935\n2740,132687.5915468404\n2741,145046.4294216244\n2742,153975.01220287848\n2743,156951.55349074412\n2744,156371.58008866353\n2745,139983.83260069345\n2746,135513.43391879552\n2747,151760.91196540513\n2748,126837.18448653826\n2749,128120.03967358505\n2750,132603.08625106714\n2751,132872.15625066782\n2752,217971.32023030936\n2753,149512.09589559725\n2754,206278.39608926827\n2755,135020.96624826404\n2756,92921.60434005919\n2757,71666.38964928426\n2758,88712.22853827875\n2759,156785.27837430997\n2760,123991.27031653002\n2761,139723.76078663065\n2762,141290.205374936\n2763,190838.2743336716\n2764,152295.54443796372\n2765,306494.87953266624\n2766,131428.67796152408\n2767,89065.18968201803\n2768,128843.78088136081\n2769,127251.69170607986\n2770,145182.08430663092\n2771,108110.6684802311\n2772,113975.02985629716\n2773,165617.25920432026\n2774,141504.42066841095\n2775,124374.1728521385\n2776,146214.22352735215\n2777,142172.9307638285\n2778,112555.79821663653\n2779,120925.64562181552\n2780,98294.61132469741\n2781,94321.24339890474\n2782,94800.38095929791\n2783,95909.84408222223\n2784,122739.22165071112\n2785,135106.59481398828\n2786,66766.98985599585\n2787,129180.7854798207\n2788,73225.35052651077\n2789,178220.6047574852\n2790,98363.46526507534\n2791,108296.78074150412\n2792,67680.62369484855\n2793,164558.69104522746\n2794,97078.37729780222\n2795,119456.69081743945\n2796,93877.09289919099\n2797,198685.7482263114\n2798,113920.87025333311\n2799,115058.71252677936\n2800,66671.00876691956\n2801,106110.80373997761\n2802,129142.46859769365\n2803,155909.90631749164\n2804,133661.38702242295\n2805,96832.21976755674\n2806,78536.56546047682\n2807,160518.3094340028\n2808,145769.20881339276\n2809,129650.23583426666\n2810,130020.34712027386\n2811,164299.2632958381\n2812,164930.64123941233\n2813,152264.89741768895\n2814,155470.01782665466\n2815,101211.17364721603\n2816,236190.38470507934\n2817,145600.44639880685\n2818,130607.03138645018\n2819,167017.12673452863\n2820,133448.5882575209\n2821,103319.74920667218\n2822,199547.83617886575\n2823,289217.50608274643\n2824,165196.52887961254\n2825,144345.9960378881\n2826,127872.56022269165\n2827,139435.60553945345\n2828,215289.78866390718\n2829,206744.46457883436\n2830,232949.77847739248\n2831,179720.19526898305\n2832,241065.28120849252\n2833,309075.4784102683\n2834,223518.7309007568\n2835,223148.98764421372\n2836,191472.77039424542\n2837,165527.929252597\n2838,146869.73801061546\n2839,170283.398033935\n2840,203379.87903071463\n2841,204865.49094488402\n2842,233574.7530901601\n2843,147660.11748787653\n2844,177368.83786246134\n2845,125517.36597360016\n2846,210946.60006271277\n2847,208254.97955166557\n2848,223172.47928473275\n2849,203755.3251883776\n2850,286820.507354346\n2851,217479.6635793572\n2852,238554.75668575105\n2853,236462.99388300997\n2854,137172.40751783104\n2855,201523.34803487654\n2856,200578.33655772058\n2857,189384.66096597744\n2858,205987.0344737348\n2859,115419.92826033289\n2860,130214.29484832702\n2861,126020.19334298435\n2862,187281.9555816951\n2863,124439.9216838468\n2864,236445.22204878478\n2865,144623.18931534982\n2866,139207.2852606426\n2867,91879.26331826541\n2868,93900.48296069608\n2869,99487.579547038\n2870,128283.66233315768\n2871,90765.52355476486\n2872,48978.84149065566\n2873,98912.39968989247\n2874,143283.6031331402\n2875,113342.50065321193\n2876,162307.50860689412\n2877,141484.28773195946\n2878,171891.48193677166\n2879,140326.52485531615\n2880,104425.22061868774\n2881,146129.80900017073\n2882,168039.44191200432\n2883,193252.42887986518\n2884,210571.44545784034\n2885,190459.68008439592\n2886,221607.15443759432\n2887,95413.2204453601\n2888,132322.9356157994\n2889,56214.64113403039\n2890,75910.32213394923\n2891,132416.7679762803\n2892,56669.92703968279\n2893,96791.64099791527\n2894,66498.59701268376\n2895,311544.399163394\n2896,277435.88855715765\n2897,219888.38005308047\n2898,151355.18895414058\n2899,211108.39799606314\n2900,163102.2659720965\n2901,203082.8178086734\n2902,186859.57423805963\n2903,331926.7525475502\n2904,345555.5922854043\n2905,92964.05064696432\n2906,203137.9109870475\n2907,113142.67592681957\n2908,130217.59253504862\n2909,152948.51454304782\n2910,78075.04498033761\n2911,80588.05484898394\n2912,156791.49349737584\n2913,85931.42234390511\n2914,78537.68110037813\n2915,86660.03782677995\n2916,84167.55482507776\n2917,169186.2863039336\n2918,119185.44587644828\n2919,229885.53623297266\n'

O score obtido na competição foi 0.12191.

Conclusão¶

Neste trabalho, comparou-se o desempenho dos modelos de Regressão Linear, Random Forest Regressor, KNN Regressor e Support Vector Regression, onde, antes da otimização o melhor foi o Support Vector Regression, como RMSE = 0.14246. Após a otimização de cada modelo, o menor RMSE foi obtido usando a regularização Lasso (RMSE = 0.12457). Finalmente, temos a superação de todos esses modelos através do Stacking Regression, que chegou a um RMSE = 0.12113.

A técnica de empilhamento de diversos modelos, chamada de Stacking pode ser eficiente para criar um modelo do tipo ensemble que apresenta desempenho superior a outros modelos isolados. No entanto, isso pode vir a ter um alto custo computacional e o ganho pode não ser tão superior, tornando a sua utilização inviável.