Jovian
⭐️
Sign In

数据集中的数据类别包括租赁房源、小区、二手房、配套、新房、土地、人口、客户、真实租金等。

  • Score: 类似 mse/var_ 衡量 误差与标准差的差异

In [1]:
from yuan.pipe import *
from yuan.eda import SimpleEDA
from sklearn.metrics import mean_squared_error, r2_score

In [2]:
# tradeMeanPrice leak? 未来的数据平均值?
# ID 采样顺序成周期性?
In [ ]:
 
In [11]:
train = pd.read_csv('../../Data/Future/train_data.csv')
y = train.tradeMoney

租赁房源 = train.columns[:8]
小区信息 = train.columns[8:14]
配套设施 = train.columns[14:27]
二手房 = train.columns[27:31]
新房 = train.columns[31:37]
土地 = train.columns[37:43]
人口 = train.columns[43:46]
客户 = train.columns[46:49]
真实租金 = train.columns[49:51]
In [4]:
se =  SimpleEDA(train, ['ID'])
se.summary(20)
1. 统计缺失率... uv 0.043436 pv 0.043436 dtype: float64 2. 统计类别数... city 1 houseFloor 3 rentType 4 houseDecoration 4 supplyLandNum 4 tradeLandNum 5 interSchoolNum 7 houseToward 10 hospitalNum 11 subwayStationNum 13 region 15 privateSchoolNum 17 mallNum 17 parkNum 18 saleSecHouseNum 28 lookNum 32 gymNum 39 drugStoreNum 42 schoolNum 44 bankNum 45 dtype: int64

租赁房源

In [5]:
df1 = train[租赁房源]
df1.head()
SimpleEDA(df1, ['ID']).summary()
Out[5]:
1. 统计缺失率... Series([], dtype: float64) 2. 统计类别数... houseFloor 3 rentType 4 houseDecoration 4 houseToward 10 totalFloor 55 houseType 104 dtype: int64
In [6]:
# sns.countplot('houseFloor', hue='rentType', data=df1)
# train.groupby('tradeLandArea')['tradeMoney'].median().sort_index().plot()
# train.tradeMeanPrice.apply(np.log1p).hist()

# train.tradeMoney[:1000].diff().reset_index(drop=True).plot()
In [7]:
from sklearn.preprocessing import LabelEncoder
In [8]:
for i in train.dtypes[lambda x: x=='object'].index:
    train[i] = LabelEncoder().fit_transform(train[i])
In [9]:
X = train.drop(['tradeMoney'], 1)
# y = np.where(train.tradeMoney>5500, 1, 0)
In [13]:
# from yuan.models import OOF
from lightgbm import LGBMClassifier, LGBMRegressor
oof = OOF(LGBMRegressor(), folds=KFold(5, True, 666))
In [ ]:
oof.fit(X, y, X, feval=r2_score)
Fold 1 started at Thu Apr 25 10:48:05 2019 Training until validation scores don't improve for 300 rounds. [100] training's l2: 4.39292e+10 valid_1's l2: 1.20865e+12 Did not meet early stopping. Best iteration is: [100] training's l2: 4.39292e+10 valid_1's l2: 1.20865e+12 Fold 2 started at Thu Apr 25 10:48:06 2019 Training until validation scores don't improve for 300 rounds. [100] training's l2: 2.14152e+11 valid_1's l2: 1.33175e+10 Did not meet early stopping. Best iteration is: [100] training's l2: 2.14152e+11 valid_1's l2: 1.33175e+10 Fold 3 started at Thu Apr 25 10:48:06 2019 Training until validation scores don't improve for 300 rounds. [100] training's l2: 2.21599e+11 valid_1's l2: 3.4149e+10 Did not meet early stopping. Best iteration is: [100] training's l2: 2.21599e+11 valid_1's l2: 3.4149e+10 Fold 4 started at Thu Apr 25 10:48:07 2019 Training until validation scores don't improve for 300 rounds. [100] training's l2: 2.17142e+11 valid_1's l2: 3.33666e+11 Did not meet early stopping. Best iteration is: [100] training's l2: 2.17142e+11 valid_1's l2: 3.33666e+11 Fold 5 started at Thu Apr 25 10:48:08 2019 Training until validation scores don't improve for 300 rounds. [100] training's l2: 2.6807e+11 valid_1's l2: 5.74428e+10 Did not meet early stopping. Best iteration is: [100] training's l2: 2.6807e+11 valid_1's l2: 5.74428e+10
In [1]:
1
Out[1]:
1
In [1]:
from yuan.pipe import *

In [2]:
from yuan.utils import jupyter
In [5]:
jupyter.commit(nb_filename='RentForecast.ipynb')
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJmcmVzaCI6ZmFsc2UsImlkZW50aXR5Ijp7InVzZXJuYW1lIjoiSmllLVl1YW4iLCJpZCI6Njd9LCJ0eXBlIjoiYWNjZXNzIiwiZXhwIjoxNTUyOTY1NTkzLCJpYXQiOjE1NTIzNjA3OTMsIm5iZiI6MTU1MjM2MDc5MywianRpIjoiNjM1ZTg2MjQtYjA1ZC00NGJmLTljYjAtOGVjOGRmM2ExNmJkIn0.5jglhEGGs12ITl-DWWaFL-BVPhCzaDEeMKIJvEI-bbA
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-5-cecfce311dda> in <module>() ----> 1 jupyter.commit(nb_filename='RentForecast.ipynb') ~/Desktop/intelpython3/lib/python3.6/site-packages/yuan/utils/jupyter/__init__.py in commit(notebook_id, nb_filename) 20 "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJmcmVzaCI6ZmFsc2UsImlkZW50aXR5Ijp7InVzZXJuYW1lIjoiSmllLVl1YW4iLCJpZCI6Njd9LCJ0eXBlIjoiYWNjZXNzIiwiZXhwIjoxNTUyOTY1NTkzLCJpYXQiOjE1NTIzNjA3OTMsIm5iZiI6MTU1MjM2MDc5MywianRpIjoiNjM1ZTg2MjQtYjA1ZC00NGJmLTljYjAtOGVjOGRmM2ExNmJkIn0.5jglhEGGs12ITl-DWWaFL-BVPhCzaDEeMKIJvEI-bbA") 21 print('\n') ---> 22 jovian.commit(nb_filename=nb_filename, env_type='pip', notebook_id=notebook_id) TypeError: commit() got an unexpected keyword argument 'nb_filename'
In [7]:
import jovian
In [ ]:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZGVudGl0eSI6eyJpZCI6NjcsInVzZXJuYW1lIjoiSmllLVl1YW4ifSwianRpIjoiNmRjOWMxNDEtZDI4NS00ODNlLWEzM2YtZmZlNTFmNGUyNDE4IiwidHlwZSI6ImFjY2VzcyIsIm5iZiI6MTU1NjE2NTE4MywiZnJlc2giOmZhbHNlLCJleHAiOjE1NTY3Njk5ODMsImlhdCI6MTU1NjE2NTE4M30.SHVx2nih5ePXORMeBgDVal440v7NsE8Tvj8ADbFtFjo
In [ ]:
jovian.commit()
[jovian] Saving notebook..
In [12]:
import prettytable
In [28]:
p = prettytable.PrettyTable(['ID', 'area', 'rentType'])

In [29]:
d = train
In [33]:
p.add_row([])
In [ ]:
 
In [34]:
print()

+----+------+----------+ | ID | area | rentType | +----+------+----------+ | 1 | 3 | 2 | +----+------+----------+
In [ ]: