I've been struggling for almost an hour to solve the error when running DecisionTreeClassifier.
TypeError Traceback (most recent call last) <ipython-input-42-dbd33597c073> in <module>() 7 learner = DecisionTreeClassifier(random_state = 2) ----> 8 learner = learner.fit(features_train[:int(sample_size)], outcome_train[:int(sample_size)]) ********* omitted ********** TypeError: float() argument must be a string or a number
Some people say that is because of NaN value so I checked but there were no NaN value.
print features_train.isnull().values.any() -> False
Is there anything wrong in the code lines where values are converted to datetime? But it seems right.
features_train['date_x'] = pd.to_datetime(features_train['date_x'], format='%Y-%m-%d')
Then, finally I found the link says you just need to datetime into categorical vaue.
I just intentionally didn't convert datetime into categorical date when encoding the data set because I was not sure if it is mathematically right.
After including date columns in data encording process, DecisionTreeClassifier worked as expected and returned its prediction.
columns = ['activity_category', 'people_id', 'activity_id', 'date_x', ****** omitted ***** ] #Set concoder encoder = LabelEncoder() #Encode data frame encoded_people_act_train_df = people_act_train_df.copy() for col in columns: encoded_people_act_train_df[col]= encoder.fit_transform(people_act_train_df[col])