What are the column names of input variables in assignment 2 DataFrame (insurance.csv)?
I am confused because the smoker and sex columns are don’t have integer dtypes as their values but still they are also major factors for calculating charges?
Can someone please quickly help me regarding this?
These columns are first converted into categorical variable and then their code is being used as the value forwarded into the model.
It’s actually done in one line in dataframe_to_arrays
function.
ok i have already put them in categorical columns ,so i don’t need to put them again in input_columns right?
They’re still an input, the categorical_cols
is only used to specify which columns should be treated as described above. Function:
If you would ignore them in input_cols
then the inputs_array
wouldn’t contain them.
Ok that makes it clear. Thank you.
May someone kindly help me out please?! I keep getting the error
“DataFrame’ object has no attribute 'to_numpy”
whenever I run the “dataframe_to_arrays” function.
Please help!
can you send your code for dataframe_to_arrays funtion?
The very one in the assignment 2 in Step 2
def dataframe_to_arrays(dataframe):
# Make a copy of the original dataframe
dataframe1 = dataframe.copy(deep=True)
# Convert non-numeric categorical columns to numbers
for col in categorical_cols:
dataframe1[col] = dataframe1[col].astype(‘category’).cat.codes
# Extract input & outupts as numpy arrays
inputs_array = dataframe1[input_cols].to_numpy()
targets_array = dataframe1[output_cols].to_numpy()
return inputs_array, targets_array
Hi,
Your error rises from how you assign input_cols
and categorical_cols
. When you assign them with df.columns
it returns an index type object whereas you need a list of column names:
input_cols = list(dataframe.columns)[:5]
dataframe_to_array()
function runs without an error when the input_cols
and categorical_cols
are lists.
Hope it helps
Oh! Perfect! Many thanks for the help, it now works. It was a big frustration on my part and I would not proceed forward.
Great indeed and many thanks again.
Thank you so much! I’ve been wondering what went wrong because I had the list of column names. But I think, that where I got it wrong. My list isn’t a list
Yeah that happens Be sure to check your datatypes when working with pandas. It may not always return what you expect.
Thanks so much @shaica! But unfortunately, my issues still persist. I’ve been getting this error that the copied dataframe1 does not have ‘col’. I’ve checked the code line by line, it seems like the dataframe does not recognize the categorical_cols
elements as one of its objects. Which is previously understandable to me given that the categorical_cols
is a list of the column names in dataframe1
.
I did test the lists of column titles and its values previous, but I still receive this error. Do you have any idea why?
I think I found the solution!!! It is at the defined function.
At the loop:
for col in categorical_cols:
dataframe1[col] = dataframe1.col.astype('category').cat.codes
The conversion of dataframe1.col.astype.cat.codes
, I changed it to dataframe1[col].astype('categorical').cat.codes
. Then it works! I’m not sure why though…
Hi again,
There is a typo in your code. Under the for loop right of the equal sign dataframe1.col.astype … should be
dataframe1[col] = dataframe1[col].astype('category').cat.codes
Ah! Hahahaha! Thanks! It came that way in the assignment and I didn’t think twice about it. Thanks again!