ZERO TO GAN's Assignment 2

What are the column names of input variables in assignment 2 DataFrame (insurance.csv)?
I am confused because the smoker and sex columns are don’t have integer dtypes as their values but still they are also major factors for calculating charges?
Can someone please quickly help me regarding this?

These columns are first converted into categorical variable and then their code is being used as the value forwarded into the model.

It’s actually done in one line in dataframe_to_arrays function.

ok i have already put them in categorical columns ,so i don’t need to put them again in input_columns right?

They’re still an input, the categorical_cols is only used to specify which columns should be treated as described above. Function:
obraz

If you would ignore them in input_cols then the inputs_array wouldn’t contain them.

Ok that makes it clear. Thank you.

May someone kindly help me out please?! I keep getting the error
“DataFrame’ object has no attribute 'to_numpy”
whenever I run the “dataframe_to_arrays” function.
Please help!

can you send your code for dataframe_to_arrays funtion?

The very one in the assignment 2 in Step 2

def dataframe_to_arrays(dataframe):
# Make a copy of the original dataframe
dataframe1 = dataframe.copy(deep=True)
# Convert non-numeric categorical columns to numbers
for col in categorical_cols:
dataframe1[col] = dataframe1[col].astype(‘category’).cat.codes
# Extract input & outupts as numpy arrays
inputs_array = dataframe1[input_cols].to_numpy()
targets_array = dataframe1[output_cols].to_numpy()
return inputs_array, targets_array

Hi,

Your error rises from how you assign input_cols and categorical_cols. When you assign them with df.columns it returns an index type object whereas you need a list of column names:

input_cols = list(dataframe.columns)[:5]

dataframe_to_array() function runs without an error when the input_cols and categorical_cols are lists.

Hope it helps

4 Likes

Oh! Perfect! Many thanks for the help, it now works. It was a big frustration on my part and I would not proceed forward.
Great indeed and many thanks again.

Thank you so much! I’ve been wondering what went wrong because I had the list of column names. But I think, that where I got it wrong. My list isn’t a list :sweat_smile: :sweat_smile: :sweat_smile:

Yeah that happens :slight_smile: Be sure to check your datatypes when working with pandas. It may not always return what you expect.

Thanks so much @shaica! But unfortunately, my issues still persist. I’ve been getting this error that the copied dataframe1 does not have ‘col’. I’ve checked the code line by line, it seems like the dataframe does not recognize the categorical_cols elements as one of its objects. Which is previously understandable to me given that the categorical_cols is a list of the column names in dataframe1.

I did test the lists of column titles and its values previous, but I still receive this error. Do you have any idea why?

I think I found the solution!!! It is at the defined function.

At the loop:

for col in categorical_cols:
    dataframe1[col] = dataframe1.col.astype('category').cat.codes

The conversion of dataframe1.col.astype.cat.codes , I changed it to dataframe1[col].astype('categorical').cat.codes. Then it works! I’m not sure why though… :thinking:

Hi again,

There is a typo in your code. Under the for loop right of the equal sign dataframe1.col.astype … should be

dataframe1[col] = dataframe1[col].astype('category').cat.codes

Ah! Hahahaha! Thanks! It came that way in the assignment and I didn’t think twice about it. Thanks again!