Pandas vs SQL/Excel?

why Pandas when data analysis and presentation can be done in SQL or excel :slight_smile: pl elaborate more use cases. Thank you!

You can check this:

1 Like

will check … Thanks!

1 Like

I am sure that you have gotten a ton of answers but here is my 2 cents.

First off SQL is a relational database language. It is very good at saving, returning, filtering and performing joins on database tables. It absolutely sucks at performing data analysis on data. SQL get x, y, and sometimes z = awesome. SQL give me the mean on dataset x = not awesome.

Excel is very good at quick calculations. However, Excel enables the user to continuously make bad habits. First and foremost it hides formula and hides work. Looking through an Excel sheet someone else made is an exhausting exercise. More over if the data analysis you are working on is going be repeated, Pandas can be set up as a program process. Make it once, run it forever. Excel takes a bit more to reach this functionality. Lastly, if working with big data, Excel has it’s limits on rows available and columns. Also the data has to be loaded into memory when working with it. If your dataset is larger than your memory… Pandas has the ability to work on portions of the data without loading the entire data to memory.

I can hate on Excel all day long.

Hope that helps

Entropy.

4 Likes

My initial impression of the advantages of Pandas vs SQL

  • Legibility: To build calculated columns in SQL, you need to embed your calculation within the select statement. With a complex SQL statement containing many joins, the SQL statement can become very convoluted and difficult to read. Pandas solves that problem by separating the data read and calculation statements from each other. It also enables commenting to occur around the processing code.

  • Local or distributed processing: If you’ve worked with a large transactional database, you’ll know that the DBA doesn’t like a lot of people running queries against the system because this affects transaction speeds. In some cases, complex queries have to be run outside of normal operational hours. With pandas, your CPU or a seperate data service CPU performs this processing.

  • Big data analysis: A huge amount of data online is not in a database format. Pandas allows you to write processing code against a common easy to produce format (CSV)

  • Data presentation: SQL by itself only performs CRUD (Create, Read, Update, Delete) operations on a database. It can only present data in a textual format. Pandas allows you to visualise the data in different types of charts.

3 Likes