PyByte
Posts
🤯 ChatGPT just released CRAZY new features

🤯 ChatGPT just released CRAZY new features

What's the fastest way to iterate over a DataFrame?

Joel Johnson
November 09, 2023

from PyByte import spotlight_story, deep_dive, get_inspired, byte_break

# Welcome to the PyByte!

👋 Hello coders! Elevate your Python knowledge with this week's PyByte.

spotlight_story()

OpenAI announced this week a TON of stuff. The most exciting things for me are:

GPT-4 Turbo which has a HUGE context window (128K tokens, that’s about 300 pages of a book 📕)
GPT-4 Turbo is WAY cheaper 🤑 (3x cheaper for input [think prompts & stuff you pass it] and 2x cheaper for output [it’s response])
GPT-4 Turbo supports visual inputs
You can use DALL-E for image generation
Text-to-speech capabilities 🗣️ (CRAZY natural sounding voices, IMO better than Amazon Polly)

The reason I’m sharing this is because ALL of this is available via their API that you can use within Python.

If you’ve taken the OpenAI section of my Python for Professionals course you’ll be able to use all of these immediately (and possibly 10x your workflow).

If you would be interested in seeing a technical deep_dive() on how to use the OpenAI API in Python in next weeks newsletter, reply “CHATGPT TUTORIAL” to this email.

deep_dive()

This deep dive comes from an article of a friend of mine who went deep to answer the question: what is the fastest way to iterate over a Pandas DataFrame?

Speed tests and 13 examples: how to iterate over Pandas DataFrames in Python without iterating

Key takeaways: Use vectorization. Speed profile your code! Don’t assume something is faster because you think it is faster; speed profile it and prove it is faster. The results may surprise you.

gabrielstaples.com/python_iterate_over_pandas_dataframe/#gsc.tab=0

It’s worth the read if you’ve wondered the ABSOLUTELY fastest way to use DataFrames, but below I’ve got the TLDR version.

But first, a quick recap. Pandas is a tool that helps us work with lots of data easily in a DataFrame which is like a giant, smart spreadsheet.

Here’s how we create a DataFrame with some fake data in a dictionary (2 columns, 3 rows).

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

Iterating over the data row-by-row using something like iterrows() is slow. Here we create a new column C which is a sum of rows A and B.

for index, row in df.iterrows():
    df.at[index, 'C'] = row['A'] + row['B']

Here's the pro tip: instead of iterating, we use something called 'vectorization'.

Using vectorization is like giving all the data a group instruction, so they all change at once.

df['C'] = df['A'] + df['B']

So in the future if you care about speed, stick with vectorization (and list comprehension which we’ll learn about later) to stay speedy.

Gif by minions on Giphy

get_inspired()

Competence comes in all shapes and sizes. Let’s just define it simply as follows:

The ability to get what you want, how & when you want it.

Incompetence is where all the dangerous mistakes are made. Here’s a quick refresher for the Dunning-Kruger Effect.

This occurs when someone is not only incompetent but they don’t realize how bad they are.

When you review your month, or year, what is that “one thing” you’ve been constantly trying to fix or improve or master but you never make any progress? There’s likely a cornerstone skill or perspective that has you stuck at level 1.

byte_break()

Only good thing I’ve seen on threads.net all week 🙂

See you next week 👋

Joel