GenAIClub.ro, artisans, speeding up OpenAI, and staying safe when downloading random models

Mar 15, 2024

Hi friends, and welcome to another issue of this newsletter!

Let’s see what’s on the menu — a new GenAI community that’s entirely unrelated to that Gen V series you prolly haven’t seen, a brilliant article comparing the mass-production of digital goods to the mass-production of physical goods, tips on speeding up batches of OpenAI calls, and some safety info for when you just want to try out new PyTorch models.

New Community

Yup, I'm doing a community again, and this time it's focused on GenAI and Romania 🤖🇷🇴. Co-founded by myself and Daniel Costea, we're calling it....GenAIClub.ro b/c well, you know, we like being obvious.

Our first event is on March 26 -- it has couple of cool talks (if I do say so myself), plus a panel with several other Microsoft MVPs.

You might want to follow us here. 🤗

Speeding up OpenAI calls

Assumption: You're processing a pandas dataframe (or any sort of list really), which can reasonably fit in memory. Not only that, but you’re doing one or more OpenAI calls for each and every row.

If this sounds like you, then here’s how you can speed this up (in my case, I got a total runtime reduction of about 60%):

Use the Async* clients
await all API calls
Replace pandas.apply with asyncio.gather

In a nutshell use the code below:

from openai import AsyncOpenAI, AsyncAzureOpenAI 
import asyncio

client_async = AsyncAzureOpenAI(...)

chat_response = await client_async.chat.completions.create(
        model=model, messages=messages,
    )

df[:, 'is_relevant'] = await asyncio.gather(*add_relevance_async(company, row) for _, row in df.iterrows()))

instead of this

from openai import OpenAI, AzureOpenAI 

client = AzureOpenAI(...)

chat_response = client.chat.completions.create(
        model=model, messages=messages,
    )
    
df.loc[:, 'is_relevant'] = df.apply(lambda row: add_relevance(company['name'], row), axis=1)

(at this point I’m really amazed at how bad Substack’s formatting options are, and am seriously considering showing code as images in the future)

On Artisans

You’re an artisan.
You’ve spent years or decades cultivating your craft, drawing knowledge and inspiration from family, teachers, peers, and artistic heroes. Inspired by them, you learned, practiced, and developed, hoping that someday you can leverage your passion, skills, and creativity into not just a remunerative career, but an emotionally and spiritually fulfilling one.
But then someone invents a way to automate what you do, to have machines carry out the same craft. The results aren’t quite as good as what artisans like you can do, but they are good enough to meet most people’s needs—and, even more damaging to your dreams, they are considerably cheaper. You can only work so fast, and thus need to charge a pretty high price for your work to keep a roof over your head, the lights on, and your stomach full.
The machines work far quicker, producing in mass, and you now justifiably fear you’ll never be able to make your art more than a hobby.

You may or may not agree with Aaron Ross Powell’s article on mass-produced art, but it will make for a good read.

Security, security, security

Recently, JFrog's researchers recently identified about 100 malicious models hosted on Hugging Face. Wondering how that's possible and/or what you can do about it?

Well, it all starts from PyTorch's default serialization format -- Python pickle files. The issue with pickle files is that they can include EXECUTABLE CODE (because why not), so naturally some people will try to include "executable code that does bad things to your computer" (tm).

This is why, incidentally, running open-source apps that gleefully download 10s of gigabytes of models from dozens of obscure HuggingFace accounts with no confirmation whatsoever scare me so much.

One way to mitigate this is to use safe serialization formats -- I'm talking of course about safetensors here -- which are not able to carry executable code BY DESIGN. When using the diffusers library for example, you should always set use_safetensors=True when loading pipelines and the library will make sure to download the correct files.

Or, of you find yourself desperately wanting to run that random .pt or .ckpt model that will improve your life tenfold, you can try to spin off a CPU VM somewhere, load the .pt using PyTorch, save it as safetensors to purge all/any executable code, load it as safetensors, and finally save it back to .pt.

Still a better experience than getting hacked I guess.

Vlad Iliescu

Discussion about this post