Episode 3

HockeyStick #3 - Generative AI with Mark Liu

Published on: 15th April, 2024

Exploring the Frontiers of Generative AI with Mark Liu

In this episode of HockeyStick, host Miko Pawlikowski interviews Mark Liu, a finance professor and seasoned coder, to dive into the world of Generative AI. Liu, author of 'Learn Generative AI with PyTorch', shares his journey from finance to AI, emphasizing the importance of coding in modern finance and his transition to Python for teaching analytics. He recounts the creation of his books, starting with Python applications in finance, moving to machine learning and generative AI, and finally to PyTorch for efficient model training. The conversation explores the basics of generative AI, breakthroughs like GANs, transformers, and diffusion models, and Liu's predictions for AI's impact on future jobs and technology development. They discuss the significance of generative AI in various industries, ethical considerations, and the potential directions for Liu's future work, highlighting the critical role of hands-on learning in understanding and leveraging AI technologies.

00:00 Welcome to HockeyStick: The Generative AI Revolution

00:25 Meet Mark Liu: From Finance to AI Pioneer

02:56 The Journey of Learning and Teaching Python

08:00 Exploring Generative AI and PyTorch

10:39 The Magic of Generative AI: From Text to Lifelike Images

14:50 The Impact of AI on Industries and Job Security

21:18 Understanding the Breakthroughs Behind Generative AI

34:16 The Role of Hardware and Data in AI's Rapid Advancement

37:15 Exploring the Versatility of GPUs

38:17 The Future of AI: Predictions and Possibilities

39:49 Navigating the Hype Cycle of AI

42:09 Ethical Considerations and the Future of AI Regulation

45:39 The Impact of AI on Politics and Elections

46:58 The OpenAI Controversy: From Open Source to For-Profit

51:02 PyTorch vs. TensorFlow: Predicting the Future

54:21 A Deep Dive into Generative AI and PyTorch

01:14:22 Future Projects and Closing Thoughts

Transcript
Speaker:

I'm Miko Pawlikowski and this is HockeyStick.

Speaker:

Generative AI is on everyone's mind.

Speaker:

From essays to photorealistic pictures to high quality videos it

Speaker:

has changed the way we think about creativity and intelligence forever.

Speaker:

If the AI won't steal your job, but somebody using AI will, then

Speaker:

the best defense is to learn how this technology works ASAP.

Speaker:

Today, I'm bringing you Mark Liu, the author of Learn Generative AI with

Speaker:

PyTorch, a tenured finance professor and the founding director of the Master

Speaker:

of Science in Finance program at the University of Kentucky and a veteran

Speaker:

coder with over 20 years of experience.

Speaker:

In this conversation, we'll talk about learning through doing, how everybody can

Speaker:

build generative AI models, the various breakthroughs that allowed for the current

Speaker:

AI explosion to take place, and make some wild predictions about the future.

Speaker:

Welcome to this episode and please enjoy.

Speaker:

How are you doing today?

Speaker:

Pretty good.

Speaker:

Thank you Miko.

Speaker:

glad to be here.

Speaker:

Yeah, I'm very excited.

Speaker:

not only because I'm hoping to learn so many interesting things from

Speaker:

your book, but also because I'm very curious, how does somebody who's a

Speaker:

founding director of a master of science in finance and a tenured professor

Speaker:

in finance, decide to go into AI.

Speaker:

Tell us a little bit about your story.

Speaker:

it goes back to, like five years ago, in 2017, our department wanted to launch

Speaker:

a Master of Science in Finance program.

Speaker:

And it is that point, I've been tenured for about five years.

Speaker:

I was always, very adventurous, trying to do new things.

Speaker:

I was appointed the founding director to start an academic

Speaker:

graduate program from scratch.

Speaker:

And, I was very much into it.

Speaker:

it was a lot of work.

Speaker:

But I thoroughly enjoyed it.

Speaker:

So our program launched in fall of 2017.

Speaker:

And it's a one year program.

Speaker:

at the end of 2017, We started to, place our students.

Speaker:

the very first year we had 30 students in the program, which is a great number.

Speaker:

And, I talked to many employers, many companies, trying to

Speaker:

place our MS Finance students.

Speaker:

I heard the same thing again and again.

Speaker:

they told me that they want somebody who not only knows finance, but also

Speaker:

knows coding programming analytics and the number one programming language

Speaker:

in finance is Python and I've been doing programming for many years

Speaker:

So those are mainly, statistical, software to run regression

Speaker:

for the finance research.

Speaker:

And then I had to learn Python from scratch in order to teach my students.

Speaker:

And it turns out that Python is a very user-friendly programming language,

Speaker:

so even if you never programmed before, you can guess what a block

Speaker:

of code is trying to accomplish.

Speaker:

I started to run Python workshops to MS finance students and gradually I

Speaker:

accumulated a lot of teaching notes and I also had to convince my students to

Speaker:

use Python, because some of the students said that, "I can do everything in

Speaker:

Excel, why should I learn Python", right?

Speaker:

And then I told them that, Excel is not exactly a programming language,

Speaker:

and you do need a programming language in order to automate things

Speaker:

to make sense, more convenient, the bigger programs, that kind of stuff.

Speaker:

So what I did was I started to create fun projects in finance, like speech

Speaker:

recognition and text to speech.

Speaker:

So one example would be I add those features to a finance calculator.

Speaker:

what you can do is that you can actually speak to a computer, and ask the

Speaker:

computer to do a finance calculation.

Speaker:

you can tell the program in a human voice "what is the present value of

Speaker:

$1000 in five years", And then the program will do the calculation and

Speaker:

tell you the answer in a human voice.

Speaker:

and then that caught a student's attention.

Speaker:

So I started to do those kind of applications.

Speaker:

And then after a year or so, I had plenty of projects.

Speaker:

And then some students told me "you should write a book about it".

Speaker:

So I started to, send the manuscript to no starch, press to publish the book.

Speaker:

The moment my colleagues, or my students, or a lot of my friends, even my family

Speaker:

members, heard that I'm writing a programming book, In Python about the

Speaker:

speech recognition and the text to speech, their first reaction was, "I

Speaker:

thought you were a finance professor"

Speaker:

that question came up again and again.

Speaker:

And then I gave them a famous quote by a chief risk officer from Deutsche Bank.

Speaker:

"banks are essentially technology firms now".

Speaker:

So there is a lot of truth in that because in order to be in the field of finance,

Speaker:

you need to know a lot of technology, know programming, know, analytics and so forth.

Speaker:

So that was my first book.

Speaker:

in 2020, it's finally published in 2021.

Speaker:

So I think I, signed a contract with them in 2019.

Speaker:

And then after that, I.

Speaker:

Started to, teach a course in the MS finance program.

Speaker:

So it's called, Python, predictive analytics.

Speaker:

so use Python to do machinery models.

Speaker:

for business analytics, and, I started to, teach students a lot of machine learning

Speaker:

models, including, deep neural networks.

Speaker:

And then, again, I, accumulated a lot of, notes.

Speaker:

And then,

Speaker:

I came across a video from DeepMind, showing how you can actually play

Speaker:

Atari games like, Breakout, by training a computer program to play

Speaker:

the game, at a superhuman level.

Speaker:

So what happened was, not only the computer program learned,, To play

Speaker:

the game, it actually figured out a way to score very efficiently, a

Speaker:

way human beings didn't know before.

Speaker:

So you, dig a tunnel at the side of the wall, and then you send

Speaker:

the ball to the back of the wall to score it very efficiently.

Speaker:

When I saw that video, I was completely amazed.

Speaker:

I told myself, "I gotta figure out how this works".

Speaker:

I spent several months experimented with different kind of programs,

Speaker:

trying to figure out how it works.

Speaker:

And eventually I figured it out.

Speaker:

And that became my second book.

Speaker:

it's machine learning animated.

Speaker:

So it's published with CRC Press, last year.

Speaker:

And then, recently, once, ChatGPT was out, generative AI was very popular.

Speaker:

I was very curious.

Speaker:

I was trying to figure out how exactly a large language model

Speaker:

works, and how a computer program can understand the human language.

Speaker:

I spend a lot of time trying to figure it out.

Speaker:

Before I was actually using TensorFlow.

Speaker:

It worked pretty well for me with Atari games and so on and so forth.

Speaker:

apparently it's not great in terms of GPU training.

Speaker:

You can do GPU training, but there is an overhead.

Speaker:

So you have to program everything in CPU and then send it to the GPU.

Speaker:

Do the calculation and then send it back.

Speaker:

the overhead is just too much.

Speaker:

So it ended up, not very fast.

Speaker:

then I learned another AI framework called PyTorch.

Speaker:

you can explicitly send a tensor to GPU to do the calculation and so on and so forth.

Speaker:

It's a little more complicated than TensorFlow because you do have to send

Speaker:

something to GPU and then, get it back.

Speaker:

So in terms of coding, you have to do a slightly more work, but in

Speaker:

terms of performance, it's amazing.

Speaker:

So I get to, train models.

Speaker:

7 to 10 times faster, compared to CPU training.

Speaker:

as all those large language models, they have billions or hundreds

Speaker:

of billions of parameters, right?

Speaker:

So the speed is crucial.

Speaker:

RIght now, I'm like, training a model with millions of parameters.

Speaker:

which is fine.

Speaker:

So for, even larger kind of language models, in my third book, which

Speaker:

is with, manning publications.

Speaker:

So in this book, I'm doing generative AI with PyTorch.

Speaker:

the reason I switched to PyTorch is because of dynamic, computing,

Speaker:

graph, and then, the GPU training.

Speaker:

I can train most models in a matter of minutes.

Speaker:

sometimes I get a larger ones, maybe a couple of hours.

Speaker:

That's it.

Speaker:

I can see the model in action and then I can tune the model

Speaker:

so that's the third book.

Speaker:

So let me, conclude by quickly summarizing what I'm doing in the third book.

Speaker:

the name, I think you just mentioned at the beginning.

Speaker:

Learn Generative AI with PyTorch.

Speaker:

Readers learn to create generative AI models from scratch, to create the

Speaker:

different contents like, images, shapes, numbers, text, music, sound, so forth,

Speaker:

all with PyTorch and deep learning models.

Speaker:

And in particular,

Speaker:

readers learn how to create.

Speaker:

a ChatGPT-style transformer from scratch, and then in particular, I teach

Speaker:

readers how to create a GPT-2 XL with 1.5B parameters Of course, with 1.

Speaker:

5 billion parameters, it's very hard to train, right?

Speaker:

It's very slow, number one.

Speaker:

Number two, GPT-2 was trained with huge amounts of data, and regular readers don't

Speaker:

have access to this training data, right?

Speaker:

but, I also teach readers how to extract the pre trained weights from

Speaker:

OpenAI and then you load those weights into the GPT-2 model you created from

Speaker:

scratch, and start to generate the text.

Speaker:

So the text you generate is very coherent without grammar errors,

Speaker:

it's amazing, of course it's not as Powerful as ChatGPT GPT-4, but

Speaker:

a normal person without access to super computing facilities, without access

Speaker:

to larger amounts of training data can create a ChatGPT-style deep neural network

Speaker:

from scratch, and use it to generate a text and generate a lifelike music.

Speaker:

It's amazing.

Speaker:

And that's the text part.

Speaker:

on the image part, you can create like a color image.

Speaker:

You can also convert a horse to a zebra.

Speaker:

You can convert blonde hair to black hair in images.

Speaker:

You can add or remove glasses in images and so forth.

Speaker:

So the whole experience is amazing.

Speaker:

it worked better than anticipated.

Speaker:

And that's a whole experience.

Speaker:

Reminded me of famous quote, "technology advanced enough is

Speaker:

indistinguishable from magic".

Speaker:

The whole thing is really magic.

Speaker:

That's my long answer to your question.

Speaker:

Thank you for that.

Speaker:

just for anybody who's not familiar with Manning, the book, is currently

Speaker:

available in what's called MEEP.

Speaker:

That's for Manning Early Access Program, you can read the chapters

Speaker:

as, they are produced, by Mark.

Speaker:

So at the moment there is five chapters that are available, but I'm being told

Speaker:

that 11, will be coming, very soon.

Speaker:

And the estimated time for the whole book to be available is May

Speaker:

2024, so for anybody who's eager and who might be thinking that the

Speaker:

book is not finished yet, you can actually start reading it right now.

Speaker:

speaking of the magic and the building from scratch, I think what I liked the

Speaker:

most about your book, and what initially attracted me to actually go and read it,

Speaker:

It's that 'build from scratch' thing.

Speaker:

And I love that you used Richard Feynman's philosophy, the quote, "What

Speaker:

I cannot create, I do not understand".

Speaker:

I think that's a very good motto to live by.

Speaker:

it's absolutely great that, you take us on this journey to build

Speaker:

things up, even though I've only read the five chapters so far.

Speaker:

all of a sudden with ChatGPT, everybody started talking

Speaker:

about this and this explosion.

Speaker:

what were some other moments, other than chat GPT, where you realized,

Speaker:

Oh man, this is going to blow up.

Speaker:

This is going to be massive with generative AI.

Speaker:

I believe you mentioned, the writer's guild of America versus AI, story.

Speaker:

Can we talk about that for a minute?

Speaker:

before I answer that question, I encourage you to read my chapter one for free,

Speaker:

even if you don't have to buy my book.

Speaker:

manning has a great feature.

Speaker:

If you go to manning.com and if you look for my book, Learn Generative

Speaker:

AI with PyTorch, you can find it.

Speaker:

I have a fairly long chapter one summarizing the state of the art

Speaker:

in generative AI and also what I've been doing in the book.

Speaker:

what Miko talked about, the Writer's Guild of America.

Speaker:

So a few months ago, they, negotiated with, big firms.

Speaker:

About, The threat of, AI.

Speaker:

And as a result, it's a, contract to limit, how much AI you can use

Speaker:

in writing, in production, in order to protect the jobs of the writers.

Speaker:

And, this is just one example of the, Disruptive power of AI

Speaker:

in many different industries.

Speaker:

writers is just one example, and it threatens many other industries.

Speaker:

Another example is checkmate, which is online educational platform.

Speaker:

So college students go there to get tutoring service and so forth,

Speaker:

and with the ChatGPT actually their business model is threatened, right?

Speaker:

I think, in the month after the release of ChatGPT, their, stock

Speaker:

price plunged by almost 40%.

Speaker:

So that's how serious the, competition is.

Speaker:

Those are just, a couple of examples.

Speaker:

the potential of generative AI is huge, but at the same time, if you don't,

Speaker:

catch up with the trend, there is,

Speaker:

a risk that, your job might be, replaced by ai.

Speaker:

there is a, an interesting quote.

Speaker:

I think there is a lot of truth.

Speaker:

It says that, "AI will not take your job.

Speaker:

somebody using AI will".

Speaker:

So I think there is a lot of truth in that.

Speaker:

So in order to avoid being replaced by AI, I think the best

Speaker:

strategy is to get in the game.

Speaker:

to learn about the general AI, to protect yourself in terms of, future careers.

Speaker:

so that's,

Speaker:

the big motivation, behind my books.

Speaker:

the main motivation, of course, is intellectual curiosity.

Speaker:

I'm by nature a very curious person.

Speaker:

So when I saw like ChatGPT works like magic, I really want

Speaker:

to get it to the bottom of it.

Speaker:

And they're trying to figure out how it works.

Speaker:

So that's the main reason.

Speaker:

But at the same time, I'm trying to teach my students.

Speaker:

programming skills, machine learning skills, AI skills, generative AI skills in

Speaker:

order to prepare them for the job market.

Speaker:

so that, in the future, their skill sets will not be outdated.

Speaker:

that's my second motivation for writing the books.

Speaker:

Do you buy in this comparison that AI is like personal compuers?

Speaker:

And that, a lot of people were worried about how personal computers

Speaker:

were going to just remove jobs.

Speaker:

But what ended up happening was, some, small portion of jobs was eliminated,

Speaker:

but most of the jobs were modified, and became, operating computers.

Speaker:

Do you think that's the most apt comparison of what we're likely to

Speaker:

experience with AI in the coming years?

Speaker:

the future, is hard to predict, but personally, I think, most likely, that's

Speaker:

what's going to happen in the near future.

Speaker:

if generative AI, you can actually use it to increase your productivity,

Speaker:

to have more job opportunities.

Speaker:

On the other hand, if you, basically, completely stay away from it, your

Speaker:

skill sets might be outdated but at the same time, I think technology

Speaker:

will make all this AI stuff more accessible to most people, right?

Speaker:

You don't necessarily have to be a programmer, so one

Speaker:

example is Midjourney right?

Speaker:

you can actually just go to a browser and then you can use Midjourney

Speaker:

or DALL-E 2, DALL-E 3, or whatever to create a very fancy images.

Speaker:

You can use a text prompt to create a.

Speaker:

an image of what you meant, you don't have to draw yourself,

Speaker:

in that sense, I'm optimistic.

Speaker:

I think for most people, generative AI will be a very valuable tool

Speaker:

to increase their productivity.

Speaker:

as long as, you keep up with the technology,

Speaker:

I'm glad you mentioned Midjourney because I think for me personally, that was where

I realized:

'okay, this is the hockeystick moment' because I remember the little

I realized:

tiny pictures, blurry from the GAN paper.

I realized:

and then all of a sudden I saw some pictures that were generated by

I realized:

Midjourney and I went and I, I tried it myself and, it was more or less able

I realized:

to produce almost everything I threw at it, other than some particular types of

I realized:

dinosaurs that just didn't recognize.

I realized:

That was like the one thing I knew, 'okay, they didn't train

I realized:

it on that kind of dinosaur'.

I realized:

But, that was definitely one of those moments where I realized, wow.

I realized:

And the other is, I think, I live in London, one way or another, you end

I realized:

up using the tube a lot, and, usually you're annoyed at people who, play

I realized:

some music on like public transport.

I realized:

And then, at some point I realized that I was getting annoyed at people

I realized:

talking about generative AI, on the public transport and making noise.

I realized:

And that's when you realize that, 'okay, so this has now gone, mainstream and,

I realized:

and everybody's talking about that'.

I realized:

But let's talk a little bit about, The actual underlying breakthroughs,

I realized:

that brought us to where we are.

I realized:

And, in particular, I'm thinking about GAN, the generative adversarial

I realized:

networks and transformers and diffusion.

I realized:

where should we start?

I realized:

what's the first important breakthrough that everybody should know about?

I realized:

I think, all the generative AI models, in my book are deep neural networks.

I realized:

machine learning is a very wide field.

I realized:

there are many traditional machine learning models, random forest,

I realized:

linear regressions, this and that, but about, 20 years ago, deep neural

I realized:

networks became very powerful.

I realized:

one great thing about the neural networks is that you can scale it and, deep neural

I realized:

network can approximate any relationship, even if we human beings don't know what's

I realized:

the exact relationship, as long as you create a large enough model to capture it.

I realized:

so that's the foundation 20 years ago.

I realized:

And then over the past, 20 years or so, many people.

I realized:

Breakthroughs in, deep learning field, and then, let's talk about it like a ChatGPT.

I realized:

Okay, so ChatGPT is a huge deep neural network trained on huge amounts of data.

I realized:

And before that, state of the art, natural language processing models

I realized:

are recurrent neural networks.

I realized:

So how it works was either progresses on the timeline.

I realized:

Let's say you have a sentence like, this is a sentence, right?

I realized:

So you have like four words in the sentence, right?

I realized:

the model uses the first, word, "this" to predict the second word "is" and then

I realized:

it uses the first two words to predict the third word, and so on and so forth.

I realized:

it worked to some degree, but it's very slow because, you have

I realized:

to, predict one word at a time.

I realized:

And then in 2017, there is a huge breakthrough.

I realized:

There's a paper.

I realized:

called "attention is all you need" by a group of, Google scholars,

I realized:

and they used a different mechanism to capture the, relationship of

I realized:

different words in a sentence.

I realized:

So it's called the attention mechanism and It's much more effective on top of that.

I realized:

it's not sequential.

I realized:

So which means one word can't pay attention to all

I realized:

other words at the same time.

I realized:

And this allows for, parallel training.

I realized:

And this has huge implications.

I realized:

number one, it works better in terms of capturing long-term relationships.

I realized:

between different words in a sentence so that you can understand the meaning of

I realized:

a long sentence, long text, number one.

I realized:

Number two, because of the non sequential nature of, Attention mechanism.

I realized:

You can use parallel training.

I realized:

you can train the same model on many different devices.

I realized:

this makes training much faster.

I realized:

And this also allows you to train the model on more data.

I realized:

that's why ChatGPT became so powerful, because, you can train them much faster,

I realized:

and then you can train them on more data.

I realized:

On top of that, the mechanism works much better than recurrent neural

I realized:

networks, because it can capture really long term relationships in a sequence,

I realized:

like as a text is a sequence, right?

I realized:

that propelled, uh, OpenAI to have all these models, including ChatGPT.

I realized:

now let's go to, the recent development, the text to image transformers.

I realized:

this is a new innovation in transformer models called, multimodal models.

I realized:

The original transformer model, "attention is all you need", which powers the

I realized:

chatGPT, they only use text, right?

I realized:

So the input is a sequence of text, the output is also a sequence of text, but

I realized:

the multimodal models, the input and output can be, different formats, right?

I realized:

32, 33, the input is a text and the output is an image, right?

I realized:

you can have a different, inputs, outputs.

I realized:

You can have audio, you can have video, Sora has videos, that kind of stuff.

I realized:

but let's talk about what is the underlying mechanism

I realized:

behind multi modal models.

I realized:

DALL-E 2, DALL-E 3,s it has something to do with different models.

I realized:

So I think you mentioned that, at first the generated image is very grainy, right?

I realized:

the different models add noise to an image gradually.

I realized:

let's say there are like 1000 time steps.

I realized:

And then at each time step, you can actually add a little bit of noise

I realized:

to the image and gradually you have a 1000 different images and each one

I realized:

becomes progressively noisier and at the end, it becomes completely noisy.

I realized:

And then what you can do is that you can give those images to a

I realized:

machine learning model and you can train the model to remove those

I realized:

noises, progressively, step by step.

I realized:

that's how, DALL-E and all those text to image models work.

I realized:

first step is that you use a text prompt to generate a very grainy image, and

I realized:

then after that you use a model which is very much like a different models.

I realized:

You will progressively refine those models so that, you turn a very grainy

I realized:

image into a high resolution image.

I realized:

that's why, when you enter a like a shorter prompt and then, DALL-E 2 can

I realized:

give you a higher resolution image.

I realized:

capturing, what are you trying to produce in the text prompt.

I realized:

So that's actually chapter 14 of my book.

I realized:

I'm going to talk about how you can add a little bit of noise to

I realized:

the image, one step at a time.

I realized:

And then you can use those, images to train the model to remove the noise step

I realized:

by step progressively, and very much like, DALL-E 2 trying to, make the image clearer

I realized:

and clearer step by step progressively.

I realized:

Generative adversarial networks, which was an interesting

I realized:

development, from Ian Goodfellow.

I realized:

How does that fit into the rest of what you just described?

I realized:

Generative Adversarial, Networks, so it's great at generating

I realized:

different forms, of content.

I realized:

a lot of times when readers learn something, if you give them the end

I realized:

product, it's too complicated, right?

I realized:

So they may get frustrated and they just give up.

I realized:

as an author, my job is how to make sure that readers stay engaged throughout

I realized:

the book and never get tired, never get frustrated, and gradually learn and

I realized:

finally learn to do the state of the art machine learning models generally by

I realized:

models like ChatGPT-style transformer to generate the text and the audio, right?

I realized:

So what is the idea behind the GANs?

I realized:

You have two networks.

I realized:

One is a generator network.

I realized:

The other one is a discriminator network, so the job of the generator is trying

I realized:

to generate a piece of work similar to that from the training data set.

I realized:

let's use a grayscale image as an example, right?

I realized:

you have a training dataset of grayscale images of,

I realized:

handwritten digits, like 0 to 9.

I realized:

And then, those are the real images.

I realized:

And then you will ask the generator to generate something similar to

I realized:

that, so that it can pass as real in front of the discriminator.

I realized:

before you train the model, the generator is terrible.

I realized:

So whatever the generator generated, completely like gibberish.

I realized:

it's like a snowflake on a screen, that kind of stuff.

I realized:

But, this is where training, comes in.

I realized:

you will have a training loop, and then, in each iteration,

I realized:

you will ask the generator to generate a bunch of fake images.

I realized:

At the same time, you also have a bunch of real images from the training set and

I realized:

you give all those to the discriminator and ask the discriminator to determine

I realized:

whether each image is real or fake

I realized:

And then the generator's job is trying to create an image so that the

I realized:

discriminator would think it's real.

I realized:

that's the generator's objective.

I realized:

So therefore you have a loss function, and then you train the model.

I realized:

You gradually fine tune the model parameters so that in the next

I realized:

iteration, whatever image generated by the generator will have a higher

I realized:

probability of passing as real.

I realized:

And then you do this again and again, you can do the thousands of iterations.

I realized:

And, if you do that, long enough, then eventually the generator will

I realized:

be able to create an image identical to the image from the training set.

I realized:

So that's how GAN works you have a zero sum game, you have a competitive

I realized:

kind of two networks competing with each other, trying to outsmart

I realized:

each other and eventually, the generator gets better and better.

I realized:

So that's the idea behind GANs, it's a revolutionary idea.

I realized:

in 2014, 2015, Ian Goodfellow and his co authors proposed the model.

I realized:

a great thing about the model is it can generate different content: numbers.

I realized:

Images, shapes, even music, so on and so forth.

I realized:

I love this idea because on top of that, you've got this

I realized:

built-in target point, right?

I realized:

When your discriminator can no longer discriminate between

I realized:

what you're generating.

I realized:

when you're finished, it's not arbitrary.

I realized:

You've got that.

I realized:

And the other reason why I love that is that it's got this anecdote attached

I realized:

to it that, legend has it, it was written one evening, when Ian was

I realized:

celebrating in a pub I think someone was graduating, some fellow students.

I realized:

And, they were discussing a problem when they wanted to generate some pictures.

I realized:

And he came up with this idea that, 'oh, what you're suggesting is too

I realized:

complicated and you should, put two networks against each other'.

I realized:

And they laughed.

I realized:

he went home and, still slightly drunk.

I realized:

he wrote a proof of concept of that.

I realized:

And then turned out, that it actually worked out.

I realized:

I think in one of the interviews later, he said that if he wasn't drunk,

I realized:

he probably wouldn't have done it because it sounded like a silly idea.

I realized:

Okay.

I realized:

Yeah, that's right.

I realized:

Yeah.

I realized:

how random some of those things are.

I realized:

How, weird and unpredicted.

I realized:

And I think one of the things I wanted to ask you about is also

I realized:

what made all of those kind of recent breakthroughs possible?

I realized:

what was missing?

I realized:

Because we've had the neural network since what the 80s or something like that.

I realized:

all of a sudden, it looks like in the last few years, or maybe last decade or

I realized:

so, it was just like one breakthrough after another breakthrough just dropping.

I realized:

And if you try to keep up with currently written papers on AI,

I realized:

there's just so many of them.

I realized:

And it looks like every other day, there's something super interesting that's been

I realized:

developed and it's literally hard to keep up just with other people's ideas.

I realized:

What do you think enabled this kind of explosion in the recent years?

I realized:

actually, like a neural networks was proposed even earlier than 1980s.

I realized:

I think in 1960s, researchers proposed artificial neural networks, basically

I realized:

modeled after human brain, The idea was a great one, but at that point, we

I realized:

didn't have the, hardware to support it, And then started in 1990s, early 2000s.

I realized:

The hardware becomes much more powerful, number one.

Number two:

there was more research, more breakthroughs in the research

Number two:

field of, artificial neural networks.

Number two:

so one example is, LeCun's, uh, Convolutional Neural Networks.

Number two:

most neural networks are fully connected, dense neural networks,

Number two:

which means, a neuron in the previous layer is connected to all the neurons

Number two:

in the next layer, and it works great.

Number two:

Except that once your model becomes larger, the number of parameters,

Number two:

grow exponentially, and then it's very hard to train it, right?

Number two:

So that's a problem.

Number two:

convolutional neural networks is, you localize the weights, okay?

Number two:

you have a filter, and then the weights in the filter is a fixed When you move

Number two:

the filter on an image, and then this greatly reduced the number of parameters.

Number two:

it makes, computer vision much more efficient.

Number two:

because of that in, Early 2000s, there were a lot of breakthroughs in computer

Number two:

vision, in convolutional neural networks, and I think that's a huge breakthrough.

Number two:

And then

Number two:

after that, you also have, GPU training.

Number two:

GPU training became very popular in the past maybe 10 years or So.

Number two:

And there is, Huge game changer because as deep neural networks became larger

Number two:

and larger, It's very hard to train them, without, extra help, right?

Number two:

When you train on CPU.

Number two:

CPU is a general purpose kind of processor.

Number two:

you have to do many things on it.

Number two:

But, GPU is specialized.

Number two:

So you can do machine learning jobs much faster.

Number two:

and of course, we also have more and more.

Number two:

training data available, and that also is necessary for

Number two:

large language models to work.

Number two:

it takes time, but I think, the past 20 years or so, we suddenly have,

Number two:

everything come together to make it work,

Number two:

basically, we've got gamers to thank for their breakthroughs in AI

Number two:

because of the graphic cards, the GPUs that they requested, right?

Number two:

you have a very good point, I think GPU was originally designed

Number two:

for gaming purpose, right?

Number two:

And then suddenly right now, it has a completely different purpose, And

Number two:

I have several GPUs at home, not very powerful I think it's powerful enough

Number two:

for me to experiment on different models.

Number two:

It costs maybe several hundred dollars, thousand dollars.

Number two:

I have three of them.

Number two:

Two of them are from my son.

Number two:

My son was playing video games.

Number two:

And then now he doesn't use those computers anymore.

Number two:

And then he just gave it to me.

Number two:

And then I just simply take them out and use it for my own,

Number two:

But the cost is not that much.

Number two:

the cost is not that much unless you go for like the top of the line

Number two:

80 gig ones, which are very hard to come by and also quite expensive.

Number two:

Yeah, so thank you gamers.

Number two:

Thank you for enabling the AI revolution in many ways.

Number two:

it goes back to what I was saying about how random some

Number two:

of these things seem to be.

Number two:

so where do you think, we're heading?

Number two:

Like you said, the future is notoriously difficult to predict, obviously.

Number two:

But, if you were still going to venture and make a guess, that will

Number two:

probably prove completely wrong a few years down the line, where do you

Number two:

think we're heading with all of this?

Number two:

if I had to venture to guess The large language models will become even

Number two:

more powerful in the near future, not only in terms of generating, cohesive

Number two:

text, but also generating images, generating videos and also Multimodal

Number two:

models will become very popular.

Number two:

Okay, you can generate not only images, text, you can also generate

Number two:

audio, video, sound, and so forth.

Number two:

other than that, I think, it really depends on, which breakthrough will

Number two:

come through in the near future.

Number two:

And you never know if there's just one day suddenly is huge breakthrough, and

Number two:

then they'll completely, change the landscape of ai, just like what the

Number two:

ChatGPT did a couple years ago, right?

Number two:

the future is very exciting, but at the same time, like you

Number two:

said, it's very hard to predict.

Number two:

But, I think right now is a very fortunate time, a very exciting

Number two:

time for, tech enthusiasts.

Number two:

for anybody who is passionate about ai, about technology, is very exciting.

Number two:

So two follow up questions then.

Number two:

one it's, like anything else, there are these fashion waves

Number two:

that kind of, come and go.

Number two:

and AI is now the latest hottest thing.

Number two:

So all the VCs, everybody's throwing money at it.

Number two:

But at some point people will probably move on to the next thing, just like

Number two:

they did with crypto and smartphones and internet and whatever else before, right?

Number two:

So I'm wondering, where do you think we are in that, hype cycle, and what's

Number two:

going to happen when all of a sudden slapping AI-first on your startup, no

Number two:

longer make sure that you get funding.

Number two:

So that's question number one, follow up.

Number two:

and then the second question is, if you were to plot, a graph of how you

Number two:

expect, the large language models to continue developing, I think we

Number two:

can all agree that there are some kind of like very exponential growth

Number two:

where somebody figured out, ChatGPT or one of those massive models.

Number two:

If you throw enough data at it, and you massage it for long enough, you

Number two:

can create this impression of, 'oh, this is magic, how on earth is that

Number two:

even happening?' But then, at some point it has to plateau, right?

Number two:

it's not possible for it to go, at that kind of speed, into the sky.

Number two:

Feeling.

Number two:

Again, it's hard to predict the sense.

Number two:

course, all the usual disclaimers about predictions, but what's your take

Number two:

on what it means about us as humans?

Number two:

Does it mean that what we, cherish as one of the unique capabilities

Number two:

of humans, the human intelligence?

Number two:

it's not actually all that unique, because it's hard to not have this

Number two:

feeling when you talk to one of those big large language models

Number two:

and, during the time it doesn't go haywire and start behaving weird,

Number two:

but on the times where it works well.

Number two:

It's really hard to not have this impression that you're talking to somebody

Number two:

with, some amount of intelligence to it.

Number two:

So does it mean that we're all some kind of statistical models and the

Number two:

intelligence that we demonstrate is also an emerging property?

Number two:

What's your take on that?

Number two:

I don't think, many people in the world right now have a

Number two:

good answer to that question.

Number two:

that said, I do want to point out that there are many people

Number two:

right now have concerns about, AI.

Number two:

Because of the potential damage it can do, so it's all about the objective

Number two:

function, So if you give a task to the model and, in terms of the last function,

Number two:

and then you can just try it again and again, and eventually it will become

Number two:

very good at, whatever objective you want the model to do so that is good,

Number two:

but at the same time, it can be bad the AI may not even know it, right?

Number two:

It's just trying to accomplish a certain goal.

Number two:

It just happens that a human being is standing in the way of that goal.

Number two:

so in that sense, I do think that, Human beings need to be careful.

Number two:

I think like AI needs to be, regulated in to some degree.

Number two:

we cannot let it to, do whatever it wants.

Number two:

It may have serious.

Number two:

negative consequences to human beings,

Number two:

I think that a lot of what you just described has been the main kind of

Number two:

concern for everybody making sci-fi movies from the Terminator and Skynet

Number two:

And, I certainly get that, but I think I'm probably more worried about.

Number two:

going back to what we said about, you won't be losing your job to AI, you'll

Number two:

be losing your job to someone using, an AI, I think this probably applies

Number two:

here too, that you can just do, as an enabler, it scales up the amount of

Number two:

damage that, nefarious party can actually, produce, because using that to bad ends.

Number two:

a lot of the security that we rely on is practical, right?

Number two:

Like for example, all the encryption keys that we use for everything are, only

Number two:

because it would be computationally too expensive to actually figure that out.

Number two:

But then when you've got tools like this, it's easy to be scared about the

Number two:

possibility of that figuring out, and making things possible, that previously

Number two:

weren't, so I think I'm more worried about that scenario, where someone uses

Number two:

the AI to bad ends and it enables them to do more damage that they would be

Number two:

able to do with traditional methods,

Number two:

even in the current stage, if AI falls into the wrong hands,

Number two:

it can do a lot of damage.

Number two:

not that catastrophic, but it can do a lot of damage to a lot of families, right?

Number two:

I think, There were like stories about, people use the generative AI to create

Number two:

a fake phone call to their parents and, demand a ransom money so I think it

Number two:

causes, financial damage and also a lot of emotional distress, like fake news.

Number two:

Fake video, a lot of deep fake stuff, so even at this stage I

Number two:

think you can do a lot of harm if you fall into the wrong hands

Number two:

Yeah, that's a very good example of the call.

Number two:

Like you can technically go and call people and scam and, people do that,

Number two:

but there is a limit to how many people you can physically call in a day.

Number two:

If on the other hand, you have a powerful enough AI, you can scale it up and

Number two:

you can probably call everybody in the United States, a certain amount of times.

Number two:

That's

Number two:

you concerned about the AI involvement in the upcoming election.

Number two:

so we have to be careful, but I think so far the impact that it's limited.

Number two:

but at the same time, I think all the parties, politicians need to

Number two:

pay attention to generative AI.

Number two:

Because of what it can do, fake news and so forth.

Number two:

imagine you are running a political campaign, right?

Number two:

You must, get to know, analytics, how AI can influence your campaign either

Number two:

positively or negatively, if your team can utilize AI, uh, to, Strengthen

Number two:

your position legally, you're in a very good, position, it can help you,

Number two:

but on the other hand, if you're not careful, your opponents or somebody can

Number two:

use deepfake to disrupt your campaign for your cause that's why I think AI

Number two:

is so powerful and also so widespread.

Number two:

It affects every single industry in the economy, not just a few isolated sectors.

Number two:

that's very unique.

Number two:

About AI.

Number two:

Did you hear about the Elon Musk lawsuit against OpenAI from a few days ago?

Number two:

obviously OpenAI initially started as an alternative to the big

Number two:

companies, and the massive labs like Google, Facebook and so on.

Number two:

And their pitch and the initial mission statement was to

Number two:

release everything open source.

Number two:

Now, hence the name OpenAI.

Number two:

And then somewhere along the way, that turned and it's currently a for profit,

Number two:

closed source company, worth, what, under a hundred billion at the moment.

Number two:

we're recording this on March the 4th, a few days ago.

Number two:

Elon Musk, opened this lawsuit, where he alleges that, he was basically scammed

Number two:

because they turned the company around and they went against the initial mission.

Number two:

And, I think the opinions on the internet, vary from, 'okay, this is

Number two:

jealousy', because he's jealous of, of the success that open AI has seen.

Number two:

To, 'okay, this is a nice publicity stand.

Number two:

he probably has a point, but this is probably not going

Number two:

to start standing court'.

Number two:

and I'm trying to make sense of, how much of that is actually valid and

Number two:

how much I should be worried about OpenAI being, at the forefront of

Number two:

this, a big closed source company.

Number two:

I also heard that, many years ago when Elon Musk and the Sam Altman co-founded

Number two:

the OpenAI, their objective, was, a nonprofit organization, Given the

Number two:

competition from other big players in the industry, I think OpenAI was under

Number two:

pressure to commercialize ChatGPT and this may go against the original objective so

Number two:

I can see the argument from both sides.

Number two:

on the one hand, we have to be careful like we just discussed about the use of,

Number two:

AI that may lead to, the end of humanity as we know it, if we're not careful.

Number two:

But at the same time, if we use that properly, I.

Number two:

It can be a great tool, that's why there is such a great market for, generative AI,

Number two:

so I think there is some tension, within the company, so you have different views.

Number two:

that's why, I think, a few months ago, within several days, Altman

Number two:

was fired and then get hired back and so on and so forth.

Number two:

in the background, I think it's really just those two forces at play, so

Number two:

the force wants to make sure that, AI does not go out of control, harm human

Number two:

beings and at the same time, there is huge pressure from, industry peers to

Number two:

Commercialize those applications to make profits, Actually I'm glad that, Elon

Number two:

Musk actually made the lawsuit in the sense that it may, swing the pendulum

Number two:

to the other side so eventually what I think, uh, the view that we should

Number two:

commercialize and make money out of it, I think that kind of view prevailed, right?

Number two:

that's why Sam Altman got hired back, but that can go too far, because, in the

Number two:

process of competition, making profits, you may sacrifice security, so I think,

Number two:

the lawsuit by Elon Musk can potentially put the original mission in check.

Number two:

So to speak, and maybe, force OpenAI and other tech companies to think

Number two:

more about, guardrails around, AI to make sure It doesn't go out

Number two:

of control and harm human beings,

Number two:

time will tell if anything comes out of it other than, one billionaire

Number two:

being upset at another, but we'll see.

Number two:

So I'm going to ask you for one more prediction, and this time

Number two:

a little bit more down-to-earth.

Number two:

Pytorch.

Number two:

It appears to be still on the rise and, it appears to be the kind of

Number two:

go-to option for any new papers.

Number two:

TensorFlow seems to be, stagnating a little bit.

Number two:

you talked a little bit about the advantages of PyTorch and

Number two:

why you chose it for your book.

Number two:

and, I'm wondering, do you see this being like the prevailing platform?

Number two:

because now I think that the main kind of breakthroughs for Pytorch was, you

Number two:

mentioned the GPU support, obviously, and also the built in, backpropagation,

Number two:

right, the autograd now, the other frameworks also provide the autograd.

Number two:

so I guess they're closing up the gap a little bit in that respect, if

Number two:

you were to venture one more crazy prediction, would you see Pytorch

Number two:

leading the way going forward?

Number two:

Are you going to update your book in a couple of years to

Number two:

port it to some other framework?

Number two:

I think PyTorch is going to prevail in the near future.

Number two:

So I mentioned this in my book.

Number two:

So what PyTorch does is, using a dynamic computational graph, which means it

Number two:

creates, Computational graph on the fly so that, it's faster, it's more flexible.

Number two:

TensorFlow is using static computational graph.

Number two:

so it's slower.

Number two:

so that's the main difference.

Number two:

And, it affects the training speed greatly.

Number two:

so in TensorFlow, you don't really have to worry about which device you can use.

Number two:

it's all done at the backend automatically by TensorFlow.

Number two:

but at a cost, If you have, an industry scale Models, and then you have a lot

Number two:

of GPU and you do a huge calculation Maybe the overhead is neglectable.

Number two:

doesn't affect things much but for a lot of researchers it makes a huge difference

Number two:

because we already working with a lot of toy models not huge, therefore If you

Number two:

use the PyTorch, there is a little bit of inconvenience in the sense that you

Number two:

have to, specify whether to move this tensor to GPU, and then once you are

Number two:

done with it, you have to, get it back.

Number two:

But the benefit is huge because it, greatly.

Number two:

Increases the training speed.

Number two:

I think like at least for, small players, regular readers, and also

Number two:

for researchers around the world.

Number two:

I think a PyTorch is much more convenient.

Number two:

It's much faster.

Number two:

And certain large corporations, they may not care that much.

Number two:

for regular people PyTorch is much more convenient, it's much faster

Number two:

and in the near term it may, win out.

Number two:

for anybody listening to this, I know that if I haven't, read your book

Number two:

before, I would probably be on manning.

Number two:

com, looking at it.

Number two:

And then at some point I would reach chapter 4, where you're walking

Number two:

us through building a network that does, generation of anime faces.

Number two:

Which I thought was a pretty cool example.

Number two:

Can you give us a taste, for, anybody who's going to be doing that?

Number two:

what's the training gonna look like?

Number two:

what data we're going to use, how we're going to implement a network.

Number two:

And then in terms of training, what kind of hardware you need for the

Number two:

training to be, quick, how much time you need to, see for that.

Number two:

give us an idea whether this is something that, someone who is comfortable with

Number two:

Python can just pick up on a Sunday, on a random weekend and go through, or whether

Number two:

there's any extra prep that's needed.

Number two:

in order to train a GAN model to produce the color images or

Number two:

for anime faces obviously you need the training data, right?

Number two:

the research community has a lot of human-created data

Number two:

for us to experiment on.

Number two:

So you can actually go to a website, download the anime faces.

Number two:

I think tens of thousands of them, and then you need to

Number two:

create two neural networks.

Number two:

One is the generator, one is the discriminator, and the generator is

Number two:

trying to create an image that can pass as real in front of the discriminator.

Number two:

you just train the model, many rounds and then eventually you will

Number two:

see that the generator is able to generate a anime face, which is very

Number two:

much the one from the training set.

Number two:

I want to mention that in order to, generate, color images of

Number two:

human faces, you don't need to use, convolutional neural networks

Number two:

because, we mentioned this earlier.

Number two:

So if you use a fully connected, dense neural networks.

Number two:

There are just too many, parameters and then the training will be too slow.

Number two:

So on the other hand, if you use the convolutional neural

Number two:

networks, you localize the weights.

Number two:

So the weights will stay the same in a filter and then you

Number two:

move the filter around the image.

Number two:

So there's a way of greatly reduce the number of parameters in the model and

Number two:

make the model training much faster.

Number two:

this is on the software side, on the training side.

Number two:

In terms of hardware, so I trained it using, GeForce RTX 2060, like a GPU.

Number two:

I think right now the cost is three or four hundred bucks.

Number two:

It's not that expensive You can easily buy it or if you have a older

Number two:

gaming computer, you can just grab it and then put on your computer.

Number two:

It's very easy to do, you don't really need a lot of knowledge

Number two:

about computer hardware to do it.

Number two:

Nowadays, computers are very user friendly you can Just pop it open and, change

Number two:

ports, very fast, that kind of stuff.

Number two:

So it took me like, 30 minutes to an hour to train the model.

Number two:

So it's very fast.

Number two:

However, if you don't really want to bother with the GPU, you can train

Number two:

the same model with the CPU and, what you can do is, you can simply

Number two:

leave your computer on all night it may take, five, six or seven hours,

Number two:

but, it can be easily done overnight.

Number two:

You just leave the program on, go to sleep, next morning, you see the result.

Number two:

so in that sense, computationally, it's not that costly.

Number two:

I think the most complicated model, would it be, chapter six, you have to convert,

Number two:

like a horse image into a zebra image.

Number two:

It's called, cycleGAN and then you have to convert like a blonde hair to black hair

Number two:

in images or black hair to blonde hair,

Number two:

Those kind of models are a little bit more.

Number two:

Time consuming, because you are using higher resolution, number one.

Number two:

Number two, you are actually training two generators and two discriminators.

Number two:

Okay, so what, how CycleGAN works is that, you have two generators, let's use a horse

Number two:

and a zebra as the example, how to convert a horse image to a zebra image, right?

Number two:

So you have two generators.

Number two:

One generator is called a horse generator, the other one is called a zebra generator.

Number two:

So what horse generator does is that it takes in a zebra image

Number two:

and convert it into a horse image.

Number two:

And then what is a zebra generator does is that it will Take a horse

Number two:

image and convert it into a zebra.

Number two:

And then you also have two discriminators.

Number two:

the horse discriminator will tell whether an image is a horse image or not, and

Number two:

then the zebra discriminator will tell if an image, if is a zebra image or not.

Number two:

and then, cycleGAN has another element a loss function has a

Number two:

component called a cycle loss.

Number two:

So what do you do?

Number two:

So I think the idea is really Ingenious.

Number two:

that's why I mentioned that with the right loss function you can't show anything.

Number two:

originally you have a horse image, right?

Number two:

And then you give that image to a zebra generator to create a fake Zebra image.

Number two:

Okay.

Number two:

Now, you will use that fake zebra image as input to the horse generator, and ask

Number two:

the horse generator to convert the fake zebra image into a fake horse image.

Number two:

now here is the key if both generators do their job right, then

Number two:

the fake horse image you got will be Identical to the original horse

Number two:

image You so that's called a cycleGAN.

Number two:

cycle loss is trying to minimize the loss between

Number two:

the original horse image and the fake horse image after a round trip.

Number two:

That's a very powerful tool because that forces the model, both models,

Number two:

both the zebra generator and the horse generator to generate realistic Images.

Number two:

so since your show is called HockeyStick I think that's like when

Number two:

I was like trying to experiment the different models I think that is

Number two:

pretty much like a hockeystick moment.

Number two:

When I saw that, I was like, this is like a psycho loss is really

Number two:

ingenious because that component in the loss function is crucial for you to

Number two:

successfully convert a horse image into a zebra and a zebra image into a horse.

Number two:

When I saw that I was completely amazed not just by how well the model

Number two:

works, but also by, the, ingenious mechanism, devised by the researchers.

Number two:

again, there are tons of smart people in the profession.

Number two:

So sometimes I see what they are doing, and once I understand what they

Number two:

are doing, I was completely amazed.

Number two:

I said, this method is amazing, the author must be a genius, I think there

Number two:

are tons of geniuses in our profession.

Number two:

Love that story.

Number two:

And also FYI, I'm totally stealing the quote from you

Number two:

with the right loss function.

Number two:

You can achieve anything.

Number two:

I think this should go on a t shirt.

Number two:

That's right, yeah.

Number two:

with the right loss function, you can achieve anything.

Number two:

That's my belief, the concept of the loss function is very powerful.

Number two:

so loss function is another way of saying the objective function, right?

Number two:

you are telling the model what to achieve, what to do, it's very powerful.

Number two:

Yeah, I think what keeps striking me is that once you go and look into this ideas.

Number two:

They're not actually that complicated, there's not too much magic in it, but

Number two:

to come up with that idea initially, be the first one to propose that it

Number two:

does require certain a level of genius.

Number two:

So I think, probably decades from now, kids will be learning a lot

Number two:

of that stuff in primary school or early in their education.

Number two:

And it just feels like we're really experiencing some kind of breakthrough

Number two:

in this profession, a hockeystick moment.

Number two:

Absolutely.

Number two:

it's good that a lot of smart researchers are working in the field.

Number two:

And sometimes when you get stuck on a question, you may

Number two:

work on it for years, right?

Number two:

Without any breakthrough, and then suddenly, last year, like a strong line,

Number two:

year after year, suddenly, there is a aha moment, and then you figure out the

Number two:

way to tackle the problem and it worked.

Number two:

And then it's a method may become revolutionary, it may

Number two:

completely change the field

Number two:

You're about to finish, your book.

Number two:

Is there anything that you would do differently if you

Number two:

were starting to write it today?

Number two:

Would you make any different choices?

Number two:

Good question.

Number two:

I don't think there are many things I would change.

Number two:

The reason is because even though it's a new book, actually I have been

Number two:

working on it for a couple years now, so I have, a GitHub, repository, before

Number two:

I, submit a proposal to manning so it's my way of working things out.

Number two:

couple years ago I started to, use PyTorch for machine learning

Number two:

models and I started to get into.

Number two:

generative AI, and then I started to, use PyTorch to generate shapes, images,

Number two:

and then eventually I get into natural language processing, large language

Number two:

models, and then I had a lot of projects.

Number two:

on my computer writing book, it's my way of, organize things to, think things

Number two:

through to make sure everything works out.

Number two:

but I know that, in order to write a compelling proposal.

Number two:

I need to, first prepare well, right?

Number two:

especially there are not too many good publishers out there, so you only

Number two:

have one shot with a good publisher.

Number two:

like manning is one of the great publishers.

Number two:

over the years I've read many books from manning and, I really enjoyed their books

Number two:

and, I knew that I needed to write a good proposal in order to work it out.

Number two:

I don't want to lose a chance.

Number two:

So what I did was, in the summer, I spent several months to

Number two:

create a huge github repository.

Number two:

So I lay out all the chapters initially, like the first draft, and

Number two:

it had 17 chapters and, each chapter I use a Jupyter notebook to explain

Number two:

everything to the best of my ability.

Number two:

All the codes are there.

Number two:

So it's, pretty much like a book.

Number two:

Once I have that, then I spend another month to convert it

Number two:

into an actual book, a PDF file.

Number two:

a lot of tech people use latex.

Number two:

Latex is A word processing software, right?

Number two:

especially if you have a lot of math, you can actually generate like a beautiful

Number two:

equation, my book has some like a equation, some math, but not a whole lot.

Number two:

But, it forces me to go through everything one more time, in

Number two:

the process of converting, the GitHub repository into a PDF file.

Number two:

I spent a lot of months converting everything.

Number two:

And also it looks beautiful because, uh, it exactly like a book.

Number two:

you have a template, you have a cover, you have, table of content, you have each

Number two:

chapter, what is section number, what is section title, what is subsection so

Number two:

forth, you have images, in short, it's pretty much like a book to be published.

Number two:

and then I sent that, to manning, in the summer, along with the PDF file,

Number two:

along with the, proposal file, and then I have a link to the GitHub page.

Number two:

And then what manning did was send the book proposal to more than

Number two:

10 reviewers in the profession.

Number two:

The reviewers are all data scientists, people who know, AI in the profession,

Number two:

and they give comments on whether, this book should be published And then they

Number two:

give a lot of, very valuable feedback.

Number two:

the feedback was very positive, partly because it's a hot topic, partly because

Number two:

I spent a lot of time preparing it, right?

Number two:

but I did receive a lot of good feedback.

Number two:

to answer your question is because I have been through the several rounds.

Number two:

now, there's not much I would change, because I have already incorporated,

Number two:

some feedbacks, great feedbacks from about the 12, reviewers on the proposal.

Number two:

Fair enough.

Number two:

How many copies have you sold so far?

Number two:

it's already sold more than a thousand copies.

Number two:

I think like it's a daily high was 58.

Number two:

So it says a lot about the demand for, generative AI and if you look at the,

Number two:

the top 10, from manning website every week, you will see generative AI is hot.

Number two:

a lot of demand.

Number two:

And another trend is, Python PyTorch.

Number two:

I think that's, a lot of people are switching to PyTorch and, I

Number two:

think there is a book from Manning called, "Deep Learning with PyTorch".

Number two:

It's selling very well.

Number two:

And then there's another book called, "Large Language Models from Scratch".

Number two:

actually the book is also using PyTorch just as I do.

Number two:

But it's that just that focuses on large language models, but in

Number two:

my book focus on many different contents like large language models.

Number two:

music, images, shapes, numbers And then another thing I want to

Number two:

mention is that, I did spend a lot of time thinking about, how to help

Number two:

readers learn progressively, step by step.

Number two:

chapter one, of course, is an overview of the book of the,

Number two:

generative AI landscape and, what is the book is trying to accomplish.

Number two:

Chapter two, it's a deep learning with PyTorch.

Number two:

So even if readers.

Number two:

Have no background using PyTorch.

Number two:

after reading chapter two, they will be able to use, pyTorch to

Number two:

create, deep learning models.

Number two:

from A to Z you have you can do the whole thing.

Number two:

Okay?

Number two:

So that's very important.

Number two:

And then chapter three, we get into GANs.

Number two:

So you will use, GANs to generate, numbers and the shapes.

Number two:

So the models are very simple.

Number two:

you only have a two or three layers, of neurons in those models.

Number two:

So therefore, it's very easy to understand.

Number two:

It's easy to create, and the training takes a matter of minutes.

Number two:

readers will not get, frustrated because everything is so simple.

Number two:

And then in chapter four,

Number two:

I kick things up a notch

Number two:

so instead of using fully connected dense layers, I use convolutional layers

Number two:

that's needed for image processing.

Number two:

If you want to create a high resolution color images, fully

Number two:

connected dense layers won't work It may work, but it's very slow.

Number two:

On the other hand, if you use convolutional layers, it's much faster

Number two:

because you use filters, to move around the image, and then you just train

Number two:

the weights in the filter itself.

Number two:

So that's much more efficient and that kind of stuff and then so

Number two:

people learn to use the convolutional layers in chapter four to generate

Number two:

the color image and then in chapter five I kick things up another level.

Number two:

So readers learn to select characteristics in images, you can

Number two:

choose to generate An image with eyeglasses or without eyeglasses You

Number two:

can transition from an image with glass to an image without glasses.

Number two:

So all those arithmetic kind of stuff and then chapter six is not out

Number two:

yet, but I will do the cycleGAN is computationally costly, because the

Number two:

reason I just mentioned it because they have two generators, two discriminators.

Number two:

and then chapter seven is about, variational auto encoders.

Number two:

that's a different model from GAN.

Number two:

that is important, because it has a encoder-decoder architecture.

Number two:

We see it's very common.

Number two:

In machine learning models, for example, ChatGPT is like a decoder-only model, the

Number two:

original transformer paper attention is all you need has like an encoder part,

Number two:

and a decoder part that kind of stuff.

Number two:

And then after that, I get into transformers, natural language processing,

Number two:

how to do tokenization, how to create a transformer from scratch, including

Number two:

like a ChatGPT-style, you can create a GPT from scratch, you can train it.

Number two:

I saw that you have, several posts on LinkedIn about how to

Number two:

create a GPT from scratch, right?

Number two:

my book does exactly that in, chapter 10, how to create a GPT from scratch.

Number two:

And then chapter 11 is how to create a small GPT from scratch and then train it.

Number two:

To generate text.

Number two:

its focus is not mainly on creating, but on training a GPT from scratch.

Number two:

Of course, it's much smaller.

Number two:

It only has 5 million parameters.

Number two:

But you learn how to train a model from scratch.

Number two:

and after that it's music generation and then different models and then how you

Number two:

can use the langChain to chain together different, large language models.

Number two:

So that's the whole book,

Number two:

it's been a real pleasure to talk to you.

Number two:

I'm personally super excited, can't wait until the rest of

Number two:

the chapters become available.

Number two:

So, you know, hurry up before I let you go.

Number two:

I'm curious whether you have your next idea for your next book already in

Number two:

mind or whether you're going to take a small break before book number four.

Number two:

So far I'm very busy with, writing the current book.

Number two:

I do get ideas from time to time.

Number two:

One example is, I think this is a text to Image, like a

Number two:

multimodal model thing, is amazing.

Number two:

I think, there could be another book, there, just focused purely

Number two:

on diffusion models and, multimodal transformers, how to convert a text

Number two:

to image, or convert, text to video,

Number two:

There could be a book there.

Number two:

I thought about it, but, I didn't spend a lot of time on it because I'm busy

Number two:

writing the current book and the other, idea I thought about is, so this is

Number two:

also related to multi modal models.

Number two:

my first book is called a make a Python talk, right?

Number two:

But it's actually using Google API to do the actual speech

Number two:

recognition, text to speech.

Number two:

I don't do any machine learning part.

Number two:

So I just use the Google API to do all the heavy lifting But, there are

Number two:

like open source models out there.

Number two:

You can actually train a model.

Number two:

To, do speech recognition, so that's actually a multi modal model, right?

Number two:

Because, speech recognition, basically the input is, audio, output is text, right?

Number two:

And then you can also do text to speech.

Number two:

that can be another interesting project.

Number two:

I have some ideas on how they work, but I do have to spend

Number two:

a lot of time to experiment.

Number two:

so I would say in another two or three years, I may venture into

Number two:

one of those ideas and maybe write another book about it.

Number two:

Awesome.

Number two:

you're going to have one reader already interested in that.

Number two:

So definitely go for it.

Number two:

Okay, let me ask you then, which idea do you like better, the speech recognition

Number two:

model or, just a book about, text to image, multimodal, transformer,

Number two:

which idea do you like better?

Number two:

I've been meaning to properly read the whisper, paper.

Number two:

So I think the speech, recognition is actually a pretty good use

Number two:

case, and I would definitely be interested in reading that.

Number two:

Good to know.

Number two:

I may put more emphasis on that project.

Number two:

Awesome.

Number two:

the feedback.

Number two:

All right.

Number two:

thank you so much.

Number two:

It's been a pleasure, and hopefully I'll get you next time with your next book.

Number two:

Thanks a lot.

Number two:

Thank you.

Next Episode All Episodes Previous Episode
Show artwork for HockeyStick Show

About the Podcast

HockeyStick Show
Breakthroughs in tech, business and performance
Explore the moments leading to exponential growth in technology, business, science and more.

About your host

Profile picture for Miko Pawlikowski

Miko Pawlikowski