Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Zico Colter

👤 Person
483 total appearances

Appearances Over Time

Podcast Appearances

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

There's only so much really high quality, good text that's available out there. On the flip side, and this is the point I often make, first of all, we're only talking about text there. We're only talking about publicly available text.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

There's only so much really high quality, good text that's available out there. On the flip side, and this is the point I often make, first of all, we're only talking about text there. We're only talking about publicly available text.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

If you start talking about internally available text, stuff like this, from a very straightforward standpoint, we have not gotten close to using all the data that's available. Public models that train on the order of 30 terabytes of data or something like this, right? So 30 terabytes of text data. This sounds like a lot, But this is a tiny, tiny amount of data.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

If you start talking about internally available text, stuff like this, from a very straightforward standpoint, we have not gotten close to using all the data that's available. Public models that train on the order of 30 terabytes of data or something like this, right? So 30 terabytes of text data. This sounds like a lot, But this is a tiny, tiny amount of data.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

If you start talking about internally available text, stuff like this, from a very straightforward standpoint, we have not gotten close to using all the data that's available. Public models that train on the order of 30 terabytes of data or something like this, right? So 30 terabytes of text data. This sounds like a lot, But this is a tiny, tiny amount of data.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

And there is so much more data that's available that we are not using right now to build these models. And of course, I'm thinking about things like multimodal data, stuff like this video data, audio data, all these things, we have massive amounts available. I mean, just just a few tens of terabytes is not the amount of data these large companies that index the internet are storing.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

And there is so much more data that's available that we are not using right now to build these models. And of course, I'm thinking about things like multimodal data, stuff like this video data, audio data, all these things, we have massive amounts available. I mean, just just a few tens of terabytes is not the amount of data these large companies that index the internet are storing.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

And there is so much more data that's available that we are not using right now to build these models. And of course, I'm thinking about things like multimodal data, stuff like this video data, audio data, all these things, we have massive amounts available. I mean, just just a few tens of terabytes is not the amount of data these large companies that index the internet are storing.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

There is so much more data than this, and we have not really come close to tapping that whole reserve. Now, whether or not we can use that data well, right, because text data in some sense is the most distilled form of a lot of this, and a lot of this is not textual data, that remains to be seen. But we are nowhere close to hitting the limits of available data in these models, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

There is so much more data than this, and we have not really come close to tapping that whole reserve. Now, whether or not we can use that data well, right, because text data in some sense is the most distilled form of a lot of this, and a lot of this is not textual data, that remains to be seen. But we are nowhere close to hitting the limits of available data in these models, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

There is so much more data than this, and we have not really come close to tapping that whole reserve. Now, whether or not we can use that data well, right, because text data in some sense is the most distilled form of a lot of this, and a lot of this is not textual data, that remains to be seen. But we are nowhere close to hitting the limits of available data in these models, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

Arguably, we're unable to process it, because we don't have enough compute and things like this. But we're nowhere close to data limits in other senses. MARK MANDELBACHER- What are the challenges of using these new forms of multimodal data well? PAUL BAKAUSKI- I think the biggest challenge is simply compute.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

Arguably, we're unable to process it, because we don't have enough compute and things like this. But we're nowhere close to data limits in other senses. MARK MANDELBACHER- What are the challenges of using these new forms of multimodal data well? PAUL BAKAUSKI- I think the biggest challenge is simply compute.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

Arguably, we're unable to process it, because we don't have enough compute and things like this. But we're nowhere close to data limits in other senses. MARK MANDELBACHER- What are the challenges of using these new forms of multimodal data well? PAUL BAKAUSKI- I think the biggest challenge is simply compute.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

If you have something like video data, just think about the size of a video file versus a text file. So if we transcribed this podcast, it would be a few kilobytes. If you take the dump of video from it, it'll be on the order of, I don't even know, I do. It would be about six and a half gigabytes. Gigabytes, exactly, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

If you have something like video data, just think about the size of a video file versus a text file. So if we transcribed this podcast, it would be a few kilobytes. If you take the dump of video from it, it'll be on the order of, I don't even know, I do. It would be about six and a half gigabytes. Gigabytes, exactly, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

If you have something like video data, just think about the size of a video file versus a text file. So if we transcribed this podcast, it would be a few kilobytes. If you take the dump of video from it, it'll be on the order of, I don't even know, I do. It would be about six and a half gigabytes. Gigabytes, exactly, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

So tens of thousands of magnitudes of difference, orders of magnitude of difference, right? Now, arguably, depending on people's opinion, maybe the entirety of the actual valuable information is not in the audio of my voice and the video. You could argue that there's not as much usable content there. When we think about what kind of data humans use, I would argue that visual data

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

So tens of thousands of magnitudes of difference, orders of magnitude of difference, right? Now, arguably, depending on people's opinion, maybe the entirety of the actual valuable information is not in the audio of my voice and the video. You could argue that there's not as much usable content there. When we think about what kind of data humans use, I would argue that visual data

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World

So tens of thousands of magnitudes of difference, orders of magnitude of difference, right? Now, arguably, depending on people's opinion, maybe the entirety of the actual valuable information is not in the audio of my voice and the video. You could argue that there's not as much usable content there. When we think about what kind of data humans use, I would argue that visual data