I thought I would write a small thing this week talking about what I’ve been building with Suyash in the last ~2 weeks or so.
What is this?
Transcribe (https://transcribe.param.codes) uses OpenAI’s Whisper to create transcriptions of YouTube videos and then uses GPT-3 to improve those transcriptions (remove filler words, transcription errors etc.)
Why is it useful?
I’ve always preferred reading to watching videos, especially for educational content. However, as the internet has grown, there is an increasing amount of valuable information that’s only available in video form. It’s hard for me to consume this information because I don’t have the attention span to watch a 1.5 hour video. So, I thought I would create a tool to encode this information in text easily.
The sad thing that this encoding is, in essence, lossy. People use videos because they get a visual component that is just not reproducible in text. However, a transcription is still useful in many cases, even if you prefer videos to text.
The most obvious case is recall. It’s hard to find a random part of a video where the person talked about X topic, weeks after you originally watched the video. It’s much easier to do Ctrl+F “X” and then read from there. Another case is when you want an LLM like GPT-3 to process the information for you. Currently, this isn’t possible for videos. But it is possible for a transcription.
See it in action!
Link: https://transcribe.param.codes/
Example transcriptions:
Here’s a quick 30 second demo:
How it works
It’s very easy to build AI apps these days. I used Replicate to run whisper on audio files. It’s very nice (at least in the beginning) to not have to worry about managing GPUs when trying to use these models. The code for this is very simple, it’s something like:
import replicate
model = replicate.models.get("openai/whisper")
version = model.versions.get("VERSION_NUMBER_HERE")
transcription = version.predict(audio=open("/path/to/file", "wb"))
After that, I send the transcription to GPT-3 which improves the transcription basically like
does in this tweet:That’s the app for now. It’s very simple, uses Flask, Next.js and SQLite (!)
Last minute notes
If you try it out, I would recommend signing in before transcribing something. That way, you can keep track of the things you’ve transcribed. The transcription process does take a reasonable amount of time and if you haven’t signed in, you have to keep track of your video’s transcription link yourself.
What’s next?
Our goal for this week is to group information from different videos together and then see if we can do some LLM magic with this data. We’re still experimenting here, trying to find what’s useful for us and for other people, so if you have ideas about things you’d want a tool like this to do (or even if you just want to chat), please reach out to me, either by replying to this post or via Twitter.
I wonder if you can put the time stamp beside the Summary/Transcription