How To Download YouTube Playlist Using Python

How To Download YouTube Playlist Using Python

Step-by-Step Guide to Automate YouTube Playlist Downloading with Python, Pytube and FFmpeg

ยท

9 min read

YouTube is the world's largest video-sharing platform with millions of videos uploaded daily. You often come across a playlist of videos on YouTube that you want to watch offline or save for future reference. However, downloading a single video from YouTube is easy, but when it comes to downloading an entire playlist, can be a daunting task. In such cases, automating the process using Python can save time and effort.

In this tutorial, you will be using the Pytube library in Python to download entire YouTube playlists in various resolutions, including the high-quality 2160p resolution. Pytube is a lightweight library that allows easy access to YouTube videos and metadata. With Pytube, we can easily extract video URLs, download videos, and even extract audio from videos. So, let's dive into the tutorial and learn how to download a YouTube playlist using Python.

If you're in a hurry, jump to the GitHub gist to get the code immediately.

Prerequisites

To follow this tutorial, you should have a basic understanding of Python programming language, including installing Python packages using pip.

You will also need to install the Pytube library, which can be installed using pip. Open a command prompt or terminal window and type the following command:

pip install pytube tqdm tenacity

Apart from Python and Pytube, you will also need to have FFmpeg installed on your system. FFmpeg is a command-line tool used for handling multimedia files. It is required for Pytube to be able to download videos in various resolutions. You can download FFmpeg from the official website and follow the installation instructions for your operating system.

We've also installed tqdm for progress bar, and tenacity to implement retry mechanism.

Note: PyTube sometimes doesn't work. In that case, you can install the pytubefix library.

Let's Code It!

Roll up your sleeves, fire up your favorite code editor, and let's get started!

Import the Libraries

Let's import the necessary libraries for the program to run correctly.

import os
import re
from pytube import Playlist, YouTube
# Comment above line and uncomment below line if you're using pytubefix
from pytubefix import Playlist, YouTube

The code imports the os module to handle operating system functionalities, re module to handle regular expressions, and pytube module to download the YouTube playlist.

Util Function to Sanitize Filenames

Often we find that the playlist or the video title contains special characters, which few operating systems don't support. For example, Windows will not support a video with the title "Who are you?" because it contains a question mark(?). We'll write a function to sanitize such names:

def sanitize_filename(filename):
    return re.sub(r'[<>:"/\\|?*]', '-', filename)

The sanitize_filename function replaces invalid characters in filenames with a hyphen (-).

Create the Function

Let's define a function called download_playlist. The function takes two parameters; the playlist URL and the desired video resolution.

def download_playlist(playlist_url, resolution):
    playlist = Playlist(playlist_url)
    playlist_name = sanitize_filename(re.sub(r'\W+', '-', playlist.title))

    if not os.path.exists(playlist_name):
        os.mkdir(playlist_name)

It first creates a new Playlist object using the Pytube library and extracts the name of the playlist. It then checks if a folder with the same name exists in the current working directory. If not, it creates a new folder with the playlist name.

Note that the re.sub() function replaces any non-alphanumeric characters in the playlist title with a hyphen ("-") character. We do so because folder and file names in most file systems have restrictions on the characters they can contain.

Downloading the Videos

The Playlist object provides a videos property which is iterable of YouTube objects. You first iterate through the list of videos in the playlist using a for loop and enumerate function.:

for index, video in enumerate(tqdm(playlist.videos, desc="Downloading playlist", unit="video"), start=1):
    ...

The start parameter set to 1 to start counting from 1 instead of 0 because you're going to use the index in the filenames. Thus it makes sense to start from 1.

The enumerate function is used in conjunction with tqdm to keep track of the current index in the iteration, making it easier to monitor the progress and status of the download process for each video in the playlist. The tqdm function is wrapped around playlist.videos to create a progress bar that visualizes the progress of the loop. The desc parameter is used to set a description for the progress bar, here specified as "Downloading playlist", which helps in indicating the ongoing process to the user. Additionally, the unit parameter is set to "video", which labels each step in the progress bar, thereby clarifying that each increment corresponds to the processing of one video.

Next, create a YouTube object for each video using its watch URL:

for index, video in enumerate(tqdm(playlist.videos, desc="Downloading playlist", unit="video"), start=1):
        yt = YouTube(video.watch_url, on_progress_callback=progress_function)

The second argument, on_progress_callback=progress_function, sets up a callback function that will be called periodically to provide updates on the download progress. We will define it later.

Filter the available video streams to select the one with the desired resolution, and get its filename:

for index, video in enumerate(tqdm(playlist.videos, desc="Downloading playlist", unit="video"), start=1):
    yt = YouTube(video.watch_url, on_progress_callback=progress_function)
    video_streams = yt.streams.filter(res=resolution)

    video_filename = sanitize_filename(f"{index}. {yt.title}.mp4")
    video_path = os.path.join(playlist_name, video_filename)

The above code snippet filters available video streams to match a specified resolution and prepares the filename and path for downloading a YouTube video.

Check if the video already exists. If it does, skip downloading this video and move on to the next one:

for index, video in enumerate(tqdm(playlist.videos, desc="Downloading playlist", unit="video"), start=1):
    yt = YouTube(video.watch_url, on_progress_callback=progress_function)
    video_streams = yt.streams.filter(res=resolution)

    video_filename = sanitize_filename(f"{index}. {yt.title}.mp4")
    video_path = os.path.join(playlist_name, video_filename)

    if os.path.exists(video_path):
        print(f"{video_filename} already exists")
        continue

If the desired resolution is not available, download the video with the highest resolution instead:

for index, video in enumerate(tqdm(playlist.videos, desc="Downloading playlist", unit="video"), start=1):
    ...

    if os.path.exists(video_path):
        print(f"{video_filename} already exists")
        continue

    if not video_streams:
        highest_resolution_stream = yt.streams.get_highest_resolution()
        video_name = sanitize_filename(highest_resolution_stream.default_filename)
        print(f"Downloading {video_name} in {highest_resolution_stream.resolution}")
        download_with_retries(highest_resolution_stream, video_path)
    else:
        video_stream = video_streams.first()
        video_name = sanitize_filename(video_stream.default_filename)
        print(f"Downloading video for {video_name} in {resolution}")
        download_with_retries(video_stream, "video.mp4")

If no video streams with the desired resolution are found, the code selects the highest-resolution stream available. It then creates a sanitized filename for this stream and proceeds to download it using the download_with_retries function.

If streams with the desired resolution are available, the code selects the first stream in the filtered list. It creates a sanitized filename for this stream and downloads it as "video.mp4". This approach ensures that the best available quality is used or the specified resolution is downloaded if available.

If the desired resolution is available, you don't have to download it directly. If you do so, the downloaded video won't have sound. Instead, download both the video and audio streams separately, and merge them using the FFmpeg library to create the final video file. Finally, rename the merged file and delete the temporary video and audio files:

for index, video in enumerate(tqdm(playlist.videos, desc="Downloading playlist", unit="video"), start=1):
    ...

    if not video_streams:
        ...
    else:
        video_stream = video_streams.first()
        video_name = sanitize_filename(video_stream.default_filename)
        print(f"Downloading video for {video_name} in {resolution}")
        download_with_retries(video_stream, "video.mp4")

        audio_stream = yt.streams.get_audio_only()
        download_with_retries(audio_stream, "audio.mp4")

        os.system("ffmpeg -y -i video.mp4 -i audio.mp4 -c:v copy -c:a aac final.mp4 -loglevel quiet -stats")
        os.rename("final.mp4", video_path)
        os.remove("video.mp4")
        os.remove("audio.mp4")

        print("----------------------------------")

The video stream is initially downloaded as video.mp4 and the audio stream is downloaded as audio.mp4. Next, the ffmpeg command takes two input files (video.mp4 and audio.mp4), copies the video codec from the input file and uses the AAC codec for audio, and saves the output as final.mp4. The output file is created by merging the video and audio streams from the two input files.

After ffmpeg finishes processing, the final.mp4 is renamed video_path and the video and audio stream files are deleted.

The download_playlist Function

At this point, you have completed the download_playlist function. It should look like this:

def download_playlist(playlist_url, resolution):
    playlist = Playlist(playlist_url)
    playlist_name = sanitize_filename(re.sub(r'\W+', '-', playlist.title))

    if not os.path.exists(playlist_name):
        os.mkdir(playlist_name)

    for index, video in enumerate(tqdm(playlist.videos, desc="Downloading playlist", unit="video"), start=1):
        yt = YouTube(video.watch_url, on_progress_callback=progress_function)
        video_streams = yt.streams.filter(res=resolution)

        video_filename = sanitize_filename(f"{index}. {yt.title}.mp4")
        video_path = os.path.join(playlist_name, video_filename)

        if os.path.exists(video_path):
            print(f"{video_filename} already exists")
            continue

        if not video_streams:
            highest_resolution_stream = yt.streams.get_highest_resolution()
            video_name = sanitize_filename(highest_resolution_stream.default_filename)
            print(f"Downloading {video_name} in {highest_resolution_stream.resolution}")
            download_with_retries(highest_resolution_stream, video_path)
        else:
            video_stream = video_streams.first()
            video_name = sanitize_filename(video_stream.default_filename)
            print(f"Downloading video for {video_name} in {resolution}")
            download_with_retries(video_stream, "video.mp4")

            audio_stream = yt.streams.get_audio_only()
            print(f"Downloading audio for {video_name}")
            download_with_retries(audio_stream, "audio.mp4")

            os.system("ffmpeg -y -i video.mp4 -i audio.mp4 -c:v copy -c:a aac final.mp4 -loglevel quiet -stats")
            os.rename("final.mp4", video_path)
            os.remove("video.mp4")
            os.remove("audio.mp4")

        print("----------------------------------")

A line of dashes is also printed after every iteration to separate each video as they are downloaded.

Function to Download with Retries

In the above code snippet, we have called a download_with_retries method. Let us define that:

@retry(stop=stop_after_attempt(5), wait=wait_fixed(2))
def download_with_retries(stream, filename):
    stream.download(filename=filename)

The function attempts to download a file using the provided stream, with retry logic to handle potential failures. It is decorated with the @retry decorator from the tenacity library, which adds retry functionality. The decorator parameters specify the retry behavior:

  • stop=stop_after_attempt(5): This tells the function to stop retrying after 5 attempts.

  • wait=wait_fixed(2): This specifies a fixed wait time of 2 seconds between each retry attempt.

Inside the function, stream.download(filename=filename) is called to perform the actual download. If the download fails, the retry mechanism will automatically retry up to 5 times, with a 2-second pause between attempts. This ensures a more robust download process by handling transient errors and improving reliability.

Function to Handle Progress Bar

We've also used a callback function to show the progress bar. Let us define that:

def progress_function(stream, chunk, bytes_remaining):
    total_size = stream.filesize
    bytes_downloaded = total_size - bytes_remaining
    percentage_of_completion = bytes_downloaded / total_size * 100
    print(f"Downloading... {percentage_of_completion:.2f}% complete", end="\r")

The function takes three parameters:

  • stream: Represents the video stream being downloaded.

  • chunk: The current chunk of data being downloaded.

  • bytes_remaining: The number of bytes remaining to download.

Inside the function:

  1. total_size = stream.filesize: Retrieves the total size of the file from the stream.

  2. bytes_downloaded = total_size - bytes_remaining: Calculates the number of bytes already downloaded by subtracting the remaining bytes from the total size.

  3. percentage_of_completion = bytes_downloaded / total_size * 100: Computes the percentage of completion by dividing the bytes downloaded by the total size and multiplying by 100.

The function then prints the progress as a percentage, updating the same line in the console with end="\r", which ensures the progress is displayed on the same line, providing a real-time update on the download status.

Main Function

Let's create the main function that takes input from the user for the playlist URL and desired video resolution. It then calls the download_playlist function with the user's inputs.

if __name__ == "__main__":
    playlist_url = input("Enter the playlist url: ")
    resolutions = ["240p", "360p", "480p", "720p", "1080p", "1440p", "2160p"]
    resolution = input(f"Please select a resolution {resolutions}: ")
    download_playlist(playlist_url, resolution)

Completed Code

Demo

Conclusion

In this tutorial, you learned how you can use PyTube and FFmpeg libraries to download videos from a YouTube playlist in high resolutions such as 1080p, 1440p, and even 2160p.

I hope you found the tutorial helpful. If so, don't forget to star the GitHub gist and share this tutorial with others.

The code has been tested with Pytubefix 6.5.1. It may not work properly at a later point in time.

Did you find this article valuable?

Support Ashutosh Krishna by becoming a sponsor. Any amount is appreciated!

ย