Building a YouTube to Lo-Fi Converter: A Deep Dive into Audio Processing with Python

Lo-fi music has surged in popularity over the past few years, becoming a staple for study sessions, relaxation, and background ambiance. Characterized by its warm, nostalgic sound featuring reverb effects, slower tempos, and often vinyl crackle, lo-fi creates an atmosphere that many find perfect for focus and calm.

In this article, we’ll explore how to build a YouTube-to-Lo-Fi converter application using Python and Streamlit. This application allows users to transform any YouTube video’s audio into a lo-fi track by applying reverb and slowdown effects. We’ll dive into the technical implementation, explore the audio processing techniques, and discuss the challenges faced during development.

What is Lo-Fi Music?

Lo-fi (low-fidelity) music embraces imperfections in recording and production, creating a warm, nostalgic sound. Key characteristics include:

Reverb effects: Creating a sense of space and atmosphere
Slowed-down tempo: Often 5-30% slower than the original
Vinyl crackle and noise: Adding texture and warmth
Filtered frequencies: Typically reduce high frequencies for a warmer sound

Our application focuses on implementing the first two characteristics: reverb and slowdown effects.

Technical Architecture

The YouTube to Lo-Fi converter is built using several key technologies:

Streamlit: For creating the web interface
PyTubeFix: For downloading YouTube audio
FFmpeg: For audio processing and format conversion
Librosa: For advanced audio analysis and processing
PyDub: For audio manipulation
Matplotlib: For waveform visualization

The application follows a straightforward workflow:

User inputs a YouTube URL
Application downloads the audio using PyTubeFix
Audio processing applies reverb and optional slowdown effects
The user can preview and download the processed audio

Implementation Details

Setting Up FFmpeg

One of the core components of our application is FFmpeg, a powerful multimedia framework for handling audio and video. We need to properly configure it before using any audio processing libraries:

# Set ffmpeg paths - using normpath to ensure correct path format
ffmpeg_dir = os.path.normpath(os.path.join(os.path.dirname(os.path.abspath(__file__)), 
                              "ffmpeg", "ffmpeg-master-latest-win64-gpl", "bin"))
ffmpeg_path = os.path.normpath(os.path.join(ffmpeg_dir, "ffmpeg.exe"))
ffprobe_path = os.path.normpath(os.path.join(ffmpeg_dir, "ffprobe.exe"))

# Add ffmpeg directory to system PATH first
os.environ['PATH'] = ffmpeg_dir + os.pathsep + os.environ['PATH']

# Set environment variables for all audio libraries
os.environ['FFMPEG_BINARY'] = ffmpeg_path
os.environ['FFPROBE_BINARY'] = ffprobe_path
os.environ['LIBROSA_FFMPEG_EXECUTABLE'] = ffmpeg_path

This setup ensures that all our audio libraries (PyDub, Librosa, etc.) use the same FFmpeg binaries, preventing compatibility issues.

Downloading YouTube Audio

The first step in our process is downloading audio from YouTube. We use PyTubeFix, a fork of PyTube that addresses some of the issues with YouTube’s constantly changing API:

def download_youtube_audio(url, output_path):
    """Download audio from YouTube video with enhanced error handling and retry mechanism"""
    max_retries = 3
    retry_count = 0

    # Clean and validate the URL
    url = url.strip()

    # Handle various YouTube URL formats
    if 'youtu.be' in url:
        video_id = url.split('/')[-1].split('?')[0]
        url = f"https://www.youtube.com/watch?v={video_id}"
    # ... other URL format handling ...

    while retry_count < max_retries:
        try:
            # Create YouTube object
            yt = YouTube(url, use_oauth=False, allow_oauth_cache=True)

            # Get available streams and select the best audio
            audio_streams = yt.streams.filter(only_audio=True)
            audio_streams_by_abr = audio_streams.order_by('abr').desc()
            video = audio_streams_by_abr.first()

            # Download the file
            out_file = video.download(output_path=output_path)

            # Convert to MP3
            base, ext = os.path.splitext(out_file)
            mp3_file = base + '.mp3'
            os.rename(out_file, mp3_file)

            return mp3_file, yt.title

        except Exception as e:
            # Error handling and retry logic
            retry_count += 1
            # ... retry logic ...

The function includes robust error handling with a retry mechanism, handling different YouTube URL formats, and selecting the highest quality audio stream available.

Applying Lo-Fi Effects

The heart of our application is the audio processing that transforms regular audio into lo-fi. We implement two main effects:

1. Slowdown Effect

The slowdown effect reduces the playback speed without changing the pitch, creating that characteristic lo-fi vibe:

# Calculate slowdown factor (0.5 to 1.0, where 0.5 is half speed)
slowdown_factor = 1.0 - (effects['slowdown_amount'] / 200.0)

# Use ffmpeg's atempo filter for slowdown
cmd = [
    ffmpeg_path,
    '-y',  # Overwrite output files
    '-i', temp_slowdown_path,
    '-filter:a', f"atempo={slowdown_factor}",  # Slowdown effect
    output_slowdown_temp
]

# Run the command
subprocess.run(cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

We use FFmpeg’s atempo filter, which allows us to adjust the tempo without affecting the pitch. The slowdown factor ranges from 0.5 (half speed) to 1.0 (original speed).

2. Reverb Effect

The application offers two reverb implementation options:

Fast Reverb (using FFmpeg’s aecho filter):

# Fast reverb implementation using direct ffmpeg
reverb_amount = effects['reverb_amount']
reverberance = min(100, reverb_amount * 10)  # Scale to 0-100

# Use ffmpeg's aecho filter for a quick reverb effect
cmd = [
    ffmpeg_path,
    '-y',
    '-i', temp_path,
    '-af', f"aecho=0.8:0.9:{reverb_amount * 50}:0.5",  # Simple echo effect
    output_temp
]

High-Quality Reverb (using convolution):

# High-quality reverb using librosa
y, sr = librosa.load(temp_path, sr=None)

# Create impulse response for reverb
reverb_length = int(sr * effects['reverb_amount'] / 20)
impulse_response = np.exp(-np.arange(reverb_length) / (sr * 0.5))

# Process in chunks to reduce memory usage
chunk_size = 2**18  # Process ~0.25M samples at a time
y_reverb = np.zeros_like(y)

# Process audio in chunks
for i in range(0, len(y), chunk_size):
    end = min(i + chunk_size, len(y))
    chunk = y[i:end]
    # Apply convolution to chunk
    chunk_reverb = np.convolve(chunk, impulse_response, mode='full')[:len(chunk)]
    y_reverb[i:end] = chunk_reverb

The high-quality reverb uses convolution with an exponentially decaying impulse response, which creates a more natural reverb sound. To manage memory usage, we process the audio in chunks.

User Interface with Streamlit

Streamlit provides a simple yet powerful way to create web interfaces for Python applications. Our UI is designed to be intuitive and visually appealing:

def main():
    # App title and description
    st.markdown("<h1 class='main-header'>YouTube to Lo-Fi Converter</h1>", unsafe_allow_html=True)

    # Create two columns for the main layout
    col1, col2 = st.columns([3, 2])

    with col1:
        # YouTube URL input
        st.markdown("<h2 class='sub-header'>Step 1: Enter YouTube URL</h2>", unsafe_allow_html=True)
        youtube_url = st.text_input("Paste YouTube URL here")

        # Reverb effect options
        st.markdown("<h2 class='sub-header'>Step 2: Customize Reverb Effect</h2>", unsafe_allow_html=True)
        reverb_amount = st.slider("Reverb Amount", 1, 10, 5)
        reverb_quality = st.radio("Reverb Quality", ["fast", "high-quality"], index=0)

        # Slowdown controls
        slowdown_enabled = st.checkbox("Enable Slowdown Effect", value=False)
        slowdown_amount = st.slider("Slowdown Amount", 0, 100, 30)

        # Process button
        process_button = st.button("Apply Effects")

The interface includes sliders for adjusting effect parameters, allowing users to customize the lo-fi sound to their preference.

Waveform Visualization

To provide visual feedback, we generate a waveform visualization of the processed audio:

def create_waveform_plot(audio_path):
    """Create a waveform visualization of the audio"""
    # Load the audio file
    y, sr = librosa.load(audio_path, sr=None)

    # Create the figure and axes
    plt.figure(figsize=(10, 4))

    # Plot the waveform
    librosa.display.waveshow(y, sr=sr, alpha=0.6, color='#9370DB')

    # Customize the plot
    plt.title('Waveform Visualization', color='#483D8B')
    plt.xlabel('Time (s)', color='#483D8B')
    plt.ylabel('Amplitude', color='#483D8B')
    plt.tight_layout()

    return plt.gcf()

This visualization helps users see the audio’s amplitude patterns and provides a more engaging experience.

Challenges and Solutions

Challenge 1: YouTube API Changes

YouTube frequently changes its API and page structure, which can break libraries like PyTube. To address this, we:

Used PyTubeFix, a more actively maintained fork of PyTube
Implemented a robust retry mechanism with exponential backoff
Added fallback methods for different extraction approaches

Challenge 2: Memory Management for Audio Processing

Audio processing, especially reverb using convolution, can be memory-intensive. Our solution:

Processed audio in chunks to reduce memory usage
Offered a “fast” reverb option using FFmpeg’s filters for less resource-intensive processing
Cleaned up temporary files immediately after use

Challenge 3: FFmpeg Integration

Ensuring consistent FFmpeg usage across different libraries was challenging. We solved this by:

Bundling FFmpeg with the application
Setting environment variables to ensure all libraries use the same FFmpeg binaries
Adding robust error handling for FFmpeg operations

Future Enhancements

The YouTube to Lo-Fi converter could be enhanced in several ways:

Additional Effects: Adding vinyl crackle, frequency filtering, and beat manipulation
Batch Processing: Allowing users to process multiple YouTube videos at once
Preset Profiles: Creating preset lo-fi styles (e.g., “Study Beats”, “Chill Vibes”)
Audio Visualization: Adding more advanced visualizations like spectrograms
Cloud Deployment: Hosting the application on a cloud platform for wider accessibility

Conclusion

Building a YouTube to Lo-Fi converter demonstrates the power of Python’s audio processing capabilities. By combining libraries like PyDub, Librosa, and FFmpeg with Streamlit’s intuitive interface, we’ve created an application that makes audio transformation accessible to everyone.

The project showcases several important concepts in audio processing:

Digital signal processing techniques (convolution, filtering)
Audio format conversion and manipulation
Memory-efficient processing of large audio files
Creating intuitive user interfaces for complex operations

Whether you’re a music producer looking to experiment with lo-fi sounds or a developer interested in audio processing, this project provides a solid foundation for understanding and implementing audio effects in Python.

Here’s how the Streamlit app looks.

Also, note that you will need the ffmpeg binaries on your local system to run this. These files are not included in Github as they are greater than 100MB in size. Binaries can be downloaded from here.

Note: The complete source code for this project is available on GitHub. Feel free to fork, contribute, or use it as a starting point for your own audio processing applications.

Building a YouTube to Lo-Fi Converter: A Deep Dive into Audio Processing with Python

What is Lo-Fi Music?

Technical Architecture

Implementation Details

Setting Up FFmpeg

Downloading YouTube Audio

Applying Lo-Fi Effects

1. Slowdown Effect

2. Reverb Effect

User Interface with Streamlit

Waveform Visualization

Challenges and Solutions

Challenge 1: YouTube API Changes

Challenge 2: Memory Management for Audio Processing

Challenge 3: FFmpeg Integration

Future Enhancements

Conclusion

About Lahaul Seth

Subscribe To Our Newsletter

What is Lo-Fi Music?

Technical Architecture

Implementation Details

Setting Up FFmpeg

Downloading YouTube Audio

Applying Lo-Fi Effects

1. Slowdown Effect

2. Reverb Effect

User Interface with Streamlit

Waveform Visualization

Challenges and Solutions

Challenge 1: YouTube API Changes

Challenge 2: Memory Management for Audio Processing

Challenge 3: FFmpeg Integration

Future Enhancements

Conclusion

About Lahaul Seth

Footer CTA

Subscribe To Our Newsletter