Marketplace

/ Machine Learning (audio based)

Provide feedback

Machine Learning (audio based)

BETA

box-back

Machine Learning with audio

Standalone Audio ML Application

The Standalone Audio ML Application combines cutting-edge machine learning models to provide a full audio processing solution. It integrates Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) for an interactive and personalized audio experience.

This Python-based application supports both microphone recording and pre-recorded audio file processing. It performs a multi-stage analysis:


This Python-based application supports both microphone recording and pre-recorded audio file processing. It performs a multi-stage analysis:

  • Transcription: Uses Wav2Vec2 ASR for accurate speech-to-text conversion.
  • Sentiment Analysis: Applies a fine-tuned DistilBERT model to determine the sentiment of the text.
  • Response Generation: Based on sentiment analysis, the application generates an appropriate response.
  • Speech Synthesis: Converts the response text into natural-sounding speech with Tacotron2 TTS.

A user-friendly graphical interface, built with Tkinter, allows users to interact with the application without needing to use the command line.


Key Features:

  • Automatic Speech Recognition: Transcribes speech with the Wav2Vec2 model, handling audio with a 16kHz sample rate.
  • Sentiment Analysis: Analyzes text sentiment using DistilBERT, categorizing it as positive, negative, or neutral.
  • Text-to-Speech: Converts generated responses to speech, producing clear .wav files.
  • Audio Acquisition: Supports direct microphone recording or loading pre-recorded audio files.
  • Graphical User Interface: Built with Tkinter for easy interaction, including buttons for recording and file processing.

How to Use:

  • Clone the Repository:

    git clone https://github.com/yourusername/standalone-audio-ml-app.git

    cd standalone-audio-ml-app

  • Install Dependencies:

    pip install -r requirements.txt

Run the Application:

  • Command-line: Record audio or process existing files:

    python main.py --record

    python main.py --file path/to/audio/file

  • GUI: Launch the GUI application:
  • python gui_app.py


Dependencies:

  • transformers (Hugging Face models)
  • torch (Deep learning)
  • librosa (Audio processing)
  • TTS (Text-to-Speech synthesis)
  • sounddevice and soundfile (Audio recording and saving)
  • tkinter (GUI development)

License:

MIT License


Contributing:

Feel free to contribute! Fork the repository, create a branch for your changes, and submit a pull request. Please follow the project's style guide and include tests where applicable.

About

A Python app combining ASR, NLP, and TTS for audio processing & more

Date

17/nov/24

Last revision

22/nov/2024

Languages

Pytorch, Python, Javascript

Pytorch

Python

Machine Learning

TTS

ASR

NLP

Open Source

Wav2Vec2

DistilBERT

Tacotron2

Sentiment Analysis