Machine Learning (audio based)

BETA

Machine Learning with audio

Standalone Audio ML Application

The Standalone Audio ML Application combines cutting-edge machine learning models to provide a full audio processing solution. It integrates Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) for an interactive and personalized audio experience.

This Python-based application supports both microphone recording and pre-recorded audio file processing. It performs a multi-stage analysis:

Transcription: Uses Wav2Vec2 ASR for accurate speech-to-text conversion.
Sentiment Analysis: Applies a fine-tuned DistilBERT model to determine the sentiment of the text.
Response Generation: Based on sentiment analysis, the application generates an appropriate response.
Speech Synthesis: Converts the response text into natural-sounding speech with Tacotron2 TTS.

A user-friendly graphical interface, built with Tkinter, allows users to interact with the application without needing to use the command line.

Key Features:

Automatic Speech Recognition: Transcribes speech with the Wav2Vec2 model, handling audio with a 16kHz sample rate.
Sentiment Analysis: Analyzes text sentiment using DistilBERT, categorizing it as positive, negative, or neutral.
Text-to-Speech: Converts generated responses to speech, producing clear .wav files.
Audio Acquisition: Supports direct microphone recording or loading pre-recorded audio files.
Graphical User Interface: Built with Tkinter for easy interaction, including buttons for recording and file processing.

How to Use:

Clone the Repository:
git clone https://github.com/yourusername/standalone-audio-ml-app.git

cd standalone-audio-ml-app
Install Dependencies:
pip install -r requirements.txt

Run the Application:

Command-line: Record audio or process existing files:
python main.py --record

python main.py --file path/to/audio/file
GUI: Launch the GUI application:

python gui_app.py

Dependencies:

transformers (Hugging Face models)
torch (Deep learning)
librosa (Audio processing)
TTS (Text-to-Speech synthesis)
sounddevice and soundfile (Audio recording and saving)
tkinter (GUI development)

License:

MIT License

Contributing:

Feel free to contribute! Fork the repository, create a branch for your changes, and submit a pull request. Please follow the project's style guide and include tests where applicable.

About

A Python app combining ASR, NLP, and TTS for audio processing & more

Date

17/nov/24

Last revision

22/nov/2024

Languages

Pytorch, Python, Javascript

Pytorch

Python

Machine Learning

TTS

ASR

NLP

Open Source

Wav2Vec2

DistilBERT

Tacotron2

Sentiment Analysis