Voice Detection on ESP-32

Discussion in 'UDOO KEY' started by bcerjan, Jul 16, 2023.

  1. bcerjan

    bcerjan New Member

    Joined:
    May 25, 2023
    Messages:
    1
    Likes Received:
    0
    I've been working on making an offline voice timer using the ESP32 for voice recognition and the RP2040 for everything else (a little display, sounds, ...). I've hit a bit of a wall with the speech recognition portion and I was wondering if anyone had any suggestions or had a better performing method for speech detection.

    Knowing very little about machine learning, I followed along with the Google Colab documentation for training 'tflite' models for microcontrollers, but I wanted to adjust it to detect the words: marvin (as a wake word), stop, and the digits 0-9. I've tooled around with the settings as well as implemented the methods from a few papers (e.g. this one) to try and improve performance, but I'm typically limited to ~85% accuracy on a reduced set of words (marvin, stop, 0-3 and 5). When I try modifying the example code for an ESP32 (from the espressif repo) I can see that it is now trying to detect the correct words, but it has very low confidence (typically ~130 on the scale it uses) and I am fairly certain it would never "work" in any real sense.

    Does anyone have any suggestions about how to improve accuracy? I also tried using ESP-Skainet, but I couldn't get it to recognize any commands (though I am not sure it was receiving audio correctly).
     
  2. Lhimo

    Lhimo New Member

    Joined:
    Apr 12, 2023
    Messages:
    3
    Likes Received:
    0
    might be due to this?
     

Share This Page