04 Apr, 2022 / 0 Comments

ESP32-S3-BOX: AI Voice Development Kit with 16MB QSPI flash and 8MB Octal PSRAM designed for AIoT Applications

Espressif’s AI voice-development kit ESP32-S3-BOX provides a platform for developing the control of smart devices with offline and online voice assistants and is ideal for developing AIoT applications with reconfigurable AI voice functions, such as smart speakers, and IoT devices that achieve human-computer voice interaction directly. This ESP32-S3-BOX can function as the “control center” for users’ entire surrounding area, allowing them to easily connect multiple smart devices, either with voice commands or by using the device’s touch screen. Moreover, it includes Espressif’s AI image processing, Wi-Fi human-body detection, and wireless image transmission that can be useful in an office environment, facilitating the running of the reception area and conference rooms.

This AI voice development kit is based on Espressif’s ESP32-S3 Wi-Fi + Bluetooth 5 (LE) SoC and comes with 16MB of QSPI flash and 8MB of Octal PSRAM. With ESP32-S3’s 512KB SRAM, this ESP32-S3-BOX is equipped with a variety of peripherals, such as a 2.4-inch display with a 320x240 resolution, a capacitive touch screen, a dual microphone, a speaker, and two Pmod-compatible headers which allow for the extensibility of the hardware. It also uses a Type-C USB connector that provides 5 V of power input, while also supporting serial and JTAG debugging, as well as a programming interface.

ESP32-S3-BOX runs Espressif’s own audio front-end (AFE) algorithm, ESP-Skainet, and Alexa-for-IoT SDK, providing users with excellent offline and online voice functions. Additionally, it is capable of running the LVGL-based HMI solution, and such SDKs as ESP-DL and ESP-ADF. Espressif’s complete AIoT platform, ESP RainMaker, can also be used with ESP32-S3-BOX for configuring GPIOs and offline commands, while providing control via phone apps and/or a voice assistant.