Озвучивать текст SpeechKit Yandex API v3: Synthesizer (gRPC)

24.02.20241304 прочитало0

Набор методов синтеза голоса.

Вызов	Описание
UtteranceSynthesis	Синтезирование текста в речь.

UtteranceSynthesis

Синтезирование текста в речь.

rpc UtteranceSynthesis (UtteranceSynthesisRequest) returns (stream UtteranceSynthesisResponse)

UtteranceSynthesisRequest

Поле	Описание
model	string Имя модели. Определяет базовую функциональность синтеза. На данный момент должно быть пусто. Не используйте его..
Utterance	oneof: `text` or `text_template` Текст для синтеза, одна из разметок синтеза текста.
text	string Raw text (e.g. «Hello, Alice»).
text_template	TextTemplate Text template instance, e.g. `{"Hello, {username}" with username="Alice"}`.
hints[]	Hints Опционально настройки синтеза речи.
output_audio_spec	AudioFormatOptions Опционально. По умолчанию: 22050 Гц, линейный 16-битный PCM с прямым порядком байтов со знаком, с заголовком WAV.
loudness_normalization_type	enum LoudnessNormalizationType Указывает тип нормализации громкости. Необязательный. По умолчанию: `LUFS`. `MAX_PEAK`: тип нормализации, при котором усиление изменяется для приведения максимального значения выборки PCM или пика аналогового сигнала к заданному уровню. `LUFS`: Тип нормализации, основанный на рекомендации EBU R 128.
unsafe_mode	bool Опционально. Автоматически разделяйте длинный текст на несколько высказываний и выставляйте соответствующие счета. Возможно некоторое ухудшение качества обслуживания.

TextTemplate

Поле	Описание
text_template	string Template text. Sample:`The {animal} goes to the {place}.`
variables[]	TextVariable Defining variables in template text. Sample: `{animal: cat, place: forest}`

TextVariable

Поле	Описание
variable_name	string The name of the variable.
variable_value	string The text of the variable.

Hints

Поле	Описание
Hint	oneof: `voice`, `audio_template`, `speed`, `volume`, `role`, `pitch_shift` or `duration` The hint for TTS engine to specify synthesised audio characteristics.
voice	string Name of speaker to use.
audio_template	AudioTemplate Template for synthesizing.
speed	double Hint to change speed.
volume	double Hint to regulate normalization level. For `MAX_PEAK` loudness_normalization_type: volume changes in a range (0;1], default value is 0.7. For `LUFS` loudness_normalization_type: volume changes in a range [-145;0), default value is -19.
role	string Hint to specify pronunciation character for the speaker.
pitch_shift	double Hint to increase (or decrease) speaker’s pitch, measured in Hz. Valid values are in range [-1000;1000], default value is 0.
duration	DurationHint Hint to limit both minimum and maximum audio duration.

AudioTemplate

Поле	Описание
audio	AudioContent Audio file.
text_template	TextTemplate Template and description of its variables.
variables[]	AudioVariable Describing variables in audio.

AudioContent

Поле	Description
AudioSource	oneof: `content` The audio source to read the data from.
content	bytes Bytes with audio data.
audio_spec	AudioFormatOptions Description of the audio format.

AudioVariable

Поле	Описание
variable_name	string The name of the variable.
variable_start_ms	int64 Start time of the variable in milliseconds.
variable_length_ms	int64 Length of the variable in milliseconds.

DurationHint

Поле Описание

policy

Поле	Описание
policy	enum DurationHintPolicy Type of duration constraint. `EXACT_DURATION`: Limit audio duration to exact value. `MIN_DURATION`: Limit the minimum audio duration. `MAX_DURATION`: Limit the maximum audio duration.
duration_ms	int64 Constraint on audio duration in milliseconds.

enum DurationHintPolicy
Type of duration constraint.

EXACT_DURATION: Limit audio duration to exact value.
MIN_DURATION: Limit the minimum audio duration.
MAX_DURATION: Limit the maximum audio duration.

duration_ms int64
Constraint on audio duration in milliseconds.

AudioFormatOptions

Поле	Описание
AudioFormat	oneof: `raw_audio` or `container_audio`
raw_audio	RawAudio The audio format specified in request parameters.
container_audio	ContainerAudio The audio format specified inside the container metadata.

RawAudio

Поле Описание

audio_encoding

Поле	Описание
audio_encoding	enum AudioEncoding Encoding type. `LINEAR16_PCM`: Audio bit depth 16-bit signed little-endian (Linear PCM).
sample_rate_hertz	int64 Sampling frequency of the signal.

enum AudioEncoding
Encoding type.

LINEAR16_PCM: Audio bit depth 16-bit signed little-endian (Linear PCM).

sample_rate_hertz int64
Sampling frequency of the signal.

ContainerAudio

Поле Описание

container_audio_type

Поле	Описание
container_audio_type	enum ContainerAudioType `WAV`: Audio bit depth 16-bit signed little-endian (Linear PCM). `OGG_OPUS`: Data is encoded using the OPUS audio codec and compressed using the OGG container format. `MP3`: Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format.

enum ContainerAudioType

WAV: Audio bit depth 16-bit signed little-endian (Linear PCM).
OGG_OPUS: Data is encoded using the OPUS audio codec and compressed using the OGG container format.
MP3: Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format.

UtteranceSynthesisResponse

Поле	Описание
audio_chunk	AudioChunk Part of synthesized audio.
text_chunk	TextChunk Part of synthesized text.
start_ms	int64 Start time of the audio chunk in milliseconds.
length_ms	int64 Length of the audio chunk in milliseconds.

AudioChunk

Поле	Описание
data	bytes Sequence of bytes of the synthesized audio in format specified in output_audio_spec.

TextChunk

Поле	Описание
text	string Synthesized text.

What's your reaction?

Excited

Happy

In Love

Not Sure

Silly

Озвучивать текст SpeechKit Yandex API v3: Synthesizer (gRPC)

UtteranceSynthesis

UtteranceSynthesisRequest

TextTemplate

TextVariable

Hints

AudioTemplate

AudioContent

AudioVariable

DurationHint

AudioFormatOptions

RawAudio

ContainerAudio

UtteranceSynthesisResponse

AudioChunk

TextChunk

What's your reaction?

Загадочный граф Сен-Жерме́н (часть 1-я)

Евгений Евтушенко: Б. Ахмадулиной

How save dump data on disk to use c++

Электроника и схемотехника для начинающих. С чего начать?

Продвижение песен

Смотрят также:Наука

План продвижения песен для трёх музыкальных стилей — классический, модерн и поп

ТОП-20 встраиваемых баз данных (Top 20 Embedded Databases)

Оптимизация размеров экрана для различных условий просмотра

Формула читабельности: как сделать ваш сайт удобным для чтения

Оставить комментарий Отменить ответ

Сейчас смотрят

Лунный грунт: Были ли американцы на Луне?

Мы говорим одно и то же

Вдыхаю вновь лучи сентябрьских дней

UtteranceSynthesis

UtteranceSynthesisRequest

TextTemplate

TextVariable

Hints

AudioTemplate

AudioContent

AudioVariable

DurationHint

AudioFormatOptions

RawAudio

ContainerAudio

UtteranceSynthesisResponse

AudioChunk

TextChunk

Share

What's your reaction?

Вам понравится

Смотрят также:Наука

Оставить комментарий Отменить ответ

Сейчас смотрят

Latest Posts