deploy: 9e4cc57

skit-ai · May 9, 2024 · d7accf4 · d7accf4
1 parent 747c113
commit d7accf4
Show file tree

Hide file tree

Showing 67 changed files with 69 additions and 69 deletions.
diff --git a/404/index.html b/404/index.html
@@ -291,7 +291,7 @@
     "id": 37,
     "url": "/speech-conversational-llms/",
     "title": "Speech LLMs for Conversations",
-    "body": "2024/05/09 - With LLMs making conversational systems has become easier. You no longer need tofocus on the low-level details of categorizing semantics and designingresponses. Instead, you can concentrate on controlling high-level behaviors viaan LLM. This is the trend that we see most of the world moving towards asproducts are using vendor combinations of ASR, LLM, and TTS with some dialogmanagement stitched in between. While this is going to be the norm soon, we wantto keep exploring areas from where the next set of quality improvements willcome. Earlier we discussed how spokenconversations are richer than pure text and how the gap would be not bridged byLLMs purely working on transcriptions. In one of our recent experiments we buildan efficient multi-modal LLM that takes speech directly to provide betterconversational experience. For production usage, the constraint here is thatthis should happen without losing the flexibility that you get in a text-onlyLLM around writing prompts, making changes, evaluating, and debugging. Below is a conversation with our recent in-house Speech LLM based conversationalsystem. Notice that because of the extra information in speech some micropersonalizations can happen like usage of gendered pronouns1. You also getlower impact of transcription errors and in general better responses innon-speech signals. With access to both speech and text domains, the modelallows for more fluent turn-taking, though not demonstrated in the currentconversation. In addition, our approach also reduces the combined model size(&lt;2B) for taking speech to response, leading to lower compute latency ascompared to larger systems. The model above doesn’t yet control speech synthesis beyond the textual markersit can generate, but that’s something to be added soon (you might have noticederratic pitch shifts in the call above since TTS vendors don’t contextualizebased on past conversations). Stay tuned for more details on how we take thisand similar research areas forward.       Of course concerns around paralinguistic prediction accuracies areextremely important to take something like this in production.  &#8617;    "
+    "body": "2024/05/09 - With LLMs making conversational systems has become easier. You no longer need tofocus on the low-level details of categorizing semantics and designingresponses. Instead, you can concentrate on controlling high-level behaviors viaan LLM. This is the trend that we see most of the world moving towards asproducts are using vendor combinations of ASR, LLM, and TTS with some dialogmanagement stitched in between. While this is going to be the norm soon, we wantto keep exploring areas from where the next set of quality improvements willcome. Earlier we discussed how spokenconversations are richer than pure text and how the gap would be not bridged byLLMs purely working on transcriptions. In one of our recent experiments we builtan efficient multi-modal LLM that takes speech directly to provide betterconversational experience. For production usage, the constraint here is thatthis should happen without losing the flexibility that you get in a text-onlyLLM around writing prompts, making changes, evaluating, and debugging. Below is a conversation with our recent in-house Speech LLM based conversationalsystem. Notice that because of the extra information in speech some micropersonalizations can happen like usage of gendered pronouns1. You also getlower impact of transcription errors and in general better responses innon-speech signals. With access to both speech and text domains, the modelallows for more fluent turn-taking, though not demonstrated in the currentconversation. In addition, our approach also reduces the combined model size(&lt;2B) for taking speech to response, leading to lower compute latency ascompared to larger systems. The model above doesn’t yet control speech synthesis beyond the textual markersit can generate, but that’s something to be added soon (you might have noticederratic pitch shifts in the call above since TTS vendors don’t contextualizebased on past conversations). Stay tuned for more details on how we take thisand similar research areas forward.       Of course concerns around paralinguistic prediction accuracies areextremely important to take something like this in production.  &#8617;    "
     }, {
     "id": 38,
     "url": "/confidence-calibration/",

diff --git a/Code-Mixing-Metrics/index.html b/Code-Mixing-Metrics/index.html
@@ -296,7 +296,7 @@
     "id": 37,
     "url": "/speech-conversational-llms/",
     "title": "Speech LLMs for Conversations",
-    "body": "2024/05/09 - With LLMs making conversational systems has become easier. You no longer need tofocus on the low-level details of categorizing semantics and designingresponses. Instead, you can concentrate on controlling high-level behaviors viaan LLM. This is the trend that we see most of the world moving towards asproducts are using vendor combinations of ASR, LLM, and TTS with some dialogmanagement stitched in between. While this is going to be the norm soon, we wantto keep exploring areas from where the next set of quality improvements willcome. Earlier we discussed how spokenconversations are richer than pure text and how the gap would be not bridged byLLMs purely working on transcriptions. In one of our recent experiments we buildan efficient multi-modal LLM that takes speech directly to provide betterconversational experience. For production usage, the constraint here is thatthis should happen without losing the flexibility that you get in a text-onlyLLM around writing prompts, making changes, evaluating, and debugging. Below is a conversation with our recent in-house Speech LLM based conversationalsystem. Notice that because of the extra information in speech some micropersonalizations can happen like usage of gendered pronouns1. You also getlower impact of transcription errors and in general better responses innon-speech signals. With access to both speech and text domains, the modelallows for more fluent turn-taking, though not demonstrated in the currentconversation. In addition, our approach also reduces the combined model size(&lt;2B) for taking speech to response, leading to lower compute latency ascompared to larger systems. The model above doesn’t yet control speech synthesis beyond the textual markersit can generate, but that’s something to be added soon (you might have noticederratic pitch shifts in the call above since TTS vendors don’t contextualizebased on past conversations). Stay tuned for more details on how we take thisand similar research areas forward.       Of course concerns around paralinguistic prediction accuracies areextremely important to take something like this in production.  &#8617;    "
+    "body": "2024/05/09 - With LLMs making conversational systems has become easier. You no longer need tofocus on the low-level details of categorizing semantics and designingresponses. Instead, you can concentrate on controlling high-level behaviors viaan LLM. This is the trend that we see most of the world moving towards asproducts are using vendor combinations of ASR, LLM, and TTS with some dialogmanagement stitched in between. While this is going to be the norm soon, we wantto keep exploring areas from where the next set of quality improvements willcome. Earlier we discussed how spokenconversations are richer than pure text and how the gap would be not bridged byLLMs purely working on transcriptions. In one of our recent experiments we builtan efficient multi-modal LLM that takes speech directly to provide betterconversational experience. For production usage, the constraint here is thatthis should happen without losing the flexibility that you get in a text-onlyLLM around writing prompts, making changes, evaluating, and debugging. Below is a conversation with our recent in-house Speech LLM based conversationalsystem. Notice that because of the extra information in speech some micropersonalizations can happen like usage of gendered pronouns1. You also getlower impact of transcription errors and in general better responses innon-speech signals. With access to both speech and text domains, the modelallows for more fluent turn-taking, though not demonstrated in the currentconversation. In addition, our approach also reduces the combined model size(&lt;2B) for taking speech to response, leading to lower compute latency ascompared to larger systems. The model above doesn’t yet control speech synthesis beyond the textual markersit can generate, but that’s something to be added soon (you might have noticederratic pitch shifts in the call above since TTS vendors don’t contextualizebased on past conversations). Stay tuned for more details on how we take thisand similar research areas forward.       Of course concerns around paralinguistic prediction accuracies areextremely important to take something like this in production.  &#8617;    "
     }, {
     "id": 38,
     "url": "/confidence-calibration/",

diff --git a/Code-Mixing-Seminar/index.html b/Code-Mixing-Seminar/index.html
@@ -296,7 +296,7 @@
     "id": 37,
     "url": "/speech-conversational-llms/",
     "title": "Speech LLMs for Conversations",
-    "body": "2024/05/09 - With LLMs making conversational systems has become easier. You no longer need tofocus on the low-level details of categorizing semantics and designingresponses. Instead, you can concentrate on controlling high-level behaviors viaan LLM. This is the trend that we see most of the world moving towards asproducts are using vendor combinations of ASR, LLM, and TTS with some dialogmanagement stitched in between. While this is going to be the norm soon, we wantto keep exploring areas from where the next set of quality improvements willcome. Earlier we discussed how spokenconversations are richer than pure text and how the gap would be not bridged byLLMs purely working on transcriptions. In one of our recent experiments we buildan efficient multi-modal LLM that takes speech directly to provide betterconversational experience. For production usage, the constraint here is thatthis should happen without losing the flexibility that you get in a text-onlyLLM around writing prompts, making changes, evaluating, and debugging. Below is a conversation with our recent in-house Speech LLM based conversationalsystem. Notice that because of the extra information in speech some micropersonalizations can happen like usage of gendered pronouns1. You also getlower impact of transcription errors and in general better responses innon-speech signals. With access to both speech and text domains, the modelallows for more fluent turn-taking, though not demonstrated in the currentconversation. In addition, our approach also reduces the combined model size(&lt;2B) for taking speech to response, leading to lower compute latency ascompared to larger systems. The model above doesn’t yet control speech synthesis beyond the textual markersit can generate, but that’s something to be added soon (you might have noticederratic pitch shifts in the call above since TTS vendors don’t contextualizebased on past conversations). Stay tuned for more details on how we take thisand similar research areas forward.       Of course concerns around paralinguistic prediction accuracies areextremely important to take something like this in production.  &#8617;    "
+    "body": "2024/05/09 - With LLMs making conversational systems has become easier. You no longer need tofocus on the low-level details of categorizing semantics and designingresponses. Instead, you can concentrate on controlling high-level behaviors viaan LLM. This is the trend that we see most of the world moving towards asproducts are using vendor combinations of ASR, LLM, and TTS with some dialogmanagement stitched in between. While this is going to be the norm soon, we wantto keep exploring areas from where the next set of quality improvements willcome. Earlier we discussed how spokenconversations are richer than pure text and how the gap would be not bridged byLLMs purely working on transcriptions. In one of our recent experiments we builtan efficient multi-modal LLM that takes speech directly to provide betterconversational experience. For production usage, the constraint here is thatthis should happen without losing the flexibility that you get in a text-onlyLLM around writing prompts, making changes, evaluating, and debugging. Below is a conversation with our recent in-house Speech LLM based conversationalsystem. Notice that because of the extra information in speech some micropersonalizations can happen like usage of gendered pronouns1. You also getlower impact of transcription errors and in general better responses innon-speech signals. With access to both speech and text domains, the modelallows for more fluent turn-taking, though not demonstrated in the currentconversation. In addition, our approach also reduces the combined model size(&lt;2B) for taking speech to response, leading to lower compute latency ascompared to larger systems. The model above doesn’t yet control speech synthesis beyond the textual markersit can generate, but that’s something to be added soon (you might have noticederratic pitch shifts in the call above since TTS vendors don’t contextualizebased on past conversations). Stay tuned for more details on how we take thisand similar research areas forward.       Of course concerns around paralinguistic prediction accuracies areextremely important to take something like this in production.  &#8617;    "
     }, {
     "id": 38,
     "url": "/confidence-calibration/",

diff --git a/Turn_Taking_Dynamics_in_Voice_Bots/index.html b/Turn_Taking_Dynamics_in_Voice_Bots/index.html
@@ -294,7 +294,7 @@
     "id": 37,
     "url": "/speech-conversational-llms/",
     "title": "Speech LLMs for Conversations",
-    "body": "2024/05/09 - With LLMs making conversational systems has become easier. You no longer need tofocus on the low-level details of categorizing semantics and designingresponses. Instead, you can concentrate on controlling high-level behaviors viaan LLM. This is the trend that we see most of the world moving towards asproducts are using vendor combinations of ASR, LLM, and TTS with some dialogmanagement stitched in between. While this is going to be the norm soon, we wantto keep exploring areas from where the next set of quality improvements willcome. Earlier we discussed how spokenconversations are richer than pure text and how the gap would be not bridged byLLMs purely working on transcriptions. In one of our recent experiments we buildan efficient multi-modal LLM that takes speech directly to provide betterconversational experience. For production usage, the constraint here is thatthis should happen without losing the flexibility that you get in a text-onlyLLM around writing prompts, making changes, evaluating, and debugging. Below is a conversation with our recent in-house Speech LLM based conversationalsystem. Notice that because of the extra information in speech some micropersonalizations can happen like usage of gendered pronouns1. You also getlower impact of transcription errors and in general better responses innon-speech signals. With access to both speech and text domains, the modelallows for more fluent turn-taking, though not demonstrated in the currentconversation. In addition, our approach also reduces the combined model size(&lt;2B) for taking speech to response, leading to lower compute latency ascompared to larger systems. The model above doesn’t yet control speech synthesis beyond the textual markersit can generate, but that’s something to be added soon (you might have noticederratic pitch shifts in the call above since TTS vendors don’t contextualizebased on past conversations). Stay tuned for more details on how we take thisand similar research areas forward.       Of course concerns around paralinguistic prediction accuracies areextremely important to take something like this in production.  &#8617;    "
+    "body": "2024/05/09 - With LLMs making conversational systems has become easier. You no longer need tofocus on the low-level details of categorizing semantics and designingresponses. Instead, you can concentrate on controlling high-level behaviors viaan LLM. This is the trend that we see most of the world moving towards asproducts are using vendor combinations of ASR, LLM, and TTS with some dialogmanagement stitched in between. While this is going to be the norm soon, we wantto keep exploring areas from where the next set of quality improvements willcome. Earlier we discussed how spokenconversations are richer than pure text and how the gap would be not bridged byLLMs purely working on transcriptions. In one of our recent experiments we builtan efficient multi-modal LLM that takes speech directly to provide betterconversational experience. For production usage, the constraint here is thatthis should happen without losing the flexibility that you get in a text-onlyLLM around writing prompts, making changes, evaluating, and debugging. Below is a conversation with our recent in-house Speech LLM based conversationalsystem. Notice that because of the extra information in speech some micropersonalizations can happen like usage of gendered pronouns1. You also getlower impact of transcription errors and in general better responses innon-speech signals. With access to both speech and text domains, the modelallows for more fluent turn-taking, though not demonstrated in the currentconversation. In addition, our approach also reduces the combined model size(&lt;2B) for taking speech to response, leading to lower compute latency ascompared to larger systems. The model above doesn’t yet control speech synthesis beyond the textual markersit can generate, but that’s something to be added soon (you might have noticederratic pitch shifts in the call above since TTS vendors don’t contextualizebased on past conversations). Stay tuned for more details on how we take thisand similar research areas forward.       Of course concerns around paralinguistic prediction accuracies areextremely important to take something like this in production.  &#8617;    "
     }, {
     "id": 38,
     "url": "/confidence-calibration/",