Prakash Siva, Vp, Technology & Strategy, Radisys

This is going to be a discussion in a couple of areas one is going to talk about media processing audio and video processing but more interesting we’re going to talk about how service providers can use the concept of keywords and wise activated automation wise activated detection of keywords as part of a stream an audio or a video stream.

Can use that as nests to create new revenue streams to create new kinds of capabilities so that’s why the hey city so.

City is for obviously for maybe most people know is the voice-activated assistant as part of Apple I could put I could have put Alexa hello Alexa but really this is going to talk about for a for a a telecom audio stream how do you embed voice recognition seamlessly as part.

So let me jump in we’ll talk first about media engine and the media server product then we’ll go into a little bit of why this is important why the this area of a key word detection and how that’s affecting the industry how though that that that whole area plays out we will talk about it and finally we will finish with deployment options so we see three.

Major trends for media processing and IP networks expanded domain of contextual IP communications that’s video collaboration voice and video or LTE Wi-Fi I mean we see a lot of that and that’s expanding there is a tremendous amount of user-generated content usage generated content whether it comes from video feeds audio feeds whatever is submitted by by users that is expanding and.

We also see the rising open architecture in the cloud economy more focus on open.

Architectures that’s why we’ll talk about disaggregation and open architectures basically this is hard to do.

If you don’t have an open architecture it can support these kinds of capabilities closed architectures find it hard but this one we believe is though is the winning way to to for these types of applications for for these trends what do we see as.

The as the result we want to see cost-efficient multi service enablement we want to optimize media quality experience and.

Want our service providers want to ensure high performance in a cloud environment these are all kind of the basics just to set the stage a quick view of our of our.
Media engine itself there are multiple layers here media engine is software.

That we sell we sell and software is Hardware forms but ultimately the splash platform agnostic software can run bare metal DSPs GPUs private clouds public clouds there’s a core functionality with play record DMT.

F’s ASR conferencing and so forth and on top you can there are software packages can do HD video audio transcoding and finally you can have a range of real-time communications everything from WebRTC unified communications transcoding along with content optimization and we’ll talk a little bit about the edge why this is important at the edge but fundament be the media enables.

Differentiated real-time communication and that is key for a lot of service providers going forward that differentiation how do they differentiate and yet and that is the key to bringing in new revenues simplifies multi technology multi-vendor in the interworking if you have a common MRF if you have a common media server that can work with different application servers then you can start to coalesce you can have a common in Turkey among them and how.

Can you do this cost-effectively how can you move this to the cloud and an open architecture a little bit about.

The service delivery platform the the that we offer the media agent platform we can do everything from support for a volte ims traditional multimedia messaging we’ll talk about the speech enabled applications audio video collaboration analytics value-added services WebRTC all of this is supported by our media engine and because it is open because it is lot of it is open source on.
Standards-based lots of capex reduction and tremendous TCO by the.

Way all this is elastically scalable both in a private cloud as well as a public cloud so you can run on Amazon you can run on your private cloud for these types of applications where do we fit into the in the big picture the the red.

Is the media server running as via Neffs we can talk to we can run on top of DSPs with DSP.

Acceleration GPU acceleration we work closely.

Talk about two SDKs actually Intel SDKs one is called Intel media SDK for encoding decoding and so forth and another is.

Open vino which is an AI library that runs on top of x86 you do not need GPU acceleration if you have jigglets is great but you.

Still run this on x86 our product has been integrated with multiple VN FM’s and EMS.

We are fully integrated with onap with with the nokia nokia cloud band ZTE memory and so forth we run on top of OpenStack you see the complete set here VMware ec2 Amazon Web Services and of course depending on the number of sessions depending of how much you need to accelerate this processing you could do this with accelerator hardware you could run this with DSPs you could run this with extra GPUs all of.
That is possible with this with this architecture so what is media.

Activation ultimately provide it.

Provides it can provide integrated what we believe is integrated speech recognition in call keyword detection what does that mean as I’m talking to you as we’re talking it can be by the way it can be either a point-to-point a stream or a point-to-multipoint or a conferencing stream you can detect any word based on a library and we’ll talk about it and today it is based on us English but it can be customized for any language it.

Has built-in acoustic models so you do not have this learning to train there is no training needed for all this and these capabilities you can offer like I said multiple languages but even more importantly it is agnostic to the age.

Of the speaker it is agnostic to the to the gender it’s it’s agnostic to all of those things and you can embed this functionality.

Into the stream so that’s the the incall keyword detection reduces costs and we’ll talk about how.

It improves user experience you can do high-volume video and voice analytics.

To improve business promises and so forth if you’re a call center you can take this and put this at the edge and why would you want this at the edge one of the key reasons we believe is if you had a large amount of video streams coming in and you wanted to perform some processing at the edge so that you actually did not burden the core that’s an important.

Application that we can perform and last is on the fly media adaptation transcoding and trans creating of user-generated content that enhances Keowee but will focus on a.

Couple of these the edge and the voice recognition let’s talk about keyword detection.

Market today this market is growing incredibly rapidly we see the explosion of applications tied around Alexa the Amazon product around the Google Cloud around every major vendor it has this so part of our question is how do we get telcos how do you get service providers in on the action because this is a market that is growing incredibly fast and natural language processing will be the the primary method of access to services in the future what did we do today.

Using CL eyes well I’m sorry you’re using text base this that the other ultimately ease moving towards Weiss based and.

How do we get telcos in on the action that’s part of what we will talk about the basic the solutions are not in line these are all done or OTT today and there’s a reliance on external digital assistance whether it’s Alexa Google and speech servers so they can be both expensive and the problem is.

They are ultimately there are privacy issues so this issues with cost performance flexibility user experience all of these are issues in the current solutions today how do we do this we can our media engine sits inline in the in the in the audio or video path and we can perform audio decoding keyword detection and we can there are quality enhancements we can perform and once that is that action is taken by the media engine the keyword detected keyword.

There are signals sent out as the npf digits to the application server so this line here is once in Texas half but before you do that you have to enable keyword detection and.

You have to specify the keywords and the keywords you specify you’re not talking about audio files here we are talking about plain text you can put in if you want you can even put in hey Siri as the keyword or we have.

Done this with a keyword hey an idea so if.

With hey an idea hello an idea hi an idea is the keyword that the system will detect and allow us to trigger a set of actions based on that.

On the trigger so we can take the audio stream at that point and feed it.

To an external server we can perform an announcement we can perform a range of actions once you detect the keyword as part of the as part of the stream by the way this can be applied to audio and video and we’ll talk about the video case where you want to perform some form of inline let’s say analytics and trigger certain actions but that is the the main path this is Rd decoding keyword detection and and and.

The the quality piece external is application server and the ASR server SR is automatic speech recognition server it’s optional.

Want to do something externally also key benefits much lower cost than traditional approaches the accuracy of speech recognition we can improve the quality of the.