Apple Has Teamed Up With NVIDIA To Research On A ‘ReDrafter’ Technique That Speeds Up Text Generation With Large Language Models

Generative AI features under the Apple Intelligence banner have steered clear from leveraging NVIDIA GPUs to handle cloud-based inputs, with the California-based giant sticking with its custom silicon in its servers that will eventually be replaced by the unreleased M4 Ultra to speed up its Large Language Models. However, a recent blog post from the iPhone maker reveals that Apple and its engineers are not shying away from partnering with NVIDIA if it means both entities have a common goal; implementing faster text generation performance with LLMs.

A new ‘Recurrent Drafter’ technique has been published and open-sourced by Apple, and it ‘achieves state of the art performance’

Known as ‘ReDrafter’ for short, the new blog post states that this technique combines two approaches; one is beam search, and the other is tree attention. Both techniques are designed for improving text generation performance and after Apple’s own research, it collaborated with NVIDIA to integrate ReDrafter into TensorRT-LLM, which is a tool that helps Large Language Models run faster on NVIDIA GPUs. Another improvement is that this technology can reduce latency while utilizing less power.

“This research work demonstrated strong results, but its greater impact comes from being applied in production to accelerate LLM inference. To make this advancement production-ready for NVIDIA GPUs, we collaborated with NVIDIA to integrate ReDrafter into the NVIDIA TensorRT-LLM inference acceleration framework.
Although TensorRT-LLM supports numerous open source LLMs and the Medusa speculative decoding method, ReDrafter’s beam search and tree attention algorithms rely on operators that had never been used in previous applications. To enable the integration of ReDrafter, NVIDIA added new operators or exposed existing ones, which considerably improved TensorRT-LLM’s capability to accommodate sophisticated models and decoding methods. ML developers using NVIDIA GPUs can now easily benefit from ReDrafter’s accelerated token generation for their production LLM applications with TensorRT-LLM.
In benchmarking a tens-of-billions parameter production model on NVIDIA GPUs, using the NVIDIA TensorRT-LLM inference acceleration framework with ReDrafter, we have seen 2.7x speed-up in generated tokens per second for greedy decoding. These benchmark results indicate this tech could significantly reduce latency users may experience, while also using fewer GPUs and consuming less power.”

While this collaboration proves that there is a sliver of a chance for Apple and NVIDIA to enter into an agreement, we strongly believe that such a partnership will not materialize due to the past history shared by the technology giants. We should see short-term tag teams like this formed again in the future, but a meaningful business relationship appears to be out the window.

News Source: Apple

Read full on Wccftech

Discover more from Technical Master - Gadgets Reviews, Guides and Gaming News

Subscribe to get the latest posts sent to your email.

A new ‘Recurrent Drafter’ technique has been published and open-sourced by Apple, and it ‘achieves state of the art performance’

Discover more from Technical Master - Gadgets Reviews, Guides and Gaming News

These are the companies using AI-driven dynamic pricing the most – and the top users probably won’t surprise you

AMD Radeon RX 9070 XT Is Allegedly The Top RDNA 4 GPU, Red Team Goes With Radeon 9000 Branding

ARM Wants A Retrial With Qualcomm In The Nuvia Licensing Dispute After The Jury Came To A Stalemate

Apple Reportedly Developing A HomePod With A 7-Inch Display, Which Will Be Equipped With The Current-Generation A18; Rumored Launch Said To Happen In 2025

Synology patches critical vulnerabilities, urges users to update devices against zero-click attacks

NYT Connections today — my hints and answers for Sunday, December 22 (game #560)

The OnePlus 13 is coming on January 7 — along with a surprise

Microsoft reveals Excel, Outlook and PowerPoint upgrades powered by Copilot — showing AI might actually be the key to unlocking your workplace creativity after all

The rumored Fujifilm X-E5 is the camera I’m most excited about in 2025 – here’s why

These are the AI assistants developers are actually using – and how they’re using them

Pixel 10 lineup Will Not Use A Samsung 5G Modem, With Google Reportedly Shifting To A MediaTek Solution As It Likely Aims To Prioritize Efficiency

Apple Has Teamed Up With NVIDIA To Research On A ‘ReDrafter’ Technique That Speeds Up Text Generation With Large Language Models

A new ‘Recurrent Drafter’ technique has been published and open-sourced by Apple, and it ‘achieves state of the art performance’

Discover more from Technical Master - Gadgets Reviews, Guides and Gaming News

Related Posts

Discover more from Technical Master - Gadgets Reviews, Guides and Gaming News