Generative AI, a rapidly growing technology, is currently utilized by OpenAI’s ChatGPT and Google Bard for chat purposes, as well as by image generation systems like Stable Diffusion and DALL-E. However, these tools require the use of cloud-based data centers with multiple GPUs to carry out the necessary computing processes for each query. MediaTek, a Taiwanese semiconductor company, believes that the future of running generative AI tasks directly on mobile devices, connected cars, and smart speakers like Amazon Echo, Google Home, or Apple HomePod is closer than we realize. They have announced their collaboration with Meta to port Lllama 2 LLM, in combination with MediaTek’s latest-generation APUs and NeuroPilot software development platform, to enable generative AI tasks on devices without relying on external processing.
While this development won’t completely eliminate the need for data centers, it will significantly reduce their size. For example, Llama 2’s “small” dataset contains 7 billion parameters, or around 13GB, which is suitable for basic generative AI functions. However, larger versions with 72 billion parameters require much more storage, making it impractical for smartphones. In the future, LLMs will be 10 to 100 times the size of Llama 2 or GPT-4, with storage requirements in the hundreds of gigabytes and higher. To accommodate this, specially designed cache appliances with fast flash storage and terabytes of RAM can be used. MediaTek envisions hosting a device optimized for serving mobile devices in a single rack unit, providing impressive capabilities without the need for heavy compute.
MediaTek expects that smartphones powered by their next-generation flagship SoC, set to be released by the end of the year, will be able to support Llama 2-based AI applications. To access these datasets on devices, mobile carriers would need to rely on low-latency edge networks, which are small data centers or equipment closets with fast connections to 5G towers. This would eliminate the need for LLMs running on smartphones to go through multiple network hops before accessing the parameter data. Additionally, domain-specific LLMs can be moved closer to the application workload by running in a hybrid fashion with caching appliances within the miniature data center, creating a “constrained device edge” scenario.
There are several benefits to using on-device generative AI. Firstly, it reduces latency as the data is processed directly on the device, especially when localized cache methodologies are employed for frequently accessed parts of the parameter dataset. Secondly, it improves data privacy since only the model data is transmitted through the data center, not the user’s data. Thirdly, it enhances bandwidth efficiency as a large portion of the data processing occurs on the device itself. Fourthly, it increases operational resiliency as the system can continue functioning even if the network is disrupted, provided that the device has a sufficiently large parameter cache. Lastly, it is more energy-efficient as it requires fewer compute-intensive resources at the data center and less energy to transmit data from the device.
However, achieving these benefits may require splitting workloads and implementing load-balancing techniques to alleviate the computational costs and network overhead of centralized data centers. In addition, there are concerns about the power of LLMs that can be run on current hardware and the security risks associated with sensitive data being compromised on local devices if not properly managed. Updating the model data and maintaining data consistency across a large number of distributed edge caching devices also pose challenges.
Furthermore, there is the question of cost. Who will bear the expenses of these mini edge data centers? Currently, edge networking is primarily employed by Edge Service Providers like Equinix, catering to services such as Netflix and Apple’s iTunes, rather than mobile network operators like AT&T, T-Mobile, or Verizon. Generative AI service providers like OpenAI/Microsoft, Google, and Meta would need to establish similar arrangements. While there are numerous considerations with on-device generative AI, it is evident that tech companies are actively exploring this field. Within the next five years, it is possible that our on-device intelligent assistants will be capable of independent thought. AI in our pockets is coming sooner than expected.