AI21 Labs enhances the capabilities of gen AI transformers through Jamba integration

AI21 Labs, a leading player in the generative AI landscape, is revolutionizing the capabilities of gen AI transformers through their latest integration called Jamba. While transformers have dominated the field since the groundbreaking research paper “Attention is All You Need” in 2017, AI21 Labs is taking a new approach with Jamba to go beyond transformers.

Jamba combines the Mamba model, based on the Structured State Space model (SSM), with a transformer architecture to create an optimized gen AI model. The acronym Jamba stands for Joint Attention and Mamba, representing its goal to bring together the best attributes of SSM and transformers. This innovative model is released as an open-source under the Apache 2.0 license.

Although Jamba is not intended to replace current transformer-based large language models (LLM), it is expected to supplement them in certain areas. AI21 Labs claims that Jamba outperforms traditional transformer-based models on generative reasoning tasks as measured by benchmarks like HellaSwag. However, it currently does not surpass transformer-based models on critical benchmarks such as the Massive Multitask Language Understanding (MMLU) for problem-solving.

AI21 Labs has a strong focus on gen AI for enterprise use cases and has raised $155 million in funding to support its efforts. The company’s enterprise tools include Wordtune, a service that helps enterprises generate content matching their tone and brand. In fact, AI21 Labs has successfully competed against and won enterprise business against gen AI giant OpenAI.

Transformers, while dominant in the gen AI landscape, have some shortcomings, particularly in terms of attention and context. The attention mechanism of transformers slows down as the context window grows, making inference less efficient for long context use cases. Additionally, transformers require large memory resources that scale with context length, making it difficult to run long context windows or parallel batches without extensive hardware resources.

To address these concerns, AI21 Labs introduces the Mamba SSM architecture, which requires less memory and has a different attention mechanism to handle large context windows. However, the Mamba approach falls short in terms of output capabilities compared to transformer models. Jamba, on the other hand, is a hybrid SSM Transformer model that combines the resource and context optimization of the SSM architecture with the strong output capabilities of a transformer.

AI21 Labs’ Jamba model offers a 256K context window and delivers three times the throughput on long contexts compared to Mixtral 8x7B. Moreover, Jamba is the only model in its size class that fits up to 140K context on a single GPU. Jamba utilizes a Mixture of Experts (MoE) model, similar to Mixtral, but with an extreme level of optimization. Its MoE layers allow Jamba to draw on just 12B of its available 52B parameters at inference, making it more efficient than a Transformer-only model of equivalent size.

Although Jamba is still in its early stages and not part of AI21 Labs’ enterprise offering, the company plans to release an instruct version on the AI21 Platform as a beta soon. With its promising capabilities and potential to supplement existing transformer-based models, Jamba represents a significant advancement in the field of generative AI.