Unlocking the Potential of Machine Learning Research: Recent Developments

Recent developments in machine learning research have the potential to create a lasting impact in academic research. From deploying large language models (LLMs) on CPUs with automatic INT4 weight-only quantization, to leveraging explicit morphological knowledge for pre-training of language models, to combining small language models with zero-shot and one-shot learning methods, the potential of machine learning research is vast. In this newsletter, we present recent developments in machine learning research that have the potential to revolutionize the field.

This paper presents an efficient approach to deploying large language models (LLMs) on CPUs, with automatic INT4 weight-only quantization and a special LLM runtime with highly-optimized kernels. This approach has the potential to create a lasting impact in academic research, as it can enable LLMs to be deployed more efficiently and with improved inference efficiency.

Efficient LLM Inference on CPUs (2311.00502v1)

Text Rendering Strategies for Pixel Language Models (2311.00522v1)

This paper presents a new text rendering strategy for Pixel Language Models that can lead to improved performance on sentence-level tasks while maintaining performance on token-level and multilingual tasks. The proposed approach also reduces the number of parameters needed to train the model, making it more compact and efficient. The potential for this technique to create a lasting impact in academic research is clear, as it could lead to improved language modelling capabilities.

Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation (2311.00684v1)

This paper presents two attention alignment strategies to improve the long-context utilization capability of the T5 Transformer language model. The potential for these techniques to create a lasting impact in academic research is high, as they can be used to improve language modeling, retrieval, and multi-document question answering without any fine-tuning.

Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures (2311.00636v1)

This paper presents two flavours of Kronecker-Factored Approximate Curvature (K-FAC) for linear weight-sharing layers in modern neural network architectures, which can reduce computational costs and speed up training. Results show that K-FAC can reduce the number of steps needed to reach a fixed validation metric target by 50-75%, creating a lasting impact in academic research of these techniques.

Style Locality for Controllable Generation with kNN Language Models (2311.00475v1)

This paper presents a novel approach for controllable generation using kNN language models with locality levels. The approach has been evaluated and shown to successfully control style while providing a better fluency-style trade-off than previous work. The potential for this technique to create a lasting impact in academic research is significant.

The Development of LLMs for Embodied Navigation (2311.00530v1)

This article reviews the potential of Large Language Models (LLMs) to augment Embodied Intelligence systems with sophisticated environmental perception and decision-making support for navigation tasks. It provides an overview of existing models, research methodologies, and datasets, and forecasts future directions in the field. The potential for LLMs to create a lasting impact in academic research of embodied navigation is discussed.

Explicit Morphological Knowledge Improves Pre-training of Language Models for Hebrew (2311.00658v1)

This paper presents techniques to incorporate explicit morphological knowledge into pre-training of language models for Hebrew, a morphologically-rich language. The results show improved performance on semantic and morphological tasks, suggesting that leveraging morphological cues can create a lasting impact in academic research.

Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving (2311.00694v1)

This paper presents a novel approach to leveraging LLMs for improved exploration of challenging problem solving tasks. By framing an LLM as a hierarchical policy, it is able to propose multiple diverse high-level problem-solving tactics as hints, and sample multiple reasoning chains to generate a solution group for each leader proposal. This approach has the potential to create a lasting impact in academic research, by providing meaningful and inspiring hints, and improving the accuracy of the final answer.

Crosslingual Retrieval Augmented In-context Learning for Bangla (2311.00587v1)

This paper presents a pioneering approach to improve the performance of LLMs in low-resource languages such as Bangla. By utilizing cross-lingual retrieval augmented in-context learning, the authors demonstrate that MPLMs can be successfully boosted on Bangla tasks. The evaluation results show that the proposed approach can create a lasting impact in academic research by providing steady improvements over the zero-shot performance.

Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task (2311.00686v1)

This paper explores the potential of small language models to be used as evaluation metrics in summarization tasks. Through experiments with various prompting techniques, the authors demonstrate that combining these approaches with zero-shot and one-shot learning methods can create a lasting impact in academic research, yielding competitive results.