Who Will Replace Transformer?

Who Will Replace Transformer?

The common challenge faced by non-Transformer architectures is still to prove how high their ceiling can be. Author: Zhang Jin Editor: Chen Caixian The paper “Attention Is All You Need” published by Google in 2017 has become a bible for artificial intelligence today, and the global AI boom can be directly traced back to the … Read more

Illustration Of Transformer Architecture

Illustration Of Transformer Architecture

1. Overview The overall architecture of the Transformer has been introduced in the first section: Data must go through the following before entering the encoder and decoder: Embedding Layer Positional Encoding Layer The encoder stack consists of several encoders. Each encoder contains: Multi-Head Attention Layer Feed Forward Layer The decoder stack consists of several decoders. … Read more

Higher Order Transformers Enhance Stock Movement Prediction

Higher Order Transformers Enhance Stock Movement Prediction

Source: Time Series Research This article is approximately 2800 words long and is recommended for a 5-minute read. This article proposes a higher-order Transformer architecture specifically designed to handle multimodal stock data for predicting stock movements. For investors and traders, predicting stock movements in the financial market is crucial as it enables them to make … Read more

The GPT Legacy: Tracing the Transformer Family Tree

The GPT Legacy: Tracing the Transformer Family Tree

The Analyst Network of Machine Heart Author:Wang Zijia Editor: H4O This article introduces the Transformer family. Recently, the arms race of large language models has dominated much of the discussions among friends, with many articles exploring what these models can do and their commercial value. However, as a small researcher immersed in the field of … Read more

How to Enable AI to Speak Chinese: A Step-by-Step Guide

How to Enable AI to Speak Chinese: A Step-by-Step Guide

Last week, the “Huxiu Research” column under Huxiu updated its episode titled “Is Chinese Bound to Fall Behind in the AI Wave?” After the episode aired, we received discussions and doubts from various parties. The questions mainly fell into two categories: One category included many AI practitioners pointing out that our understanding of the principles … Read more

Understanding Google’s Powerful NLP Model BERT

Understanding Google's Powerful NLP Model BERT

▲ Click on the top Leiphone to follow Written by | AI Technology Review Report from Leiphone (leiphone-sz) Leiphone AI Technology Review notes: This article is an interpretation provided by Pan Shengfeng from Zhuiyi Technology based on Google’s paper for AI Technology Review. Recently, Google researchers achieved state-of-the-art results on 11 NLP tasks with the … Read more

NVIDIA’s 50-Minute BERT Training: Beyond Just GPUs

NVIDIA's 50-Minute BERT Training: Beyond Just GPUs

Selected from arXiv Author:Mohammad Shoeybi et al. Translated by Machine Heart Contributors:Mo Wang Previously, Machine Heart introduced a study by NVIDIA that broke three records in the NLP field: reducing BERT’s training time to 53 minutes; reducing BERT’s inference time to 2.2 milliseconds; and increasing the parameter count of GPT-2 to 8 billion (previously, GPT-2 … Read more

Qwen2.5 Technical Report

Qwen2.5 Technical Report

In December 2024, the paper “Qwen2.5 Technical Report” from Tongyi Qianwen was released. This report introduces Qwen2.5, a series of comprehensive large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has made significant improvements in both pre-training and post-training phases. In terms of pre-training, the high-quality pre-training dataset has … Read more

Guide to Deploying Llama3 Locally with Ollama

Guide to Deploying Llama3 Locally with Ollama

As we all know, Zuckerberg’s Meta has open-sourced Llama3 with two versions: the 8B and 70B pretrained and instruction-tuned models. There is also a larger 400B parameter version expected to be released this summer, which may be the first open-source model at the GPT-4 level! Let’s start with a preliminary understanding of Llama3. Model Architecture … Read more

NLP and Transformer Converge in Computer Vision: DETR as a New Paradigm for Object Detection

NLP and Transformer Converge in Computer Vision: DETR as a New Paradigm for Object Detection

Original by Machine Heart Author: Chen Ping Since the introduction of the Transformer, it has swept through the entire NLP field. In fact, it can also be used for object detection. Researchers at Facebook AI first launched the visual version of the Transformer—Detection Transformer (DETR), filling the gap of using Transformer for object detection, surpassing … Read more