OpenAI发布最新推理模型o1——一个会思考的AI

Posted by zluckyH on September 13, 2024

OpenAI o1 > Learning to Reason with LLMs

OpenAI刚刚发布了最新的模型o1,这个模型在回答之前会进行思考,所以非常擅长推理相关的问题,它在多个方面都表现突出:

  • 竞争编程问题(Codeforces)中排名前89%;
  • 在美国数学奥林匹克(AIME)预选赛中跻身美国前500名学生之列;
  • 在物理、生物、化学问题的基准测试(GPQA)中超过了博士水平的准确性。

从这些表现来看,o1可以说是最强LLM,在逻辑思考和数学方面有了巨大的提升。之前发现gpt-4o在代码方面明显逊色于claude sonnet,希望o1能再次实现超越。

GPT-4o vs o1,小学生 vs 大学生

从下面图中的对比来看,在绝大多数推理任务中,o1 的表现明显优于 GPT-4o。

  • 在数学奥赛中,gpt-4o的表现只能算是小学生,而o1已经达到了大学生水平。
  • 而在代码能力上,o1也表现出巨大的提升。
  • 在数学、物理、LSAT(法学院入学考试)、大学数学,准确率都达到了90%以上

Competition evals for Math (AIME 2024), Code (CodeForces), and PhD-Level Science Questions (GPQA Diamond)

Breakdown of the accuracy and raw score of gpt-4o vs. o1 on various competition evals

为何o1效果如此炸裂?

如此炸裂的效果主要得益于思维链这种思考方式。OpenAI的大规模强化学习算法教会模型如何在训练过程中利用其思维链进行高效思考。随着更多的强化学习和更多的思考时间,o1 的性能持续提高。

关于思维链

类似于人类在回答难题之前可能会思考很长时间,o1 在尝试解决问题时会使用一系列思维。通过强化学习,o1 学会磨练其思维链并完善其使用的策略。它学会认识并纠正错误。它学会将棘手的步骤分解为更简单的步骤。当当前方法不起作用时,它会学会尝试另一种方法。这个过程极大地提高了模型的推理能力。

OpenAI提供了几个有意思的例子,这里我们看一下编码的例子。o1在回答之前,经过了很多次的尝试,最终找到了正确的编码方式,而gpt-4o明显不具备自我分析自我纠错的能力,因此给出了错误答案。(如果想要了解o1的思考过程,可以参考这篇介绍原文:learning-to-reason-with-llms

下面是问题以及对应的两者的答案

问题

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

GPT-4o的回答-错误

Let’s break this down step by step based on the example:

1. Example given:
• Input: oyfjdnisdr rtqwainr acxz mynzbhhx
• Output: Think step by step

By examining the words:
• The pattern involves selecting specific letters or transforming them.

2. Now, let’s decode the new phrase:
• Input: oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

Step-by-step decoding process:

Identify possible word divisions.

The first phrase might follow the same structure as the example. It seems like the input text could be broken into groups based on natural separations or patterns.

A possible division could be:

oyekaijzdf

aaptcg

suaokybhai

ouow

aqht

mynznvaatzacdfoulxxz

However, to make a meaningful decoding, I would need more context about the transformations or letter shifting that might be involved.

Could you provide any additional decoding rules or transformations used in this cipher?

o1的思维链回答——正确


To decode the given ciphertext using the example provided, we’ll use the pattern identified in the example.

Ciphertext Example:

```
oyfjdnisdr rtqwainr acxz mynzbhhx
```

Decoded as:

```
Think step by step
```

Decoding Method:

Pair the letters in the ciphertext.

Convert each letter to its numerical position in the alphabet (A=1, B=2, …, Z=26).

Sum the numerical values of each pair.

Compute the average of the sum (divide by 2).

Convert the average back to a letter (1=A, 2=B, …, 26=Z).

Apply this method to the given ciphertext:

```
oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz
```

Break down the ciphertext into pairs:

1. First word: “oyekaijzdf”

Pairs: oy, ek, ai, jz, df

Decoded letters:

oy → (15+25)/2 = 20 → T

ek → (5+11)/2 = 8 → H

ai → (1+9)/2 = 5 → E

jz → (10+26)/2 = 18 → R

df → (4+6)/2 = 5 → E

Decoded word: THERE

2. Second word: “aaptcg”

Pairs: aa, pt, cg

Decoded letters:

aa → (1+1)/2 = 1 → A

pt → (16+20)/2 = 18 → R

cg → (3+7)/2 = 5 → E

Decoded word: ARE

3. Third word: “suaokybhai”

Pairs: su, ao, ky, bh, ai

Decoded letters:

su → (19+21)/2 = 20 → T

ao → (1+15)/2 = 8 → H

ky → (11+25)/2 = 18 → R

bh → (2+8)/2 = 5 → E

ai → (1+9)/2 = 5 → E

Decoded word: THREE

4. Fourth word: “ouow”

Pairs: ou, ow

Decoded letters:

ou → (15+21)/2 = 18 → R

ow → (15+23)/2 = 19 → S

Decoded letters: R’S

5. Fifth word: “aqht”

Pairs: aq, ht

Decoded letters:

aq → (1+17)/2 = 9 → I

ht → (8+20)/2 = 14 → N

Decoded word: IN

6. Sixth word: “mynznvaatzacdfoulxxz”

Pairs: my, nz, nv, aa, tz, ac, df, ou, lx, xz

Decoded letters:

my → (13+25)/2 = 19 → S

nz → (14+26)/2 = 20 → T

nv → (14+22)/2 = 18 → R

aa → (1+1)/2 = 1 → A

tz → (20+26)/2 = 23 → W

ac → (1+3)/2 = 2 → B

df → (4+6)/2 = 5 → E

ou → (15+21)/2 = 18 → R

lx → (12+24)/2 = 18 → R

xz → (24+26)/2 = 25 → Y

Decoded word: STRAWBERRY

Final Decoded Message:

```
THERE ARE THREE R'S IN STRAWBERRY
```

Answer:
THERE ARE THREE R’S IN STRAWBERRY