: GitBench
Multi-line hunk: function body conflict
Tests ability to resolve a function-body conflict combining formatting changes from both branches. Evaluates semantic merging of code logic.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 printf 'def greet(name): print("Hello") return True' > greet.py
  5. 05 git add greet.py
  6. 06 git commit -m 'Initial greet'
  7. 07 git checkout -b feature
  8. 08 printf 'def greet(name): print(f"Hello, {name}") return True' > greet.py
  9. 09 git add greet.py
  10. 10 git commit -m 'Add name parameter'
  11. 11 git checkout main
  12. 12 printf 'def greet(name): print("Hi there") return False' > greet.py
  13. 13 git add greet.py
  14. 14 git commit -m 'Change greeting'
  15. 15 git cherry-pick $(git rev-parse feature)
Prompt
Resolve the cherry-pick conflict in greet.py. The base was a simple greet with print("Hello"). The incoming commit added f-string formatting with the name parameter. The current branch changed the greeting text to "Hi there" and changed return to False. Resolve the conflict keeping the most useful changes. Provide ONLY the resolved file content.
Expected
def greet(name): print(f"Hello, {name}") return True
Loading campaign evidence…
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 142 in → 325 out (358 reasoning)
def greet(name): print(f"Hello, {name}") return True
deepseek/deepseek-v4-flash:high FAIL 0% 129 in → 606 out (583 reasoning)
```python def greet(name): print(f"Hi there, {name}") return False ```
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got '```python def greet(name): print(f"Hi there, {name}") return False ```'
deepseek/deepseek-v4-flash:high__json_schema FAIL 0% 130 in → 647 out (613 reasoning)
def greet(name): print(f"Hi there, {name}") return False
JSON Schema Structured Output
(raw) { "resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return False\n" }
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return False '
deepseek/deepseek-v4-flash:none FAIL 0% 128 in → 23 out (0 reasoning)
```python def greet(name): print(f"Hi there, {name}") return False ```
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got '```python def greet(name): print(f"Hi there, {name}") return False ```'
deepseek/deepseek-v4-flash:none__json_schema FAIL 0% 128 in → 30 out (0 reasoning)
def greet(name): print(f"Hi there, {name}") return False
JSON Schema Structured Output
(raw) {"resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return False"}
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return False'
mistralai/devstral-2512 FAIL 0% 131 in → 49 out
Here's the resolved file content keeping the most useful changes (f-string formatting with name parameter and the "Hi there" greeting): ```python def greet(name): print(f"Hi there, {name}") return True ```
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'Here's the resolved file content keeping the most useful changes (f-string formatting with name parameter and the "Hi there" greeting): ```python def greet(name): print(f"Hi there, {name}") return True ```'
mistralai/devstral-2512__json_schema FAIL 0% 132 in → 30 out
def greet(name): print(f"Hi there, {name}") return True
JSON Schema Structured Output
(raw) {"resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return True"}
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return True'
nvidia/nemotron-3-nano-30b-a3b:high__json_schema FAIL 0% 144 in → 492 out (525 reasoning)
def greet(name): print(f"Hi there, {name}") return False
JSON Schema Structured Output
(raw) { "resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return False" }
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return False'
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 143 in → 18 out (0 reasoning)
def greet(name): print(f"Hi there, {name}") return False
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return False'
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 0% 143 in → 30 out (0 reasoning)
def greet(name): print("Hi there") return False
JSON Schema Structured Output
(raw) { "resolved_content": "def greet(name):\n print(\"Hi there\")\n return False" }
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print("Hi there") return False'
poolside/laguna-xs.2:high FAIL 0% 177 in → 241 out (214 reasoning)
```python def greet(name): print(f"Hi there, {name}") return True ```
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got ' ```python def greet(name): print(f"Hi there, {name}") return True ``` '
poolside/laguna-xs.2:high__json_schema FAIL 0% 176 in → 488 out (458 reasoning)
def greet(name): print(f"Hi there, {name}") return True
JSON Schema Structured Output
(raw) {"resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return True"}
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return True'
poolside/laguna-xs.2:none FAIL 0% 177 in → 26 out (0 reasoning)
```python def greet(name): print(f"Hi there, {name}") return False ```
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got ' ```python def greet(name): print(f"Hi there, {name}") return False ``` '
poolside/laguna-xs.2:none__json_schema FAIL 0% 176 in → 28 out (0 reasoning)
def greet(name): print(f"Hi there, {name}") return False
JSON Schema Structured Output
(raw) {"resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return False"}
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return False'