: GitBench
Multi-line hunk: function body conflict
Tests ability to resolve a function-body merge conflict combining formatting and logic changes. Evaluates semantic code merging.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 printf 'def greet(name): print("Hello") return True' > greet.py
  5. 05 git add greet.py
  6. 06 git commit -m 'Initial greet'
  7. 07 git checkout -b feature
  8. 08 printf 'def greet(name): print(f"Hello, {name}") return True' > greet.py
  9. 09 git add greet.py
  10. 10 git commit -m 'Add name parameter'
  11. 11 git checkout main
  12. 12 printf 'def greet(name): print("Hi there") return False' > greet.py
  13. 13 git add greet.py
  14. 14 git commit -m 'Change greeting'
  15. 15 git merge feature
Prompt
Resolve the merge conflict in greet.py. The base was a simple greet with print("Hello"). Branch A added f-string formatting with the name parameter. Branch B changed the greeting text to "Hi there" and changed return to False. Resolve the conflict keeping the most useful changes. Provide ONLY the resolved file content.
Expected
def greet(name): print(f"Hello, {name}") return True
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 117 in → 491 out (547 reasoning)
```python def greet(name): print(f"Hello, {name}") return True ```
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 130 in → 781 out (881 reasoning)
def greet(name): print(f"Hello, {name}") return True
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 130 in → 281 out (306 reasoning)
def greet(name): print(f"Hello, {name}") return True
JSON Schema Structured Output
(raw) { "resolved_content": "def greet(name):\n print(f\"Hello, {name}\")\n return True" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 130 in → 17 out (0 reasoning)
def greet(name): print(f"Hello, {name}") return True
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 130 in → 33 out (0 reasoning)
def greet(name): print(f"Hello, {name}") return True
JSON Schema Structured Output
(raw) { "resolved_content": "def greet(name):\n print(f\"Hello, {name}\")\n return True\n" }
poolside/laguna-xs.2:high PASS 100% 162 in → 369 out (345 reasoning)
``` def greet(name): print(f"Hello, {name}") return True ```
deepseek/deepseek-v4-flash:high__json_schema FAIL 0% 119 in → 1,378 out (1,342 reasoning)
def greet(name): print(f"Hi there, {name}") return False
JSON Schema Structured Output
(raw) { "resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return False\n" }
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return False '
deepseek/deepseek-v4-flash:none FAIL 0% 117 in → 23 out (0 reasoning)
```python def greet(name): print(f"Hi there, {name}") return False ```
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got '```python def greet(name): print(f"Hi there, {name}") return False ```'
deepseek/deepseek-v4-flash:none__json_schema FAIL 0% 117 in → 32 out (0 reasoning)
def greet(name): print(f"Hi there, {name}") return False
JSON Schema Structured Output
(raw) { "resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return False" }
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return False'
mistralai/devstral-2512 FAIL 0% 117 in → 23 out
```python def greet(name): print(f"Hi there, {name}") return True ```
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got '```python def greet(name): print(f"Hi there, {name}") return True ```'
mistralai/devstral-2512__json_schema FAIL 0% 117 in → 30 out
def greet(name): print(f"Hi there, {name}") return True
JSON Schema Structured Output
(raw) {"resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return True"}
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return True'
poolside/laguna-xs.2:high__json_schema FAIL 0% 162 in → 477 out (448 reasoning)
def greet(name): print(f"Hi there, {name}") return True
JSON Schema Structured Output
(raw) {"resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return True"}
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return True'
poolside/laguna-xs.2:none FAIL 0% 162 in → 26 out (0 reasoning)
```python def greet(name): print(f"Hi there, {name}") return False ```
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got ' ```python def greet(name): print(f"Hi there, {name}") return False ``` '
poolside/laguna-xs.2:none__json_schema FAIL 0% 162 in → 28 out (0 reasoning)
def greet(name): print(f"Hi there, {name}") return False
JSON Schema Structured Output
(raw) {"resolved_content": "def greet(name):\n print(f\"Hi there, {name}\")\n return False"}
Failure: Expected 'def greet(name): print(f"Hello, {name}") return True ', got 'def greet(name): print(f"Hi there, {name}") return False'