: GitBench
Multi-line hunk: config block conflict
Tests ability to resolve a config block conflict with port and host changes. Evaluates infrastructure configuration merging.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 printf 'server: port: 8080 host: localhost' > config.yaml
  5. 05 git add config.yaml
  6. 06 git commit -m 'Initial config'
  7. 07 git checkout -b prod
  8. 08 printf 'server: port: 443 host: localhost' > config.yaml
  9. 09 git add config.yaml
  10. 10 git commit -m 'Use HTTPS port'
  11. 11 git checkout main
  12. 12 printf 'server: port: 8080 host: 0.0.0.0' > config.yaml
  13. 13 git add config.yaml
  14. 14 git commit -m 'Listen on all interfaces'
  15. 15 git cherry-pick $(git rev-parse prod)
Prompt
Resolve the cherry-pick conflict in config.yaml. The base had port: 8080 and host: localhost. The incoming commit changed port to 443 (HTTPS) keeping localhost. The current branch kept port 8080 but changed host to 0.0.0.0. Provide ONLY the resolved file content.
Expected
server: port: 443 host: 0.0.0.0
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 134 in → 338 out (317 reasoning)
server: port: 443 host: 0.0.0.0
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 135 in → 292 out (259 reasoning)
server: port: 443 host: 0.0.0.0
JSON Schema Structured Output
(raw) { "resolved_content": "server:\n port: 443\n host: 0.0.0.0\n" }
deepseek/deepseek-v4-flash:none PASS 100% 137 in → 26 out (0 reasoning)
```yaml server: port: 443 host: 0.0.0.0 ```
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 135 in → 30 out (0 reasoning)
server: port: 443 host: 0.0.0.0
JSON Schema Structured Output
(raw) { "resolved_content": "server:\n port: 443\n host: 0.0.0.0" }
mistralai/devstral-2512 PASS 100% 144 in → 27 out
```yaml server: port: 443 host: 0.0.0.0 ```
mistralai/devstral-2512__json_schema PASS 100% 144 in → 30 out
server: port: 443 host: 0.0.0.0
JSON Schema Structured Output
(raw) {"resolved_content": "server:\n port: 443\n host: 0.0.0.0"}
poolside/laguna-xs.2:high PASS 100% 189 in → 294 out (269 reasoning)
server: port: 443 host: 0.0.0.0
poolside/laguna-xs.2:high__json_schema PASS 100% 186 in → 393 out (363 reasoning)
server: port: 443 host: 0.0.0.0
JSON Schema Structured Output
(raw) {"resolved_content": "server:\n port: 443\n host: 0.0.0.0"}
poolside/laguna-xs.2:none PASS 100% 187 in → 24 out (0 reasoning)
server: port: 443 host: 0.0.0.0
poolside/laguna-xs.2:none__json_schema PASS 100% 188 in → 29 out (0 reasoning)
server: port: 443 host: 0.0.0.0
JSON Schema Structured Output
(raw) {"resolved_content": "server:\n port: 443\n host: 0.0.0.0"}
nvidia/nemotron-3-nano-30b-a3b:high FAIL 0% 157 in → 1,075 out (1,055 reasoning)
server: port: 443 host: localhost
Failure: Expected 'server: port: 443 host: 0.0.0.0 ', got 'server: port: 443 host: localhost'
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 158 in → 27 out (0 reasoning)
--- config.yaml server: port: 8080 host: 0.0.0.0
Failure: Expected 'server: port: 443 host: 0.0.0.0 ', got '--- config.yaml server: port: 8080 host: 0.0.0.0'
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 0% 157 in → 27 out (0 reasoning)
server: port: 443 host: localhost
JSON Schema Structured Output
(raw) { "resolved_content": "server:\n port: 443\n host: localhost\n" }
Failure: Expected 'server: port: 443 host: 0.0.0.0 ', got 'server: port: 443 host: localhost '