The primary problem is that you initialize DS to 0. When the BIOS passes control to your init routine it will do a FAR CALL to the option ROM location +3. The FAR Call will set CS to the segment the option ROM was loaded at and set IP (instruction pointer) to 3. 3 is the offset just past the signature and size bytes.
By setting DS to zero you'll be accessing your string relative to the 0x0000 segment. You want to use the segment at which the option ROM is loaded. To do that you initialize DS to the value of the CS register. Instead of:
xor ax, ax ; make it zero
mov ds, ax
You do this:
mov ax, cs              ; CS contains segment we are running in
mov ds, ax              ;    so copy it to DS
You should also set the direction flag for string instructions to forward using CLD. You can't guarantee the BIOS set it that way before calling our option ROM code.
As I've never written an option ROM, and I couldn't find any specific documentation on calling convention I was uncertain if you needed to preserve all the registers you change. I looked at the option ROMs in my own PC using the ree program under Linux. What I noticed is that they use pusha and popa to save and restore all the general purpose registers and they push/pop to save/restore individual segment registers. It is probably good practice to do this in your own option ROM. One requirement that dates back to the old Phoenix BIOS is that after the size byte there needs to be a NEAR JMP to the entry point of the initialization code.
When finished an option ROM initialization routine you return back to the BIOS using retf (FAR return) so that the BIOS can continue scanning for other option ROMS and complete the bootup sequence.
I fixed up your code a bit since there were some glitches in the print routine. This code should work:
use16                       ; ISA module operates in the 16-bit segment.
DB      55h, 0AAh           ; Boot signature
DB      01h                 ; Block size in sectors (200h each)
jmp start                   ; NearJMP part of Phoenix BIOS specification
start:
    pushf                   ; Save the flags as we modify direction bit
    pusha                   ; Save all general purpose registers
    push ds                 ; we modify DS so save it
    cld                     ; Ensure forward string direction
    mov ax, cs              ; CS contains segment we are running in
    mov ds, ax              ;    so copy it to DS
    mov si, text_string     ; Put string position into SI
    call print_string       ; Call our string-printing routine
    pop ds                  ; Restore all registers and flags we saved
    popa
    popf
    retf                    ; Far return to exit option init routine
print_string:               ; Routine: output string in SI to screen
    mov ah, 0eh             ; BIOS tty Print
    xor bx, bx              ; Set display page to 0 (BL)
    jmp .getch
.repeat:
    int 10h                 ; print character
.getch:
    lodsb                   ; Get character from string
    test al,al              ; Have we reached end of string?
    jnz .repeat             ;     if not process next character
.end:
    ret
; String ends with 0dh (Carriage return) and 0ah (linefeed) to
; advance cursor to the beginning of next line
text_string db 'Hello World!', 0dh, 0ah, 0
times 512-($-$$) db 0
Note: The code uses pusha/popa instructions that are only available on 80186+ processors. If you are targeting 8086/8088 then you'll need to individually push and pop each register you modify.
It is also possible to not use DS register segment and override LODSB with a CS override. It could be modified to be cs lodsb. By doing that you don't need to save and restore DS because DS would remain unmodified. You also wouldn't have the need to copy CS to DS. You can similarly drop the need to save and restore the flags and set the direction bit with CLD if you replace cs lodsb with:
    mov al, [cs:si]
    inc si 
The simplified code could look like:
use16                       ; ISA module operates in the 16-bit segment.
DB      55h, 0AAh           ; Boot signature
DB      01h                 ; Block size in sectors (200h each)
jmp start                   ; NearJMP part of Phoenix BIOS specification
start:
    pusha                   ; Save all generel purpose registers
    mov si, text_string     ; Put string position into SI
    call print_string       ; Call our string-printing routine
    popa                    ; Restore all general purpose registers
    retf                    ; Far return to exit option init routine
print_string:               ; Routine: output string in SI to screen
    mov ah, 0eh             ; BIOS tty Print
    xor bx, bx              ; Set display page to 0 (BL)
    jmp .getch
.repeat:
    int 10h                 ; print character
.getch:
    mov al, [cs:si]
    inc si
    test al,al              ; Have we reached end of string?
    jnz .repeat             ;     if not process next character
.end:
    ret
; String ends with 0x0d (Carriage return) and 0x0a (linefeed) to
; advance cursor to the beginning of next line
text_string db 'Hello World!', 0dh, 0ah, 0
times 512-($-$$) db 0
My Bochs configuration file that I used to test on Debian Linux is:
# configuration file generated by Bochs
plugin_ctrl: unmapped=1, biosdev=1, speaker=1, extfpuirq=1, parallel=1, serial=1, iodebug=1
config_interface: textconfig
display_library: x
memory: host=32, guest=32
romimage: file="/usr/local/share/bochs/BIOS-bochs-latest", address=0x0, options=none
vgaromimage: file="/usr/local/share/bochs/VGABIOS-lgpl-latest"
# no floppyb
ata0: enabled=1, ioaddr1=0x1f0, ioaddr2=0x3f0, irq=14
ata0-master: type=none
ata0-slave: type=none
ata1: enabled=1, ioaddr1=0x170, ioaddr2=0x370, irq=15
ata1-master: type=none
ata1-slave: type=none
ata2: enabled=0
ata3: enabled=0
optromimage1: file="optrom.bin", address=0xd0000
pci: enabled=1, chipset=i440fx
vga: extension=vbe, update_freq=5, realtime=1
cpu: count=1:1:1, ips=4000000, quantum=16, model=bx_generic, reset_on_triple_fault=1, cpuid_limit_winnt=0, ignore_bad_msrs=1, mwait_is_nop=0
cpuid: level=6, stepping=3, model=3, family=6, vendor_string="GenuineIntel", brand_string="              Intel(R) Pentium(R) 4 CPU        "
cpuid: mmx=1, apic=xapic, simd=sse2, sse4a=0, misaligned_sse=0, sep=1, movbe=0, adx=0
cpuid: aes=0, sha=0, xsave=0, xsaveopt=0, x86_64=1, 1g_pages=0, pcid=0, fsgsbase=0
cpuid: smep=0, smap=0, mwait=1
print_timestamps: enabled=0
debugger_log: -
magic_break: enabled=0
port_e9_hack: enabled=0
private_colormap: enabled=0
clock: sync=none, time0=local, rtc_sync=0
# no cmosimage
# no loader
log: -
logprefix: %t%e%d
debug: action=ignore
info: action=report
error: action=report
panic: action=ask
keyboard: type=mf, serial_delay=250, paste_delay=100000, user_shortcut=none
mouse: type=ps2, enabled=0, toggle=ctrl+mbutton
speaker: enabled=1, mode=system
parport1: enabled=1, file=none
parport2: enabled=0
com1: enabled=1, mode=null
com2: enabled=0
com3: enabled=0
com4: enabled=0
This configuration assumes the optional ROM is in a file optrom.bin and will be loaded at memory address 0xd0000
Option ROMs have to have a checksum computed and placed in the last byte of the image file. QEMU provides a script that can be used for that purpose. To update the checksum of an image you can do:
python signrom.py inputimagefile outputimagefile