From my understanding, if the subnet mask was 255.255.255.0, it would mean we have 254 host as there is 2 reserved IPs for broadcast (all 0s) and for one for network host (all ones).
Correct, but at this point it's more important that you have 256 addresses, not that you have 254 hosts.
The all-ones and all-zeros addresses only gain their special meaning at the very end of the process, when the network is actually configured on an interface (and even though they're not available for hosts they're still included as part of the network). But as the original network won't be used directly on an interface, the "reserved" discount does not apply to it yet.
So while subnetting, you're not dividing the 254 hosts in half – you're dividing the whole 256-address network into two 128-address halves, and only then counting each half as "126 hosts + 2 reserved".
(Also, you got the reserved addresses the wrong way round: All-ones is the modern broadcast address. All-zeros used to be the broadcast address in 1980s, now it's called "network address" and is just reserved for legacy reasons.)
I am not sure why 255.255.255.128 would give 2 networks each with 126 host.
The way a subnet mask works, each '1' bit defines the network part of the address, while each '0' bit defines the host part. For example, your original netmask (255.255.255.0) allows for 256 addresses because it has 8 'host' bits, and those bits can represent 28 distinct values.
So if you convert both netmasks to binary, you should see that they differ by one bit (therefore splitting the original network into 21 parts), and you should see that the smaller network has 7 "host" bits (therefore giving you 27 addresses in total).
Instead of netmasks, these days network sizes are more commonly written in "prefix length" notation, counting the number of '1' bits (e.g. "/24" for 255.255.255.0).