The key is silicon space versus performance.
One regular single-threaded core can be doing many things. It can spend a lot of cycles performing operations on data in its caches. It might do a large matrix multiplication where almost every hit needs access to memory. (and others things which I will skip to keep this to the point). On average, it will do a bit of both.
Two independent computers could do the same thing twice as fast.
One computer with two cores and only way to access memory will perform differently. If it runs two processes doing calculation on items within its caches it would be twice as fast. If both real cores perform operations trying to access memory all the times then both will need to wait up to half its cycles before until it is their turn for access and the setup will be slower than two independent single core computers. Roughly it will be 80-100% faster though. At the cost of 100% silicon increase.
Treading takes this to a new level. They do not duplicate a whole core. They duplicate only the most used part.
In the ideal case that part is duplicated and two operation can run simultaneous, doubling speed. In the worst case the duplicated part will always need to wait for access.
In practice this averages out to about 30% speed gain for less than 30% extra silicon needed. That is more efficient then duplicating a complete core.
Or in a very rough table
1 core speed 100% Silicon 100%
2 cores speed 180-200%Silicon 200%
1 core with HTspeed 130% Silicon 120%
-->
1 core 100% cost. 100% speed Efficiency 100/100
2 cores 200% cost. 180-200% speed Efficiency 190/200
1 core with HT 120% cost. 130% speed Efficiency 130/120
HT with one additonal tread, on typical X86 cores with their instructions et typically wins on cost (amounth of silicon space used transalates to cost) vs performance.