You can use regular expressions to replace the first appearance of </body></html>, if there is one more pair of same tags after that: 
// https://regex101.com/r/nVuN8S/1
$regex = '/(?<replace><\/body>\s*<\/html>)(?=(?:.|\s)*<\/body>\s*<\/html>)/';
$new_html = preg_replace($regex, '', $html);
Here you look for </body> and </html> separated by any number of white space characters (e.g. new line). Then you use a positive lookahead to check if they are followed by any number of symbols, including white space, and by additional </body> and </html> tags after them. 
To read "all the data" (assuming that it means everything between the <body> tags), you may use another regex E.g: 
// https://regex101.com/r/nVuN8S/2
$regex = '/<body>(?<data>(?:.|\s)+)<\/body>'/;
Of course, you may use a couple of different approaches to get the data: simple string manipulation (remove text before <body> and after </body>, and the tags themselves), DOM document functionality, etc.