I want to allow embedding of HTML but avoid DoS due to deeply nested HTML documents that crash some browsers. I'd like to be able to accommodate 99.9% of documents, but reject those that nest too deeply.
Two closely related question:
- What document depth limits are built into browsers? E.g. browser X fails to parse or does not build documents with depth > some limit.
 - Are document depth statistics for documents available on the web? Is there a site with web statistics that explains that some percentage of real documents on the web have document depths less than some value.
 
Document depth is defined as 1 + the maximum number of parent traversals needed to reach the document root from any node in a document. For example, in
<html>                   <!-- 1 -->
  <body>                 <!-- 2 -->
    <div>                <!-- 3 -->
      <table>            <!-- 4 -->
        <tbody>          <!-- 5 -->
          <tr>           <!-- 6 -->
            <td>         <!-- 7 -->
              Foo        <!-- 8 -->
the maximum depth is 8 since the text node "Foo" has 8 ancestors. Ancestor here is interpreted non-strictly, i.e. ever node is its own ancestor and its own descendent.
Opera has some table nesting stats, which suggest that 99.99% of documents have a table nesting depth of less than 22, but that data does not contain whole document depth.
EDIT:
If people would like to criticize the HTML sanitization library instead of answering this question, please do. http://code.google.com/p/owasp-java-html-sanitizer/wiki/AttackReviewGroundRules explains how to find the code, where to find a testbed that lets you try out attacks, and how to report issues.
EDIT:
I asked Adam Barth, and he very kindly pointed me to webkit code that handles this.
Webkit, at least, enforces this limit. When a treebuilder is created it receives a tree limit that is configurable:
m_treeBuilder(HTMLTreeBuilder::create(this, document, reportErrors, usePreHTML5ParserQuirks(document), maximumDOMTreeDepth**(document)))
and it is tested by the block-nesting-cap test.