DuckDuckGo is an odd duck when it comes to inclusion in their results. I've done a fair bit of research on this topic across a number of search engines and have had some email back and forth with DDG.
Here's the deal. They get their content from other search engines, as listed here. To my knowledge their search results don't indicate which search engine was its source, so for your content to be removed you need to basically go upstream to all of their sources and get your content removed from there. If that sounds onerous, don't worry — you'd want to do that anyway, right?
DDG does have its own crawler as well, aptly called the DuckDuckBot. It does not honor the noindex HTML tag, nor the HTTP header (it does honor robots.txt), but that doesn't seem to matter because no new results are created by the DuckDuckBot. To my knowledge, this isn't documented anywhere, but I spoke with their staff, which I quote below:
DDG says (2014-06-06):
We get our results from multiple sources and our own crawler wouldn't be the cause of your [problem]. Our crawler only does very specific tasks, like looking (and not actually crawling) parked domains, spam sites, etc.
If there are results from [your website] appearing on DuckDuckGo and shouldn't be, they're likely flowing from one of our upstream sources. If removed there, then they'll stop showing in our results.
I respond:
OK, so nothing gets put in your index via your crawlers, which indeed do not support noindex HTML or HTTP tags?
They confirm:
Yep! Sorry for the confusion and, if you see anything out of the ordinary, please feel free to let us know.
So then the only remaining question is how do you remove your content from the upstream providers. For that, I point you to my blog since it differs by provider. The crux of it is:
- Use
noindex HTML meta tag and x-robots HTTP tag (for images and such) to tell search engines not to include something in their results;
- List your entire website in your sitemap.xml file so that all search engines can find it there.
- Use
robots.txt to block the search engines that do not support noindex or x-robots tag.
And for bonus points:
- Set your
sitemaps.xml files so they have noindex set up (and thus won't show up in search results).
- Do likewise for your
robots.txt file.
It's a complicated world.