Parsing specific part of class

Question

I would like to parse from a html doc classes, but only if there are specific words in the class included. So for example in

<div class="article-xyz"> or <div class="abcd-xyzefg">

this Python code

from bs4 import BeautifulSoup

with open('simple2.html') as html_file:
    soup = BeautifulSoup(html_file, 'lxml')

article_all = soup.find_all('div', class_='xyz')

should extract me some results if I search for 'xyz', but it doesnt.

This is my test html:

<!doctype html>
<html class="no-js" lang="">
    <head>
        <title>Test - A Sample Website</title>
        <meta charset="utf-8">
        <link rel="stylesheet" href="css/normalize.css">
        <link rel="stylesheet" href="css/main.css">
    </head>
    <body>
        <h1 id='site_title'>Test Website</h1>
        <hr></hr>
        <div class="article">
            <h2><a href="article_1.html">Article 1 Headline</a></h2>
            <p>This is a summary of article 1</p>
        </div>
        <hr></hr>
        <div class="article">
            <h2><a href="article_2.html">Article 2 Headline</a></h2>
            <p>This is a summary of article 2</p>
        </div>
        <hr></hr>
        <div class="article-xyz">
            <h2><a href="article_2.html">Article 2 test headline dings</a></h2>
            <p> article 2 test thing</p>
        </div>
        <div class='footer'>
            <p>Footer Information</p>
        </div>
        <div class="other-xyz-stuff">
            <h2><a href="article_2.html">other-xyz-stuff test headline </a></h2>
            <p>other-xyz-stuff test </p>
        </div>
        <script src="js/vendor/modernizr-3.5.0.min.js"></script>
        <script src="js/plugins.js"></script>
        <script src="js/main.js"></script>
    </body>
</html>

I am using Python 3.7 with BS4 so far.

Can anyone help me?

Thank you and Greetings

Does this answer your question? [How to find elements by class](https://stackoverflow.com/questions/5041008/how-to-find-elements-by-class) — metatoaster, Jun 04 '20 at 12:40

score 1 · Answer 1 · answered Jun 04 '20 at 12:40

1

Use lambda like

article_all = soup.find_all('div', class_=lambda x: x and 'xyz' in x)

answered Jun 04 '20 at 12:40

Abhishek J

2,386
2
21
22

Parsing specific part of class

1 Answers1