How to, FAQ

What you need to know to use this library

You know how to use scrapy/parsel library
You know css selectors*
You know xpath selectors*
Optional You know regex

* for avoid manually write selectors - see easy selectors page

How this library works?

At a high level performs the following steps:

Prepare class steps

collect all types for Fields objects
collect all attributes names for Fields objects
collect all Fields objects

parse steps

accept markup argument, convert to parsel.Selector object if markup is str or bytes
passes the markup to the Field object
all methods are executed from left to right
If there are no errors and auto_type is enabled, it converts the received value based on the annotation

Sequence diagram

bs4 allowed?

Yes you can! But you will lose all the features and profits of this library!

Also, you will complicate your code and debugging if you don't use fields objects.

# Note: bs4 is not recommended: you will lose performance and core library functions!
from typing import Any

from bs4 import BeautifulSoup
from scrape_schema import BaseSchema, sc_param


class Schema(BaseSchema):
    def __init__(self, markup: Any):
        super().__init__(markup)
        self.soup = BeautifulSoup(markup, "lxml")

    @sc_param
    def h1(self):
        return self.soup.find("h1").text

    @sc_param
    def urls(self):
        return [li.find("a").get("href") for li in self.soup.find_all("li")]

    # I'm too lazy to write more :D


if __name__ == '__main__':
    text = """
            <html>
                <body>
                    <h1>Hello, Parsel!</h1>
                    <ul>
                        <li><a href="http://example.com">Link 1</a></li>
                        <li><a href="http://scrapy.org">Link 2</a></li>
                    </ul>
                    <script type="application/json">{"a": ["b", "c"]}</script>
                </body>
            </html>"""
    print(Schema(text).dict())
    # {'h1': 'Hello, Parsel!', 'urls': ['http://example.com', 'http://scrapy.org']}