majuna/app/terraform/block_external.py

from typing import List, Dict

from bs4 import BeautifulSoup
import requests

from app import app
from app.terraform.block_mirror import BlockMirrorAutomation


class BlockExternalAutomation(BlockMirrorAutomation):
    """
    Automation task to import proxy reachability results from external source.
    """
    short_name = "block_external"
    description = "Import proxy reachability results from external source"

    _content: bytes

    def _fetch(self) -> None:
        user_agent = {'User-agent': 'BypassCensorship/1.0'}
        page = requests.get(app.config['EXTERNAL_CHECK_URL'], headers=user_agent)
        self._content = page.content

    def _parse(self) -> None:
        soup = BeautifulSoup(self._content, 'html.parser')
        h2 = soup.find_all('h2')  # pylint: disable=invalid-name
        div = soup.find_all('div', class_="overflow-auto mb-5")
        i = 0
        for idx, heading in enumerate(h2):
            if not div[idx].div and heading.text in app.config['EXTERNAL_VANTAGE_POINTS']:
                anchors = div[idx].find_all('a')
                for anchor in anchors:
                    self.patterns.append("https://" + anchor.text)
            i += 1
block: try to unify the mirror block modules 2022-06-18 12:36:54 +01:00			`from typing import List, Dict`
automation: herd blocking automations into framework see #1 2022-05-09 08:09:57 +01:00
Initial import 2022-03-10 14:26:22 +00:00			`from bs4 import BeautifulSoup`
			`import requests`

			`from app import app`
block: try to unify the mirror block modules 2022-06-18 12:36:54 +01:00			`from app.terraform.block_mirror import BlockMirrorAutomation`
automation: herd blocking automations into framework see #1 2022-05-09 08:09:57 +01:00

block: try to unify the mirror block modules 2022-06-18 12:36:54 +01:00			`class BlockExternalAutomation(BlockMirrorAutomation):`
lint: tidying up code in block tasks 2022-06-17 12:42:42 +01:00			`"""`
			`Automation task to import proxy reachability results from external source.`
			`"""`
automation: herd blocking automations into framework see #1 2022-05-09 08:09:57 +01:00			`short_name = "block_external"`
			`description = "Import proxy reachability results from external source"`

block: try to unify the mirror block modules 2022-06-18 12:36:54 +01:00			`_content: bytes`
lint: tidying up code in block tasks 2022-06-17 12:42:42 +01:00
			`def _fetch(self) -> None:`
automation: herd blocking automations into framework see #1 2022-05-09 08:09:57 +01:00			`user_agent = {'User-agent': 'BypassCensorship/1.0'}`
			`page = requests.get(app.config['EXTERNAL_CHECK_URL'], headers=user_agent)`
block: try to unify the mirror block modules 2022-06-18 12:36:54 +01:00			`self._content = page.content`
automation: herd blocking automations into framework see #1 2022-05-09 08:09:57 +01:00
lint: tidying up code in block tasks 2022-06-17 12:42:42 +01:00			`def _parse(self) -> None:`
block: try to unify the mirror block modules 2022-06-18 12:36:54 +01:00			`soup = BeautifulSoup(self._content, 'html.parser')`
lint: tidying up code in block tasks 2022-06-17 12:42:42 +01:00			`h2 = soup.find_all('h2') # pylint: disable=invalid-name`
			`div = soup.find_all('div', class_="overflow-auto mb-5")`
automation: herd blocking automations into framework see #1 2022-05-09 08:09:57 +01:00			`i = 0`
block: try to unify the mirror block modules 2022-06-18 12:36:54 +01:00			`for idx, heading in enumerate(h2):`
			`if not div[idx].div and heading.text in app.config['EXTERNAL_VANTAGE_POINTS']:`
			`anchors = div[idx].find_all('a')`
			`for anchor in anchors:`
			`self.patterns.append("https://" + anchor.text)`
automation: herd blocking automations into framework see #1 2022-05-09 08:09:57 +01:00			`i += 1`