Speed up the url checks by using multiple http requests in parallel.
Python's requests module supports only synchronous requests, use aiohttp
and asyncio instead.
Make link's validate method a coroutine that can be controlled by asyncio.
Run the checks in groups of 120 requests. This way, each group can use
cached results from previous groups. It would be possible to pass all
requests to aiohttp immediately. aiohttp would then run them based on its
internal connection limit (this defaults to 100). However, caching would
not work then. All requests would see an empty cache, this is checked
before a request is submitted to aiohttp.
The field name for the status code in aiohttp's response is "status". For
the requests module, it was called "status_code". Rename this everywhere.
Update the signal handler for the async operation. If we're interrupted
while we extract the urls from the source files, we can exit immediately.
If we're interrupted during the async link checking, we cancel all
requests from the current group and make sure that we're not starting
another group. When an async operation is cancelled, each coroutine and
the asyncio.gather() call receive an asyncio.CancelledError exception. We
have to handle these exceptions.
On my system, a complete run takes about 7 minutes now. We could speed
things up further by starting all tasks in parallel and using a different
approach for caching.
Please note that this change hasn't been tested on Windows. (Maybe the
sigint handling isn't portable anyway...)
The first parameter for filter must be a function that takes an element
and returns True or False. We can use is_dissector_file directly, there's
no need to wrap it into an anonymous function.
Fix some issues discovered by common python linters including:
* switch `None` comparisons to use `is` rather than `==`. Identity !=
equality, and I've spent 40+ hours before tracking down a subtle bug
caused by exactly this issue. Note that this may introduce a problem if
one of the scripts is depending on this behavior, in which case the
comparison should be changed to `True`/`False` rather than `None`.
* Use `except Exception:` as bare `except:` statements have been
discouraged for years. Ideally for some of these we'd examine if there
were specific exceptions that should be caught, but for now I simply
caught all. Again, this could introduce very subtle behavioral changes
under Python 2, but IIUC, that was all fixed in Python 3, so safe to
move to `except Exception:`.
* Use more idiomatic `if not x in y`--> `if x not in y`
* Use more idiomatic 2 blank lines. I only did this at the beginning,
until I realized how overwhelming this was going to be to apply, then I
stopped.
* Add a TODO where an undefined function name is called, so will fail
whenever that code is run.
* Add more idiomatic spacing around `:`. This is also only partially
cleaned up, as I gave up when I saw how `asn2wrs.py` was clearly
infatuated with the construct.
* Various other small cleanups, removed some trailing whitespace and
improper indentation that wasn't a multiple of 4, etc.
There is still _much_ to do, but I haven't been heavily involved with
this project before, so thought this was a sufficient amount to put up
and see what the feedback is.
Linters that I have enabled which highlighted some of these issues
include:
* `pylint`
* `flake8`
* `pycodestyle`
The intention is to try to run this on the Petri-dish buildbot,
where it could run with '--commits 1' to warn about files touched
in the most recent commit.
Change-Id: Ie924d39e093d1fef8cfbdf02d15bbede386b2862
Reviewed-on: https://code.wireshark.org/review/37826
Petri-Dish: Martin Mathieson <martin.r.mathieson@googlemail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Prefer:
- html (rather than txt)
- https
Also includes the script check_dissector_urls.py,
that can be used to find links in code and test them.
Change-Id: Iafd8bb8948674a38ad5232bf5b5432ffb2b1251b
Reviewed-on: https://code.wireshark.org/review/36821
Petri-Dish: Martin Mathieson <martin.r.mathieson@googlemail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Peter Wu <peter@lekensteyn.nl>