Commit Graph

12 Commits

Author SHA1 Message Date
Martin Mathieson f4f008770b More testing of check_dissector.py 2023-05-29 22:07:20 +00:00
Martin Kaiser 57a2313e71 check_dissector_urls: use the cache for all links 2023-04-23 10:33:01 +00:00
Martin Kaiser 0066ef6978 check_dissector_urls: use concurrent http requests
Speed up the url checks by using multiple http requests in parallel.

Python's requests module supports only synchronous requests, use aiohttp
and asyncio instead.

Make link's validate method a coroutine that can be controlled by asyncio.
Run the checks in groups of 120 requests. This way, each group can use
cached results from previous groups. It would be possible to pass all
requests to aiohttp immediately. aiohttp would then run them based on its
internal connection limit (this defaults to 100). However, caching would
not work then. All requests would see an empty cache, this is checked
before a request is submitted to aiohttp.

The field name for the status code in aiohttp's response is "status". For
the requests module, it was called "status_code". Rename this everywhere.

Update the signal handler for the async operation. If we're interrupted
while we extract the urls from the source files, we can exit immediately.
If we're interrupted during the async link checking, we cancel all
requests from the current group and make sure that we're not starting
another group. When an async operation is cancelled, each coroutine and
the asyncio.gather() call receive an asyncio.CancelledError exception. We
have to handle these exceptions.

On my system, a complete run takes about 7 minutes now. We could speed
things up further by starting all tasks in parallel and using a different
approach for caching.

Please note that this change hasn't been tested on Windows. (Maybe the
sigint handling isn't portable anyway...)
2023-04-11 18:05:31 +00:00
Martin Mathieson 4614bd1837 check_dissector_url.py: fix --file handling 2023-04-05 09:19:27 +00:00
Martin Kaiser 3e36272dd6 check_dissector_urls.py: enumerate counter is unused 2023-04-03 20:50:19 +00:00
Martin Kaiser fae0f31ff1 check_dissector_urls.py: remove unnecessary wrapper
The first parameter for filter must be a function that takes an element
and returns True or False. We can use is_dissector_file directly, there's
no need to wrap it into an anonymous function.
2023-04-03 20:48:26 +00:00
Martin Mathieson 55d3a9db9e tools/check_*.py: allow multiple --file entries 2022-02-20 23:12:10 +00:00
Martin Mathieson 070cc0a47a Run dissector URLs check - fix up a couple of entries. 2022-01-03 20:01:27 +00:00
Jeff Widman 8d7ebc732e Fix issues discovered by common python linters
Fix some issues discovered by common python linters including:
* switch `None` comparisons to use `is` rather than `==`. Identity !=
equality, and I've spent 40+ hours before tracking down a subtle bug
caused by exactly this issue. Note that this may introduce a problem if
one of the scripts is depending on this behavior, in which case the
comparison should be changed to `True`/`False` rather than `None`.
* Use `except Exception:` as bare `except:` statements have been
discouraged for years. Ideally for some of these we'd examine if there
were specific exceptions that should be caught, but for now I simply
caught all. Again, this could introduce very subtle behavioral changes
under Python 2, but IIUC, that was all fixed in Python 3, so safe to
move to `except Exception:`.
* Use more idiomatic `if not x in y`--> `if x not in y`
* Use more idiomatic 2 blank lines. I only did this at the beginning,
until I realized how overwhelming this was going to be to apply, then I
stopped.
* Add a TODO where an undefined function name is called, so will fail
whenever that code is run.
* Add more idiomatic spacing around `:`. This is also only partially
cleaned up, as I gave up when I saw how `asn2wrs.py` was clearly
infatuated with the construct.
* Various other small cleanups, removed some trailing whitespace and
improper indentation that wasn't a multiple of 4, etc.

There is still _much_ to do, but I haven't been heavily involved with
this project before, so thought this was a sufficient amount to put up
and see what the feedback is.

Linters that I have enabled which highlighted some of these issues
include:
* `pylint`
* `flake8`
* `pycodestyle`
2020-09-26 04:38:18 +00:00
Martin Mathieson 7e4ff6f826 cppcheck.sh and check_dissector_urls.py: Show which files are being examined.
Change-Id: Ib5ecb215050dea6bf2f03014d544dac49e56fe12
Reviewed-on: https://code.wireshark.org/review/37865
Petri-Dish: Martin Mathieson <martin.r.mathieson@googlemail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Martin Mathieson <martin.r.mathieson@googlemail.com>
2020-07-15 11:32:42 +00:00
Martin Mathieson 70119bb905 check_dissector_urls.py: Add options to control which files to scan
The intention is to try to run this on the Petri-dish buildbot,
where it could run with '--commits 1' to warn about files touched
in the most recent commit.

Change-Id: Ie924d39e093d1fef8cfbdf02d15bbede386b2862
Reviewed-on: https://code.wireshark.org/review/37826
Petri-Dish: Martin Mathieson <martin.r.mathieson@googlemail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2020-07-12 05:05:08 +00:00
Martin Mathieson d70a4a9321 Standardise IETF RFC and Draft URLs in dissectors.
Prefer:
- html (rather than txt)
- https

Also includes the script check_dissector_urls.py,
that can be used to find links in code and test them.

Change-Id: Iafd8bb8948674a38ad5232bf5b5432ffb2b1251b
Reviewed-on: https://code.wireshark.org/review/36821
Petri-Dish: Martin Mathieson <martin.r.mathieson@googlemail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Peter Wu <peter@lekensteyn.nl>
2020-04-13 14:58:48 +00:00