Remove curly quotes and double angle bracket punctuation too.
For some reason this has to be done in a different regex substitution
or Python handles it incorrectly.
Also add Chengdu to the list of Chinese locations
Remove the contents of fullwidth parenthesis for the same reason as
parenthesis.
Special case 杭州德澜科技有限公司(HangZhou Delan Technology Co.,Ltd)
where the item in the parenthesis is the English name.
Add Guangdong to the list of city names to remove at the start.
These words from Jörg Mayer are as true today as they were
21 years ago in 49a2f32336,
"I still have yet to see a case when a MAC address starting
with 0:0:0 actually means Xerox", but there are lots of cases
where all zero OUIs, MAC-48s, EUI-64s, etc are used to mean
null.
[skip ci]
* None of the OUI tables are supposed to be written to, constify them.
* Use proper types in the bsearch parameters to avoid confusion.
* Move masking outside the bsearch function as tiny optimization.
* Document the MA-L/M/S macros.
To reduce startup external file parsing replce the manuf file with
static arrays compiled into the binary.
Add 3 tables for MA-L, MA-M and MA-S. Add a fourth table to direct
a 24-bit MAC prefix (OUI) to one of these tables.
Adapt the make-manuf.py script to generate the static C data
instead of the text file.
The arrays are sorted and a binary search is performed to map
an OUI (24bit/28bit/36bit) to a short and long name.
Add - and + to punctuation exclusion list.
Do not remove the first word as a general term. When an exclusion
term is used as the first word usually it is noa only legalese and
should not be rejected. The exception is "The".
Skip some locations in company names that are just repeated low-value
information. Many different Chinese companies will short to the same
name (Shenzen for example).
This is a heuristic and not 100% reliable but in the vast majority of
cases it cuts down on noise and generates more informative names.
The truncation size of 8 is too short to convey enough information
in many cases. Some experimentation suggests it can be safely
increased for better readability without any other ill effects.
Make a conservative size increase 12. Arguaby it could be larger.
The cavebear OUI list is hopelessly outdated (last updated 1999?)
and our template file mostly contains obsolete or poorly formatted
entries, compared to the official IEEE registry. We should rely on
the official registry, which is the best and most up-to-date source,
despite some minor inconsistencies and glitches.
Remove the template file and use the IEEE registry exclusively.
PEP 394[1] says,
"In cases where the script is expected to be executed outside virtual
environments, developers will need to be aware of the following
discrepancies across platforms and installation methods:
* Older Linux distributions will provide a python command that refers
to Python 2, and will likely not provide a python2 command.
* Some newer Linux distributions will provide a python command that
refers to Python 3.
* Some Linux distributions will not provide a python command at all by
default, but will provide a python3 command by default."
Debian has forced the issue by choosing the third option[2]:
"NOTE: Debian testing (bullseye) has removed the "python" package and
the '/usr/bin/python' symlink due to the deprecation of Python 2."
Switch our shebang from "#!/usr/bin/env python" to "#!/usr/bin/env
python3" in some places. Remove some 2/3 version checks if we know we're
running under Python 3. Remove the "coding: utf-8" in a bunch of places
since that's the default in Python 3.
[1]https://www.python.org/dev/peps/pep-0394/#for-python-script-publishers
[2]https://wiki.debian.org/Python
Fix some issues discovered by common python linters including:
* switch `None` comparisons to use `is` rather than `==`. Identity !=
equality, and I've spent 40+ hours before tracking down a subtle bug
caused by exactly this issue. Note that this may introduce a problem if
one of the scripts is depending on this behavior, in which case the
comparison should be changed to `True`/`False` rather than `None`.
* Use `except Exception:` as bare `except:` statements have been
discouraged for years. Ideally for some of these we'd examine if there
were specific exceptions that should be caught, but for now I simply
caught all. Again, this could introduce very subtle behavioral changes
under Python 2, but IIUC, that was all fixed in Python 3, so safe to
move to `except Exception:`.
* Use more idiomatic `if not x in y`--> `if x not in y`
* Use more idiomatic 2 blank lines. I only did this at the beginning,
until I realized how overwhelming this was going to be to apply, then I
stopped.
* Add a TODO where an undefined function name is called, so will fail
whenever that code is run.
* Add more idiomatic spacing around `:`. This is also only partially
cleaned up, as I gave up when I saw how `asn2wrs.py` was clearly
infatuated with the construct.
* Various other small cleanups, removed some trailing whitespace and
improper indentation that wasn't a multiple of 4, etc.
There is still _much_ to do, but I haven't been heavily involved with
this project before, so thought this was a sufficient amount to put up
and see what the feedback is.
Linters that I have enabled which highlighted some of these issues
include:
* `pylint`
* `flake8`
* `pycodestyle`
Add "of" to the list of general terms to remove when shortening.
Change-Id: Idbfea2d502a89d668ba2f170bf3450cfcbb91fe5
Reviewed-on: https://code.wireshark.org/review/35627
Reviewed-by: Gerald Combs <gerald@wireshark.org>
Handle cases where we might shorten a name (e.g. "ZAO") down to
nothing.
Change-Id: I5ecb9592d2ecd8225d0ed459ef16885214af5da4
Reviewed-on: https://code.wireshark.org/review/35584
Reviewed-by: Gerald Combs <gerald@wireshark.org>
Move our business types and general terms to a list and add more. Only
convert all upper case names to title case. Remove double quotes when
shortening names.
Change-Id: I31e9799986542270350b8c2436929f293de4e36c
Reviewed-on: https://code.wireshark.org/review/35577
Reviewed-by: Gerald Combs <gerald@wireshark.org>
Remove Python 2 support from tools/make-manuf.py and tools/make-usb.py.
Don't double-escape UTF-8 sequences in make-usb.py so that we generate
{ 0x045e000e, "SideWinder\xc2\xae Freestyle Pro" },
instead of
{ 0x045e000e, "SideWinder\\xc2\\xae Freestyle Pro" },
Change-Id: I918f854ccba868a122fd7b138c1654b2c7615f94
Reviewed-on: https://code.wireshark.org/review/32839
Reviewed-by: Gerald Combs <gerald@wireshark.org>
Add comments containing the resonse headers for the URLs we fetch.
standards-oui.ieee.org currently returns inconsistent results depending
on which host you happen to resolve.
Change-Id: I4adba7e51628d0350ba8e091523807ec85009700
Reviewed-on: https://code.wireshark.org/review/29729
Reviewed-by: Gerald Combs <gerald@wireshark.org>
If the PyICU module is available, use it to truncate manufacturer
names by grapheme clusters.
Change-Id: Ib7dcbb126809df496a534f44a47871a1b28dc539
Reviewed-on: https://code.wireshark.org/review/29660
Reviewed-by: Gerald Combs <gerald@wireshark.org>
Remove some entries from manuf.tmpl that are either redundant or less
informative that their corresponding IEEE entries. Add a missing '"' to
make-manuf.py.
Change-Id: Ia69f4529c5fa1b39f1662b94d072c65bd2d969ea
Reviewed-on: https://code.wireshark.org/review/29568
Reviewed-by: Gerald Combs <gerald@wireshark.org>
The download links offered by the IEEE at
https://standards.ieee.org/products-services/regauth/ are CSV files.
Updating the Perl version to support CSV would have required rewriting a
significant portion of the script along with either adding a dependency
on Text::CSV or writing our own CSV parser.
Migrate it to Python, which has a built-in CSV module.
Change-Id: I39ba0ec873145f44374ab9f751e8bde51535ca4d
Reviewed-on: https://code.wireshark.org/review/29442
Reviewed-by: Gerald Combs <gerald@wireshark.org>
Petri-Dish: Gerald Combs <gerald@wireshark.org>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>