We have sporadic osmo-bts-trx shutdowns with "No clock from osmo-trx" error
messages. Around it are L1C logs on level 'notice', so let's log those as well.
Related: OS#2325
Change-Id: Ic306d2dd8670550d84e9c446098bccaba147e13b
method raise_exn was removed in
1a7a3f0e43, but due to merge process it
remaine dbeing used in some places.
Change-Id: I89f4abe3d69aa4e7bc1fd6c6fd7e9fdea2cd8d19
ofono's NetworkRegistration.Scan() method fails sporadically. On failure,
check if we are now registered to the desired network, and schedule another
scan otherwise.
For instance it fails with org.ofono.Error.Failed if the modem starts to
register internally after we started Scan() and the registering succeeds
while we are still waiting for Scan() to finsih.
Change-Id: I4a2265ee39a94daa00f525b1c7037a6775509425
With the recent fix of the junit report related issues, another issue arose:
the 'with log.Origin' was changed to disallow __enter__ing an object twice to
fix problems, now still code would fail because it tries to do 'with' on the
same object twice. The only reason is to ensure that logging is associated with
a given object. Instead of complicating even more, implement differently.
Refactor logging to simplify use: drop the 'with Origin' style completely, and
instead use the python stack to determine which objects are created by which,
and which object to associate a log statement with.
The new way: we rely on the convention that each class instance has a local
'self' referencing the object instance. If we need to find an origin as a new
object's parent, or to associate a log message with, we traverse each stack
frame, fetching the first local 'self' object that is a log.Origin class
instance.
How to use:
Simply call log.log() anywhere, and it finds an Origin object to log for, from
the stack. Alternatively call self.log() for any Origin() object to skip the
lookup.
Create classes as child class of log.Origin and make sure to call
super().__init__(category, name). This constructor will magically find a parent
Origin on the stack.
When an exception happens, we first escalate the exception up through call
scopes to where ever it is handled by log.log_exn(). This then finds an Origin
object in the traceback's stack frames, no need to nest in 'with' scopes.
Hence the 'with log.Origin' now "happens implicitly", we can write pure natural
python code, no more hassles with scope ordering.
Furthermore, any frame can place additional logging information in a frame by
calling log.ctx(). This is automatically inserted in the ancestry associated
with a log statement / exception.
Change-Id: I5f9b53150f2bb6fa9d63ce27f0806f0ca6a45e90
A new mcc_mnc parameter is now optionally passed to connect() in order
to manually register to a specific network with a given MCC+MNC pair.
If no parameter is passed (or None), then the modem will be instructed
to attempt an automatic registration with any available network which
permits it.
We get the MCC+MNC parameter from the MSC/NITB and we pass it to the
modem object at connect time as shown in the modified tests. Two new
simple tests to check network registration is working are added in this
commit.
Ofono modems seem to be automatically registering at some point after
they are set Online=true, and we were actually using that 'feature'
before this patch. Thus, it is possible that a modem quickly becomes
registered, and we then check so before starting the scan+registration
process, which can take a few seconds.
The scanning method can take a few seconds to complete. To avoid
blocking in the dbus ofono Scan() method, this commit adds some code to
make use of glib/gdbus async methods, which are not yet supported
directly by pydbus. This way, we can continue polling while waiting for
the scan process to complete and we can register several modems in
parallel. When scan completes, a callback is run which attempts to
register. If no MCC+MNC was passed, as we just finished scanning the
modem should have enough fresh operator information to take good and
quick decisions on where to connect. If we have an MCC+MNC, then we check
the operator list received by Scan() method. If operator with desired
MCC+MNC is there, we register with it. If it's not there, we start
scanning() again asynchronously hoping the operator will show up in next
scan.
As scanning() and registration is done in the background, tests are
expected to call connect(), and then later on wait for the modem to
register by waiting/polling the method "modem.is_connected()". Tests
first check for the modem being connected and after with MSC
subscriber_attached(). The order is intentional because the later has to
poll through network and adds unneeded garbage to the pcap files bein
recorded.
Change-Id: I8d9eb47eac1044550d3885adb55105c304b0c15c
Even if aborted due to signal, write a JUnit report XML, and make sure to
indicate the runs as erratic.
Change-Id: I7a334ef3463896c543c0fe592d3903c15e67d4c4
A bit of refactoring to fix logging and error reporting, and simplify the code.
This transmogrifies some of the things committed in
0ffb414406 "Add JUnit XML reports; refactor test
reporting", which did not fully match the code structuring ideas used in
osmo-gsm-tester. Also solve some problems present from the start of the code
base.
Though this is a bit of a code bomb, it would take a lot of time to separate
this into smaller bits: these changes are closely related and resulted
incrementally from testing error handling and logging details. I hope it's ok.
Things changed / problems fixed:
Allow only a single trial to be run per cmdline invocation: unbloat trial and
suite invocation in osmo-gsm-tester.py.
There is a SuiteDefinition, intended to be immutable, and a mutable SuiteRun.
SuiteDefinition had a list of tests, which was modified by the SuiteRun to
record test results. Instead, have only the test basenames in the
SuiteDefinition and create a new set of Test() instances for each SuiteRun, to
ensure that no state leaks between separate suite runs.
State leaking across runs can be seen in
http://jenkins.osmocom.org/jenkins/view/osmo-gsm-tester/job/osmo-gsm-tester_run/453/
where an earlier sms test for sysmo succeeds, but its state gets overwritten by
the later sms test for trx that fails. The end result is that both tests
failed, although the first run was successful.
Fix a problem with Origin: log.Origin allowed to be __enter__ed more than once,
skipping the second entry. The problem there is that we'd still __exit__ twice
or more, popping the Origin off the stack even though it should still remain.
We could count __enter__ recurrences, but instead, completely disallow entering
a second time.
A code path should have one 'with' statement per object, at pivotal points like
run_suites or run_tests. Individual utility functions should not do 'with' on a
central object. The structure needed is, in pseudo code:
try:
with trial:
try:
with suite_run:
try:
with test:
test_actions()
The 'with' needs to be inside the 'try', so that the exception can be handled
in __exit__ before it reaches the exception logging.
To clarify this, like test exceptions caught in Test.run(), also move suite
exception handling from Trial into SuiteRun.run_tests(). There are 'with self'
in Test.run() and SuiteRun.run_tests(), which are well placed, because these
are pivotal points in the main code path.
Log output: clearly separate logging of distinct suites and test scripts, by
adding more large_separator() calls at the start of each test. Place these
separator calls in more logical places. Add separator size and spacing args.
Log output: print tracebacks only once, for the test script where they happen.
Have less state that duplicates other state: drop SuiteRun.test_failed_ctr and
suite.test_skipped_ctr, instead add SuiteRun.count_test_results().
For test failure reporting, store the traceback text in a separate member var.
In the text report, apply above changes and unclutter to achieve a brief and
easy to read result overview: print less filler characters, drop the starting
times, drop the tracebacks. This can be found in the individual test logs.
Because the tracebacks are no longer in the text report, the suite_test.py can
just print the reports and expect that output instead of asserting individual
contents.
In the text report, print duration in precision of .1 seconds.
Add origin information and a traceback text to the junit XML result to give
more context when browsing the result XML. For 'AssertionError', add the source
line of where the assertion hit.
Drop the explicit Failure exception. We don't need one specific exception to
mark a failure, instead any arbitrary exception is treated as a failure. Use
the exception's class name as fail_type.
Though my original idea was to use raising exceptions as the only way to cause
a test failure, I'm keeping the set_fail() function as an alternative way,
because it allows test specific cleanup and may come in handy later. To have
both ways integrate seamlessly, shift some result setting into 'finally'
clauses and make sure higher levels (suite, trial) count the contained items'
stati.
Minor tweak: write the 'pass' and 'skip' reports in lower case so that the
'FAIL' stands out.
Minor tweak: pass the return code that the program exit should return further
outward, so that the exit(1) call does not cause a SystemExit exception to be
logged.
The aims of this patch are:
- Logs are readable so that it is clear which logging belongs to which test and
suite.
- The logging origins are correct (vs. parents gone missing as previously)
- A single test error does not cause following tests or suites to be skipped.
- An exception "above" Exception, i.e. SystemExit and the like, *does*
immediately abort all tests and suites, and the results for tests that were
not run are reported as "unknown" (rather than skipped on purpose):
- Raising a SystemExit aborts all.
- Hitting ctrl-c aborts all.
- The resulting summary in the log is brief and readable.
Change-Id: Ibf0846d457cab26f54c25e6906a8bb304724e2d8
Figure out how many resources were reserved, how many of those match the
requirements, and how many are used, and log one of three matching error
messages for that situation.
For that purpose, allow find()ing reserved resources without logging anything,
using a log_label=None arg.
Change-Id: I1c67600ba69351859e46b8b2f368ee8106db0993
In db59bcf9fc we added a configured GSUP server
address for the osmo-hlr, but the osmo-msc is still trying to connect to
127.0.0.1.
In the same way as for mgcpgw, add conf_for_msc() to OsmoHLR, and use that to
configure the HLR's address in osmo-msc.cfg.
Related: OS#2320
Change-Id: I005aa160c679fc92b248abd762888959bd5b2c55
For all those API functions that directly use reserved_resources.get(), add a
'specifics' argument to be able to pick specific resources. For example, this
allows to pick a suite.bts(specifics={'type': 'osmo-bts-sysmo'})
I needed this to test error reporting for over-using resources, but will most
probably make sense in the future.
Change-Id: If6f175f4bb53dec5306fb3c6479202a7bf1c7116
When trying to reserve more resources than available in the resources.conf,
actually print an intelligible error message: catch the nameless error from
solve() and fill in what was requested for solution.
Change-Id: Iba3707f1aaeb40a58c616c33af52a60c9a2e7e1f
Also add various comments to illustrate what is going on during origin
resolution.
In the regression tests' expectations, some duplicate entries in the origins
are removed, and hence no list of deeper origin ancestry is printed anymore.
Change-Id: I42c3b8635b54c31c27699140e200c1f75a6ada29
Add -c cmdline option to do the same as / in addition to the
OSMO_GSM_TESTER_CONF var, because setting the var is cumbersome in daily
development.
Change-Id: I4c3b482f31f638047ab3f3d785d294b28d244b80
The "Affero" nature makes sense for the Osmocom network components like
BSC, SGSN, etc. as they are typically operated to provide a network
service.
For testing, this doesn't make so much sense as it is difficult to
imagine people creating a business out of offering to run test cases on
an end-to-end Osmocom GSM network. So let's drop the 'Affero' here.
All code is so far developed by sysmocom staff, so as Managing Director
of sysmocom I can effect such a license change unilaterally.
Change-Id: I8959c2d605854ffdc21cb29c0fe0e715685c4c05
Otherwise 0.0.0.0 was being used and we want all interfaces for a
specific osmo-hlr instance to use the same IP
Requires osmo-hlr change id I79f7a300480f308b21116dd14d1698be38725afd
otherwise osmo-hlr won't be able to parse the configuration file.
Change-Id: I4e0063abc8de3d739ebd81942b692cc2e75792f1
Have all complexity in one common shell script, greatly simplify the individual
scripts.
This allows to provide a specific branch or git hash to build instead of
current master. Some scripts allowed to provide branch names before, this now
also allows using git hashes directly.
Environment variables can be used to override the git hash/branch to use for
specific repositories.
Motivation for this patch: we need this to investigate failure causes more
easily.
Change-Id: I5ac2f90d006a1b2f6c246976346d852a70c89089
Otherwise 127.0.0.1 was being used and we want all interfaces for a
specific osmo-mgcpgw instance to use the same IP
Change-Id: I60dbfbb66458cd333fe07139ee175c94fa1672a7
Otherwise 127.0.0.1 was being used and we want all interfaces for a
specific osmo-bsc instance to use the same IP
Change-Id: I38dccac6707bf55f0abcf96e3a9d7d8ec765a156
Preparation for following commits to add smpp support, as we will have a
class SmppClient with a method accepting an Sms object to send it.
Change-Id: I1f28e14e963abb64df687b69d54975be2aeb0d0d
Implement the Modem.log_info() function, use that instead of logging all modem
properties.
Tweak mo_mt_sms.py print() statements.
Pass modem object to SMS generation to inlcude the modem name as SMS token.
Change-Id: I2b17fce0b3b05594fd9038b54e5b65f5127bd0a4
I got a backtrace in which the modem was lacking feature 'net'. That
happens for 2 reasons:
1- net feature is not shown unless the modem is Online (at least for
sierra modems)
2- Even after it has been set online, a lapse of time can pass before
the feature gets shown.
This was added in 896f08f6ab
"fix: refresh dbus object when interfaces have changed"
with the expectation that the 'Features' list would be available in all
modem states. Since it depends on being powered and online, the same
functionality is already provided by checking the Interfaces list, hence
this code can be dropped entirely.
Change-Id: Iedd62235d1a3a8b917ad4ac0b61b9c5dbf0fe43c
Completely discard prefix/share/doc in builds. There was still ~1.2Mb from
libosmo-netif around.
Exception: osmo-hlr installs a bootstrap sql in prefix/share/doc/osmo-hlr,
so leave that script as-is.
Change-Id: I7f3f3cfed0f56099bdff93b11a0009c1caef67c8
Allows using 'with some_origin() as foo:' constructs.
Not used actively, but is sometimes useful during debugging sessions.
Change-Id: I7a6463ee39761775305dd2272c24f248552db4ad
This seems to be the default address used to communicate via SSH with the
sysmoBTS. Whichever process ends up getting this address sees all of the
SSH in its pcap (for the AoIP build it tends to be OsmoHLR).
We could filter properly, but actually also just take this address out of
the pool for allocation to server processes.
Change-Id: I07e74ba0b9a5b08a308aae7646c4b7c70fe4aa0e
After a suite was done, the modem object would linger. If two suites were run
consecutively, the first suite's modem objects would still log incoming SMS.
Add an object cleanup mechanism in the SuiteRun class. Start by adding a
cleanup() to the Modem object and subscribing created modems there.
Move the modem_obj() function into SuiteRun, there is no use of it being
separate, and it makes for better logging.
Change-Id: I0048d33e661d683a263c98128cd5c38b8d897dab
This solves the KeyError problems when we attempt to use new Interfaces that
have come up. The solution is to get a fresh pydbus object when interfaces have
been added.
Another key solution is to not completely discard and unregister all signals
every time. This is racy and may cause signals getting lost. If an interface
was not removed, it is not harmful to have it subscribed using an older pydbus
object. These older objects may linger until the specific signal subscriptions
are disconnected. It is important to fetch a new dbus object for subscribing to
signals on interfaces that have just been added.
Put signal subscription and property watching in a separate class
ModemDbusInteraction. This class may also be used without signals or a modem
config, in anticipation of the IMSI discovery patch that's coming up.
Related: OS#2233
Change-Id: Ia36b881c25976d7e69dbb587317dd139169ce3d9
Add missing code to free resources, not upon program exit, but when a suite is
done.
This allows running more than one suite in a row.
Also add a check to not attempt to free if there is nothing to be freed, to
avoid a regression test failure triggered when a suite exits without reserving
anything.
Change-Id: Ic017a1cf07052f5e48812c8553fba6f972d280f0
Related: OS#2301
If not a single resource of a wanted item was left, we ran into a None. Report
unavailability instead.
Change-Id: Ie1849a74cb227964e7c3ac06852582baa2333697
Before this, the network opened up by osmo-bts-trx would be invisible through
the attenuation of the osmo-gsm-tester hardware, because tx-attenuation would
apparently default to 50, meaning maximum attenuation.
Change-Id: I1c026b5691033127eef766d82566c39cc070e14a
The idea is to see the full origin list for log level ERR, while the rest
of the logging can be kept less verbose.
Change-Id: I0277782652548fa321f767da79b207d70678fad1