Vigor Modem to Prometheus
A python script to read the detailed DrayTek router stats.
The following stats are recorded and presented as Open Metrics (Prometheus) stats.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
> vdsl status more
---------------------- ATU-R Info (hw: annex A, f/w: annex A/B/C) -----------
Near End Far End Note
Trellis : 1 1
Bitswap : 1 1
ReTxEnable : 1 1
VirtualNoise : 0 0
20BitSupport : 0 0
LatencyPath : 0 0
LOS : 0 0
LOF : 0 0
LPR : 0 0
LOM : 0 0
SosSuccess : 0 0
NCD : 0 0
LCD : 0 0
FECS : 0 2575 (seconds)
ES : 35 3 (seconds)
SES : 3 1 (seconds)
LOSS : 0 0 (seconds)
UAS : 27 1861 (seconds)
HECError : 0 0
CRC : 126 31
RsCorrection : 0 0
INP : 230 160 (symbols)
InterleaveDelay : 0 62 (1/100 ms)
NFEC : 32 32
RFEC : 16 16
LSYMB : 16 16
INTLVBLOCK : 32 32
AELEM : 0 ----
Environment Variables
The script requires some environment variables to be loaded, i do this in docker using a .env config file:
1
2
3
4
5
6
7
IP=192.168.100.1
USERNAME=admin
PASSWORD=admin
SERVER_PORT=8081
METRICS_PORT=8001
TELNET_CMD=/usr/bin/telnet
SPAWN_TIMEOUT=5
Telnet
The script spawns a telnet connection to the router to gather the stats from the CLI (It could be easily changed to use SSH)
The remote prometheus call to /metrics
, fires off the telnet request, so this will poll as fast as you want it to.
Python Modules
Python Expect (to gather the stats) Template Text Parser (to parse the stats) Prometheus Python Client (to make them available to Prometheus)
Prometheus
Set as a target as normal.
1
2
3
4
5
6
7
8
9
- job_name: router_stats
honor_timestamps: true
scrape_interval: 5m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- 192.168.0.1:8003
Dockerfile and Script
dockerfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
FROM python:3-slim
RUN set -eux; \
useradd -ms /bin/bash puser; \
pip install prometheus_client pexpect ttp; \
apt update; \
apt-get install -y --no-install-recommends \
curl \
telnet \
&& rm -rf /var/lib/apt/lists/* ; \
echo "Done"
# run as non-root user - safety first!
USER puser
WORKDIR /home/puser
# copy code into image
COPY . /home/puser/
# fix for https://github.com/dmulyalin/ttp/issues/54
ENV TTPCACHEFOLDER=/tmp
ENV IP="192.168.200.1"
ENV USERNAME="admin"
ENV PASSWORD="admin"
ENV SERVER_PORT=8081
ENV METRICS_PORT=8001
# where does telnet live?
ENV TELNET_CMD='/usr/bin/telnet'
# timeout to telnet to router and grab stats
ENV SPAWN_TIMEOUT=5
EXPOSE 8081
EXPOSE 8001
# and run the code :)
ENTRYPOINT ["python3"]
CMD ["vigor-to-prom.py"]
vigor-to-prom.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
#!/usr/bin/python3
#
# A python OpenMetrics / Prometheus server to grab data from a DrayTek router
#
# Uses 'expect' to gather data from the telnet port
# https://www.draytek.co.uk/support/guides/kb-dsl-status-more
# https://kitz.co.uk/adsl/linestats_errors.htm
from os import getenv
import http.server
from prometheus_client import start_http_server
from prometheus_client import Summary, Gauge
from prometheus_client.core import CounterMetricFamily, REGISTRY
import pexpect # expect
from ttp import ttp # templating
# https://prometheus.io/docs/concepts/metric_types/
CUSTOM_COUNTERS = {
'ne_LOS': 'Loss Of Signal count',
'ne_LOF': 'Loss Of Frame count',
'ne_LPR': 'Loss Of Power count',
'ne_LOM': 'Loss Of Margin count',
'ne_NCD': 'No Cell Delineation failure count',
'ne_LCD': 'Loss Of Cell Delineation failure count',
'ne_CRC': 'Cyclic Redundancy Check error count, number of CRC 8 anomalies (number of incorrect CRC)',
'ne_HECError': 'Header Error Check Error count, HEC anomalies in the ATM Data Path',
'fe_LOS': 'Loss Of Signal count',
'fe_LOF': 'Loss Of Frame count',
'fe_LPR': 'Loss Of Power count',
'fe_LOM': 'Loss Of Margin count',
'fe_NCD': 'No Cell Delineation failure count',
'fe_LCD': 'Loss Of Cell Delineation failure count',
'fe_CRC': 'Cyclic Redundancy Check error count, number of CRC 8 anomalies (number of incorrect CRC)',
'fe_HECError': 'Header Error Check Error count, HEC anomalies in the ATM Data Path',
}
LATENCY = Summary('server_latency_gather_router_data_seconds', 'Time to get stats from the router')
NE_FECS = Gauge('router_ne_fecs','Forward Error Correction Seconds - Line far-end (FECS-LFE)')
NE_ES = Gauge('router_ne_es','Errored Seconds - Line far-end (ES-LFE)')
NE_SES = Gauge('router_ne_ses','Severely Errored Seconds - Line far-end (SES-LFE)')
NE_LOSS = Gauge('router_ne_loss','Loss Of Signal Seconds')
NE_UAS = Gauge('router_ne_uas','Un-Available Seconds - Line (UAS-L) & Unavailable Seconds - Line far-end (UAS-LFE)')
FE_FECS = Gauge('router_fe_fecs','Forward Error Correction Seconds - Line far-end (FECS-LFE)')
FE_ES = Gauge('router_fe_es','Errored Seconds - Line far-end (ES-LFE)')
FE_SES = Gauge('router_fe_ses','Severely Errored Seconds - Line far-end (SES-LFE)')
FE_LOSS = Gauge('router_fe_loss','Loss Of Signal Seconds')
FE_UAS = Gauge('router_fe_uas','Un-Available Seconds - Line (UAS-L) & Unavailable Seconds - Line far-end (UAS-LFE)')
DSL_UPDOWN = Gauge('router_up_down','1=up "SHOWTIME", 0=down anything else', ['status'])
#TODO: add some info gauges for the static text
ext_template = """
<group name="rs">
---------------------- ATU-R Info (hw: annex , f/w: annex ) -----------
Near End Far End Note
Trellis :
Bitswap :
ReTxEnable :
VirtualNoise :
20BitSupport :
LatencyPath :
LOS :
LOF :
LPR :
LOM :
SosSuccess :
NCD :
LCD :
FECS : (seconds)
ES : (seconds)
SES : (seconds)
LOSS : (seconds)
UAS : (seconds)
HECError :
CRC :
INP : (symbols)
INTLVDelay : (1/100 ms)
NFEC :
RFEC :
LSYMB :
INTLVBLOCK :
AELEM : ----
</group>
"""
wan_template = """
<group name="wan">
Link Status:
Firmware Version:
VDSL2 Profile:
Basic Status Upstream Downstream
Actual Data Rate:
SNR:
G.Vectoring Status:
</group>
"""
class RouterCollector(object):
#
# Prometheus stuff, called when ever the METRICS_PORT is opened
#
@LATENCY.time() # measure who long this function takes to run
def collect(self):
print("Gathering router stats...")
try:
child = pexpect.spawn (config.TELNET_CMD + ' ' + config.IP, timeout=config.SPAWN_TIMEOUT)
child.expect("Account:")
child.send (config.USERNAME +"\r")
child.expect ("Password: ")
child.send (config.PASSWORD+"\r")
child.expect ("DrayTek> ")
child.send ("wan vdsl show basic\r")
child.expect ("DrayTek> ")
wan_results = child.before
wan_results=str(wan_results.replace(b'\r',b''),'ascii') # convert to ascii string for parsing
parser = ttp(data=wan_results, template=wan_template, log_level='INFO')
parser.parse() # extract the info
wm = parser.result(format='raw')[0][0]
link_status = wm['wan']['link_status']
if link_status == "SHOWTIME":
DSL_UPDOWN.labels(link_status).set(1)
else:
DSL_UPDOWN.labels(link_status).set(0)
child.send ("vdsl status more\r")
child.expect ("DrayTek> ")
ext_results = child.before
child.send ("exit\r")
ext_results=str(ext_results.replace(b'\r',b''),'ascii') # convert to ascii string for parsing
#print(ext_results)
parser = ttp(data=ext_results, template=ext_template, log_level='INFO')
parser.parse() # extract the info
# print result in JSON format
#p_ext_results = parser.result(format='json')[0]
#print(p_ext_results)
om = parser.result(format='raw')[0][0]
#print(om)
# Load up the Prometheus custom counters
for c in CUSTOM_COUNTERS:
#print("{} | {} | {}".format(c, CUSTOM_COUNTERS[c], om['rs'][c]))
yield CounterMetricFamily("router_" + c, CUSTOM_COUNTERS[c], om['rs'][c])
# load up the standard gauges - everything that is returned in seconds
NE_FECS.set(om['rs']['ne_FECS'])
NE_ES.set(om['rs']['ne_ES'])
NE_SES.set(om['rs']['ne_SES'])
NE_LOSS.set(om['rs']['ne_LOSS'])
NE_UAS.set(om['rs']['ne_UAS'])
FE_FECS.set(om['rs']['fe_FECS'])
FE_ES.set(om['rs']['fe_ES'])
FE_SES.set(om['rs']['fe_SES'])
FE_LOSS.set(om['rs']['fe_LOSS'])
FE_UAS.set(om['rs']['fe_UAS'])
except Exception as e:
print("Error gathering stats:", e)
return
class ServerHandler(http.server.BaseHTTPRequestHandler):
#
# Web Browser stuff
#
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write("Prometheus metrics available on port {} /metrics\n".format(config.METRICS_PORT).encode("utf-8")) # a byte string
class Config(object):
def __init__(self):
# read in environment variables
self.IP=getenv('IP','192.168.0.1')
self.USERNAME=getenv('USERNAME','admin')
self.PASSWORD=getenv('PASSWORD','password')
self.SERVER_PORT=int(getenv('SERVER_PORT', '8081'))
self.METRICS_PORT=int(getenv('METRICS_PORT','8001'))
self.TELNET_CMD=getenv('TELNET_CMD','/usr/bin/telnet') # where does telnet live?
self.SPAWN_TIMEOUT=int(getenv('SPAWN_TIMEOUT',5))
if __name__ == "__main__":
# read in config from Environment Variables
config = Config()
#print(config.IP,config.USERNAME,config.PASSWORD,config.SPAWN_TIMEOUT)
# this is called everytime the /metrics URI is called
REGISTRY.register(RouterCollector())
# start metrics server
start_http_server(config.METRICS_PORT)
# start web server - keeps the app up and running
server = http.server.HTTPServer(('', config.SERVER_PORT), ServerHandler)
print("Prometheus metrics available on port "+str(config.METRICS_PORT)+" /metrics")
print("HTTP server available on port "+str(config.SERVER_PORT))
server.serve_forever()
1
https://github.com/mohclips/draytek_router_stats_prometheus