Home Vigor Modem to Prometheus
Post
Cancel

Vigor Modem to Prometheus

Vigor Modem to Prometheus

A python script to read the detailed DrayTek router stats.

The following stats are recorded and presented as Open Metrics (Prometheus) stats.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
> vdsl status more
  ---------------------- ATU-R Info (hw: annex A, f/w: annex A/B/C) -----------
                  Near End        Far End    Note
 Trellis      :      1               1
 Bitswap      :      1               1
 ReTxEnable   :      1               1
 VirtualNoise :      0               0
 20BitSupport :      0               0
 LatencyPath  :      0               0
 LOS          :      0               0
 LOF          :      0               0
 LPR          :      0               0
 LOM          :      0               0
 SosSuccess   :      0               0
 NCD          :      0               0
 LCD          :      0               0
 FECS         :      0            2575 (seconds)
 ES           :     35               3 (seconds)
 SES          :      3               1 (seconds)
 LOSS         :      0               0 (seconds)
 UAS          :     27            1861 (seconds)
 HECError     :      0               0
 CRC          :    126              31
 RsCorrection :      0               0
 INP          :    230             160 (symbols)
 InterleaveDelay :      0              62 (1/100 ms)
 NFEC         :     32              32
 RFEC         :     16              16
 LSYMB        :     16              16
 INTLVBLOCK   :     32              32
 AELEM        :      0            ----

Environment Variables

The script requires some environment variables to be loaded, i do this in docker using a .env config file:

1
2
3
4
5
6
7
IP=192.168.100.1
USERNAME=admin
PASSWORD=admin
SERVER_PORT=8081
METRICS_PORT=8001
TELNET_CMD=/usr/bin/telnet
SPAWN_TIMEOUT=5

Telnet

The script spawns a telnet connection to the router to gather the stats from the CLI (It could be easily changed to use SSH)

The remote prometheus call to /metrics, fires off the telnet request, so this will poll as fast as you want it to.

Python Modules

Python Expect (to gather the stats) Template Text Parser (to parse the stats) Prometheus Python Client (to make them available to Prometheus)

Prometheus

Set as a target as normal.

1
2
3
4
5
6
7
8
9
- job_name: router_stats
  honor_timestamps: true
  scrape_interval: 5m
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - 192.168.0.1:8003

Dockerfile and Script

dockerfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
FROM python:3-slim

RUN set -eux; \
    useradd -ms /bin/bash puser; \
    pip install prometheus_client pexpect ttp; \
    apt update; \
    apt-get install -y --no-install-recommends \
        curl \
        telnet \
        && rm -rf /var/lib/apt/lists/* ; \
    echo "Done"

# run as non-root user - safety first!
USER puser
WORKDIR /home/puser

# copy code into image
COPY . /home/puser/

# fix for https://github.com/dmulyalin/ttp/issues/54
ENV TTPCACHEFOLDER=/tmp
ENV IP="192.168.200.1"
ENV USERNAME="admin"
ENV PASSWORD="admin"

ENV SERVER_PORT=8081
ENV METRICS_PORT=8001

# where does telnet live?
ENV TELNET_CMD='/usr/bin/telnet' 
# timeout to telnet to router and grab stats
ENV SPAWN_TIMEOUT=5

EXPOSE 8081
EXPOSE 8001

# and run the code :)
ENTRYPOINT ["python3"]
CMD ["vigor-to-prom.py"]

vigor-to-prom.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
#!/usr/bin/python3

#
# A python OpenMetrics / Prometheus server to grab data from a DrayTek router
#
# Uses 'expect' to gather data from the telnet port

# https://www.draytek.co.uk/support/guides/kb-dsl-status-more

# https://kitz.co.uk/adsl/linestats_errors.htm

from os import getenv

import http.server

from prometheus_client import start_http_server
from prometheus_client import Summary, Gauge
from prometheus_client.core import CounterMetricFamily, REGISTRY

import pexpect # expect
from ttp import ttp # templating

# https://prometheus.io/docs/concepts/metric_types/

CUSTOM_COUNTERS = {
    'ne_LOS': 'Loss Of Signal count',
    'ne_LOF': 'Loss Of Frame count',
    'ne_LPR': 'Loss Of Power count',
    'ne_LOM': 'Loss Of Margin count',
    'ne_NCD': 'No Cell Delineation failure count',
    'ne_LCD': 'Loss Of Cell Delineation failure count',
    'ne_CRC': 'Cyclic Redundancy Check error count, number of CRC 8 anomalies (number of incorrect CRC)',
    'ne_HECError': 'Header Error Check Error count, HEC anomalies in the ATM Data Path',

    'fe_LOS': 'Loss Of Signal count',
    'fe_LOF': 'Loss Of Frame count',
    'fe_LPR': 'Loss Of Power count',
    'fe_LOM': 'Loss Of Margin count',
    'fe_NCD': 'No Cell Delineation failure count',
    'fe_LCD': 'Loss Of Cell Delineation failure count',
    'fe_CRC': 'Cyclic Redundancy Check error count, number of CRC 8 anomalies (number of incorrect CRC)',
    'fe_HECError': 'Header Error Check Error count, HEC anomalies in the ATM Data Path',
    
}

LATENCY = Summary('server_latency_gather_router_data_seconds', 'Time to get stats from the router')

NE_FECS  = Gauge('router_ne_fecs','Forward Error Correction Seconds - Line far-end (FECS-LFE)')
NE_ES    = Gauge('router_ne_es','Errored Seconds - Line far-end (ES-LFE)')
NE_SES	 = Gauge('router_ne_ses','Severely Errored Seconds - Line far-end (SES-LFE)')
NE_LOSS  = Gauge('router_ne_loss','Loss Of Signal Seconds')
NE_UAS	 = Gauge('router_ne_uas','Un-Available Seconds - Line (UAS-L) & Unavailable Seconds - Line far-end (UAS-LFE)')

FE_FECS  = Gauge('router_fe_fecs','Forward Error Correction Seconds - Line far-end (FECS-LFE)')
FE_ES    = Gauge('router_fe_es','Errored Seconds - Line far-end (ES-LFE)')
FE_SES	 = Gauge('router_fe_ses','Severely Errored Seconds - Line far-end (SES-LFE)')
FE_LOSS  = Gauge('router_fe_loss','Loss Of Signal Seconds')
FE_UAS	 = Gauge('router_fe_uas','Un-Available Seconds - Line (UAS-L) & Unavailable Seconds - Line far-end (UAS-LFE)')

DSL_UPDOWN = Gauge('router_up_down','1=up "SHOWTIME", 0=down anything else', ['status'])

#TODO: add some info gauges for the static text 

ext_template = """
<group name="rs">
  ---------------------- ATU-R Info (hw: annex , f/w: annex ) -----------
 Near End                   Far End  Note
 Trellis          :  
 Bitswap          :  
 ReTxEnable       :  
 VirtualNoise     :  
 20BitSupport     :  
 LatencyPath      :  
 LOS              :  
 LOF              :  
 LPR              :  
 LOM              :  
 SosSuccess       :  
 NCD              :  
 LCD              :  
 FECS             :   (seconds)
 ES               :   (seconds)
 SES              :   (seconds)
 LOSS             :   (seconds)
 UAS              :   (seconds)
 HECError         :  
 CRC              :  
 INP              :   (symbols)
 INTLVDelay       :   (1/100 ms)
 NFEC             :  
 RFEC             :  
 LSYMB            :  
 INTLVBLOCK       :  
 AELEM            :  ----
</group>
"""

wan_template = """
<group name="wan">

Link Status: 
Firmware Version: 
VDSL2 Profile: 
Basic	Status	Upstream	Downstream 
Actual Data Rate: 
SNR: 
G.Vectoring Status:	
</group>
"""

class RouterCollector(object):
#
# Prometheus stuff, called when ever the METRICS_PORT is opened
#
    @LATENCY.time() # measure who long this function takes to run
    def collect(self):

        print("Gathering router stats...")

        try:
            child = pexpect.spawn (config.TELNET_CMD + ' ' + config.IP, timeout=config.SPAWN_TIMEOUT)

            child.expect("Account:")

            child.send (config.USERNAME +"\r")
            child.expect ("Password: ")
            child.send (config.PASSWORD+"\r")
            child.expect ("DrayTek> ")

            child.send ("wan vdsl show basic\r")
            child.expect ("DrayTek> ")
            wan_results = child.before
            wan_results=str(wan_results.replace(b'\r',b''),'ascii') # convert to ascii string for parsing

            parser = ttp(data=wan_results, template=wan_template, log_level='INFO')
            parser.parse() # extract the info
            wm = parser.result(format='raw')[0][0]

            link_status = wm['wan']['link_status']
            if link_status == "SHOWTIME":
                DSL_UPDOWN.labels(link_status).set(1)
            else:
                DSL_UPDOWN.labels(link_status).set(0)
            
            
        

            child.send ("vdsl status more\r")
            child.expect ("DrayTek> ")
            ext_results = child.before
            child.send ("exit\r")

            ext_results=str(ext_results.replace(b'\r',b''),'ascii') # convert to ascii string for parsing
            #print(ext_results)

            parser = ttp(data=ext_results, template=ext_template, log_level='INFO')
            parser.parse() # extract the info

            # print result in JSON format
            #p_ext_results = parser.result(format='json')[0]
            #print(p_ext_results)

            om = parser.result(format='raw')[0][0]
            #print(om)

            # Load up the Prometheus custom counters
            for c in CUSTOM_COUNTERS:
                #print("{} | {} | {}".format(c, CUSTOM_COUNTERS[c], om['rs'][c]))
                yield CounterMetricFamily("router_" + c, CUSTOM_COUNTERS[c], om['rs'][c])
            
            # load up the standard gauges - everything that is returned in seconds
            NE_FECS.set(om['rs']['ne_FECS'])
            NE_ES.set(om['rs']['ne_ES'])
            NE_SES.set(om['rs']['ne_SES'])
            NE_LOSS.set(om['rs']['ne_LOSS'])
            NE_UAS.set(om['rs']['ne_UAS'])

            FE_FECS.set(om['rs']['fe_FECS'])
            FE_ES.set(om['rs']['fe_ES'])
            FE_SES.set(om['rs']['fe_SES'])
            FE_LOSS.set(om['rs']['fe_LOSS'])
            FE_UAS.set(om['rs']['fe_UAS'])

        except Exception as e:
            print("Error gathering stats:", e)
            return


class ServerHandler(http.server.BaseHTTPRequestHandler):
#
# Web Browser stuff
#
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write("Prometheus metrics available on port {} /metrics\n".format(config.METRICS_PORT).encode("utf-8")) # a byte string

class Config(object):

    def __init__(self):
    # read in environment variables

        self.IP=getenv('IP','192.168.0.1')
        self.USERNAME=getenv('USERNAME','admin')
        self.PASSWORD=getenv('PASSWORD','password')

        self.SERVER_PORT=int(getenv('SERVER_PORT', '8081'))
        self.METRICS_PORT=int(getenv('METRICS_PORT','8001'))

        self.TELNET_CMD=getenv('TELNET_CMD','/usr/bin/telnet') # where does telnet live?
        self.SPAWN_TIMEOUT=int(getenv('SPAWN_TIMEOUT',5))


if __name__ == "__main__":

    # read in config from Environment Variables
    config = Config()
    #print(config.IP,config.USERNAME,config.PASSWORD,config.SPAWN_TIMEOUT)

    # this is called everytime the /metrics URI is called
    REGISTRY.register(RouterCollector())

    # start metrics server
    start_http_server(config.METRICS_PORT)
    
    # start web server - keeps the app up and running
    server = http.server.HTTPServer(('', config.SERVER_PORT), ServerHandler)
    print("Prometheus metrics available on port "+str(config.METRICS_PORT)+" /metrics")
    print("HTTP server available on port "+str(config.SERVER_PORT))
    server.serve_forever()
1
https://github.com/mohclips/draytek_router_stats_prometheus
This post is licensed under CC BY 4.0 by the author.