Survey and Data Standards - Lessons Learnt

Background

Recently, 27/03/2019, I've been working a lot with InfoNet databases. In one specific project, I am working with possibly the best and most detailed asset survey standard I have ever seen. This is fantastic and pros to the water company who've built this standard.

Unfortunately this standard is not "the British standard", and thus, infuriatingly the British contractor has been making a complete hash of the data and their surveys.

I could discuss at length about the issues we have faced with this particular project, however, the more important discussion is how we can improve our standards and systems to overcome these obstacles!

After all, often this surveyed information becomes widespread and part of vital waste water management decisions. From prediction of flooding, to flooding alleviation. There is no doubt this information is important, so we have to make sure our systems are as robust as possible to withstand, detect and correct human errors where they occur.

So without further ado here are some lessons we learnt while assessing this work:

SURVEY STANDARDS

Survey status

Have the surveyor log the status of the survey (e.g. Unable to raise, Unable to find, Unable to access, Surveyed, Partially surveyed, not surveyed).

Choices which display intent

At no point should a surveyor be able to get away with inserting no data into the table apart from when they haven't surveyed this data.

The specific standard we are currently working with specifies that gas type found in the manhole should be recorded. This is not standard as per STC25. Unfortunately they didn't include the option 'None found'. For this reason future observers won't be able to know whether the field was populated by the surveyor or inadvertantly not (which already happened in our case). This is especially critical when surveyors are dealing with standards they are not familiar with.

The whole point of this specific tip is to ensure that a standard is in place such that all choices on any particular field display intent.

Data input priority

Different people have different ideas about which data sources are 'more accurate' than other data sources. For example, typically British standards dictate that a manhole survey contains more accurate pipe width than that from a CCTV survey. However CCTV surveys give more accurate descriptions of pipe material than manhole surveys. This is a British standard but you may have other ideas.

When you're making your standards be sure to specify these details! Don't trust engineers to use their judgement, because you will end with inconsistency, (and ultimately less trust), in data! Instead, build this into your standards. A table like the following should suffice:

Field	cams_pipe::width	cams_pipe::material	cams_pipe::cover_level	...
1	MH Survey	CCTV Survey	MH Survey	...
2	CCTV Survey	MH Survey	CCTV Survey	...
3	Flow Survey	Sewer Records	LIDAR Data	...
4	Flow Survey	Sewer Records	LIDAR Data	...
...	...	...	...	...

Where 1 is the highest confidence data, and as the number increases, the confidence in that data source decreases. These confidence charts are especially useful when building models from sewer records and survey data.

Don't attach! Hotlink!

ICM databases typically store all attached files outside of the database in a folder called "Extras". In this folder they store GUIDs for each image and then use the db_ref field of the database object to store the link to the external file.

Intrinsically this generally means that there is actually little difference physically between hotlinking and attaching a file, apart from when you try to save these files as snapshot files or copy them to transportables. In these cases with the huge amount of pictures copied, the transportable database often swells massively in size, becoming barely transportable at all.

The alternative, hotlinking, ensures that the network size remains small allowing databases to be imported and ran quickly.

Hotlinks also allow you to be more tactical about the links you use. For example, it might be worth having seperate folders for ancillaries and manholes, and naming conventions. For example:


        cso
        |- C0001 - Madavale Road (CSO)
        |  |- Drawings
        |  |  |- SK12345678_Chamber.png
        |  |- Photos
        |  |  |- SK12345678_Location.png
        |  |  |- SK12345678_Sketch.png
        |  |- Misc
        |  |  |- SK12345678_Plan.png
        |- ...

With hotlinks:

Name	Hotlink
Chamber Drawing	cso/C0001 - Madavale Road (CSO)/Drawings/SK12345678_Chamber.png
Location Photo	cso/C0001 - Madavale Road (CSO)/Photos/SK12345678_Location.png
Sketch	cso/C0001 - Madavale Road (CSO)/Photos/SK12345678_Sketch.png
Location Plan	cso/C0001 - Madavale Road (CSO)/Misc/SK12345678_Plan.png

DATA VALIDATION

Conditional data quality rules dependant on survey status

While checking the data for inconsistencies the above point can be extremely useful. For examle, If the surveyor is unable to raise a manhole, they should still record information including:

Ground Level

Cover Shape

All of these fields should be filled with data. On the other hand, some fields would be impossible to fill in:

Type of manhole (e.g. CSO, Outfall, SPS, Bifurcation, ...)

Chamber depth, shape or dimensions

Shaft depth, shape or dimensions

Where as, if the surveyor was unable to find the manhole, then they won't be able to get any of the information. QA checks can then be built with this information in mind, check for logical inconsistencies and then flag them up for manual review.

These data quality checks could easily be checked using ruby!


        $errors = []
        def ensure(boolean,message)
          if !boolean
            $errors.push(message)
          end
        end
        
        isSurveyed = "user_text_1"
        net = WSApplication.current_network
        mhSurveys = net.row_objects("cams_manhole_survey")
        mhSurveys.map do |row|
          if row[isSurveyed]=="UTR"
            ensure(row.ground_level != nil ,"UTR: Ground level should be surveyed")
            ensure(row.cover_level != nil  ,"UTR: Cover level should be surveyed")
            ensure(row.cover_shape != nil  ,"UTR: Cover shape should be surveyed")
            ensure(row.x != nil            ,"UTR: x should be surveyed")
            ensure(row.y != nil            ,"UTR: y should be surveyed")
            ensure(row.type == nil         ,"UTR: type should not be surveyed")
            ensure(row.shaft_depth == nil  ,"UTR: shaft depth should not be surveyed")
            ensure(row.chamber_depth == nil,"UTR: chamber depth should not be surveyed")
            #...
          elsif row[isSurveyed]=="UTF"
            ensure(row.ground_level != nil ,"UTF: Ground level should be surveyed")
            ensure(row.cover_level  == nil ,"UTF: Cover level should not be surveyed if MH was not found")
            #...
          elsif row[isSurveyed]=="YES"
            #...test everything
          end
        end

The code could be made much more efficient, but this would at least work, and would do so quite well!

Include notes searches in data validation

In numerous cases we found a comment like "Flap valve on 'A' pipe" within the notes field, however no surveyed information about the flap valve itself (width, shape, invert, ...).

Validation algorithms should check for text within surveyed objects, and flag all assets as 'info' warnings, for manual review. A simple ruby script would be the following:


          mhSurveys.map do |row|
            if row.notes[/flap|valve|NRV/i]
              raise_info(row)
            end
            if row.notes[/penstock|sluice|gate/i]
              raise_info(row)
            end
            if row.notes[/hole|hiw|orifice/]
              raise_info(row)
            end
            #...
          end

Choice lists

In InfoNet, choice lists contain lists of valid options for specific fields. They usually map short codes to descriptions. For instance, if you have a "Survey status field", you might have a choice list which allows the user to input "Unable to raise", "Unable to find", "Surveyed", etc. In the choice lists, these options are mapped to codes e.g. "UTR", "UTF", "S", etc. Which are imported into the database via the data input userform.

In InfoNet, you can attach your choice lists to transportable databases, however I would strongly advise against this. There is no way to check whether the choice list attached is correct or not without creating a new database and testing it. Instead of attaching choice lists to the database, simply export the choice list CSV and supply it with the transportable database.

Choice lists are vital if you want InfoNet/InfoWorks Validation Objects to work, so make sure your choice lists are correct before you send them to the contractor!

Connectivity issues

It is fairly common for surveyors to accidentally insert the incorrect manhole reference into the raw survey card. When this data is inputted into the InfoNet database it is vital that the standard declares that connectivity of manholes is checked.

In numerous cases we picked up InfoNet databases with misconnections caused by surveyors incorrectly inputting a number. Validation algorithms should search for this also.

A technique we used to pull out these connectivity issues is by selecting all non-rising main pipes greater than 150 meters. This might extract some false negatives, however, it is always best to do a sense check, in order to make sure the data is correct.

Conclusion

That's all information I have for now. Remember to comment your own ideas on the linked in post!