Similarity matching method of power distribution system operating data based on neural information retrieval

Kai Xiao1,Daoxing Li1,Pengtian Guo1,Xiaohui Wang1,Yong Chen1

1.China Electric Power Research Institute Co.Ltd.,Beijing 100192,P.R.China

Abstract

Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.

Keywords: Neural information retrieval,Power distribution,Graph data,Operating section,Similarity matching.

0 Introduction

Currently,the scale of the power grid is increasingly expanding with the integration of new energy sources and changes in load characteristics,and the power grid operation is presenting a more diversified trend.Full utilization of the power grid topology information,operation of massive data,and mining of the hidden regularity in the data are important in the optimal operation and auxiliary decision-making of the power grid.Power distribution networks abroad entered the saturation stage in the 1980s,and the distribution automation construction of developed countries in the United States,Europe and Asia has their own characteristics.The Oncor Corporation and Albama Power Company in the United States have successively built distribution automation systems,that mainly meet the needs of active distribution network operation monitoring and management,to give full play to the role of distributed power and optimize the operation of distribution network[1,2];In Europe,the Global Information System(GIS)is widely used in power distribution automation,the data application is focused on timely detection,treatment and repair of distribution network equipment faults[3].Japan and Singapore in Asia build advanced distribution automation systems for medium voltage distribution networks.By collecting a large amount of measurement data and timely reporting of distribution network anomalies to improve power supply reliability and economy[4].Distribution automation in China started late,and there is a large gap between urban and rural distribution networks.Especially in the context of the rapid development of distributed energy,there are many objects of power network measurement and control,many terminal devices,and complex line connections.So,while fully absorbing the technical achievements of developed countries in distribution equipment management,fault monitoring and data acquisition,it mainly uses the massive distribution data collected by a large number of dedicated built systems to carry out fusion analysis and improve the reliability of distribution network operation.The focus is on forecasting,security knowledge acquisition and power grid stability assessment.The traditional relational database and machine learning methods have achieved certain results.In[5],the application of data mining technology in short-term load forecasting was analyzed,and the prediction performances of historical data mining technology and traditional methods were compared.The study in[6]applied fuzzy theory in historical data mining to spatial modeling for achieving long-term load forecasting.The study[7]provides an adaptive feedforward backpropagation neural network power generation forecasting method for large-scale photovoltaic power station power generation forecasting.Historical data and a radial basis function neural network were used to analyze the data obtained per hour.In[8],the stability of a power system is judged based on historical data,and the rapid screening of fault sets and rapid decisionmaking problems in emergency situations are solved by feature selection and learning algorithm improvement.However,with the increasing complexity of the source and load of the distribution network,the accuracy,generalization ability,and efficiency of the above methods in data retrieval and analysis can get significantly affected by the continuous change in high-dimensional massive data and distribution network topology.

Graph database[9]is a non-relational database based on graph theory,first proposed by Google in the United States and used for large-scale linked data search business.Unlike relational databases,graph databases do not have strong data definitions,which makes them conducive to the flexible expansion of data models.In terms of association processing,graph data nodes are physically characterized by“adjacency-free index”.The physical address of the data is directly stored in the nodes and edges.The traversal method of direct addressing is adopted to eliminate the overhead of scanning and searching based on the index.So it has great technical advantages in high-dimensional complex network data retrieval[10-12].In terms of data calculation,relational databases only provide conventional statistical analysis functions,while most current graph databases provide graph machine learning and deep learning computing frameworks with complete reasoning capabilities.Graph data computing has attracted extensive attention nationally and internationally owing to its distributed,efficient,and concurrent computing capabilities for large-scale data and powerful representation of complex dynamic networks[13-15].The study[16]developed a parallel power flow algorithm suitable for the distribution network,and an analysis software with data visualization and management analysis functions based on the graph data model.The study[17]proposed a method to quickly find cascading faults.Using graph convolutional neural network for graph data,it can quickly find and reveal the cause of the fault.The study[18]proposed a full-node parallel iterative algorithm based on the graph data model,which achieved a hundredmillisecond computing speed for a ten-thousand-level node system.Studies[19-21]analyzed the potential advantages of graph databases in large-scale power systems,proposed a topology description method and grid data model based on graph databases,and verified the advantages of graph databases in search efficiency.

Power grid data retrieval aims to mine valuable information from the data and provide auxiliary decisionmaking for the safe operation of power systems.In[22],stable operation regulations were obtained using a clustering method to provide security warnings for the power system.The study in[23]proposed a fast stability judgment method for a large power grid based on a support vector machine(SVM),which can achieve fast stability judgment by analyzing the massive historical data of the power system.In reference[24],a power grid operating section retrieval method based on the k-means algorithm was proposed.A decision tree was used to reduce the data dimension,and the k-means algorithm was used to improve the similarity matching performance.In reference[25],a power grid operating section retrieval method was proposed based on a convolutional neural network(CNN),which effectively reduced the loss of features and improved the matching accuracy using image recognition technology as a reference.Among the above retrieval methods that using clustering and SVM is time-consuming for model training,and using a CNN can further improve the retrieval accuracy.

Graph data similarity matching is an NP-complete problem.The subgraph matching algorithms proposed by the industry,such as Ullmann[26],VF2[27],GADDI[28],GraphQL[29]and Spath[30],are mainly based on index or path for pattern matching.They have high requirements in index establishment and maintenance,path selection and have less consideration of the attributes of the subgraph itself.Therefore,designing an appropriate representation is necessary that considers the attributes and relations of the nodes according to the characteristics of the fused power distribution data and uses the appropriate deep learning algorithm to train the information retrieval model to overcome the sparse problem of the traditional spatial vector model and improves the retrieval effect.The study[31]proposed a deep correlation matching model(DRMM),which used word2vec to train word vectors and deep neural networks to sort the results.The study[32]proposed a kernel-based neural model for the document ranking model(KNRM)based on DRMM model,which adjusted the word vector through Gaussian kernel function and improved the performance of the model.The study[33]proposed a CNN for soft matching n-grams model(Conv-KNRM)based on the KNRM model,which represented query and document information as n-grams,generated a similarity matrix by cross-matching the n-gram representation and a similarity score through pooling,scoring,and sorting.In summary,most current power distribution operation data retrieval is based on relational databases having low range of data acquisition and intelligence.Based on the original data characteristics of power distribution and utilization,this study expands the scope of domain data and constructs an intelligent analysis and retrieval model integrating attention mechanism and Conv-KNRM(Conv-KNRMA),which not only improves the generalization ability of historical operation data analysis and retrieval of power distribution and utilization but also strengthens the data fusion processing ability of power distribution and utilization business.

In summary,there are three shortcomings in the retrieval of power distribution operating data.Firstly,the distribution data processing models proposed by the researchers mainly verify the validity of a specific scenario related to a business system,and the generalization ability is limited.Secondly,the distribution data is scattered and insufficient in fusion,the data model still needs to be further improved.Finally,the power distribution data retrieval is mainly aimed at the core business system data,and the distributed energy data is less considered,which affects the retrieval accuracy.So,it is necessary to fully integrate the data of power distribution and related integrated energy fields,construct intelligent retrieval technology based on deep learning and graph computing,and improve the generalization and data fusion ability of the power distribution historical operating data retrieval model.

This study first proposes a neural information retrieval technology framework for similarity matching of operating data in a power distribution system and designs the functions of each layer in the framework in detail.Second,based on this framework,a spatiotemporal sequence graph id="generateCatalog_3" style="text-align: left; text-indent: 0em; font-size: 1.4em; color: rgb(195, 101, 0); font-weight: bold; margin: 0.7em 0em;">1 Neural information retrieval technology frameworks

The framework for neural information retrieval is illustrated in Fig.1.The technical principle is to associate and fuse the operating section data of various power grid systems and convert them into a graph data structure,and use the graph representation learning algorithm to generate a low-dimensional dense vector representation for all the features of the section.The supervised learning method was used to construct the cross-sectional vector matrix representation according to a certain format.The deep CNN in the image field is used to train the cross-section information retrieval model to obtain the potential mapping relationship between the sample data and the similarity result,and then realize the accurate matching of the similar cross-section of the power grid using the neural network information.It is divided into four layers:data access,graph data construction,graph data processing,and grid application layers.The data access layer is used to obtain data related to the measurement of the power distribution system,equipment operation configuration,and other data,and perform data completion and normalization.The graph data construction layer converts the obtained multisource heterogeneous data into attribute graph data through structured and unstructured description methods,data fusion and other technologies,and constructs a dataset for model training.The graph data processing layer contains various encapsulated graph machine learning and graph neural network algorithms,including node feature embedding,graph convolutional networks,graph attention networks,etc.,and provides graph data processing algorithms to support business applications.The grid application layer provides intelligent retrieval,data matching,decision response,and customized interface services for data analysis and mining of power distribution business.After the online operation,the trained information retrieval model only is needed for the operating section to achieve similarity matching of historical data and acquisition of relevant processing information.

Fig.1 Neural information retrieval technology framework for operating section of the power distribution system

2 Data access and fusion

2.1 Data access

The power distribution system includes the original distribution network,power supply and consumption,distributed new energy,and other networks.The power grid structure is complex and challenging.The operating data are mainly based on the measurement and monitoring terminals deployed in each system,including the voltage,active load,current,phase angle,event data,equipment files,configuration parameters,and operation and maintenance records.Considering that the measurement error may increase with an increase in the comprehensive statistical level,this study preferentially selects the measurement data of the power distribution and consumption assessment calibration and the data collected by the comprehensive energy core system.Specifically,it is to access the power management unit(PMU)and µPMU by SCADA,access the measurement and assessment master table data from the electric information collection and the integrated energy service system,establish topological relations of core nodes such as lines,distribution transformers,district,and distributed power sources based on the connection of the distribution and marketing system,and record topology change information according to the frequency of collection.

2.2 Data completion

The success rate of power distribution operating data collection is affected by the failure of the power grid communication line,abnormality of the main station,and failure of the measurement equipment,which may easily cause collected data packet loss,abnormal analysis,or missing data,affecting the accuracy of data processing and model training.It is agreed that any missing data should be recollected.If not recollected,it will be discarded and reacquired in the construction of the model training dataset;in terms of testing dataset construction,when more than 20%data of the same section is missing,it will be discarded and recollected,and when less than 20%,the data are filled up.

Considering that the historical changes in the same data item of the adjacent section of the power distribution system show a certain regularity,and the data items of the same section are constrained by the physical principle of the power grid,the regulations of the voltage,load,and current data of adjacent nodes in the same branch are similar.As both the load and current are related to the voltage,the continuous section voltage value of adjacent nodes before and after the missing node can be selected to fill in.As they are all structured data,the interpolation method can be used to complete the missing values of the nodes.Considering the voltage data as an example,the specific method is as follows:

where ui,t is the voltage value of node i at time t,n is the number of nodes on the same line as node i,andand represent node i minimum and maximum measurements over the history m operating section,respectively.Missing values such as the load,current,and phase angle can be derived from the physical characteristics of the electrical quantities.

Moreover,semi-structured and unstructured data,such as work tickets,fault descriptions,missing records,images,videos,and other data,are directly extracted from the graph database.Based on the time-series characteristics of the operating section,the section nodes and time-series nodes are connected as the related knowledge results of similarity matching retrieval results,which provide a historical processing experience reference for the current operating section.

2.3 Data fusion

The power distribution system operating section is a series of sequential and continuous datasets in time;therefore,it is necessary to establish an efficient linked data structure to improve the data processing performance of the framework.The physical connection and operation of the power distribution system has natural graphical characteristics.Therefore,this study adopts the graph data technology of fusion time series to establish a data path between multiple systems to realize data fusion of spatiotemporal information integration.

Equipment data are the master data in some core systems,and they can effectively connect other core data within their respective business scope.Therefore,the equipment and wiring relations are converted into a graph data model,and the systems are associated through the existence of shared key equipment nodes,thereby forming a unified graph data structure of the power distribution system.After analyzing the data model of the Production Management System(PMS),Electricity Information Collection System(EICS),and Comprehensive Energyrelated Business Systems(CEBS),it is concluded that distribution transformers,districts,lines,and distributed power sources are the core nodes of power distribution data fusion.For the time-series data generated by the device,we established a time object network G(V ,E ),where V is the year,month,day,hour,and subgraph nodes,and E is the relationship between year and month,month and day,day and hour,hour,and minute.The specific collecting time node was used to connect the summary table data of the distribution transformers,districts,lines,and distributed power sources.By establishing the edge between the equipment and the time node,the fusion data of power distribution and distributed energy are formed based on graph data.The distributed computing and adjacencyfree indexing features of graph data can be fully utilized to improve the data processing performance after data fusion.

The relationships between the graph nodes of the power distribution operating data fusion graph are presented in Table 1.

Table 1 The graph nodes relation of the power distribution system operating section

2.4 Data normalization

As the power distribution system operates in a dynamic stable state,for each node,the measurement data on different operating sections are very close,and the changes are small.If the original measurement data are directly used to embed the node feature vector to support subsequent information retrieval model training,they will make the training non-economical and complex,and further affect the model’s retrieval and matching effect.To eliminate the negative influence of possible singular values on the measurement data,reducing the measurement data according to a certain proportion is necessary.In this study,a normalization operation was adopted for the measurement value to improve the difference between the data.It can improve the speed of model training and testing,ensuring that the eigenvalues are in a relatively small range,and avoiding memory overflow during model training.The disadvantage is that the data quality requirements are high,and the data samples must have the same variation.The data normalization process was as follows:

where u' is the normalized voltage value,ucur is the original voltage value,umax,umin are the maximum and minimum values of all the historical section voltage values obtained by the normalization processing node,respectively.

3 Feature embedding and matching method

3.1 Graph node feature embeddings

Aiming at the established data structure of the operating section sequence graph of the power distribution system,this study proposes a graph node embedding representation method that fuses prior features.By transforming the high-dimensional sparse adjacency matrix of the graph node into a low-dimensional dense vector form that fuses the characteristics of the node itself and the network characteristics,fast and efficient graph data similarity matching can be achieved through vector operations.

For the constructed sequence graph data,the maximum mutual information algorithm is used to filter out the attribute set that is most related to the target reasoning,and the attribute set is then used to filter out the key subgraphs in the whole graph.DeepWalk[34]and word2vec[35]are primarily used for data feature embedding.The specific process is as follows:

(1)The known power distribution operating section and equipment node attributes are considered as prior knowledge,including the rated power,rated voltage,rated current,capacitance,inductance,and other data.Graph representation learning based on the DeepWalk algorithm is transformed into word representation learning to obtain the node features of the temporal graph.The objective function considering the similarity of the graph node network and node attributes is designed,and the low-dimensional representation vector is obtained by continuously optimizing and iterating the function.A skip-gram model similar to natural language processing is used to optimize and express the joint probability model of network features between nodes,and the known node attribute features are integrated into the model as follows:

(2)As the time complexity of the normalization term in(4)is very high,the negative sampling method is used to convert it to a low-complexity expression.It is replaced by a maximizing function to learn the representation vectors of the nodes in the graph as follows:

(4)The batch stochastic gradient descent method adjusts parameter Φ to achieve model convergence,and vector fj is spliced with Φ(vi)to obtain the vector representation of the node feature embedding.

3.2 Attention mechanism

The attention mechanism focuses on the key features,and the important features for configuring the corresponding weights are used to ensure optimal configuration and efficient computing of limited resources.The attention mechanism mainly solves the mapping from query to the source key-value pair,and the similarity between the query and key is used as the weight of the value:

where Source r is the dataset or information source to be processed,which refers to the fusion feature vector of all graph data nodes in this study,Query is the known information of the current input algorithm,and refers to the fusion feature vector of the operating section to be matched.Keyi,Valuei are the primary key and feature vectors of the neighbor nodes,respectively.According to the core processing logic of the attention mechanism,the correlation between the feature vector to be matched and all neighboring nodes is used as the weight,and the features of all nodes are summed by weight and sorted to ensure that the most important node features are preserved and preferentially used when the system has insufficient computing power or high-performance requirements.The attention mechanism adopted can improve the performance of a graph CNN[36-38].For convolution and pooling of CNNs to extract deep features,the attention mechanism can make the learning of the model more focused on useful information,thereby reducing the computational complexity of large-scale operating section similarity matching and effectively improving the learning performance of neural information retrieval models.

4 Neural information retrieval model for similarity matching of power distribution operating data

Neural information retrieval is a similarity matching model based on an improved Conv-KNRM model,and its output is the ranking information based on the similarity score.The Conv-KNRM model includes the embedding,convolutional,cross-matching,pooling,fully connected,and training decision layers.First,the query information and the retrieved data are converted into embedding matrices through the embedding layer,and then two types of convolution kernels(unigrams and bigrams)are used to represent the n-gram of the query information and the data to be retrieved,which are then processed with different filters to obtain a new embedding matrix.Then,through the cross-matching layer,the feature vectors in the embedding matrix are matched to each other to obtain a matching matrix.Finally,it is scored and sorted through the pooling,fully connected,and training decision layers.

This study adopted the Conv-KNRM model and proposed a neural information retrieval model framework based on the attention mechanism(Conv-KNRMA),as shown in Fig.2.The performance of the model was improved by adding an attention mechanism(Attention Pooling)to the Conv-KNRM model.

Fig.2 Neural information retrieval model framework

The neural information retrieval model is divided into embedding,convolution,cross-matching,pooling,fully connected,and training decision layers.The embedding layer is used for vector transformation of sample data or retrieval information;the convolution layer is used to train the feature weights and obtain the required key sampling values;the cross-matching layer performs matrix crossmatching between the retrieval vector and the vector set,and performs soft matching calculation of multiple dimensions.The pooling layer uses the attention mechanism to assign weights to important features,the fully connected layer calculates the similarity degree of the important features of the section,and the training decision layer generates the final similarity calculation minimum distance.The core idea of the neural information retrieval algorithm is to cross the query graph data with the graph data to be retrieved and input the CNN convolutional layer and pooling layer to perform vector feature screening and extract key feature vectors.The model was trained according to the existing samples and key feature vectors,and the output graph retrieved the matching model after multiple iterations.This model allows the user to score all candidate entities on the graph based on the entered query entities,thereby returning the highest score.The algorithm uses the feature vector set of the operating section graph formed after data access and processing and graph node feature embedding and randomly assigns the training and test sets.The algorithm training process was as follows:

(1)Assuming that the retrieval vector is Tq and the vector to be retrieved is Td,calculating the features of the h-gram:

where wh and bh are weights and h is the size of the convolution sliding window.

（2)Then,calculate the similarity between:

Compared with the existing subgraph id="generateCatalog_13" style="text-align: left; text-indent: 0em; font-size: 1.4em; color: rgb(195, 101, 0); font-weight: bold; margin: 0.7em 0em;">5 Application for data retrieval of historical operating section in power distribution system

5.1 Application scenario design

The power distribution system is an important part of ensuring power safety and supply at the power system end.Following years of operation,the power distribution and consumption-related information systems have accumulated rich historical operating data and fault processing records.The business scope of the power distribution system and physical characteristics of the power grid determine that the power grid operation characteristics and fault types in this field are limited.Therefore,constructing an efficient similarity matching technology for massive historical operating sections can provide historical experience in the management and maintenance of the current power distribution system,which can reflect an increasingly important value with the improvement of power grid complexity and continuous growth of historical data.

Based on the neural information retrieval algorithm proposed in this section,an algorithm model development component and a graph computation component provided by the mainstream artificial intelligence framework in China were used to construct a sequence graph matching retrieval technology for the historical operating section of the power distribution system,as shown in Fig.3.First,the load,voltage,current,and topology data of the equipment within the scope of the power distribution system to be retrieved are converted into sequence graph data.Second,the graph data are processed to obtain the node feature vector and realize feature representation learning of the power grid operating section.Third,the real-time power grid operating section and historical power grid operating section are expressed as n-grams.Finally,the neural information retrieval algorithm performs similarity matching,and the first n results are returned based on the matching scoring results.

Fig.3 Retrieval process of power distribution historical operating section

5.2 Application result evaluation

To verify the feasibility of the technology,this study obtains the historical operating section of the 15-minute dimension of the distribution backbone network in a provincial company’s area of State Grid Corporation of China within six months,and integrates it with the PMS,SCADA,EICS,and CEBS to obtain the transformer,bus,line,PV,and wind power new energy measurement data associated with the same period.The generator active and reactive power,generator power factor,node active and reactive load,load power factor,node voltage amplitude,node voltage angle,reactive power compensation power,new energy active and reactive power,line active and reactive power flow,DC line current,and other characteristic quantities were selected.The maximum,minimum,and average values were calculated,and 17280 fusion operating section samples were formed.The original data and calculation results were stored in the Nebula graph database.

By dividing the sample set into training and test sets according to 6:4,7:3,8:2,and 9:1,different training schemes were configured to statistically analyze the similarity matching accuracy of the model.In the model training process,this study used a server equipped with two Tesla V100s GPUs,and different model training schemes had the same rounds,batches,and iteration parameters.The NDCG index of the recommendation system was used to determine the ranking effect of each model.Considering the workload of business personnel to analyze the matching results,this study mainly used NDCG@1 and NDCG@3 indicators.The training and testing results are presented in Table 2.

Table 2 Neural information retrieval model test results

The trained algorithm model realizes the matching search query ability of spatiotemporal sequence graph data that integrates entity attribute and graph topology information.The above five training and testing schemes return results within 8s through the similarity matching analysis of the neural information retrieval model,which meets the performance requirements of business analysis.

According to the above test results,taking the first column as an example,the ratio of the training set to the test set of Testing1 is 6:4.The NDCG@1 and NDCG@3 indices of all neural information retrieval models are the worst and gradually increase with the addition of matching results.Overall,as the proportion of the training and test sets increases,the matching accuracy improves significantly,indicating that the state type of the operating section of the power distribution system is a finite set.Testing4 increases or decreases the power of a single distribution transformer of 2% of the samples in training set by 5%.The NDCG@1 and NDCG@3 indices of all neural information retrieval models decreased,but Conv-KNRMA decreased less than Conv-KNRM,indicating that the attention mechanism can improve the generalization ability of the model.However,the experimental results did not achieve high accuracy,and it is speculated that the dataset does not contain the vast majority of the state types of the operating section of the power grid,or that the number of recommendations is too small.In the future,we can consider obtaining historical operating sections with a larger time span,re-training the model by increasing the training dataset,and adding the adjustment of attention weight to further improve the accuracy of model similarity matching.

6 Conclusion

Currently,the application of graph data analysis computing technology and graph neural network algorithms in the field of electric power is still in the initial exploration stage,and the various graph neural network models constructed are only applicable to specific scenarios.This study first proposes a neural information retrieval technology framework based on graph data and a graph neural network algorithm,which integrates the attention mechanism and aims at the scattered operating data in the power distribution system.The method and process of data access and processing are established,and the fusion graph data structure of operating data in the power distribution system and distributed energy is designed.Third,combined with machine learning and deep learning algorithms,a lowdimensional vector embedding representation learning algorithm that combines the characteristics of historical operating section nodes and related features was constructed,and a neural information retrieval algorithm model based on graph vector similarity matching was proposed.Finally,the neural information retrieval algorithm model was trained and tested using the 6-month’s fusion operating section data of a distribution backbone network in a certain province.Through a comparative analysis of multiple model results,it is verified that the model has a better result in the similarity matching of the historical operating section.

Theoretically,the neural information retrieval method proposed in this study is highly dependent on the graph representation vector generation method of multi-source fusion data in the distribution field.When the retrieval data set is greatly increased,the incremental training data generation efficiency of the model will be greatly reduced.Therefore,the design graph representation vector fast generation method will be the focus of our future research.

Acknowledgments

This study was supported by the National Key R&D Program of China(2020YFB0905900).

Declaration of Competing Interest

We declare that we have no conflict of interest.

References

[1]Bern A(2010)Integrating AMS and advanced sensor data with distribution automation at Oncor.IEEE PES T&D.New Orleans,LA,USA.IEEE,1-5

[2]Larry Clark G(2014)A changing map:Four decades of service restoration at Alabama power.IEEE Power and Energy Magazine,12(1):64-69

[3]Mark G(2011)Design and Implementation of an Innovative Telecontrol System in the Vattenfall Medium-Voltage Distribution Grid.21st International Conference on Electricity Distribution:No 0568,1-4,Frankfurt,Germany,6-9 June 2011

[4]Koizumi S,Okumura M,Yanase T(2005)Application and development of distribution automation system in TEPCO.IEEE Power Engineering Society General Meeting.San Francisco,CA,USA.IEEE,2429-2435

[5]Wang Z Y(2006)Developed case-based reasoning system for short-term load forecasting.2006 IEEE Power Engineering Society General Meeting.Montreal,QC,Canada.IEEE,6pp

[6]Wu H C,Lu C N(2002)A data mining approach for spatial modeling in small area load forecast.IEEE Transactions on Power Systems,17(2):516-521

[7]Huang Y H,Yu Z H,Shi D Y,et al.(2016)Strategy of huge electric power system stability quick judgment based on massive historical online data.Proceedings of the CSEE,36(3):596-603(in Chinese)

[8]Zhao F,Sun H B,Huang T E,et al.(2015)Design and engineering application of automatic discovery system for critical flowgates and security operation rules in power grids.Automation of Electric Power Systems,39(1):117-123(in Chinese)

[9]Malewicz G,Austern M H,Bik A J C,et al.(2009)Pregel:A system for large-scale graph processing - “ABSTRACT”.Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures.Calgary,AB,Canada.New York:ACM,48

[10]Gao Z P,Zhao Y,Yu Y L,et al.(2020)Low-voltage distribution network topology identification method based on knowledge graph.Power System Protection and Control,48(2):34-43(in Chinese)

[11]Miller J J(2013)Graph database application and concepts with NEO4J.Proceedings of the Southern Association for Information Systems Conference,Atlanta,USA,141-147

[12]Deutsch A,Xu Y,Wu M,et al.(2019)TigerGraph:A native MPP graph database.arXiv:1901.08248.https://arxiv.org/abs/1901.08248

[13]Shi C,Li Y T,Yu P S,et al.(2016)Constrained-meta-path-based ranking in heterogeneous information network.Knowledge and Information Systems,49(2):719-747

[14]Shi C,Li Y T,Yu P S,et al.(2016)Constrained-meta-path-based ranking in heterogeneous information network.Knowledge and Information Systems,49(2):719-747

[15]Melnik S,Garcia-Molina H,Rahm E(2002)Similarity flooding:a versatile graph matching algorithm and its application to schema matching.Proceedings 18th International Conference on Data Engineering.San Jose,CA,USA.IEEE,117-128

[16]Tan J,Zhang G F,Liu G Y,et al.(2019)Graph computing based power distribution system modeling and analysis.Distribution &Utilization,36(11):28-34,54(in Chinese)

[17]Yuan C,Lu Y,Feng W,et al.(2020)Graph computing based distributed fast decoupled power flow analysis.2019 IEEE Power&Energy Society General Meeting(PESGM).Atlanta,GA,USA.IEEE,1-5

[18]Liu K W,Zhang G F,Yuan C,et al.(2018)Fast nonlinear iterative method based on graph calculation for power flow calculation.Electric Power Information and Communication Technology,16(10):19-24(in Chinese)

[19]Huang H,Dai J P,Wang Y,et al.(2019)Graph database based construction and network topology of CIM/E for power grid.Automation of Electric Power Systems,43(22):122-129(in Chinese)

[20]Wu W C,Zhang B M(2002)A graphic database based network topology and its application.Power System Technology,26(2):14-18(in Chinese)

[21]Jiang H M,Sun H,Kong Z(2012)A quick electric network topology technology based on graph database.Computer Systems&Applications,21(12):173-176(in Chinese)

[22]Huang T E,Sun H B,Guo Q L,et al.(2015)Knowledge management and security early warning based on big simulation data in power grid operation.Power System Technology,39(11):3080-3087(in Chinese)

[23]Huang Y H,Yu Z H,Shi D Y,et al.(2016)Strategy of huge electric power system stability quick judgment based on massive historical online data.Proceedings of the CSEE,36(3):596-603(in Chinese)

[24]Liang H P,Tian C,Wang T Q,et al.(2019)Running section similarity matching based on improved K-means algorithm.Electric Power Automation Equipment,39(7):119-124,140(in Chinese)

[25]Tian C(2019)Study on running section similarity matching based on spatio-temporal information of power grid.Beijing:North China Electric Power University(in Chinese)

[26]Ullmann J R(1976)An algorithm for subgraph isomorphism.Journal of the ACM,23(1):31-42

[27]Cordella L P,Foggia P,Sansone C,et al.(2004)A(sub)graph isomorphism algorithm for matching large graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence,26(10):1367-1372

[28]Zhang S J,Li S R,Yang J(2009)GADDI:Distance index based subgraph matching in biological networks.Proceedings of the 12th International Conference on Extending Database Technology:Advances in Database Technology.Saint Petersburg,Russia.New York:ACM,192-203

[29]He H H,Singh A K(2010)Query language and access methods for graph databases.Managing and Mining Graph Data.Boston,MA:Springer US,125-160

[30]Zhao P X,Han J W(2010)On graph query optimization in large networks.Proceedings of the VLDB Endowment,3(1-2):340-351

[31]Guo J F,Fan Y X,Ai Q Y,et al.(2016)A deep relevance matching model for ad-hoc retrieval.Proceedings of the 25th ACM International on Conference on Information and Knowledge Management.Indianapolis,Indiana,USA.New York:ACM,55-64

[32]Xiong C Y,Dai Z Y,Callan J,et al.(2017)End-to-end neural adhoc ranking with kernel pooling.arXiv:1706.06613.https://arxiv.org/abs/1706.06613

[33]Dai Z Y,Xiong C Y,Callan J,et al.(2018)Convolutional neural networks for soft-matching N-grams in ad-hoc search.Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining.Marina Del Rey,CA,USA.New York:ACM,126-134

[34]Perozzi B,Al-Rfou R,Skiena S(2014)DeepWalk:Online learning of social representations.Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.New York,USA.New York:ACM,701-710

[35]Mikolov T,Chen K,Corrado G,et al.(2013)Efficient estimation of word representations in vector space.arXiv:1301.3781.https://arxiv.org/abs/1301.3781

[36]Zou D F,Hu Z N,Wang Y W,et al.(2019)Layer-dependent importance sampling for training deep and large graph convolutional networks.arXiv:1911.07323.https://arxiv.org/abs/1911.07323

[37]Xu K,Ba J L,Kiros R,et al.(2015)Show,attend and tell:Neural image caption generation with visual attention.Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37.Lille,France.New York:ACM,2048-2057

[38]Chen L,Zhang H W,Xiao J,et al.(2017)SCA-CNN:Spatial and channel-wise attention in convolutional networks for image captioning.2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,HI,USA.IEEE,6298-6306

Received:8 October 2022 Accepted:16 January 2023/Published:25 Feburary 2023

pagenumber_ebook=19,pagenumber_book=15 Kai Xiao

xiaokai1@epri.sgcc.com.cn

Daoxing Li

lidaoxin@epri.sgcc.com.cn

Xiaohui Wang

wangxiaohui@epri.sgcc.com.cn

Pengtian Guo

guopengtian@epri.sgcc.com.cn

Yong Chen

ychen@epri.sgcc.com.cn

2096-5117/© 2023 Global Energy Interconnection Development and Cooperation Organization.Production and hosting by Elsevier B.V.on behalf of KeAi Communications Co.,Ltd.This is an open access article under the CC BY-NC-ND license(http://creativecommons.org/licenses/by-nc-nd/4.0/).

Biographies

Kai Xiao received master degree at North China Electric Power University,Baoding,in 2013.He is working in China Electric Power Research Institute Co.,Ltd.His research interests include power big data technology,power graph computing and power marketing business.

Daoxing Li received master degree at North China Electric Power University,Beijing,in 2021.He is working in China Electric Power Research Institute Co.,Ltd.His research interests include artificial intelligence and graph computing.

Pengtian Guo received master degree at North China Electric Power University,Beijing,in 2020.He is working in China Electric Power Research Institute Co.,Ltd.His research interests include power Internet of things and artificial intelligence.

Xiaohui Wang received the Doctor’s degree from North China Electric Power University,Beijing,2012.He is currently working at the China Electric Power Research Institute Co.,Ltd.Beijing.His research interests include power big data technology,artificial intelligence,active distributed network,energy internet.

Yong Chen received the Doctor’s degree from Huazhong University of Science and Technology,Wuhan.He is working in China Electric Power Research Institute Co.,Ltd.His research interests include high performance computing,artificial intelligence.

(Editor Dawei Wang)