More than 5 years have passed since last update.

KDD2019 Research Track Session RT15: Mining in Emerging Applications II

DataMining

Last updated at 2019-08-12Posted at 2019-08-07

Research Track Session RT15: Mining in Emerging Applications II – Summit 4, Ground Level, Egan Center

Chair: Petko Bogdanov

Optimizing Impression Counts for Outdoor Advertising

Yipeng Zhang (RMIT University); Yuchen Li (Singapore Management University ); Zhifeng Bao (RMIT University); Songsong Mo
(Wuhan University & RMIT University); Ping Zhang (Huawei)

How a billboard impress an audience?
Influence Measurement
The logistic function (Advertising market and Customer behavior)
The effectiveness of advertisement repetition varies from one person to another.

Influence meautrement is not submodular

no approximation ratio for a greedy-based algorithm

NP-hard to approximate within any constant factor

Upper-bound estimation
Branch-and-Branch Framework

optimization

BBS: branch-and-Bound Framework
PBBS: Branch-and-Bound Framework with Progressive Bound-Estimation

Data: trajectory dataset: NYC, LA
experiment: algorithm
greedy,top-k, BBS, PBBS, LazyProbe
Varying the budget B
Varying the number of Trajectories
Scalability test in NYC
Comparison with LazyProbe

Conclusion

real problem
meet more than one billboard in each travel
not uniform cost od billboards
budget
real solution
while having the approximation guarantee
real-world trajectory dataset and billboard

Three-Dimensional Stable Matching Problem for Spatial Crowdsourcing Platforms

Boyang Li (Northeastern University); Yurong Cheng (Beijing Institute of Technology); Ye Yuan (Northeastern University); Guoren
Wang (Beijing Institute of Technology); Lei Chen (The Hong Kong University of Science and Technology)

No talk

Hidden POI Ranking with Spatial Crowdsourcing

Yue Cui (University of Electronic Science and Technology of China); Liwei Deng (University of Electronic Science and Technology
of China); Yan Zhao (Soochow University); Bin Yao (Shanghai Jiao Tong University); Vincent W. Zheng (WeBank); Kai Zheng
(University of Electronic Science and Technology of China)

Inequity of business market
how to generate candidate tasks and distribute proper tasks to proper workers with a goal of aggregating an accurate ranking got comparable H-POI

H-POI ranking consideration

Maintain effectiveness
worker quality
task generation
Improve efficiency
Budget
Time

Hidden POI ranking with spatial crowdsourcing

Preview: key Definitions

spatial task
comparable H-POI
Valid Task Set(VTS)

Minimum VTS (MinVTS) Greedy Search Algorithm
Find MinVTS
Worker reliability
category reliability
Adapted from metapath2vec
Three meta paths are designed on check-in graph
area reliability
adapted from X-means: AIC BIC based X-means clustering

adaptive number of clusters can be selected
simplify reliability calculation in an internal based approach

Ranking aggregation
main purpose: Predict a gold-standard ranking that hinges on combining pairwise comparisons via crowdsourcing

closely related work: crowdBT(Chen et al.)
Tree-constrained Skip (TCS) Search
construct a minimum spanning trees(MSTs)

with H-POI ranking supervised(TCSS)
information gain

data: yelp-dataset
Effect of sampling number
Effect of k(kappa)
Effect of H-POI number

Conclusion

Analyzed the necessities of H-PUI exploration
Proposed That can aggregate H-POI ranking from pairwise comparison of the crowd
The proposed approach can greatly reduce text pair searching time cost

Hidden Markov Contour Tree: A Spatial Structured Model for Hydrological Applications

Zhe Jiang (University of Alabama); Arpan Man Sainju (University of Alabama)

Spatial structured prediction

structured prediction
syntax tree in linguistics, music note sequence
spatial structured prediction
terrain map in hydrology
potential energy landscape in material science

Data-Driven Approach: The Fourth Paradigm of Scientific Discovery
The pitfalls of Data-driven approach

Generalizability
Interpretability
Reproducibility
scalability
Limited ground truth

flood mapping in Disaster Response
flood mapping in Hydrology
National water forecasting
In: spatial raster framework
Out: Aspacial classification model
Objective: Minimize classification errors
Constraints: explanatory feature layers contain noise

challenges

noise obstacles in imagery > cloud, shadows, tree canopies
special structure on 3D surface
large data volumes

related works

spatial proximity
- MRF, CRF, SAR
- Gaussian process(Kriging)
- CNN, Embedding
  spartial network
- network kringing
geometric DL
Graph DL

Contour Tree on 3D surface
Contour tree (poly-tree structure)
Collapsed contour tree
Hidden Markov contour tree
Hidden Markov Contour Tree(HMCT)

probabilistic graphical model
Assumptions:
feature is Gaussian
class transitional probability
HMCT: Parameter Learning
Approach: EM algorithm

experimental setup

data: Hurricane Mathew flood, NC, 2016
two study areas: Greenville NC, Grimesland NC

conclusion

HCMT model
contour tree construction, parameter learning, class inference algorithm

Urban Traffic Prediction from Spatio-Temporal Data Using Deep Meta-Learning

Zheyi Pan (Shanghai Jiaotong University); Yuxuan Liang (Xidian University); Weifeng Wang (Shanghai Jiaotong University); Yong
Yu (Shanghai Jiaotong University); Yu Zheng (JD Intelligent Cities Research); Junbo Zhang (JD Intelligent Cities Research)

Intro

Important for
trafic management
public risk assesment
public safety

challenge

spartial correlations
temporal correlations

challenge diversity of Spatio-temporal correlations

characteristic of locations and their mutual relationship are diverse
conventional method: ARIMA, GBRTm linear model
deep model
spatial network temporal network to model ST correlations
use a single model to predict traffic on all location

Insights

build a geo graph to describe spatial structure
Node: locations
Edges relation between locations
geographical features reveal characteristics of nodes and edges & impact different types of ST correlations

Overview od ST-MetaNet

Encoder-Decoder
meta graph Attention network
meta recurrent network
meta knowledge learner
meta learner

meta graph attention

calculate attention scores
get meta knowledge about edge
weight generation
softmax&linear combination

Meta gated recurrent unit

evaluation

data: TaxiBJ, METR-LA
Metrics: MAE & RMSE
use much less number of parameters
evaluation on meta networks
evaluation on meta knowledge

validate that meta-knowledge can reveal the similarity of ST corrections on nodes

conclusion

deep meta learning based framework for Spatio-temporal data
meta graph attention for modeling diverse spatial correlations
meta gated recurrent Unit for modeling diverse temporal correlations
achieve significant improvement on two real-world traffic prediction task

Co-Prediction of Multiple Transportation Demands Based on Deep Spatio-Temporal Neural Network

Junchen Ye (Beihang University); Leilei Sun (Beihang University); Bowen Du (Beihang University); Yanjie Fu (University of Central
Florida); Xinran Tong (Beihang University); Hui Xiong (Rutgers University)

background

sharing transportation plays important roles
pickup demand

related-work

hand-designed features
LSTM/ConvLSTM + CNN

summary

employ the model spatio + temporal
USE CNN directly
Single transport to predict single transport

from a Micro View

Bases Decomposing
wavelet Tranform

Base decomposition

decomposing methods
clustering
linear decomposing
proposed: autoencoder(applied)

Heterogeneous Information Fusion

In fact, there are some deep correlations between different transportation

extract the hidden correlation between different
transport to improve the prediction accuracy

Our method: Cost-net

transportation pattern decomposing
heterogeneous information fusions
spatial autoencoder & heterogenious Information Fusion

Experiment

Data: NYC Citi Bike, NYC taxi
Timeshift: small size of a dataset will hurt the high-level features learning for autoencoder.
cut original data with, all-time shifts expanse the dataset to train an autoencoder
not only expanse the dataset but, help to learn the traffic feature

How to evaluate methods?

best/mean result? No
In the box plot, our method has the best mean perform and smallest variance

a channel single to single, two-channel double to double and our method is four-channel
the more channels we employ, the better results we get

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up