Support vector machine with quantile hyper-spheres for pattern classification

Maoxiang Chu; Xiaoping Liu; Rongfen Gong; Jie Zhao

doi:10.1371/journal.pone.0212361

Abstract

This paper formulates a support vector machine with quantile hyper-spheres (QHSVM) for pattern classification. The idea of QHSVM is to build two quantile hyper-spheres with the same center for positive or negative training samples. Every quantile hyper-sphere is constructed by using pinball loss instead of hinge loss, which makes the new classification model be insensitive to noise, especially the feature noise around the decision boundary. Moreover, the robustness and generalization of QHSVM are strengthened through maximizing the margin between two quantile hyper-spheres, maximizing the inner-class clustering of samples and optimizing the independent quadratic programming for a target class. Besides that, this paper proposes a novel local center-based density estimation method. Based on it, ρ-QHSVM with surrounding and clustering samples is given. Under the premise of high accuracy, the execution speed of ρ-QHSVM can be adjusted. The experimental results in artificial, benchmark and strip steel surface defects datasets show that the QHSVM model has distinct advantages in accuracy and the ρ-QHSVM model is fit for large-scale datasets.

Citation: Chu M, Liu X, Gong R, Zhao J (2019) Support vector machine with quantile hyper-spheres for pattern classification. PLoS ONE 14(2): e0212361. https://doi.org/10.1371/journal.pone.0212361

Editor: Friedhelm Schwenker, Ulm University, GERMANY

Received: October 26, 2018; Accepted: January 31, 2019; Published: February 15, 2019

Copyright: © 2019 Chu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: UCI datasets are available from: http://archive.ics.uci.edu/ml/datasets.html. The specific datasets underlying this study are listed S1 Text. The PASCAL VOC dataset is available from: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html. Strip steel surface defects datasets are within Supporting Information files. All other information are within the paper.

Funding: This work was supported by Liaoning Province PhD Start-up Fund (No. 201601291 to MC), Natural Science Foundation of Liaoning Province of China (20180550067 to MC), Liaoning Province Ministry of Education Scientific Study Project (No. 2017LNQN11 to MC and No. 2016TSPY13 to RG) and State Key Laboratory of Robotics and System (HIT) (No. SKLRS-2016-KF-08 to MC).

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Support vector machine (SVM) [1] proposed by Vapnik and his cooperators has become an excellent tool for machine learning. SVM is a comprehensive technology by integrating the margin maximization principle, kernel skill and dual method. It has perfect statistical theory, which makes SVM be widely applied in many fields [2–4]. In spite of that, great efforts are needed to improve SVM. So, SVMs with different attributes have been proposed, such as least squares SVM (LS-SVM) [5], proximal SVM (PSVM) [6], v-SVM [7], fuzzy SVM (FSVM) [8] and pinball loss SVM (Pin-SVM) [9].

In 2007, Jayadeva et al. proposed a twin support vector machine (TWSVM) [10] for pattern classification. TWSVM is derived from generalized eigenvalue proximal SVM (GEPSVM) [11]. GEPSVM and the other multi-surface classifiers [12–13] are used to solve the XOR problems and reduce the computing time of SVM. Similarly, the TWSVM classifier determines two nonparallel separating hyper-planes by solving two quadratic programming problems (QPPs) with smaller size. TWSVM has advantages in classification speed and generalization, which makes TWSVM become a new popular tool for machine learning. Based on TWSVM, some extended TWSVMs have been proposed, such as least squares TWSVM (LS-TSVM) [14], twin bounded SVM (TBSVM) [15], twin parametric-margin SVM (TPMSVM) [16], Laplacian TWSVM (LTWSVM) [17] and weighted TWSVM with local information (WLTSVM) [18].

Support vector data description (SVDD) [19] inspired by support vector classifier is a one-class learning tool. SVDD implements the minimum volume description by building a hyper-sphere for target samples. When negative samples can be used, [19] provided a new SVDD with negative examples (SVDD_neg). SVDD_neg merges negative samples into training dataset to improve the description of hyper-sphere with the minimum volume. Different versions of classifiers have been extended from SVDD because the inner-class of samples can be gathered to the greatest extent. These classifiers include maximal-margin spherical-structured multi-class SVM (MSM-SVM) [20], twin support vector hyper-sphere (TSVH) [21], twin-hypersphere support vector machine (THSVM) [22], maximum margin and minimum volume hyper-spheres machine with pinball loss (Pin-M³HM) [23] and least squares twin support vector hyper-sphere (LS-TSVH) [24].

A main challenge for all versions of SVM is to avoid the adverse impact of noise. As mentioned in [9], classification problems may have label noise and feature noise. So, anti-noise versions of SVM have been proposed. [13] proposed L1-norm twin projection support vector machine. In [13], L1-norm is shown to be robust to noise and outliers in data. [25] overcame noise impact on LS-SVM with weight varying. [26] adopted a robust optimization method in SVM to deal with uncertain noise. [27] built a total margin SVM with separating hyperplane which is insensitive to noise. [8] built a fuzzy SVM by applying a fuzzy member into each input sample. Fuzzy SVM can restrain the adverse effect brought by noise. These versions of SVMs have achieved some success in avoiding the adverse impact of noise, but they are not good at dealing with the feature noise around the decision boundary. In 2014, Huang et al. [9] designed a novel Pin-SVM by introducing pinball loss. Pin-SVM uses pinball loss to replace hinge loss, which makes Pin-SVM not only maintain the good property of SVM, but also be less sensitive to noise, especially the feature noise around the decision boundary. As such, the pinball loss has been successively introduced into different versions of SVM in [23], [28] and [29].

In this paper, a novel support vector machine with quantile hyper-spheres (QHSVM) for pattern classification is proposed. It inherits the excellent genes of SVDD_neg, TWSVM and Pin-SVM. QHSVM has the following attributes and advantages.

QHSVM adopts pinball losses instead of hinge losses. The hinge losses with maximizing the shortest distance between two classes of samples are sensitive to noise. The pinball losses adopt quantile distance to replace the shortest distance. The quantile distance depending on many samples reduces the sensitivity to noise, especially the feature noise around the decision boundary. So, QHSVM improves the anti-noise ability of hyper-spheres by using the pinball losses.
QHSVM searches for two quantile hyper-spheres with the same center for positive or negative samples. On the premise of using pinball losses, the volume of one quantile hyper-sphere is required to be as small as possible, while that of the other one is required to be as big as possible. Moreover, QHSVM requires the target samples to be close to the same center of two hyper-spheres as much as possible. These attributes ensure that the margin maximization principle and the inner-class clustering maximization of samples are implemented.
QHSVM has a QPP for positive or negative samples. The QPP makes one class as a target class and makes the other class as a negative class. QHSVM explores the potential information of target samples to the greatest extent. And the negative samples are used to improve the description of hyper-sphere. These attributes improve the generalization of QHSVM.
In order to meet the classification requirement of high efficiency, a new local center-based density estimation method is proposed. And QHSVM with surrounding and clustering samples (ρ-QHSVM) is given. The local center-based density estimation method can appropriately split training samples into surrounding samples and clustering samples. The hyper-spheres of ρ-QHSVM will be described by sparse surrounding samples, while the center of hyper-spheres will be clustered by clustering samples.

In [23], Pin-M³HM also has the genes of THSVM and Pin-SVM. It seems that our QHSVM is similar to Pin-M³HM. In fact, our QHSVM is different from Pin-M³HM in the above attributes (b), (c) and (d). Furthermore, our QHSVM formulates two QPPs with the same structures, but Pin-M³HM has two QPPs with different structures.

This paper is organized as follows. Section 2 reviews related work. Section 3 proposes the model of QHSVM and the local center-based density estimation method. Section 4 solves the new QHSVM and ρ-QHSVM. Section 5 deals with experimental results and Section 6 contains concluding remarks.

2. Related work

2.1. Support vector machines with hinge loss and pinball loss

For binary classification, the hinge loss is widely used. The hinge loss proposed in [1] brings popular standard SVM classifier. Suppose a training dataset T_r = {(X₁,y₁),(X₂,y₂),⋯,(X_m,y_m)}, where and y_i∈{1,−1}. Standard SVM searches for an optimal separating hyperplane w^Tφ(x)+b = 0 by convex optimization, where , and φ(⋅) is a nonlinear feature mapping function. Its corresponding optimization problem can be described as follows: (1) where c is a trade-off parameter. The hinge loss (L_h)_i is given by (2) where u_i = 1−y_i(w^Tφ(X_i)+b). Substituting (2) into (1), the final QPP of SVM can be obtained: (3)

QPP (3) of SVM searches for two support hyper-planes w^Tφ(x)+b = ±1 by maximizing the shortest distance between two classes of samples. The support hyper-planes belong to boundary hyper-planes. So, SVM is sensitive to noise. In 2014, Huang et al. [9] proposed a Pin-SVM classifier by introducing the pinball loss into standard SVM. Pin-SVM has the good property of standard SVM and is insensitive to noise, especially the feature noise around the decision boundary. The pinball loss in [9] is just like the following: (4) where τ is an adjusting parameter. Replacing (L_h)_i in (1) with (L_τ)_i, the QPP of Pin-SVM can be obtained: (5)

Pin-SVM is insensitive to noise because the pinball loss is correlated with quantiles [30–31]. The pinball loss in (5) changes the idea of (3) into maximizing the quantile distance. Specially, when τ→0, Pin-SVM reduces to SVM. And the decision functions of QPPs (3) and (5) can be determined by using Lagrangian function, Karush-Kuhn-Tucker (KKT) condition and kernel function. Their formulas can be found in [1] and [9].

2.2. Twin support vector machine

TWSVM determines two nonparallel hyperplanes by optimization two QPPs, which is different from standard SVM. Each QPP of TWSVM is very much in line with standard SVM. Its size is smaller than single QPP of SVM. So, TWSVM is comparable with SVM in classification accuracy and has higher efficiency. Moreover, TWSVM is excellent at dealing with the dataset with cross planes.

Suppose that is a sample in class+1 and is a sample in class-1, where i = 1,2,⋯,m⁺ and j = 1,2,⋯,m⁻. Two QPPs of TWSVM can be described as follows: (6) (7) where and are trade-off parameters. QPPs (6) and (7) determine two nonparallel support hyper-planes (w⁺)^Tφ(x)+b⁺ = 0 and (w⁻)^Tφ(x)+b⁻ = 0. A new sample is assigned to class +1 or -1 depending on the following decision function: (8)

2.3. Support vector data description with negative examples

SVDD is an efficient method to solve a one-class data description problem. It builds a hyper-sphere to cover one class of target samples by the description of the minimum volume. The hyper-sphere embodies the inner-class clustering maximization of samples. Based on SVDD, SVDD_neg adds negative samples. When negative samples can be used, they can improve the hyper-sphere description of target samples. QPP of SVDD_neg can be given by (9) where R and C are the radius and center of the hyper-sphere respectively. QPP (9) requires the target samples be inside of the hyper-sphere and the negative samples be outside of the hyper-sphere. On one hand, this requirement ensures that the hyper-sphere describes a closed boundary around the target samples well. On the other hand, it can be used to distinguish the target samples and negative samples. Inspired by SVDD_neg, some classifiers with hyper-sphere have been proposed in [20–24].

3. Support vector machine with quantile hyper-spheres

3.1. Pinball losses for quantile hyper-spheres

The idea of QHSVM is similar to SVDD_neg in building a hyper-sphere. However, it needs to build two hyper-spheres with the same center for target samples, which is different from SVDD_neg. We consider a support vector machine with boundary hyper-spheres (BHSVM). BHSVM has two boundary hyper-spheres with the same center for target samples, which are shown in Fig 1(A). For binary classification, is firstly considered as a target sample. So, is considered as a negative sample. These two boundary hyper-spheres must satisfy the following inequality constraints: (10) (11) where R⁺ is the radius of the boundary hyper-sphere covering the target samples and is the radius of the other boundary hyper-sphere. C⁺ is the center of the two hyper-spheres. And the negative samples are outside of the hyper-sphere with the radius . and are the corresponding slack variables. Moreover, BHSVM requires min R⁺ and . So, BHSVM satisfying (10) and (11) maximizes the shortest distance between two classes of samples. Hinge losses are adopted in (10) and (11), which can be given as (12) where (13)

Download:

Fig 1. The illustration of boundary hyper-spheres, quantile hyper-spheres, hinge losses and pinball losses.

(a) Two boundary hyper-spheres for BHSVM, (b) Two quantile hyper-spheres for QHSVM, (c) Hinge losses for BHSVM and (d) Pinball losses for QHSVM.

https://doi.org/10.1371/journal.pone.0212361.g001

The hinge losses (12) are shown in Fig 1(C). It is known that the hinge losses are sensitive to noise [9]. In order to reduce the adverse effect brought by noise, QHSVM is generated by introducing the pinball losses to BHSVM. At this term, QHSVM inherits the ideas of Pin-SVM. The pinball losses for quantile hyper-spheres can be expressed as follows: (14)

The pinball losses (14) are shown in Fig 1(D). If the hinge losses in (10) and (11) are replaced by (14), then two inequality constraints with pinball losses can be obtained: (15) (16)

Under the constraints of (15) and (16), the hyper-spheres of QHSVM are insensitive to noise because they are quantile hyper-spheres. The quantile hyper-spheres are shown in Fig 1(B). Maximizing the quantile distance instead of maximizing the shortest distance is implemented. Compared with (10), (15) requires that some samples must be distributed outside of the hyper-sphere, which can be controlled with parameter τ. That is to say, maximizing the quantile distance of QHSVM depends on a number of samples. So, QHSVM is insensitive to noise, especially the feature noise around the decision boundary. When τ→0, (15) becomes (10). For (16), a similar conclusion can be drawn.

For binary classification, the other case is that is a target sample and is a negative sample. Similarly, the corresponding pinball losses can be obtained as (17) where (18)

And the inequality constraints with pinball losses can be expressed as: (19) (20) where C⁻ is the center of two quantile hyper-spheres. R⁻ and are the radii of the two quantile hyper-spheres respectively. and are the corresponding slack variables.

3.2. Primal formulation and analysis

For binary classification, consider two datasets and . Next, we formulate two QPPs with the inequality constraints (15), (16), (19) and (20): (21) (22) where and represent two datasets in class +1. and represent two datasets in class -1. The numbers of samples in four datasets are , , and respectively. For QHSVM, these datasets are specified as X⁺ or X⁻. So, for QHSVM, QPPs (21) and (22) need to satisfy the following condition: (23)

For QPP (21) with the condition (23), is the target sample, and is the negative sample. QPP (21) searches for two quantile hyper-spheres: Ω⁺ and . Their radii are R⁺ and . And the two hyper-spheres have the same center C⁺. The first term of objective function in QPP (21) minimizes (R⁺)², which tends to keep the volume of Ω⁺ as small as possible. The second term maximizes , which is to force the volume of as big as possible. On the other hand, minimizing (R⁺)² and maximizing mean to keep the margin between Ω⁺ and as big as possible, which embodies the margin maximization principle. The first and the second constraint conditions in QPP (21) make Ω⁺ be a quantile hyper-sphere controlled by τ instead of boundary hyper-sphere because some target samples fall outside of Ω⁺. The third and the fourth constraint conditions in QPP (21) also make be a quantile hyper-sphere controlled by τ because some negative samples fall inside of . These constraints make the maximum margin depend on many samples instead of few samples, which ensures QPP (21) is insensitive to noise, especially the feature noise around the decision boundary. The third and the fourth terms of objective function in QPP (21) are to minimize the sum of slack variables caused by some samples not satisfying the constraint conditions. The fifth term and constraint condition require the target samples to be distributed in the center of Ω⁺ as much as possible. In other words, the center of Ω⁺ is close to the cluster of target samples. This means our QHSVM exploits the prior structural information of target samples. Our QHSVM should be not sensitive to the structure of the data distribution. So, the term ensures that the inner-class clustering of samples is maximized. The last constraint condition ensures the radius of Ω⁺ is not smaller than that of . , and v⁺ are trade-off parameters.

For QPP (22) with the condition (23), is the target sample, and is the negative sample. QPP (22) is similar to QPP (21) in attribute and conclusion. So, it is not necessary to analyze again.

Similar to TWSVM, QHSVM builds two support hyper-spheres for binary classification. For QPP (21) with the condition (23), Ω⁺ with parameters C⁺ and R⁺ is referred to as the support hyper-sphere of the target sample . The negative sample is only used to improve the description of Ω⁺. Ω⁺ is described by using the margin maximization principle and inner-class clustering maximization of samples. It is insensitive to noise. is only used to implement the margin maximization principle. Similarly, for QPP (22) with the condition (23), Ω⁻ with parameters C⁻ and R⁻ is reckoned as the support hyper-sphere of . is only used to improve the description of Ω⁻. All mentioned above is helpful to improve the generalization of QHSVM.

For binary QHSVM, the following decision function can be obtained.

(24)

3.3 QHSVM with surrounding and clustering samples

For QPPs (21) and (22) with the condition (23), all training samples are used for optimization with inequality constraints, which means QHSVM is fit for classification without high efficiency requirement. For a highly efficient classification problem, we provide a QHSVM with surrounding and clustering samples, which is called ρ-QHSVM. The surrounding samples refer to samples that are distributed near the boundary of the quantile hyper-spheres. In the case of X⁺, its surrounding samples are distributed near the boundary of Ω⁺. The clustering samples refer to samples that are distributed near the center of the quantile hyper-spheres. The quantile hyper-spheres of ρ-QHSVM can be obtained by using sparse surrounding samples rather than all samples. So, these training samples should be divided into surrounding samples and clustering samples. In order to achieve it, a novel local center-based density estimation method is proposed.

Local center-based density estimation is originated from kernel density estimation in [32]. Kernel density estimation yields Gaussian weight by calculating the distance between a sample and its K-nearest neighbors. This kernel density weight can efficiently characterize the local geometry of samples manifold, but it can't capture surrounding samples in the training dataset. So, the local center-based density estimation method is designed.

Consider a training dataset X = {X_i|i = 1,2,⋯,m}. Firstly, the kernel function Ψ(X_i,X_l) = φ(X_i)⋅φ(X_l) is introduced. Then, the steps for a local center-based density estimation method are given in nonlinear feature mapping space:

Step 1: Calculate the square distance between each sample X_i and the others.

(25)

Step2: Search for K-nearest neighbors in nonlinear feature mapping space for each sample X_i.

(26)

Step 3: Calculate the mean of square distances for the training dataset.

(27)

Step 4: Calculate the kernel density weight for each sample X_i.

(28)

Step 5: Determine the center of K-nearest neighbors for sample X_i.

(29)

Step 6: The local center-based density of X_i is estimated as follows.

(30)

It can be seen from the above steps that the local center-based density of X_i is estimated with the distance between the sample and its K-nearest neighbors, where K is given by user. Moreover, the local center-based density is a Gaussian kernel density. When q_i = 1, . A bigger indicates that X_i is closer to its K-nearest neighbors. So, can be used to check if X_i is a clustering sample or an isolated sample. However, can’t be used to identify surrounding samples. The training dataset can be divided into clustering samples and surrounding samples from center to outside. The surrounding samples distributed near the boundary of the quantile hyper-spheres deviate the center of K-nearest neighbors. Fig 2 shows that the surrounding sample x_s is far from the center of K-nearest neighbors, while the clustering sample x_c is close to that center of K-nearest neighbors. This is their distinctive characteristics. So, q_i is used to represent the deviation degree. When q_i≠1, each ρ_i must be compensated with q_i. ρ_i is called as local center-based density. The smaller ρ_i is, the closer X_i is to boundary. The bigger ρ_i is, the closer X_i is to clustering region. On the other hand, Gaussian kernel parameter δ² is set as . is the mean of square distances for the training dataset, which makes (30) fit for different training datasets with different clustering degrees.

Download:

Fig 2. The illustration of surrounding sample, clustering sample and the centers of K-nearest neighbors.

The surrounding sample x_s and the clustering sample x_c (black dots with red circles), K-nearest neighbors (black dots with blue circles) and the centers of K-nearest neighbors (blue dots).

https://doi.org/10.1371/journal.pone.0212361.g002

For binary classification, consider datasets and . Their local center-based density and are estimated with (28). Based on magnitude of , X⁺ is divided into and by ratio ε(0<ε<1). (31) where and are the local center-based densities of and respectively. The principle of division is . So, is a surrounding sample with small local center-based density. And is a clustering sample with big local center-based density. Fig 3 illustrates the results of surrounding samples and clustering samples with different ε in two-dimensional feature space. It can be seen that the surrounding samples distribute near the boundary of two-dimensional dataset. Similarly, based on magnitude of , X⁻ is divided into and according to ratio ε. (32) where and are the local center-based densities of and respectively. The principle of division is .

Download:

Fig 3. The illustration of surrounding samples and clustering samples with different ε in two-dimensional feature space.

The surrounding samples (“·”in red) and the clustering samples (“·” in black).

https://doi.org/10.1371/journal.pone.0212361.g003

Based on , , and , the two QPPs of ρ-QHSVM are expressed as (21) and (22). Comparing with QHSVM, and are changed as and respectively. and are sparse datasets because the number of samples is reduced greatly. And their sparseness is controlled by ε. This means that the number of samples with inequality constraints is greatly reduced. So, the optimization speed of ρ-QHSVM is improved. Moreover, it can be seen from QPP (21) that the optimization accuracy is controlled by boundary samples. So, for QPP (21), the surrounding samples sets and ensure the optimization accuracy because they include boundary samples. On the other hand, comparing with QHSVM, is changed as , which shows that the center of the support hyper-sphere is closer to the samples with higher clustering degree. And is a sparse dataset, because the number of samples is also reduced. So, the clustering samples improve optimization speed and accuracy with equality constraints. For QPP (22), similar attributes and conclusions can be obtained. In summary, ρ-QHSVM is fit for high efficiency classification.

4. Solution to ρ-QHSVM

Comparing with ρ-QHSVM, QHSVM has the additional condition (23). So, QHSVM can be considered as a special case of ρ-QHSVM. So, the solution of ρ-QHSVM is only given in this section. And the solution of QHSVM can be obtained from ρ-QHSVM.

In order to solve QPPs of ρ-QHSVM classifier, operators , , , and γ≥0 are introduced. The Lagrangian function of QPP (21) is (33)

Then, the KKT necessary and sufficient optimality conditions for QPP (21) are given by (34) (35) (36) (37) (38) (39) (40) (41) (42)

Define (43)

From (35) and (36), it follows that (44) (45)

According to (37), (38), (42) and (43), we can get (46)

Let . Then, (34) can be rewritten as (47)

Substituting (35), (36), (37) and (38) into (33) leads to (48)

By substituting (44), (45), (46) and (47) into (48), the dual QPP of (21) can be changed to (49)

The matrix form for QPP (49) can be expressed as (50) where (51) (52) (53) (54) (55) (56) (57)

Similarly, by taking (22) into account, the dual QPP can be expressed as (58) where , and , , and are Lagrangian operators. And (59)

The matrix form for QPP (58) can be expressed as (60) where (61) (62) (63) (64) (65) (66) (67)

The following can be obtained from (35), (36), (42) and (43).

(68)

According to (39), (40) and (68), (R⁺)² and can be derived as (69) where and . Similarly, the following can be obtained. (70) where and .

5. Experiments and results analysis

In order to test the performance of the proposed classification model, QHSVM, ρ-QHSVM, SVM, Pin-SVM, TWSVM and THSVM are compared by using artificial and benchmark datasets with noise. Moreover, ρ-QHSVM is used to classify strip steel surface defects datasets obtained from a steel plant in China. It must be noted that THSVM is an extended binary classifier based on SVDD_neg. In this experiment, the nonlinear classifiers adopt kernel function Ψ(X_i,X_l) = exp(−‖X_i−X_l‖²/2δ²). And the linear classifiers adopt Ψ(X_i,X_l) =X_i⋅X_l. All classifiers are solved and executed with MATLAB 7.11 on Windows 7 running on a PC with Intel Core CPU (3.2 GHz) and 4 GB RAM.

Moreover, for a fair comparison, all classifiers use the same quadprog solver in MATLAB. For QHSVM, some parameters need to be determined. In order to reduce the computation complexity, assume that , and v⁺ = v⁻ for QHSVM and ρ-QHSVM. This brevity method has also been used in [16], [22], [23], [29] and [32]. For TWSVM and THSVM, c₁ = c₂ and v₁ = v₂ are set. All parameters c's, v's and δ's are chosen from the set {2^l|l = −9,−8,⋯,10}. K is used to control the number of nearest neighbors. For the nearest neighbors’ algorithm, K is generally determined by grid search. In [18] and [32], K has been discussed. According to [18] and [32], K is set as 8. The parameter τ is chosen from {0.1,0.2,0.5,1}. There are some common parameter selection methods: exhaustive search, 5-fold cross validation, grid search and optimization search. In the experiments, in order to completely cut interactions between training and testing phases, the following selection methods are adapted. Firstly, we randomly split m_all samples into m_training training samples and m_testing testing samples, where m_all = m_training+m_testing. And the split step is repeated n_training times. Thus, n_training training/testing datasets are obtained. Then, the parameter values are determined by 5-fold cross validation and grid search for the i-th training dataset, where i = 1,2,⋯,n_training. The final classifier is set through the determined parameter values and is used to evaluate the accuracy for the i-th testing dataset. It can be seen that the step is repeated n_training times. Finally, we can obtain n_training testing accuracies. And the average accuracy and the standard deviation of all accuracies are calculated. The average accuracy and the standard deviation are used to evaluate the performance of the classifiers in UCI datasets and strip steel surface defects datasets. In artificial datasets, the average accuracy is used to represent the performance of the classifiers. To make statistical analysis sound, n_training is set as 50 and m_training = 5m_testing.

5.1 Artificial datasets

To illustrate the ability of QHSVM graphically, the 2-D artificial datasets with Gaussian distribution are adopted. Suppose the samples (i = 1,2,⋯,m⁺) satisfy Gaussian distribution N(μ₁,∑₁). And the mean μ₁ is [−0.38,−0.38]^T and covariance matrix ∑₁ is diag(0.1,0.1). Suppose the samples (j = 1,2,⋯,m⁻) also satisfy Gaussian distribution N(μ₂,∑₂) with μ₂ = [0.38,0.38]^T and ∑₂ = diag(0.03,0.03). Moreover, some samples in artificial datasets are introduced with noise around the decision boundary by using an adjustable parameter θ, which are called noise samples. θ is the ratio of the number of noise samples to the number of training samples. These noise samples affect the labels around the boundary. The labels of these noise samples are selected from {+1,−1} with equal probability. And the positions of these samples satisfy Gaussian distribution with the following parameters μ_n = [0,0]^T and Σ_n = diag(0.03,0.03).

Firstly, the dataset D₁ with m⁺ = 100 and m⁻ = 100 is built according to the above Gaussian distribution. Then, the dataset is obtained by introducing noise samples with θ = 10% into D₁. Fig 4 shows the training results on the datasets D₁ and for SVM, Pin-SVM, TWSVM, THSVM, QHSVM (τ = 0) and QHSVM (τ = 0.5) with linear kernel. It can be seen from Fig 4(A-1) that the decision boundary of SVM is obtained based on two parallel support hyper-planes. These two support hyper-planes belong to boundary hyper-planes. Compared with Fig 4(A-1), the boundary hyper-planes of SVM in Fig 4(A-2) change in position. The result proves that SVM is adversely affected by noise samples. The support hyper-planes of Pin-SVM are quantile hyper-planes. Many samples are added between two quantile hyper-planes, which dilutes the adverse impact of noise samples. So, the decision boundary has not changed much for Pin-SVM on D₁ and . Different from SVM, TWSVM uses two nonparallel support hyper-planes to describe two classes of samples. This attribute makes TWSVM be in favor of the description of training dataset, especially the dataset with cross planes. However, each support hyper-plane of TWSVM needs to be supported by a parallel boundary hyper-plane. So, noise samples also affect the nonparallel support hyper-planes of TWSVM. THSVM builds two support hyper-spheres. Each hyper-sphere covers one class of samples and keeps away from the other class of samples. THSVM maximizes the margin between the two classes and the inner-class clustering of samples. So, the decision boundary of THSVM becomes more reasonable. It can be seen from Fig 4(D-1) that the decision boundary of THSVM curves to the clustering samples. However, its two support hyper-spheres belonging to boundary are affected by noise samples near the boundary. If τ = 0, the quantile hyper-spheres reduce to the boundary hyper-spheres for QHSVM. So, QHSVM (τ = 0) includes two boundary hyper-spheres with the same center for every class of samples. It can be seen that it has similar attributes with THSVM. So, it is clear that the decision boundary of QHSVM (τ = 0) will be changed by noise samples. QHSVM (τ = 0.5) builds two quantile hyper-spheres with the same center for every class of samples. Compared with the boundary hyper-spheres, some samples are added inside or outside of the quantile hyper-spheres. These samples reduce the adverse impact caused by noise samples around the decision boundary. So, the training results of QHSVM (τ = 0.5) for D₁ and are not changed obviously, just like the support hyper-spheres and the decision boundary. Moreover, the decision boundary of QHSVM (τ = 0.5) for D₁ and are both reasonable, which are curved to clustering samples. All these results prove that QHSVM has better performance because it integrates the excellent attributes of Pin-SVM, TWSVM and THSVM.

Download:

Fig 4. The training results for six classifiers with linear kernel.

(a-i) SVM, (b-i) Pin-SVM, (c-i) TWSVM, (d-i) THSVM, (e-i) QHSVM (τ = 0) and (f-i) QHSVM (τ = 0.5), where i = 1, 2. (a-1) ‒ (f-1) for the dataset D₁ and (a-2) ‒ (f-2) for the dataset . The decision boundaries (blue thick solid curves), support hyper-planes or hyper-spheres (magenta and red thin solid curves) and the hyper-planes paralleling to support hyper-planes or the hyper-spheres having the same center of support hyper-spheres (magenta and red thin dashed curves). Two classes of samples (“·” and “+” in black) and noise samples (“·” and “+” in green).

https://doi.org/10.1371/journal.pone.0212361.g004

Then, the dataset D₂ with m⁺ = 200 and m⁻ = 200 is built. According to the prescribed rules, it is divided into the training dataset and testing dataset. And noise samples with θ = 0%, 5%, 10%, 20% are introduced into the training dataset respectively. At last, the testing accuracies for different classifiers with linear kernel are shown in Table 1. For θ = 0%, compared with SVM and TWSVM, THSVM and QHSVM have better classification accuracies, which shows that the nonparallel hyper-planes (hyper-spheres) and inner-class clustering of samples strengthen the performance of classifiers. For θ = 0%, the testing accuracy of Pin-SVM is lower than that of SVM. One possible reason is that there are some isolated samples in D₂, which can be seen from Fig 4(B-1). The only error point in Fig 4(B-1) deviates from the dataset with “+” in black. Quantile hyper-plane is sensitive to isolated samples as well as noise samples. For θ≠0%, QHSVM provides the best testing accuracy compared with the other classifiers. All these results show that QHSVM performs the best in accuracy for datasets with noise samples, which is due to pinball losses, two nonparallel support hyper-spheres and inner-class clustering of samples. For θ≠0%, the testing accuracy of Pin-SVM is higher than that of SVM, TWSVM and THSVM, which shows that the pinball loss can improve classifier's performance for datasets with noise samples. The testing accuracy of Pin-SVM is lower than that of QHSVM. The reason is that it does not have the attributes of inner-class clustering of samples and nonparallel support hyper-planes. Testing accuracies corresponding to different classifiers with nonlinear kernel are shown in Table 2. For all conditions, QHSVM has the best testing accuracy. Compared with Table 1, testing accuracies corresponding to all classifiers in Table 2 are improved, which shows that the classifiers with nonlinear kernel improves the classification results.

Download:

Table 1. The testing accuracies for linear classifiers on datasets with noise.

https://doi.org/10.1371/journal.pone.0212361.t001

Download:

Table 2. The testing accuracies for nonlinear classifiers on datasets with noise.

https://doi.org/10.1371/journal.pone.0212361.t002

Finally, in order to test the performance of ρ-QHSVM, the datasets D₃ (m⁺ = m⁻ = 100), D₄ (m⁺ = m⁻ = 400), D₅ (m⁺ = m⁻ = 700) and D₆ (m⁺ = m⁻ = 1000) are built. And noise samples with θ = 0% are introduced into these datasets. Nonlinear classifiers of SVM, Pin-SVM, TWSVM, THSVM and QHSVM are tested on accuracy and speed. Testing results are shown in Table 3. The conclusions on Table 3 are nearly the same with that on Table 2, which shows that QHSVM has excellent and stable performance for different-scale datasets. THSVM and TWSVM are faster than SVM, Pin-SVM and QHSVM. The reason is that these two classifiers solve two smaller QPPs instead of one large QPP used for SVM and Pin-SVM. The efficiency of QHSVM is the lowest because it solves two large QPPs to obtain better classification accuracy. So, QHSVM is not fit for high efficiency requirement. In order to solve the above problem, ρ-QHSVM with adjustable execution speed is proposed. It uses parameter ε to adjust the execution speed. The accuracy and execution time of ρ-QHSVM with different ε for different-scale datasets are shown in Table 3. The classification accuracy of ρ-QHSVM reduces as ε becomes small. The sparseness of surrounding samples and clustering samples is controlled by ε. This is caused by the fact that reducing ε means reducing the number of surrounding samples. Fewer surrounding samples inevitably reduce the classification accuracy for datasets with noise samples. For small-scale datasets, if ε is big, ρ-QHSVM is close to QHSVM, and exceeds the other classifiers in accuracy. Take the dataset D₃ as an example, when ε = 0.7, ρ-QHSVM is close to QHSVM in accuracy. For large-scale datasets, when ε is small, ρ-QHSVM is close to QHSVM in accuracy. And it also exceeds the other classifiers in accuracy. For the dataset D₆, the classification accuracy of ρ-QHSVM exceeds that of Pin-SVM when ε = 0.3, and is close to that of QHSVM when ε = 0.4. It can be seen from Table 3 that the smaller ε is, the higher the efficiency of ρ-QHSVM is. When ε≤0.4, ρ-QHSVM is the fastest classifier, which shows that the efficiency of ρ-QHSVM can be adjusted by ε. The results of Table 3 show that the improvement of execution time brought by ρ-QHSVM is limited for small-scale datasets under the premise of high accuracy. However, for small-scale datasets, this difference is insignificant because the execution time of classifiers is small. For large-scale datasets, the execution time of ρ-QHSVM is reduced greatly under the premise of high accuracy. For example, ρ-QHSVM has high efficiency and testing accuracy for the dataset D₆ when ε = 0.3. So, ρ-QHSVM is fit for large-scale datasets with high efficiency requirement.

Download:

Table 3. The testing accuracies and execution time for six classifiers on different-scale datasets with noise.

https://doi.org/10.1371/journal.pone.0212361.t003

5.2 UCI datasets with noise samples

In order to further test the performance of QHSVM, all classifiers are run on fifteen public benchmark datasets downloaded from the UCI Machine Learning Repository [33]. Ten small-scale or middle-scale datasets are used for testing accuracy, including Heart, Ionosphere, Breast, Thyroid, Australian, WPBC, Pima, German, Sonar and ILPD. And five large-scale datasets are used for testing accuracy and speed, including Wifi, Splice, Wilt, Musk and Spambase. The details of these original benchmark datasets are listed in Table 4. In order to highlight the anti-noise ability of QHSVM, the benchmark datasets with noise samples are tested. Each benchmark dataset is corrupted by zero-mean Gaussian noise. For each feature, the ratio of the variance of noise to that of the feature denoted as θ is set to be 0%, 5% and 10%. And all original and corrupted benchmark datasets are normalized before training.

Download:

Table 4. The details of fifteen benchmark datasets.

https://doi.org/10.1371/journal.pone.0212361.t004

Table 5 shows the testing accuracies of SVM, Pin-SVM, TWSVM, THSVM and QHSVM with nonlinear kernels on the ten benchmark datasets. It can be seen that QHSVM achieves the best testing accuracy for majority of datasets. For the original benchmarked datasets with θ = 0%, QHSVM and THSVM yield the best testing accuracy on 5 and 2 of 10 datasets respectively. And SVM, Pin-SVM and TWSVM yield the best testing accuracy on 1, 1 and 1 of 10 datasets respectively. This result shows that QHSVM and THSVM with nonparallel hyper-spheres and inner-class clustering of samples strengthen the performance of classifiers. It should be pointed out that QHSVM has obvious advantage for corrupted benchmark datasets. For the corrupted benchmark datasets with θ = 5% and θ = 10%, QHSVM and Pin-SVM yield the best testing accuracies on 13 and 5 of 20 datasets respectively. And SVM, TWSVM and THSVM yield the best testing accuracy on 2, 1 and 1 of 20 datasets respectively. This result shows that QHSVM and Pin-SVM are better than classifiers with hinge loss for the corrupted datasets. Moreover, for the original and corrupted benchmark datasets, QHSVM is superior to THSVM and Pin-SVM, because it has merits of pinball losses, nonparallel hyper-spheres and inner-class clustering of samples. This conclusion is the same as experimental results on the artificial datasets.

Download:

Table 5. The testing accuracies of five classifiers on ten benchmark datasets.

https://doi.org/10.1371/journal.pone.0212361.t005

Table 6 shows the classification accuracies and execution time of SVM, Pin-SVM, TWSVM, THSVM and ρ-QHSVM with nonlinear kernels on the five large-scale datasets. In the above section, it has been found that ρ-QHSVM improves the execution efficiency for the large-scale artificial datasets under the premise of high accuracy. This part of experiment also proves the same conclusion. According to the experimental results on the artificial datasets, parameter ε of ρ-QHSVM is set as 0.3. Compared with the other classifiers, the execution time of ρ-QHSVM is the shortest. The reason is that ρ-QHSVM solves smaller QPPs with inequality constraints. These smaller QPPs are produced on sparse surrounding samples. On the other hand, ρ-QHSVM achieves the best testing accuracy for the majority of datasets. For the original benchmark datasets with θ = 0%, ρ-QHSVM yields the best testing accuracy on 3 of 5 datasets, while for the corrupted benchmark datasets with θ = 5% and θ = 10%, ρ-QHSVM yields the best accuracy on 6 of 10 datasets. The reason is that the local center-based density estimation method ensures the reasonable division about surrounding samples and clustering samples. In general, ρ-QHSVM has higher efficiency and accuracy for large-scale datasets compared with the other classifiers.

Download:

Table 6. The testing accuracies and execute time of five classifiers on five large-scale datasets.

https://doi.org/10.1371/journal.pone.0212361.t006

5.3 PASCAL VOC dataset

The PASCAL VOC dataset [34] is a public benchmark dataset and is often used in challenge competitions for supervised machine learning. The dataset is composed of color images of twenty visual object classes in realistic scenes. In the experiment, the ten classes of them are chosen, such as person, cat, cow, dog, horse, sheep, bicycle, bus, car and motorbike. These color images are converted to the intensity images, then are resized to s times the size of the original color images so that they have the specified 4096 pixels, where s is a positive real number. So, each image is represented as a sample vector with 4096 elements. We choose 800 and 1600 vectors as training samples respectively and the others are testing samples. In order to highlight the anti-noise ability, the PASCAL VOC dataset with noise are built. The PASCAL VOC dataset is corrupted by zero-mean Gaussian noise. For each feature, the ratio of the variance of noise to that of the feature denoted as θ is set to be 5%. For brevity, we build ten nonlinear binary classifiers with one-against-rest method. Then, the mean of ten accuracies is presented in Fig 5.

Download:

Fig 5. The mean accuracies of five classifiers in two datasets.

(a) the PASCAL VOC dataset with noise, (b) the original PASCAL VOC dataset.

https://doi.org/10.1371/journal.pone.0212361.g005

It can be seen from Fig 5(A) that the performance of our QHSVM is superior to that of SVM, Pin-SAVM, TWSVM and THSVM in challenging the PASCAL VOC dataset with noise. The result highlights that the new attribute of pinball losses improves the anti-noise ability of our QHSVM. Furthermore, the attribute of nonparallel hyper-spheres strengthens the generalization performance of the classifier. Fig 5(B) shows that the average accuracy of QHSVM is not lower than that of other classifiers. This indicates that the QHSVM also has reliable performance in challenging the original PASCAL VOC dataset. The robustness of QHSVM is strengthened by maximizing the margin between two hyper-spheres with the same center and maximizing the inner-class clustering of samples. Moreover, two nonparallel quantile hyper-spheres improve the generalization of QHSVM. In addition, the performance of all classifiers is improved with the increase of training samples. In the case of more training samples, the performance of all classifiers in corrupted dataset is close to that in original dataset. Notably, the accuracies of our classifier in the two datasets are close. This also shows that our QHSVM has better robustness than other classifiers.

5.4 Strip steel surface defects datasets

Strip steel surface defects datasets are obtained from Northeastern University (NEU) surface database [35]. In the experiment, four typical defects datasets in NEU surface database are investigated: patches (S1 Dataset), inclusion (S2 Dataset), scratches (S3 Dataset) and scale (S4 Dataset). Their typical images are shown in Fig 6. These defect images are extracted as defect samples, and each defect sample includes sixteen attributes. This means that each defect sample is a 16-dimensional vector. Their related attributes have been described in our previous work [36]. It can be seen that the strip steel surface defects classification belongs to multi-class classification. There are many multi-class classification methods based on binary classifier, such as one-against-one, one-against-rest, decision directed acyclic graph and binary tree [37]. And the binary tree model is most widely used. Multi-class classifiers for SVM, Pin-SVM, TWSVM, THSVM and ρ-QHSVM can be obtained on the binary tree. According to the binary tree model, three QPPs are needed to solve for SVM and Pin-SVM, while six QPPs are needed to solve for TWSVM, THSVM and ρ-QHSVM.

Download:

Fig 6. Four types of defects for strip steel surface.

https://doi.org/10.1371/journal.pone.0212361.g006

Moreover, to obtain more samples, the strip steel surface defects datasets are supplemented by rotation, distortion, translation, and scaling. In the end, the strip steel surface defects datasets include 8000 samples and each type of defects includes 2000 samples. All parameters of classifiers are obtained with the same method mentioned above. And the parameter ε of ρ-QHSVM is set to 0.3. The accuracies and execution time of all classifiers for all types of defects are shown in Table 7 and Fig 7 respectively. It can be seen from Table 7 that the accuracy of ρ-QHSVM is always the best for all types of defects. The accuracy of Pin-SVM is better than that of SVM, TWSVM and THSVM. The reason is that the strip steel surface defects datasets are corrupted by noise usually. It is well known that there is noise on the production line of strip steel. So, the pinball losses in ρ-QHSVM and Pin-SVM work for the strip steel surface defects datasets with noise samples. Moreover, the other excellent attributes improve the performance of ρ-QHSVM further. Besides, the efficiency of ρ-QHSVM is high. TWSVM, THSVM and ρ-QHSVM is better than SVM and Pin-SVM in execution time, which is shown in Fig 7. Though SVM and Pin-SVM only need to solve three QPPs for four types of datasets, these QPPs are all large. TWSVM, THSVM and ρ-QHSVM need to solve smaller QPPs, which improves the execution time. ρ-QHSVM has the fastest speed, which is benefited from the local center-based density estimation method. The method improves the classification efficiency under the premise of high accuracy. In summary, ρ-QHSVM is very fit for the strip steel surface defects classification.

Download:

Fig 7. Execution time of five classifiers for strip steel surface defects datasets.

https://doi.org/10.1371/journal.pone.0212361.g007

Download:

Table 7. The accuracies of five classifiers for all types of defects.

https://doi.org/10.1371/journal.pone.0212361.t007

6. Conclusions

A novel QHSVM classifier is proposed for pattern recognition in this paper. QHSVM has remarkable attributes: pinball losses, two nonparallel quantile hyper-spheres and inner-class clustering of samples. The quantile hyper-spheres ensure that QHSVM is insensitive to noise, especially the feature noise around the decision boundary. The robustness of QHSVM algorithm is strengthened by maximizing the margin between two hyper-spheres with the same center and maximizing the inner-class clustering of samples. Moreover, compared with standard SVM model, two nonparallel quantile hyper-spheres improve the generalization of QHSVM. On the other hand, in order to satisfy the requirement of high efficiency for large-scale datasets classification, a new version of QHSVM with adjustable execution speed is proposed, which is called ρ-QHSVM. Under the premise of high accuracy, ρ-QHSVM reduces the execution time. That benefits from the local center-based density estimation which reasonably divides training samples into surrounding samples and clustering samples. The proposed QHSVM and ρ-QHSVM are compared with SVM, Pin-SVM, TWSVM and THSVM through numerical experiments on artificial, benchmark and strip steel surface defects datasets with noise. The results show that QHSVM performs the best in accuracy for datasets with noise samples, which is due to pinball losses, two nonparallel support hyper-spheres and inner-class clustering of samples. The execution time of ρ-QHSVM is reduced greatly under the premise of high accuracy for large-scale datasets, especially strip steel surface defects datasets. ρ-QHSVM has the fastest speed, which is benefited from the local center-based density estimation method. In the future, it is necessary to find the optimal parameters for QHSVM with some effective methods. And how to apply QHSVM to unbalanced datasets will be investigated.

Supporting information

S1 Dataset. Patches dataset.

The first typical strip steel surface defects dataset.

https://doi.org/10.1371/journal.pone.0212361.s001

(ZIP)

S2 Dataset. Inclusion dataset.

The second typical strip steel surface defects dataset.

https://doi.org/10.1371/journal.pone.0212361.s002

(ZIP)

S3 Dataset. Scratches dataset.

The third typical strip steel surface defects dataset.

https://doi.org/10.1371/journal.pone.0212361.s003

(ZIP)

S4 Dataset. Scale dataset.

The fourth typical strip steel surface defects dataset.

https://doi.org/10.1371/journal.pone.0212361.s004

(ZIP)

S1 Text. Fifteen benchmark datasets.

UCI datasets.

https://doi.org/10.1371/journal.pone.0212361.s005

(TXT)

Acknowledgments

The authors would like to thank the editors and reviewers. Their comments and advice have promoted the paper more solid. This work was supported by Liaoning Province PhD Start-up Fund (No. 201601291), Natural Science Foundation of Liaoning Province of China (20180550067), Liaoning Province Ministry of Education Scientific Study Project (No. 2017LNQN11 and No. 2016TSPY13) and State Key Laboratory of Robotics and System (HIT) (No. SKLRS-2016-KF-08).

References

1. Cortes C, Vapnik V. Support-vector networks. Machine Learning. 1995; 20(3): 273–297.
- View Article
- Google Scholar
2. Zhang Z, Chow TWS. Maximum margin multisurface support tensor machines with application to image classification and segmentation. Expert Systems with Applications. 2012; 39(1): 849–860.
- View Article
- Google Scholar
3. Maulik U, Chakraborty D. A novel semisupervised SVM for pixel classification of remote sensing imagery. International Journal of Machine Learning and Cybernetics. 2012; 3(3): 247–258.
- View Article
- Google Scholar
4. Chu M, Gong R, Wang A. Strip steel surface defect classification method based on enhanced twin support vector machine. ISIJ International. 2014; 54(1): 119–124.
- View Article
- Google Scholar
5. Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neural Processing Letters. 1999; 9(3): 293–300.
- View Article
- Google Scholar
6. Fung G, Mangasarian OL. Proximal support vector machine classifiers. in: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge and Data Discovery. San Francisco: ACM SIGKDD; 2001. pp. 77–86.
7. Schölkopf B, Smola AJ, Williamson RC, Bartlett PL. New support vector algorithms. Neural Computation. 2000; 12(5): 1207–1245. pmid:10905814
- View Article
- PubMed/NCBI
- Google Scholar
8. Lin CF, Wang SD. Fuzzy support vector machines. IEEE Transactions on Neural Networks. 2002; 13(2): 464–471. pmid:18244447
- View Article
- PubMed/NCBI
- Google Scholar
9. Huang X, Shi L, Suykens JAK. Support vector machine classifier with pinball loss. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2014; 36(5): 984–997. pmid:26353231
- View Article
- PubMed/NCBI
- Google Scholar
10. Jayadeva , Khemchandani R, Chandra S. Twin support vector machines for pattern classification. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007; 29(5): 905–910. pmid:17469239
- View Article
- PubMed/NCBI
- Google Scholar
11. Mangasarian OL, Wild EW. Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006; 28(1): 69–74. pmid:16402620
- View Article
- PubMed/NCBI
- Google Scholar
12. Ye Q, Ye N, Yin T. Enhanced multi-weight vector projection support vector machine. Pattern Recognition Letters. 2014; 42: 91–100.
- View Article
- Google Scholar
13. Gu Z, Zhang Z, Sun J, Li B. Robust image recognition by L1-norm twin-projection support vector machine. Neurocomputing. 2017; 223: 1–11.
- View Article
- Google Scholar
14. Kumar MA, Gopal M. Least squares twin support vector machines for pattern classification. Expert Systems with Applications. 2009; 36(4): 7535–7543.
- View Article
- Google Scholar
15. Shao Y, Zhang C, Wang X, Dong N. Improvements on twin support vector machines. IEEE Transactions on Neural Networks. 2011; 22(6): 962–968. pmid:21550880
- View Article
- PubMed/NCBI
- Google Scholar
16. Peng X. TPMSVM: a novel twin parametric-margin support vector machine for pattern recognition. Pattern Recognition. 2011; 44(10): 2678–2692.
- View Article
- Google Scholar
17. Qi Z, Tian Y, Shi Y. Laplacian twin support vector machine for semi-supervised classification. Neural Networks. 2012; 35: 46–53. pmid:22954478
- View Article
- PubMed/NCBI
- Google Scholar
18. Ye Q, Zhao C, Gao S, Zheng H. Weighted twin support vector machines with local information and its application. Neural Networks. 2012; 35: 31–39. pmid:22944307
- View Article
- PubMed/NCBI
- Google Scholar
19. Tax DMJ, Duin RPW. Support vector data description. Machine Learning. 2004; 54(1): 45–66.
- View Article
- Google Scholar
20. Hao PY, Chiang JH, Lin YH. A new maximal-margin spherical-structured multi-class support vector machine. Applied Intelligence. 2009; 30(2): 98–111.
- View Article
- Google Scholar
21. Peng X, Xu D. Twin support vector hypersphere (TSVH) classifier for pattern recognition. Neural Computing and Applications. 2014; 24(5): 1207–1220.
- View Article
- Google Scholar
22. Peng X, Xu D. A twin-hypersphere support vector machine classifier and the fast learning algorithm. Information Sciences. 2013; 221: 12–27.
- View Article
- Google Scholar
23. Xu Y, Yang Z, Zhang Y, Pan X, Wang L. A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification. Knowledge-Based Systems. 2016; 95: 75–85.
- View Article
- Google Scholar
24. Peng X. Least squares twin support vector hypersphere (LS-TSVH) for pattern recognition. Expert Systems with Applications. 2010; 37(12): 8371–8378.
- View Article
- Google Scholar
25. Cawley GC. Leave-one-out cross-validation based model selection criteria for weighted LS-SVMs. in: International Joint Conference on Neural Networks. Vancouver, BC: IEEE; 2006. pp. 1661–1668.
26. Xu H, Caramanis C, Mannor S. Robustness and regularization of support vector machines. Journal of Machine Learning Research. 2009; 10: 1485–1510.
- View Article
- Google Scholar
27. Yoon M, Yun Y, Nakayama H. A role of total margin in support vector machines. in: Proceedings of the International Joint Conference on Neural Networks. Portland, OR: IEEE; 2003. pp. 2049–2053.
28. Gong R, Wu C, Chu M, Wang H. Twin pinball loss support vector hyper-sphere classifier for pattern recognition. in: 2016 Chinese Control and Decision Conference (CCDC). Yinchuan: IEEE; 2016. pp. 6551–6556.
29. Xu Y, Yang Z, Pan X. A novel twin support-vector machine with pinball loss. IEEE Transactions on Neural Networks and Learning Systems. 2017; 28(2): 359–370. pmid:26766383
- View Article
- PubMed/NCBI
- Google Scholar
30. Koenker R. Quantile regression. New York: Cambridge University Press; 2005.
31. Steinwart I, Christmann A. Estimating conditional quantiles with the help of the pinball loss. 2011; 17(1): 211–225.
- View Article
- Google Scholar
32. Peng X, Xu D. Bi-density twin support vector machines for pattern recognition. Neurocomputing. 2013; 99: 134–143.
- View Article
- Google Scholar
33. Dua D, Taniskidou EK. UCI Machine Learning Repository; 2017 [cited 2017 May]. Available from: http://archive.ics.uci.edu/ml.
- View Article
- Google Scholar
34. Everingham M, Van~Gool L, Williams CKI, Winn J, Zisserman A. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results; 2012 [cited 2012 February]. Available: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
- View Article
- Google Scholar
35. Chu M, Gong R. Invariant feature extraction method based on smoothed local binary pattern for strip steel surface defect. ISIJ International. 2015; 55(9): 1956–1962.
- View Article
- Google Scholar
36. Song K, Yan Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Applied Surface Science, 2013; 285: 858–864.
- View Article
- Google Scholar
37. Cheng L, Zhang J, Yang J, Ma J. An improved hierarchical multi-class support vector machine with binary tree architecture. in: International Conference on Internet Computing in Science and Engineering. Harbin: IEEE; 2008. pp. 106–109.

[ref1] 1. Cortes C, Vapnik V. Support-vector networks. Machine Learning. 1995; 20(3): 273–297.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Zhang Z, Chow TWS. Maximum margin multisurface support tensor machines with application to image classification and segmentation. Expert Systems with Applications. 2012; 39(1): 849–860.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Maulik U, Chakraborty D. A novel semisupervised SVM for pixel classification of remote sensing imagery. International Journal of Machine Learning and Cybernetics. 2012; 3(3): 247–258.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Chu M, Gong R, Wang A. Strip steel surface defect classification method based on enhanced twin support vector machine. ISIJ International. 2014; 54(1): 119–124.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neural Processing Letters. 1999; 9(3): 293–300.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Fung G, Mangasarian OL. Proximal support vector machine classifiers. in: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge and Data Discovery. San Francisco: ACM SIGKDD; 2001. pp. 77–86.

[ref7] 7. Schölkopf B, Smola AJ, Williamson RC, Bartlett PL. New support vector algorithms. Neural Computation. 2000; 12(5): 1207–1245. pmid:10905814
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref8] 8. Lin CF, Wang SD. Fuzzy support vector machines. IEEE Transactions on Neural Networks. 2002; 13(2): 464–471. pmid:18244447
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref9] 9. Huang X, Shi L, Suykens JAK. Support vector machine classifier with pinball loss. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2014; 36(5): 984–997. pmid:26353231
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref10] 10. Jayadeva , Khemchandani R, Chandra S. Twin support vector machines for pattern classification. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007; 29(5): 905–910. pmid:17469239
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref11] 11. Mangasarian OL, Wild EW. Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006; 28(1): 69–74. pmid:16402620
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref12] 12. Ye Q, Ye N, Yin T. Enhanced multi-weight vector projection support vector machine. Pattern Recognition Letters. 2014; 42: 91–100.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref13] 13. Gu Z, Zhang Z, Sun J, Li B. Robust image recognition by L1-norm twin-projection support vector machine. Neurocomputing. 2017; 223: 1–11.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref14] 14. Kumar MA, Gopal M. Least squares twin support vector machines for pattern classification. Expert Systems with Applications. 2009; 36(4): 7535–7543.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref15] 15. Shao Y, Zhang C, Wang X, Dong N. Improvements on twin support vector machines. IEEE Transactions on Neural Networks. 2011; 22(6): 962–968. pmid:21550880
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref16] 16. Peng X. TPMSVM: a novel twin parametric-margin support vector machine for pattern recognition. Pattern Recognition. 2011; 44(10): 2678–2692.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref17] 17. Qi Z, Tian Y, Shi Y. Laplacian twin support vector machine for semi-supervised classification. Neural Networks. 2012; 35: 46–53. pmid:22954478
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref18] 18. Ye Q, Zhao C, Gao S, Zheng H. Weighted twin support vector machines with local information and its application. Neural Networks. 2012; 35: 31–39. pmid:22944307
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref19] 19. Tax DMJ, Duin RPW. Support vector data description. Machine Learning. 2004; 54(1): 45–66.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref20] 20. Hao PY, Chiang JH, Lin YH. A new maximal-margin spherical-structured multi-class support vector machine. Applied Intelligence. 2009; 30(2): 98–111.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref21] 21. Peng X, Xu D. Twin support vector hypersphere (TSVH) classifier for pattern recognition. Neural Computing and Applications. 2014; 24(5): 1207–1220.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref22] 22. Peng X, Xu D. A twin-hypersphere support vector machine classifier and the fast learning algorithm. Information Sciences. 2013; 221: 12–27.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref23] 23. Xu Y, Yang Z, Zhang Y, Pan X, Wang L. A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification. Knowledge-Based Systems. 2016; 95: 75–85.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref24] 24. Peng X. Least squares twin support vector hypersphere (LS-TSVH) for pattern recognition. Expert Systems with Applications. 2010; 37(12): 8371–8378.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref25] 25. Cawley GC. Leave-one-out cross-validation based model selection criteria for weighted LS-SVMs. in: International Joint Conference on Neural Networks. Vancouver, BC: IEEE; 2006. pp. 1661–1668.

[ref26] 26. Xu H, Caramanis C, Mannor S. Robustness and regularization of support vector machines. Journal of Machine Learning Research. 2009; 10: 1485–1510.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref27] 27. Yoon M, Yun Y, Nakayama H. A role of total margin in support vector machines. in: Proceedings of the International Joint Conference on Neural Networks. Portland, OR: IEEE; 2003. pp. 2049–2053.

[ref28] 28. Gong R, Wu C, Chu M, Wang H. Twin pinball loss support vector hyper-sphere classifier for pattern recognition. in: 2016 Chinese Control and Decision Conference (CCDC). Yinchuan: IEEE; 2016. pp. 6551–6556.

[ref29] 29. Xu Y, Yang Z, Pan X. A novel twin support-vector machine with pinball loss. IEEE Transactions on Neural Networks and Learning Systems. 2017; 28(2): 359–370. pmid:26766383
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref30] 30. Koenker R. Quantile regression. New York: Cambridge University Press; 2005.

[ref31] 31. Steinwart I, Christmann A. Estimating conditional quantiles with the help of the pinball loss. 2011; 17(1): 211–225.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref32] 32. Peng X, Xu D. Bi-density twin support vector machines for pattern recognition. Neurocomputing. 2013; 99: 134–143.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref33] 33. Dua D, Taniskidou EK. UCI Machine Learning Repository; 2017 [cited 2017 May]. Available from: http://archive.ics.uci.edu/ml.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref34] 34. Everingham M, Van~Gool L, Williams CKI, Winn J, Zisserman A. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results; 2012 [cited 2012 February]. Available: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref35] 35. Chu M, Gong R. Invariant feature extraction method based on smoothed local binary pattern for strip steel surface defect. ISIJ International. 2015; 55(9): 1956–1962.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref36] 36. Song K, Yan Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Applied Surface Science, 2013; 285: 858–864.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref37] 37. Cheng L, Zhang J, Yang J, Ma J. An improved hierarchical multi-class support vector machine with binary tree architecture. in: International Conference on Internet Computing in Science and Engineering. Harbin: IEEE; 2008. pp. 106–109.

Figures

Abstract

1. Introduction

2. Related work

2.1. Support vector machines with hinge loss and pinball loss

2.2. Twin support vector machine

2.3. Support vector data description with negative examples

3. Support vector machine with quantile hyper-spheres

3.1. Pinball losses for quantile hyper-spheres

3.2. Primal formulation and analysis

3.3 QHSVM with surrounding and clustering samples

4. Solution to ρ-QHSVM

5. Experiments and results analysis

5.1 Artificial datasets

5.2 UCI datasets with noise samples

5.3 PASCAL VOC dataset

5.4 Strip steel surface defects datasets

6. Conclusions

Supporting information

S1 Dataset. Patches dataset.

S2 Dataset. Inclusion dataset.

S3 Dataset. Scratches dataset.

S4 Dataset. Scale dataset.

S1 Text. Fifteen benchmark datasets.

Acknowledgments

References