Computational Aspects of Regression Analysis of Interval Data
Authors: Michal Cerny
Abstract:
We consider linear regression models where both input data (the values of independent variables) and output data (the observations of the dependent variable) are interval-censored. We introduce a possibilistic generalization of the least squares estimator, so called OLS-set for the interval model. This set captures the impact of the loss of information on the OLS estimator caused by interval censoring and provides a tool for quantification of this effect. We study complexity-theoretic properties of the OLS-set. We also deal with restricted versions of the general interval linear regression model, in particular the crisp input – interval output model. We give an argument that natural descriptions of the OLS-set in the crisp input – interval output cannot be computed in polynomial time. Then we derive easily computable approximations for the OLS-set which can be used instead of the exact description. We illustrate the approach by an example.
Keywords: Linear regression, interval-censored data, computational complexity.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1062994
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1469References:
[1] G. Alefeld and J. Herzberger, Introduction to interval computations, Computer Science and Applied Mathematics, New York, USA: Academic Press, 1983.
[2] S. Arora and B. Barak, Computational complexity: A modern approach, Cambridge, Great Britain: Cambridge University Press, 2009.
[3] D. Avis and K. Fukuda, Reverse search for enumeration, Discrete Applied Mathematics 65, 1996, 21-46.
[4] A. H. Bentbib, Solving the full rank interval least squares problem, Applied Numerical Mathematics 41 (2), 2002, 283-294.
[5] M. Cˇ erny' and M. Hlad'ık, The regression tolerance quotient in data analysis, in: M. Houda and J. Friebelov'a (eds.), Procceding of Mathematical Methods in Economics 2010, Czech Republic: University of South Bohemia, 2010, 98-104.
[6] M. Cˇ erny' and M. Rada, A note on linear regression with interval data and linear programming, in: Quantitative methods in economics: Multiple Criteria Decision Making XV, Slovakia: Kluwer, Iura Edition, 2010, 276- 282.
[7] P.-T. Chang, E. S. Lee and S. A. Konz, Applying fuzzy linear regression to VDT legibility, Fuzzy Sets and Systems 80 (2), 1996, 197-204.
[8] C. Chuang, Extended support vector interval regression networks for interval input-output data, Information Science 178 (3), 2008, 871-891.
[9] J. P. Dunyak and D. Wunsch, Fuzzy regression by fuzzy number neural networks, Fuzzy Sets and Systems 112 (3), 2000, 371-380.
[10] T. Entani and M. Inuiguchi, Group decisions in interval AHP based on interval regression analysis, in: V.-N. Huynh et al. (eds.), Integrated uncertainty management and applications, Advances in Soft Computing, vol. 68, Germany: Springer, 2010, 269-280.
[11] J.-A. Ferrez, K. Fukuda and T. Liebling, Solving the fixed rank convex quadratic maximization in binary variables by a parallel zonotope construction algorithm, European Journal of Operational Research 166, 2005, 35-50.
[12] D. M. Gay, Interval least squaresÔÇöa diagnostic tool, in R. E. Moore (ed.), Reliability in computing, the role of interval methods in scientific computing, Perspectives in Computing, vol. 19, Boston, USA: Academic Press, 1988, 183-205.
[13] M. Gr¨otschel, L. Lov'asz and A. Schrijver, Geometric algorithms and combinatorial optimization, Germany: Springer, 1993.
[14] P. Guo and H. Tanaka, Dual models for possibilistic regression analysis, Computational Statistics & Data Analysis 51 (1), 2006, 253-266.
[15] B. Hesmaty and A. Kandel, Fuzzy linear regression and its applications to forecasting in uncertain environment, Fuzzy Sets and Systems 15, 1985, 159-191.
[16] M. Hlad'─▒k, Description of symmetric and skew-symmetric solution set, SIAM Journal on Matrix Analysis and Applications 30 (2), 2008, 509- 521.
[17] M. Hlad'─▒k, Solution set characterization of linear interval systems with a specific dependence structure, Reliable Computing 13 (4), 2007, 361- 374.
[18] M. Hlad'─▒k, Solution sets of complex linear interval systems of equations, Reliable Computing 14, 2010, 78-87.
[19] M. Hlad'ık and M. Cˇ erny', Interval regression by tolerance analysis approach, Submitted in Fuzzy Sets and Systems, Preprint: KAM-DIMATIA Series 963, 2010.
[20] M. Hlad'ık and M. Cˇ erny', New approach to interval linear regression, in: R. Kasımbeyli et al. (eds.), 24th Mini-EURO conference on continuous optimization and information-based technologies in the financial sector MEC EurOPT 2010, Selected papers, Vilnius, Lithuania: Technika, 2010, 167-171.
[21] C.-H. Huang and H.-Y. Kao, Interval regression analysis with softmargin reduced support vector machine, Lecture Notes in Computer Science 5579, Germany: Springer, 2009, 826-835.
[22] M. Inuiguchi, H. Fujita and T. Tanino, Robust interval regression analysis based on Minkowski difference, in: SICE 2002, proceedings of the 41st SICE Annual Conference, vol. 4, Osaka, Japan, 2002, 2346-2351.
[23] H. Ishibuchi and H. Tanaka, Several formulations of interval regression analysis, in: Proceedings of Sino-Japan joint meeting on fuzzy sets and systems, Beijing, China, 1990, B2-2, 1-4.
[24] H. Ishibuchi, H. Tanaka and H. Okada, An architecture of neural networks with interval weights and its application to fuzzy regression analysis, Fuzzy Sets and Systems 57 (1), 1993, 27-39.
[25] C. Jansson, Calculation of exact bounds for the solution set of linear interval systems, Linear Algebra and its Applications 251, 1997, 321-340.
[26] G. Jun-peng and L. Wen-hua, Regression analysis of interval data based on error theory, in: Proceedings of 2008 IEEE International Conference on Networking, Sensing and Control, ICNSC, Sanya, China, 2008, 552- 555.
[27] M. Kaneyoshi, H. Tanaka, M. Kamei and H. Furuta, New system identification technique using fuzzy regression analysis, in: Proceedings of the First International Symposium on Uncertainty Modeling and Analysis, Baltimore, USA, 1990, 528-533.
[28] H. Kashima, K. Yamasaki, A. Inokuchi and H. Saigo, Regression with interval output values, in: 19th International Conference on Pattern Recognition ICPR 2008, Tampa, USA, 2008, 1-4.
[29] H. Lee and H. Tanaka, Fuzzy regression analysis by quadratic programming reflecting central tendency, Behaviormetrika 25 (1), 1998, 65-80.
[30] H. Lee and H. Tanaka, Upper and lower approximation models in interval regression using regression quantile techniques, Europeran Journal of Operational Research 116 (3), 1999, 653-666.
[31] B. Li, C. Li, J. Si and G. Abousleman, Interval least-squares filtering with applications to robust video target tracking, in: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing ÔÇö Proceedings, Las Vegas, USA: IEEE Signal Processing Society, 2008, 3397-3400.
[32] E. de A. Lima Neto, F. de A. T. de Carvalho, Constrained linear regression models for symbolic interval-valued variables, Computational Statistics & Data Analysis 54 (2), 2010, 333-347.
[33] P. Liu, Study on a speech learning approach based on interval support vector regression, in: Proceedings of 4th International Conference on Computer Science & Education, Nanning, China, 2009, 1009-1012.
[34] I. Moral-Arce, J. M. Rodr'─▒guez-P'oo and S. Sperlich, Low dimensional semiparametric estimation in a censored regression model, Journal of Multivariate Analysis 102 (1), 118-129.
[35] E. Nasrabadi and S. Hashemi, Robust fuzzy regression analysis using neural networks, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 16 (4), 2008, 579-598.
[36] A. Neumaier, Interval methods for systems of equations, Cambridge, Great Britain: Cambridge University Press, 1990.
[37] S. Ning and R. B. Kearfott, A comparison of some methods for solving linear interval equations, SIAM Journal on Numerical Analysis 34 (4), 1997, 1289-1305.
[38] W. Pan and R. Chappell, Computation of the NPMLE of distribution functions for interval censored and truncated data with applications to the Cox model, Computational Statistics & Data Analysis 28 (1), 1998, 33-50.
[39] C. Papadimitriou, Computational complexity, Addison-Wesley Longman, 1995.
[40] J. Rohn, A handbook of results on interval linear problems, Prague, Czech Republic: Czech Academy of Sciences, 2005; available at: http://uivtx.cs.cas.cz/Ôê╝rohn/handbook/handbook.zip.
[41] A. Schrijver, Theory of linear and integer programming, USA: Wiley, 2000.
[42] K. Sugihara, H. Ishii and H. Tanaka, Interval priorities in AHP by interval regression analysis, Europeran Journal of Operational Research 158 (3), 2004, 745-754.
[43] H. Tanaka and H. Lee, Fuzzy linear regression combining central tendency and possibilistic properties, in: Proceedings of the Sixth IEEE International Conference on Fuzzy Systems, vol. 1, Barcelona, Spain, 1997, 63-68.
[44] H. Tanaka and H. Lee, Interval regression analysis by quadratic programming approach, IEEE Transactions on Fuzzy Systems 6 (4), 1998, 473-481.
[45] H. Tanaka and J. Watada, Possibilistic linear systems and their application to the linear regression model, Fuzzy Sets and Systems 27 (3), 1988, 275-289.
[46] X. Zhang and J. Sun, Regression analysis of clustered interval-censored failure time data with informative cluster size, Computational Statistics & Data Analysis 54 (7), 2010, 1817-1823.
[47] G. Ziegler, Lectures on polytopes, Germany: Springer, 2004.