Asignatura: Aprendizaje automático

Objetivo: Realizar una introducción a los conceptos y métodos básicos en el área del aprendizaje automático. Analizar aplicaciones reales de los principales métodos.

Docentes: José Alvarez e Igor Zwir

Correlatividades: Lógica

Dedicación: Una clase teórico-práctica semanal de 3 horas

Puntos: 2

Cronograma

Contenidos

1. Introducción. Problemas de aprendizaje bien planteados. Diseño de sistemas de aprendizaje. Enfoques y cuestiones fundamentales.

2. Aprendizaje de conceptos. Hipótesis de aprendizaje inductivo. Ordenamiento general-específico. Espacio de versiones y algoritmo de eliminación de candidatos. El aprendizaje de conceptos como búsqueda. Sesgo inductivo.

3. Aprendizaje de árboles de decisiones. Representación de árboles de decisiones. Problemas apropiados. Algoritmo básico. Algoritmos ID3, ASSISTANT y C4.5. El aprendizaje de árboles de decisiones como búsqueda. Sesgo inductivo. Problemas principales en el aprendizaje de árboles inductivos.

4. Aprendizaje bayesiano. Teorema de Bayes y aprendizaje de conceptos. Hipótesis de máxima verosimilitud y de menor error cuadrático. Principio de longitud mínima de descripción. Clasificador bayesiano óptimo. Redes bayesianas de creencias. Algoritmo EM.

5. Aprendizaje basado en instancias. Aprendizaje de k vecinos más próximos. Regresión localmente ponderada. Funciones de base radial. Razonamiento basado en casos.

6. Algoritmos genéticos. Los algoritmos genéticos como búsquedas en un espacio de hipótesis. Programación genética. Modelos de evolución y aprendizaje.

7. Aprendizaje de conjunto de reglas. Algoritmos de cubrimiento secuencial. Aprendizaje de reglas de primer orden. Programa FOIL. La inducción como deducción invertida. Inversión de la resolución.

8. Aprendizaje analítico. Definición y relación con el aprendizaje inductivo. Aprendizaje con teorías perfectas de dominio. El aprendizaje basado en explicaciones.

9. Combinación del aprendizaje inductivo y analítico. Utilización de conocimiento previo para inicializar una hipótesis. Utilización de conocimiento previo para alterar el objetivo de búsqueda.

10. Aprendizaje por refuerzo. Aprendizaje Q. Recompensas no determinísticas y acciones. Generalización a partir de ejemplos. Relación con la programación dinámica.

Bibliografía

Aha, David. “Feature weighting for lazy learning algorithms”, Liu, H. y Motoda, H. (comps.): Feature Extraction, Construction and Selection: A Data Mining Perspective, Norwell, MA, Kluwer, 1998, Cap. 1.

Ali, Kamal M. y Pazzani, Michael J. “Error Reduction through Learning Multiple Descriptions”, Machine Learning, 24: 3, 1996.

Atkeson, Christopher; Moore, Andrew W. y Schaal, Stefan. “Locally Weighted Learning”, Artificial Intelligence Review, Volume 11, 1997, pp. 11-73.

Bauer, Eric y Kohavi, Ron. “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants”, Machine Learning, Vol. 36, Nos. 1/2, July/August 1999, pp. 105-139.

Baxter, Rohan A. y Oliver, Jonathan J. “MDL and MML: Similarities and Differences (Introduction to Minimum Encoding Inference - Part III)”, Department of Computer Science, Monash University, Tech Report 207, 1994.

Bay, Stephen D. “Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets”, Proceedings of the International Conference on Machine Learning, Morgan Kaufmann Publishers, Maison, Wisc., 1998.

Bloedorn, Eric y Michalski, Ryszard S. “Data-Driven Constructive Induction: A Methodology and Its Applications”, IEEE Intelligent Systems, Special issue on Feature Transformation and Subset Selection, March/April, 1998, pp. 30-37.

Blum, Avrim L. y Langley, Pat. “Selection of Relevant Features and Examples in Machine Learning”, Artificial Intelligence, 1997.

Bradford, Jeffrey; Kunz, Clayton; Kohavi, Ron y Brodley, Carla E. “Pruning Decision Trees with Misclassification Costs”, ECML-98 (long version), 1998.

Breslow, Leonard A. y Aha, David W. “Simplifying Decision Trees: A Survey”, Knowledge Engineering Review, 1, 1997, pp. 1-40.

Chaudhry, Anurag y Holder, Lawrence B. “An Empirical Approach to Solving the General Utility Problem in Speedup Learning”, en F. D. Anger y M. Ali (comps.): Machine Reasoning, Gordon and Breach Science Publishers, 1996.

Chown, Eric y Dietterich, Thomas G. “A Divide-and-Conquer Approach to Learning from Prior Knowledge”, Department of Computer Science, Oregon State University, Technical Report, 1998.

Cunningham, Sally Jo. “Dataset cataloging metadata for machine learning applications and research”, Proc. International Artificial Intelligence and Statistics Workshop, Ft. Lauderdale, Florida, 1997.

De Jong, Kenneth A. y Spears, William M. “A Formal Analysis of the Role of Multi-Point Crossover in Genetic Algorithms”, Annals of Mathematics and Artificial Intelligence Journal, Vol. 5, No. 1, 1992, pp. 1-26.

Dietterich, Thomas G. “Fundamental Experimental Research in Machine Learning”, McCarthy, John (comp.): Basic Topics in Experimental Computer Science, 1997.

Dietterich, Thomas G. “Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms”, Neural Computation, 10(7) 1895-1924, 1998.

Dietterich, Thomas G. “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization”, Machine Learning, 1-22 (1999).

Dietterich, Thomas G. y Flann, Nicholas S. “Explanation-Based Learning and Reinforcement Learning: A Unified View”, Machine Learning, 28, 169-214, 1997.

Dietterich, Tom; Kearus, Michael y Mansour, Yishay. “Applying the Weak Learning Framework to Understand and Improve C4.5”, Proceedings of the Thirteenth International Conference on Machine Learning, pp. 96-104,1996.

Dietterich, Thomas G. y Kong, Eun Bae. “Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms”, Technical Report, Department of Computer Science, Oregon State University, 1995.

Domingos, Pedro. “Unifying Instance-Based and Rule-Based Induction”, Machine Learning, 24, 141-168, 1996.

Domingos, Pedro. “Why Does Bagging Work? A Bayesian Account and Its Implications”, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, AAAI Press, 1997.

Domingos, Pedro y Pazzani, Michael. “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss”, Machine Learning, 29, 103-130 (1997).

Dougherty, James; Kohavi, Ron y Sahami, Mehran. “Supervised and Unsupervised Discretization of Continuous Features”, en Armand Prieditis y Stuart Russell (eds.): Proceedings of the Twelfth International Conference, 1995, Morgan Kaufmann Publishers, San Francisco, CA, 1995.

Friedman, Jerome; Kohavi, Ron y Yun, Yeogirl. “Lazy Decision Trees”, AAAI-96, 1996, pp. 717-724.

Fürnkranz, Johannes. “Separate-and-Conquer Rule Learning”, Artificial Intelligence Review 13(1) 1999.

Giordana, A.; Neri, F., Saitta, L. y Botta, M. “Integrating Multiple Learning Strategies in First Order Logics”, Machine Learning, 27, 1997, pp. 209-240.

Gordon, Diana F. y des Jardins, Marie. “Evaluation and Selection of Biases in Machine Learning”, Machine Learning, 20(1/2), 1995.

Hall, Mark y Smith, Lloyd A. “Practical Feature Subset Selection for Machine Learning”, Proc. 21st Australasian Computer Science Conference, 1998.

Hirsch, Haym. “The Computational Complexity of the Candidate-Elimination Algorithm”, Technical Report ML-TR-36, Rutgers University, 1992.

Holmes, Geoffrey. “Discovering Inter-Attribute Relationships”, en N. Kasabov et al. (eds.): Proc. Fourth International Conference on Neural Information Processing and Intelligent Information Systems, Dunedin, 1997, pp. 914-917.

Holte, Robert C. “Very Simple Classification Rules Perform Well on Most Commonly Used Datasets”, Machine Learning, Vol. 3, 1993, pp. 63-91.

Kaelbling, Leslie Pack; Littman, Michael L. y Moore, Andrew W. “Reinforcement Learning: A Survey”, Journal of Artificial Intelligence Research 4 (1996), 237-285.

Keogh, Eamonn y Pazzani, Michael J. “Learning Augmented Bayesian Classifiers: A Comparison of Distribution-based and Classification-based Approaches”, Uncertainty 99, 7th. International Workshop on AI and Statistics, Ft. Lauderdale, Florida, 1999, 225-230.

Kohavi, Ron; Becker, Barry y Sommerfield, Dan. “Improving Simple Bayes”, ECML-97, 1997.

Kohavi, Ron y John, George H. “Automatic Parameter Selection by Minimizing Estimated Error”, en Armand Prieditis y Stuart Russell (eds.): Machine Learning: Proceedings of the Twelfth International Conference, Morgan Kaufmann Publishers, San Francisco, CA, 1995.

Kohavi, Ron y John, George H. “The Wrapper Approach”, en Liu, Huan y Motoda, Hiroshi (eds.): Feature Selection for Knowledge Discovery and Data Mining, Kluwer International Series in Engineering and Computer Science, 1998, Cap. 1.

Kohavi, Ron y Provost, Foster. “Glossary of Terms”, Machine Learning, 30(2/3), 271-274 (1998).

Kohavi, Ron y Sahami, Mehran. “Error-Based and Entropy-Based Discretization of Continuous Features”, KDD-96, 1996.

Kohavi, Ron; Sommerfield, Dan y Dougherty, James. “Data Mining using MLC++ - A Machine Learning Library in C++”, International Journal on Artificial Intelligence Tools, Vol. 6, No. 4, 1997, pp. 537-566.

Kohavi, Ron y Wolpert, David H. “Bias Plus Variance Decomposition for Zero-One Loss Functions”, Machine Learning: Proceedings of the Thirteenth International Conference, 1996.

Korf, Richard E. “Artificial Intelligence Search Algorithms”. Atallah, M.J. (comp.): CRC Handbook of Algorithms and Theory of computation, CRC Press, Boca Raton, FL, 1998, pp. 36-1 a 36-20.

Kubat, Miroslav, Bratko, Ivan y Michalski, Ryszard S. “A Review of Machine Learning Methods”. En Michalski, R., Bratko, I., y Kubat, M. (comps.): Machine Learning and Data Mining Methods and Applications, Wiley, Nueva York, 1998.

Langley y Simon, “Applications of machine learning and rule induction”, C. ACM, 38(1), 55-64, 1995.

Liu, Huan y Setiono, Rudy. “Some Issues on Scalable Feature Selection”, Expert Systems with Application, 15 (1998a), pp. 333-339.

Liu, Huan y Setiono, Rudy. “Incremental Feature Selection”, Applied Intelligence, Volume 9, Number 3, November/December 1998b, pp. 217-230.

Lu, Hongjun; Setiono, Rudy y Liu, Huan. “NeuroRule: A Connectionist Approach to Data Mining”, Proceedings of VLDB '95, 1995, pp. 478-489.

Mataric, Maja J. “Reward Functions for Accelerated Learning”, en Cohen, William W. y Hirsh, Haym (comps.): Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 181-189.

Michalski, Ryszard. “Inferential Theory of Learning: Developing Foundations for Multistrategy Learning”, MIchalski, R.S. y Tecuci, G. (comps.): Machine Learning: A Multistrategy Approach, Vol. IV, Morgan Kaufmann Publishers, 1994.

Mitchell, Tom. Machine Learning. McGraw-Hill, Boston, 1997.

Mitchell, Tom M. “Machine Learning and Data Mining”, Communications of the ACM, Vol. 42, No. 11, November 1999.

Mitchell, Tom M. “The Role of Unlabeled Data in Supervised Learning”, Proceedings of the Sixth International Colloquium on Cognitive Science, San Sebastian, Spain 1999a.

Mitchell, T. y Thrun, S. “Learning Analytically and Inductively”, en Steier y Mitchell (comps.): Mind Matters: A Tribute to Allen Newell, Erlbaum, 1995.

Moriarty, David E.; Schultz, Alan C. y Grefenstette, John J. “Evolutionary Algorithms for Reinforcement Learning”, Journal on Artificial Intelligence Research, February, 1999.

Musick, Ron; Catlett, Jason y Russell, Stuart. “Decision Theoretic Subsampling for Induction on Large Databases”, Machine Learning, 1993.

Nilsson, N. Introduction to Machine Learning – An early draft of a proposed textbook, Stanford, 1996.

Oates, Tim y Jensen, David. “The Effects of Training Set Size on Decision Tree Complexity”, Proceedings of the Fourth International conference on Machine Learning, 1997, pp. 254-261.

Oates, Tim y Jensen, David. “Large Datasets Lead to Overly Complex Models: an Explanation and a Solution”, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 1998, pp. 294-298.

Oates, Tim y Jensen, David. “Toward a Theoretical Understanding of Why and When Decision Tree Pruning Algorithms Fail”, Proceedings of The Sixteenth National Conference on Artificial Intelligence, 1999, pp. 372-378.

Oliver, Jonathan J. “Decision Graphs - An Extension of Decision Trees”, Department of Computer Science, Monash University, Technical Report No. 92/173, 1993; Shorter version Proceedings of the Fourth International Workshop on Artificial Intelligence and Statistics, 1993.

Oliver, Jonathan J. y Baxter, Rohan A. “MML and Bayesianism: Similarities and Differences (Introduction to Minimum Encoding Inference - Part II)”, Department of Computer Science, Monash University, Tech Report 206, 1994.

Oliver, Jonathan; Baxter, Rohan A. y Wallace, Chris S. “Unsupervised Learning Using MML”, en Saitta, Lorenza (comp.): Machine Learning: Proceedings of the Thirteenth International Conference (ICML 96), Morgan Kaufmann Publishers, San Francisco, 1996, pp. 364-372.

Oliver, Jonathan J. y Hand, David. “Introduction to Minimum Encoding Inference”, Department of Statistics, Open University, Tech Report 4-94; Department of Computer Science, Monash University, Tech Report 205, 1994.

Oliver, Jonathan J. y Hand, David J. “On Pruning and Averaging Decision Trees”, Machine Learning: Proceedings of the Twelfth International Workshop, 1995, pp. 430-437.

Pazzani, Michael J. “Searching for dependencies in Bayesian classifiers”, Artificial Intelligence and Statistics IV, Lecture Notes in Statistics, Springer-Verlag, Nueva York, 1997.

Piater, Justus H.; Cohen, Paul R.; Zhang, Xiaoqin y Atighetchi, Michael. “A Randomized ANOVA Procedure for Comparing Performance Curves”, Proceedings of The Fifteenth International Conference on Machine Learning, 1998, pp. 430-438.

Provost, Foster; Fawcett, Tom y Kohavi, Ron. “The Case Against Accuracy Estimation for Comparing Induction Algorithms”, ICML-98 Workshop on the Methodology of Applying Machine Learning, 1998.

Provost, Foster; Jensen, David y Oates, Tim. “Efficient Progressive Sampling”, Proceedings of KDD-99, International Conference on Knowledge Discovery and Data Mining, 1999.

Quinlan, J. Ross. “Improved Use of Continuous Attributes in C4.5”, Journal of Artificial Intelligence Research 4 (1996a) 77-90.

Quinlan, J. Ross. “Bagging, Boosting, and C4.5”, AAAI'96, 1996b.

Rumelhart, Widrow, Leer: “The basic ideas in neural networks”, C. ACM, 37(3), 87-92, 1994

Russell, Stuart y Norvig, Peter. Artificial Intelligence – A Modern Approach, Prentice-Hall, Upper Saddle River, 1995.

Sarle, Warren S. “Neural Networks and Statistical Models”, Proceedings of the Nineteenth Annual SAS Users Group International Conference, April, 1994.

Sebag, Michèle. “Using Constraints to Building Version Spaces”, en Raedt de, L. y Bergadano, F. (comps.): 7th European conf. on Machine Learning (ECML '94), Springer-Verlag, April 1994, pp. 257-271.

Sebag, Michèle. “Delaying the Choice of Bias: A Disjunctive Version Space Approach”, en Saitta, L. (ed.): 13th. Int. Conf. on Machine Learning (ICML '96), Bari, Italy, Morgan Kaufmann, 1996.

Singh, Satinder P. “An (almost) Tutorial on Reinforcement Learning”, Learning to Solve Markovian Decision Tasks, 1994, Caps. 2, 3 y 4 (Doct. Dissertation).

Sutton, Richard. “On the Significance of Markov Decision Processes”, en W. Gerstner; Germon, A.; Hasler, M. y Nicoud, J.-D. (eds.): Artificial Neural Networks - ICANN'97, 1997, Springer, pp. 273-282.

Sutton, Richard S. “Open Theoretical Questions in Reinforcement Learning”, en Fischer, P. y Simon, H.U. (eds.): Computational Learning Theory, Proceedings EuroCOLT'99, 1999, pp. 11-17.

Tadepalli, Prasad y Natarajan, Balas K. “A Formal Framework for Speedup Learning from Problems and Solutions”, Journal of Artificial Intelligence Research, 4, 1996, pp. 445-475.

Thrun, Sebastian. “Is Learning The n-th Thing Any Easier Than Learning The First?”, en Touretzky, D. y Mozer, M. (comps.): Advances in Neural Information Processing Systems (NIPS) 8, MIT Press, 1996.

Thrun, S.B.; Bala, J.; Bloedorn, E.; Bratko, I.; Cestnik, B.; Cheng, J.; De Jong, K.; Dzeroski, S.; Fahlman, S.E.; Fisher, D.; Hamann, R.; Kaufman, K.; Keller, S.; Kononenko, I.; Kreuziger, J.; Michalski, R.S.; Mitchell, T.; Pachowicz, P.; Reich, Y. et al. “The Monk's Problems - A Performance Comparison of Different Learning Algorithms”, Carnegie Mellon University CMU-CS.91-197, December 1991.

Thrun, Sebastian y Schwartz, Anton. “Finding Structure in Reinforcement Learning”, en Tesauro, G., Touretzky, D. y Leen, T. (comps.): Advances in Neural Information Processing Systems (NIPS) 7, MIT Press, 1995.

Webb, Geoffrey I. y Pazzani, Michael J. “Adjusted Probability Naive Bayesian Induction”, 11th Australian Joint Conference on Artificial Intelligence, Brisbane, QLD, Australia, 1998.

Wettschereck, Dietrich; Aha, David W. y Mohri, Takao. “A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms”, Artificial Intelligence Review, 11, 1997, pp. 273-314.

Wilson, D. Randall y Martínez, Tony R. “Reduction Techniques for Exemplar-Based Learning Algorithms”, Machine Learning, 1998.

Yang, Yiming; Ault, Thomas; Pierce, Thomas. “Combining multiple learning strategies for effective cross validation”, Language Technologies Institute and Computer Science Department, Carnegie Mellon University, 1999.

Zheng, Zijian. “Naive Bayesian Classifier Committees”, Proceedings of the 10th European Conference on Machine Learning (ECML-98), Berlín, Springer-Verlag, 1998, pp. 196-207.