Mining the Web 2.0 for Better Search
Ricardo Baeza-Yates, Yahoo! Research
There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC to improve search. We will show live demos to find relations in the Wikipedia or to improve image search as well as our current research in the topic. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.
Ricardo Baeza-Yates is VP of Research for Europe and Latin America, leading the Yahoo! Research labs at Barcelona, Spain and Santiago, Chile, and also supervising the lab in Haifa, Israel. Until 2005 he was the director of the Center for Web Research at the Department of Computer Science of the Engineering School of the University of Chile; and ICREA Professor and founder of the Web Research Group at the Dept. of Information and Communication Technologies of Univ. Pompeu Fabra in Barcelona, Spain. He maintains ties with both mentioned universities as a part-time professor for the Ph.D. program.
His research interests includes algorithms and data structures, information retrieval, web mining, text and multimedia databases, software and database visualization, and user interfaces.
We present the AgreementMaker system for matching real-world
ontologies, which may consist of hundreds or even thousands of
concepts. The end users of the system are sophisticated domain experts
whose needs have driven the design and implementation of the system:
they
require a responsive, powerful, and extensible framework to perform,
evaluate, and compare matching methods. The system comprises a wide
range of matching methods addressing different levels of granularity
of the components being matched (conceptual vs. structural), the
amount of user intervention that they require (manual vs. automatic),
their usage (stand-alone vs. composed), and the types of compo-
nents to consider (schema only or schema and instances). Performance
measurements (recall, precision, and runtime) are supported by the
system, along with the weighted combination of the results provided by
those methods. The AgreementMaker has been used and tested in
practical applications and in the Ontology Alignment Evaluation
Initiative (OAEI) competition. We report here on some of its most
advanced features, including its extensible architecture that
facilitates the integration and performance tuning of a variety of
matching methods, its capability to evaluate, compare, and combine
matching results, and its user interface with a control panel that
drives all the matching methods and evaluation strategies.
Isabel Cruz is Professor of Computer Science in the College of Engineering at the University of Illinois at Chicago. She holds a PhD degree in Computer Science from the University of Toronto. She has received numerous awards for teaching, service, and research including the NSF Career Award and recently the Great Cities Institute Scholar Award in 2009. She has more than 90 refereed publications in databases, geographic information systems, semantic web, visual languages, graph drawing, user interfaces, multimedia, information retrieval, and security. The broader impact of her research has been demonstrated in several real-world scenarios, including biomedical and geospatial interdisciplinary applications, and in many teaching innovations.
She has been general chair, program committee chair or program committee vice/associate chair of several of the most important conferences in her fields of expertise, such as IEEE ICDE, ACM GIS, International Conference on GeoSpatial Semantics, ACM Multimedia, ISWC, and ESWC and served on the program committees of more than 130 conferences. She has been editor to several books and journals and has served in strategic committees including the Mapping Science Committee of the National Academies.
She directs the Advances in Information Systems (ADVIS) Lab, has graduated sixteen students in their PhD and MS theses, and has published with fourteen of her undergraduate research advisees. Her leadership roles at UIC include being the CS facilitator of the NSF Advance WISEST program (to increase the number, participation and leadership status of women in academic science and engineering), the chair of the CS faculty search committee, and member of the dean of engineering search committee twice.
To integrate information, data in different formats, from different, potentially overlapping sources, must be related and transformed to meet the users' needs. Ten years ago, Clio introduced a new paradigm for creating declarative schema mappings to describe the relationship between data in heterogeneous schemas. This enabled powerful tools for mapping discovery and integration code generation, greatly simplifying the integration process. In this talk, I take a look at where our field was a decade ago and where it is now in terms of support for data integration. I share a vision for raising the level of abstraction further, to better isolate applications from the details of how the integration is accomplished. Integration independence allows applications to be independent of how, when, and where information integration takes place, making materialization and the timing of transformations an optimization decision that is transparent to applications. I identify a number of research challenges that remain to be addressed in order to ultimately achieve this vision.
Renee J. Miller is a professor of computer science and the Bell Canada Chair of Information Systems at the University of Toronto. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Premier's Research Excellence Award, and an IBM Faculty Award. She is a fellow of the ACM. Her research interests are in the efficient, effective use of large volumes of complex, heterogeneous data. This interest spans data integration and exchange, inconsistent and uncertain data
management, and data curation and cleaning. She serves on the Board
of Trustees of the VLDB Endowment and was elected to serve as VLDB
President beginning January 2010. She serves on the editorial board for
the VLDB Journal, IEEE's TKDE, and the new JIDM. She received her PhD
in Computer Science from the University of Wisconsin, Madison and bachelor's degrees in Mathematics and Cognitive Science from MIT.
Self-Indexing XML
Gonzalo Navarro, Universidad de Chile
Self-indexing is a technology that integrates text compression and text indexing, such that a text collection can be simultaneously compressed and indexed. The resulting representation, called a self-index of the text, takes space close to that of the compressed text, is able of reproducing any text substring, and offers indexed searching of the collection. This has been a major breakthrough in the last decade, allowing one to handle huge text collections within main memory and representing them succinctly.
In this talk I will, besides presenting the basics of this technique, discuss how it can be extended to index XML collections, where, on one hand, the text has structure and, on the other hand, we wish to support much more complex query languages, XPath at least. I will first describe two plug-and-play solutions, where a text self-index is coupled with compressed data structures that handle trees and sequences. Then I will introduce two more innovative solutions, where the compressed data structures are designed specifically for this problem. This area is full of open challenges and I will conclude by enumerating the most relevant ones.
Gonzalo Navarro earned his PhD in Computer Science in 1998 from the University of Chile, where he is currently Full Professor. His areas of interest include algorithms and data structures, text searching, compression, and metric space searching.
He is currently a researcher at the Millennium Institute for Cell Dynamics and Biotechnology; and runs a Fondecyt project funded by CONICYT (the Chilean agency for research funding). He has headed the Center for Web Research (a Millennium Nucleus funded by the Chilean government); RIBIDI, a Latin American project funded by CYTED; a project funded by Yahoo! Research; and other three Fondecyt projects. He has participated in several research projects such as an ECOS/CONICYT (Chile-France cooperation), AMYRI (a previous CYTED project), and five other national projects (Fondecyt and Fondef).
He is or has been PC (co-)chair of several conferences: SPIRE 2001, SCCC 2004 and a track of ENC 2007 (published by IEEE CS Press); SPIRE 2005 and IFIP TCS 2006 (published by Springer); and SIGIR 2005 Posters (published by ACM). He co-created conference SISAP in 2008 (published by IEEE CS Press). He is member of the Steering Committee of LATIN and SISAP, and of the Editorial Board of Information Retrieval journal, and of ACM Journal of Experimental Algorithmics. He has been PC member of 45 national and international conferences and reviewer for around 40 international journals. He has given around 45 invited talks at several universities and international conferences, including six plenary talks in conferences.
He has coauthored a book published by Cambridge University Press, 15 book chapters, 6 international conference proceedings (editor), a special issue in an international journal (editor), 78 papers in international journals, 142 papers in international conferences, and 24 in national and regional conferences. According to Thompson's ISI, two of his papers rank within the 1% most cited in Computer Science. According to ResearchIndex, he is ranked at position 2,387 among the most cited authors in Computer Science, first within Chile.
Database transformations (queries, views, mappings) take apart, filter, and recombine source data in order to populate warehouses, materialize views, and provide inputs to analysis tools. As they do so, applications often need to track the relationship between parts and pieces of the sources and parts and pieces of the transformations' output. This relationship is what we call database provenance.
This tutorial presents an approach to database provenance that relies on two observations. First, provenance is a kind of annotation, and we can develop a general approach to annotation propagation that also covers other applications, for example to uncertainty and access control.
In fact, provenance turns out to be the most general kind of such annotation, in a precise and practically useful sense. Second, the propagation of annotation through a broad class of transformations relies on just two
operations:
one when annotations are jointly used and one when they are used alternatively.
This leads to annotations forming a specific algebraic structure, a commutative semiring.
The semiring approach works for annotating tuples, field values and attributes in standard relations, in nested relations (complex values), and for annotating nodes in
(unordered) XML. It works for transformations expressed in the positive fragment of relational algebra, nested relational calculus, unordered XQuery, as well as for Datalog, GLAV schema mappings, and tgd constraints. Specific semirings correspond to earlier approaches to provenance, while others correspond to forms of uncertainty, trust, cost, and access control.
This is joint work with J.N. Foster, T.J. Green, Z. Ives, and G.
Karvounarakis,
done in part within the frameworks of the Orchestra and pPOD projects.
Workshop Venue
Map information:
View AMW 2010 in a larger map
Travel and Accomodation
Getting There
The Ministro Pistarini International Airport is located 22 km (13.6 miles) south-southwest of
Buenos Aires, the capital of Argentina. The airport covers an area of 3475 hectares (8586 acres)
and it is operated by Aeropuertos Argentina 2000 S.A.
The airport is named after a general and politician (1882-1956), but is more commonly known as
Ezeiza International Airport because of its location in the city of Ezeiza in Buenos Aires
surroundings. It is the country's largest international airport and a hub for the international
routes of Aerolineas Argentinas.
The airport Ezeiza International Airport was recently expanded, and is located about 30 minutes
away form downtown. To get downtown you can book an airport transfer, take a bus or a taxi.
It relies on the most suitable and advanced technology and infrastructure to operate large aircraft.
Thus, Ezeiza International Airport has become the most important airport of the country and handles 90%
of all the international air traffic arriving to and leaving from Argentina.
AMW Official Hotels
AMW 2010 has made special arrangements with the following hotels in Buenos Aires, offering special rates for people registered at AMW.
See AMW 2010 map.
To get the special rates for AMW, you must contact the official travel agency,
Endless Emotions.
Due to the limited number of rooms at these special rates, we suggest to make your reservations as soon as posisble.
-
Melia Buenos Aires Hotel is a 5-star hotel located in the Puerto Madero district, just 7 km from the Ministro Pistarini Airport. This highly luxurious accommodation offers a perfect location close to San Martin Square and Florida Street.
Standard room:
Spacious, bright and comfortable rooms measuring approximately 42 metres squared. All with views of the city and designed in a classic, welcoming style. Equipped with broadband Internet access and WiFi. There are also the Azorin Restaurant, the Machado Cafe & Restaurant, and the Gongora lobby bar.
Services: 24 hour room service, Wake-up call (via Serviexpress and automatic programming), Serviexpress. - customer care service, Express laundry service, Shoe-shine service, Babysitting service.
Equipment:
Broadband Internet connection and WiFi, 29. TV
Pay per view, Radio clock, 2 telephone lines, Minibar, Safe, Bathrobes, Individually controlled central heating, Smoke detectors and sprinklers
Desk, Kettle, Thermal/acoustic system (double glazing), Bathroom ( Bathtub, Bathroom supplies, Hairdryer, Telephone).
Rate (double or single room): U$S 130 plus tax (21%); Includes buffet breakfast and access to the Health Center
-
The Hilton Buenos Aires hotel is ideally situated in the Puerto Madero District, one of the newest and most picturesque neighborhoods in the city. Being the hotel in Puerto Madero, you'll be just a short walk to the financial district, prominent tourist attractions, limitless restaurants and nightlife, with quick access routes to both the domestic and international airports.
Contemporary in design and style the Hilton Hotel Buenos Aires boasts a breathtaking glass atrium lobby which stands seven stories high and is covered with a 7,535 square feet glass roof plus two additional Executive Floors which complete the architectural facade.
Deluxe room:
The 42sq.m./452sq.ft. Deluxe Rooms are contemporary designed in which you may find a comfortable work area with high-speed internet access; LCD 32inch; multiple-line phones; an electronic safe which perfectly fits your laptop and valuable possessions; a spacious walk-in closet and beautiful marble bathrooms.
Rate (double/single room): U$S 220 plus tax (21%)
-
Located in downtown Buenos Aires and just a few steps away form the most outstanding points of interest in the city, whether tourist, business or cultural (the Stock Exchange, the financial area, cinemas, theatres, shoppings malls and the reclaimed Puerto Madero).
Room:
Telephone with direct dialing, Fax, DDI and DDN, Cable television, Internet Access, Air conditioning and central heating, Bar, Concierge and Room service 24 hours.
Rate (double or single room): U$S 69 ; Includes breakfast
-
Room:
Rate (superior double or single room): U$S 88 plus tax (21%); Includes
buffet breakfast and access to facilities
-
Room:
Rate (double or single room): U$S 110 plus tax (21%); Includes buffet
breakfast and Internet Wi-Fi access
The Official Travel Agency of the Workshop is Endless Emotions.
- Maipu 726 3rd floor, of. "B"
- Ph: +5411-43285376-77
- Fax: +5411-52566526
- Contact:
Local Information
About Argentina
Located in South America, Argentina has an area of almost 3.8 million square kilometers, 2.8
on the continent – approximately 54% are plains (grasslands and savannahs), 23%, plateaus,
and the other 23%, mountains - and the remainder in the Antarctic. It is 3,800 Km. long and is
located between latitude 22º and 55º. Its frontier with Uruguay, Brazil, Paraguay, Bolivia
and Chile has a total perimeter of 9,376 Km, while the territory bordered by the Atlantic Ocean
is 4,725 Km long.
Geography
Argentina’s main characteristic is the enormous contrast between the immense eastern plains and
the impressive Andes mountain range to the west. This is the frontier with Chile and boasts the
highest peak in the Western hemisphere: the 6,959m high Aconcagua.
From Jujuy to Tierra del Fuego, the Andes present marvelous contrasts: the northwest plateaus,
the lake region, the forests and glaciers in the Patagonia.
To the north, Chaco is a forested area linked to rivers Bermejo, Salado and Pilcomayo.
Between the Paraná and Uruguay rivers, the Argentine Mesopotamia (provinces of Entre Ríos,
Corrientes and Misiones) is formed by low hills, where pools and marshlands evidence the ancient
courses of these great rivers. In some places within the subtropical rain forest, there are
fissures which provide such spectacular phenomena as the Iguazú Falls.
The Pampas, in the center of Argentina, is the largest and best-known area of plains. Agricultural
and livestock activities are performed in this area, which includes the province of Buenos Aires,
the northeast of La Pampa, the south of Córdoba and south of Santa Fe. To the south, the plains
give way to small hills in Tandil and de la Ventana, and to the west, to the Córdoba hills.
Towards the south, from the Andes to the sea, there appear the stony plateaus of Patagonia,
swept by the wind during most of the year. The Atlantic coast, lined with high cliffs,
forms massive indentations like the Peninsula Valdés, with its spectacular and unique colonies
of sea animals.
Climate
The country’s territory offers a wide variety of climates: subtropical in the North, sub-Antarctic
in the southern Patagonia, and mild and humid in the Pampas plains. Media temperature from November
to March is 23° C, and 12° C from June to September.
Population
Argentina’s current population is more than 38 million inhabitants, almost half of which live in
the city and the province of Buenos Aires. Population density calculated on a national basis is
14 inhabitants per square kilometer.
Language
Spanish is the official language of the Argentine Republic. In Buenos Aires, some “lunfardo”
expressions -city slang - are used.
About Buenos Aires
Buenos Aires, the capital city of Agentina, is the starting point for any visit to Argentina.
It is a cosmopolitan city with vibrant nightlife. Buenos Aires doesn't sleep.
Enjoy its cultural events, main theatres, cinemas, architectural glories and the magical tango shows.
Moving around
-
• Subway or Metro:
The Subte (as it is called in Buenos Aires) or Metro, is the faster, more frequent (every 5 minutes) and safest public transport in the city.
Buenos Aires was the first Latin American city with Subways transport (since 1913). In that decade only 10 countries had underground network: England, France, Turkey, Austria, Hungary, Germany, Greece, USA and Argentina. The current network of Buenos Aires consists of 5 lines A, B, C, D and E. The A Line (Alem to Caballito) is the network that preserves ancient and historic trains.
The ticket costs USD 0.40. It is open from 5:30 am to 11:00 pm.
-
• Urban Bus:
The urban bus transport or commonly known as "Colectivos" has a network of more than 150 buses lines and more than 15,000 buses flowing through Buenos Aires streets. A vast majority of buses are customized to receive handicapped passengers
-
• Railway:
The train is the fastest and most widely used to enter to the city. From the Retiro Train Station you can travel to the B.A. north zone, from Constitution Station, to the south of Buenos Aires province, and from Lacroze Station to the northwest zone.
-
• Taxis:
Taxis are easy to identify because they are painted yellow and black (roof), and there are 33 thousands in the city. A 10 minutes ride costs about USD 7.
-
• Remises:
It is a service like cabs but with the difference that cost is calculated only in terms of kilometers traveled, and It is request only through the phone. They do not have the typical yellow- black of the taxis.