[05:02] http://www.gazopa.com/ just found out about this [07:40] I have been searching for a few dayd now and didn't found it... [07:40] does someone here have a tool or know how to extract data from a few hundred xml's? [07:45] http://pastebin.com/2gfBGmkK [07:45] this is one of the xml files: [07:46] then there is this line: [07:46] [07:46] and I would like to extract Verdrag tot regeling van zekere wetsconflicten ten aanzien van cheques; (met Protocol) Genève, 19 maart 1931 from it [07:46] and that also needs to be done for the other lines, like the lines with the dates [07:46] someone knows how I can do this? [07:55] arkhive: you could use for example Python + lxml for that [07:55] using xpath [07:59] +1 for lxml [08:00] arkhive, warning ahead: lxml is dependency hell, probably even moreso on Windows [08:00] but once you get it running, and figure out how to use it despite their godawful docs [08:00] it'll work beautifully [08:00] lol [08:00] Wait a minute, I looked at the XML. > valign="top" [08:00] I don't want to know. [08:04] joepie91 and w0rp: thank you, I have to go now, but I'll try it out in 4 hours... [08:04] if y:D [08:05] I'll let you know how it goes then [08:06] I... am pretty sure that's not valid in XHTML [08:06] lol [08:06] let me guess [08:07] "oh, we need to upgrade to XHTML, the newest technology... let me just add / before the end of every single tag" [08:07] arkiver: alright :P [08:07] also, sorry arkhive for dinging you again :( [08:11] "This CSS thing is just a fad, and there's no way future browsers will be able to parse it efficiently in the future. We need to base everything on XML. Styles can go on the elements, and if you want to change the style you can tranform the style attributes with XSLT. It will be very elegant! You'll be able to use the same parser for everything!" [08:11] (I hope that this never happened in reality.) [08:12] (Hidden comedy being that it would eventually apply styles via CSS through JavaScript.) [08:28] w0rp: I... would not discount the possibility of that actually having occurred [08:28] :| [08:28] and that is a very disconcerting thought [13:06] ??? [13:06] http://news.cnet.com/8301-1023_3-57617497-93/yahoo-tops-the-most-trafficked-web-site-list-for-desktop-in-us/ [13:13] i think thats cause of yahoo toolbar or something [13:13] it just doesn't die [13:17] it's because we backup ALL the data [13:17] http://www.usatoday.com/story/life/people/2014/01/23/justin-bieber-arrested-for-drag-racing-dui-in-miami/4792013/ [13:18] this is the best thursday ever [13:18] rduser: did you grab it yet [13:18] grab the story? nah [13:18] it's in my head now. [13:19] i seent it to archivebot [15:08] drag racing? now we get to see how much privilege he has. that should autoqualify him for deportation [15:08] at 4am in the morning XD [15:08] oh, well at least he was doing it at the right time [15:08] lol [15:58] SketchCow: https://www.youtube.com/watch?v=xt7spKKbNo4 [15:58] this seems like something worth featuring in the console living room [15:59] but a quick search gave me 0 results :P [16:51] hmm. is anybody archiving absolutely god-awful banner ads? http://cdn.adnxs.com/p/95/94/6a/8a/95946a8a70cb9ca37ffd4d66ce6a2da1.gif looks like a doozy ;) [16:52] ow my eyes [16:52] tell me about it... sorry, I saw that one on eBay and just had to share [16:54] You attempted to reach cdn.adnxs.com, but instead you actually reached a server identifying itself as a248.e.akamai.net. [16:54] lol [16:54] LOL [16:54] oh god [16:54] one of those [16:54] they're still around?! [16:56] the ol' mental filter usually doesn't even notice them these days fnord, but that one is so obnoxious it managed it. I suppose someone's proud... [16:58] hahaha [16:58] Baljem: adblock :) [16:59] yeah, it's got to a point that adblock isn't so much a workaround for the odd annoyance, but more essential for the ol' blood pressure [17:00] I should probably install it on the office machine... [19:51] joepie91: no worries. heh. [19:51] I'm taking a SQL intro class [19:51] kinda neat [19:52] err.. i gotta restart my imac for some stupid java update but Colloquy IRC is a pain to reconnect(hard to figure out) [23:25] I offered this guy 35 USD for it and he counter offered at 45 USD [23:25] the shirt [23:26] http://www.ebay.com/itm/WebTV-Internet-T-Shirt-Philips-Magnavox-Promo-Clothing-NOS-Brand-New-XL-/390493732777?pt=LH_DefaultDomain_0&hash=item5aeb3d7fa9&autorefresh=true [23:26] it's buy it now at 50 USD [23:35] yeeah [23:35] oea.larc.nasa.gov is in the archive