#archiveteam-bs 2014-01-23,Thu

↑back Search

Time Nickname Message
05:02 πŸ”— arkhive http://www.gazopa.com/ just found out about this
07:40 πŸ”— arkiver I have been searching for a few dayd now and didn't found it...
07:40 πŸ”— arkiver does someone here have a tool or know how to extract data from a few hundred xml's?
07:45 πŸ”— arkiver http://pastebin.com/2gfBGmkK
07:45 πŸ”— arkiver this is one of the xml files:
07:46 πŸ”— arkiver then there is this line:
07:46 πŸ”— arkiver <meta content="Verdrag tot regeling van zekere wetsconflicten ten aanzien van cheques; (met Protocol) Genève, 19 maart 1931" name="DC.title" scheme=""/>
07:46 πŸ”— arkiver and I would like to extract Verdrag tot regeling van zekere wetsconflicten ten aanzien van cheques; (met Protocol) Genève, 19 maart 1931 from it
07:46 πŸ”— arkiver and that also needs to be done for the other lines, like the lines with the dates
07:46 πŸ”— arkiver someone knows how I can do this?
07:55 πŸ”— joepie91 arkhive: you could use for example Python + lxml for that
07:55 πŸ”— joepie91 using xpath
07:59 πŸ”— w0rp +1 for lxml
08:00 πŸ”— joepie91 arkhive, warning ahead: lxml is dependency hell, probably even moreso on Windows
08:00 πŸ”— joepie91 but once you get it running, and figure out how to use it despite their godawful docs
08:00 πŸ”— joepie91 it'll work beautifully
08:00 πŸ”— joepie91 lol
08:00 πŸ”— w0rp Wait a minute, I looked at the XML. > valign="top"
08:00 πŸ”— w0rp I don't want to know.
08:04 πŸ”— arkiver joepie91 and w0rp: thank you, I have to go now, but I'll try it out in 4 hours...
08:04 πŸ”— arkiver if y:D
08:05 πŸ”— arkiver I'll let you know how it goes then
08:06 πŸ”— joepie91 I... am pretty sure that's not valid in XHTML
08:06 πŸ”— joepie91 lol
08:06 πŸ”— joepie91 let me guess
08:07 πŸ”— joepie91 "oh, we need to upgrade to XHTML, the newest technology... let me just add / before the end of every single tag"
08:07 πŸ”— joepie91 arkiver: alright :P
08:07 πŸ”— joepie91 also, sorry arkhive for dinging you again :(
08:11 πŸ”— w0rp "This CSS thing is just a fad, and there's no way future browsers will be able to parse it efficiently in the future. We need to base everything on XML. Styles can go on the elements, and if you want to change the style you can tranform the style attributes with XSLT. It will be very elegant! You'll be able to use the same parser for everything!"
08:11 πŸ”— w0rp (I hope that this never happened in reality.)
08:12 πŸ”— w0rp (Hidden comedy being that it would eventually apply styles via CSS through JavaScript.)
08:28 πŸ”— joepie91 w0rp: I... would not discount the possibility of that actually having occurred
08:28 πŸ”— joepie91 :|
08:28 πŸ”— joepie91 and that is a very disconcerting thought
13:06 πŸ”— joepie91 ???
13:06 πŸ”— joepie91 http://news.cnet.com/8301-1023_3-57617497-93/yahoo-tops-the-most-trafficked-web-site-list-for-desktop-in-us/
13:13 πŸ”— godane i think thats cause of yahoo toolbar or something
13:13 πŸ”— godane it just doesn't die
13:17 πŸ”— midas it's because we backup ALL the data
13:17 πŸ”— rduser http://www.usatoday.com/story/life/people/2014/01/23/justin-bieber-arrested-for-drag-racing-dui-in-miami/4792013/
13:18 πŸ”— rduser this is the best thursday ever
13:18 πŸ”— midas rduser: did you grab it yet
13:18 πŸ”— rduser grab the story? nah
13:18 πŸ”— rduser it's in my head now.
13:19 πŸ”— godane i seent it to archivebot
15:08 πŸ”— Aranje drag racing? now we get to see how much privilege he has. that should autoqualify him for deportation
15:08 πŸ”— Dud1 at 4am in the morning XD
15:08 πŸ”— Aranje oh, well at least he was doing it at the right time
15:08 πŸ”— Aranje lol
15:58 πŸ”— joepie91 SketchCow: https://www.youtube.com/watch?v=xt7spKKbNo4
15:58 πŸ”— joepie91 this seems like something worth featuring in the console living room
15:59 πŸ”— joepie91 but a quick search gave me 0 results :P
16:51 πŸ”— Baljem hmm. is anybody archiving absolutely god-awful banner ads? http://cdn.adnxs.com/p/95/94/6a/8a/95946a8a70cb9ca37ffd4d66ce6a2da1.gif looks like a doozy ;)
16:52 πŸ”— Dud1 ow my eyes
16:52 πŸ”— Baljem tell me about it... sorry, I saw that one on eBay and just had to share
16:54 πŸ”— joepie91 You attempted to reach cdn.adnxs.com, but instead you actually reached a server identifying itself as a248.e.akamai.net.
16:54 πŸ”— joepie91 lol
16:54 πŸ”— joepie91 LOL
16:54 πŸ”— joepie91 oh god
16:54 πŸ”— joepie91 one of those
16:54 πŸ”— joepie91 they're still around?!
16:56 πŸ”— Baljem the ol' mental filter usually doesn't even notice them these days fnord, but that one is so obnoxious it managed it. I suppose someone's proud...
16:58 πŸ”— joepie91 hahaha
16:58 πŸ”— joepie91 Baljem: adblock :)
16:59 πŸ”— Baljem yeah, it's got to a point that adblock isn't so much a workaround for the odd annoyance, but more essential for the ol' blood pressure
17:00 πŸ”— Baljem I should probably install it on the office machine...
19:51 πŸ”— arkhive joepie91: no worries. heh.
19:51 πŸ”— arkhive I'm taking a SQL intro class
19:51 πŸ”— arkhive kinda neat
19:52 πŸ”— arkhive err.. i gotta restart my imac for some stupid java update but Colloquy IRC is a pain to reconnect(hard to figure out)
23:25 πŸ”— arkhive I offered this guy 35 USD for it and he counter offered at 45 USD
23:25 πŸ”— arkhive the shirt
23:26 πŸ”— arkhive http://www.ebay.com/itm/WebTV-Internet-T-Shirt-Philips-Magnavox-Promo-Clothing-NOS-Brand-New-XL-/390493732777?pt=LH_DefaultDomain_0&hash=item5aeb3d7fa9&autorefresh=true
23:26 πŸ”— arkhive it's buy it now at 50 USD
23:35 πŸ”— nico_32 yeeah
23:35 πŸ”— nico_32 oea.larc.nasa.gov is in the archive

irclogger-viewer