[01:10] hi folks- Eurovision is starting to pop up in my twitter feed- can one of you tell me a little about it? [01:10] sure man [01:10] every country in europe gets to decide whose singers are shit [01:10] true [01:11] actually shit, or just fashionable to say they are? [01:11] take the worst cliché of a country [01:11] dashcloud: wikipedia can probably do a much better job explaining it than I can [01:12] and amplify them [01:12] that's the eurovision [01:13] so, if I'm looking for a talent show, watch American Idol, but if I want to see a bad comedy hour, watch Eurovision? [01:13] eurovision is actually really entertaining [01:13] for some value of entertaining [01:13] they spend a shitpile of money making the fanciest performance they can [01:13] Yes [01:13] For a similar reason, the most insane DJ competitions are the same [01:14] months of prep for a 5 minute routine [01:15] Mixmaster Mike with the scratch routine [01:20] http://www.reddit.com/r/funny/comments/24w2l1/fuck_this_girl_in_particular/ [01:20] hu? [01:23] That's a rave. [01:23] look interesting [02:02] an 800 page book of colors of all types, hundreds of years before the pantone color book: http://www.thisiscolossal.com/2014/05/color-book/ (also viewable online!) [02:08] back in 2007, this is the act that ukraine voted to enter: https://www.youtube.com/watch?v=hfjHJneVonE [02:14] same year, switzerland entered this song, by the guy behind the chihuahua song: https://www.youtube.com/watch?v=0ydRhwnwk-s [02:15] i kinda liked israel's entry: https://www.youtube.com/watch?v=424dX16SObQ [03:05] so i found a way to get video sitemap (maybe) from theguardian.com [03:11] here we go: http://spiderbytes.theguardian.com/sitemap/sitemap-2013.xml [03:11] master sitemap for the the sub-sections sitemaps for 2013 [03:15] fun fact i may have don't theguardian.com before or at least i tryed: https://archive.org/search.php?query=collection%3A%22archiveteam-fire%22%20AND%20%28subject%3A%22www.theguardian.com%22%29 [03:35] so i look at how i did before [03:35] it goes somethng like this url: http://www.theguardian.com/books/2001/dec/30/all [03:36] the problem with the web sitemap is there not all books urls [03:36] some go other sections [03:36] also note that the sitemaps maybe incomplete before 2000 [03:38] they don't even have 1990 sitemap xml when i know there are on the website [04:55] i getting another N64 promo video from myspleen [08:47] ohhdemgir: lots of people on the _albums one [08:48] schbirid: im ordering extra disks tonight [09:06] midas: yay :)) [09:24] midas, both have been seeding around 40MB/s for the last 15 hours or so [10:12] yeah 100mbit box is at 90Mbit all day [10:47] midas, http://www.reddit.com/r/AmateurArchives/comments/24vr5r/rgonewild_history_20092013_torrents/chbq6fh [10:47] oops... [11:18] oops. [11:18] small oops, but still :p [11:18] ohhdemgir: when will the blacklisted archive be posted ;) [11:18] XD [11:18] hahaha [11:18] that's bad (GREAT!!!!) idea [11:18] \o/ [11:20] clearly some people already use that as their treasure list, I get pms about it from time to time [11:22] haha [11:22] wonder why [11:22] is the a wget flag to ignore certain size files? [11:26] also, why? forbidden fruits!! [11:27] it was sarcasm :p [11:28] the truth, people want what they're not meant to have [11:28] as always [11:32] http://i.imgur.com/PIusERE.jpg [11:34] hahaha [11:38] so, Hostdeal Ltd just went belly up [11:38] website has been archived, but no way to tell how many sites it killed [11:42] listen to the dude up to around 1 minutes - https://www.youtube.com/watch?v=7Zpc8VIYppc [12:03] ohhdemgir: whats wrong with this picture? http://i.imgur.com/lNG6XeH.png [12:03] ;-) [12:03] XD [12:04] it's so limited to 100mbit :< [12:06] RX bytes:26265137418752 (26.2 TB) TX bytes:91754905220702 (91.7 TB) [12:06] 14:06:09 up 11 days, 20:41, 2 users, load average: 7.49, 7.88, 7.96 [12:06] uptime [12:06] lol [12:07] thats pritty badass [12:09] I get messages now and again from the host with things like "Hey you managed to stay under 300TB this month, well done.." [12:09] lol [12:10] nice [18:05] nice, android pirates are uploading straight to ia now https://archive.org/details/Androtreasure.net_20140507_1722 [18:06] saves us some work [18:09] http://techcrunch.com/2014/05/07/watch-michael-arringtons-fireside-chat-with-marissa-mayer-here-at-200-pm-edt/ [18:33] lol DFJustin [18:46] hahaha [18:46] :) [19:29] Open Source Software > [19:29] of course [19:30] there are strange things on IA [19:30] https://archive.org/details/RA320 [19:30] someone backup? [19:33] I was gonna do a tumblr but then I never got around to updating http://weirdshitonarchivedotorg.tumblr.com/ [19:35] the worst thing, i have a folder like that on my storage box [19:35] downloads folder of mobile device [19:36] hm, that anal game says it is darked but it isnt [19:38] i'll mail info@ about that comment line [19:38] there are 40k items according to google [19:39] they were darked by mistake and then undarked [19:41] oh [19:41] too late, mail sent :) [19:43] https://www.google.com/search?q=lock_delete_darke_user.py [20:09] lol, some bug "Uncompressed size: 3307158438050 MB (3467806966336326157 bytes)" [20:11] zip bomb? [20:11] seems legit [20:13] looked at a tar.lzo file with lzmainfo but lzma != lzo apparently :) [20:15] lzop, yeah [20:16] yeah [21:26] I've handed the upcoming /join #livingroom [22:29] anyone got - https://www.fanfiction.net/ [22:31] iirc we did a crawl of ff.n about a year ago [22:39] exmic, - http://www.reddit.com/r/DataHoarder/comments/245ij1/start_your_own_rgonewild_archive_automated_data/chc6shy?context=3 [22:39] "Ffnet throttles severely" [22:40] would be nice to get it again but ain't no one got time the that limiting!! [22:43] the current ffnet downloader code is waiting for 30s every n requests [22:44] fuck yeah, im going to archive some internet gold [22:45] s/30/3/ [22:45] 00:45:01 ⏚ [nico@Gallifrey:/home/nico/Developpement/DeFFNetIzer] master 2 ± grep sleep deffnet.py time.sleep(3.0) [22:58] Is there a difference between archiving and scraping? I've always thought archiving was preserving everything about a site, and scraping was just keeping the bits you want, but I could be wrong [23:03] tsp__: nowaday you've to scrape the website because everythings is loaded by ajax and other javascript monstruosities [23:35] I would say scraping is pulling information off of a site in an automated way, that it wasn't intended for [23:36] like scraping book titles off of amazon [23:36] whereas archiving is just saving unmodified copies for later use [23:37] For example, I want to pull forum posts out of a forum. I don't have any experience with the archiveteam scripts to actually archive it properly, but I just want its posts; I can do that trivially with python and requests. I guess that would be scraping