[00:14] I am tryin to convert á to a, but trying to find and replace á with \xc3 doesn't work. [00:19] I can get í replaced by replacing \xed, but á won't work. [00:22] if it's utf-8 you may have to do \xc3\xa1 [00:34] That didn't work. [00:37] something very interesting: http://mrtg.cbsig.net/rrd/html/ [00:37] we now have traffic of cbsnews.com videos [00:47] cool [02:11] var old_date = "20020225";/*Any video before this date will display legacy real video clips: 20, 80 speeds*/ var cut_date = "20031120";/*Any video equal to or greater than this date will get windows media files*/ [02:11] thats the reason way every before 20031120 can't be found [02:30] http://imgur.com/a/PETBA [02:49] so looks like the old real media files on cbsnews disappeared in fall of 2005 i think [02:50] in early 2005 it wayback machine could get them [03:12] The English Language's longest work of literature: Smash Bros fanfiction (https://www.fanfiction.net/s/4112682/) [03:12] 3,592,814 words in 209 chapters [03:27] someone cataloged every occurence of computers showing up in Law & Order: http://www.theverge.com/culture/2014/2/3/5373888/machinery-of-justice-20-years-of-computers-on-law-order [04:33] dashcloud: more obsessive cataloguing: http://youtu.be/PIGxMENwq1k [06:35] SketchCow: now it starts: https://archive.org/details/cbsnews.com-video-2003-11-20 [06:35] i'm doing it this way to keep it neat [06:43] they most have started the online edition of cbsnews at the very beginning of 2005 it looks like [06:46] I need some help here [06:47] apparently the warc's created by the program https://github.com/odie5533/WarcMiddleware are not well gzipped [06:47] There should be a quick way to fix some of the code the make the warc's work in the wayback machine [06:48] but I'm not experienced with coding, so I don't know how to fix the issue [06:48] specifically the requests should *not* request gzip encoded content [06:48] could someone please take a look at the code and try to find out what needs to be changed? [06:48] I would be very happy about that [06:49] and then I can continue the my opera download [06:56] there should be some sort of magical config in scrapy.cfg or crawltest/settings.py to disable it [06:57] i might be "COMPRESSION_ENABLED = False" [06:58] https://i.imgur.com/vBgqBBV.jpg [07:00] chfoo: yes, hopefully someone can find out what's wrong with script and how to turn it off, the GZip [07:12] chfoo! [07:12] This one? [07:12] self.use_gzip = True [07:12] :D [07:15] need to go to school... can't test it now [07:15] will do it when I'm back [07:20] i'm starting my big upload of ImagineFX dvds [07:20] its about 64gb [12:11] xmc: looking at the video you passed along, I see this one in the sidebar: http://www.youtube.com/watch?v=ZPoqNeR3_UA Star Trek TNG Ambient Engine Noise (Idling for 24 hrs) - is that the longest Youtube video ever? [12:23] dashcloud: http://www.youtube.com/watch?v=YwtX4gW3-xU 36 hours long [12:24] it's so long there are ads during the vid [13:30] 3.8T ftp.tu-chemnitz.de [13:30] 5.0T ftp.uni-erlangen.de [13:30] 671G ftp.uni-muenster.de [13:30] 8.8G ftp.warwick.ac.uk [13:30] 429G gatekeeper.dec.com [13:32] still not done... [14:11] ah shit, i forgot to renew archivingyoursh.it [14:12] ugh, i cant get into the account for it [14:12] ill do it tomorrow [14:26] ovh box? [15:14] midas: it's about the domain [15:14] not a server [15:14] :P [15:14] GLaDOS: should I remind you tomorrow? not sure how good you are at mental todo lists [15:14] actually [15:14] .in 1d GLaDOS: renew archivingyoursh.it [15:14] joepie91: Okay, will remind on 05 Feb 2014 at 15:14Z [15:14] :P [15:14] nothing beats a bot, in the field of todo lists! [15:20] lol [15:30] well, a netsplit would beat it [15:50] nothing beats graffiti :D [15:57] SketchCow: i found pdf transcripts of face the nation [18:46] not sure if this was mentioned already: http://chronicle.com/blogs/profhacker/why-not-spare-a-little-bandwidth-for-the-archive-team/55071 [18:56] "It also throttles downloads of the material to limit overloading the dying service." [18:56] haha [19:13] goddamnit why did I click the Disqus link [19:19] yipdw: Disqus is rapidly becoming the IE of comments systems [19:19] *accidentally click IE shortcut on taskbar* [19:19] OH GOD NO [19:19] *frantically tries to get out of IE starting* [19:19] WHY DID I DO THAT [19:19] etc. [19:19] yeah [19:19] luckily Ghostery usually blocks it [19:20] but in this case I had to get all curious [19:20] Hooray for ghostery [19:32] i somehow broke disqus on my system but i dont mind at all [20:33] If a site uses Disqus, I won't comment on that site [20:33] 'cause it's disqusting [20:34] Haha, who made the picture @ http://chronicle.com/blogs/profhacker/why-not-spare-a-little-bandwidth-for-the-archive-team/55071 [20:34] it's awesome [20:43] so what happened with jason's new york library smackdown or do we have to wait for the statute of limitations to run out first [20:45] what about government run archives? [20:45] should we trust that? [20:46] UK government, great example [20:46] http://www.nationalarchives.gov.uk/webarchive/ [20:46] or thailand, not really a archive but it's near a faultline and could be flooded [20:47] with government the main concern is deliberate destruction and there is plenty of precedent on that [20:57] man [20:57] lowendtalk seems to be suffering from a bad case of the edits right now [20:57] topic titles being edited by mods left, right and center [20:57] to make them more politically correct [20:57] (changing it into stuff like "misunderstanding blah blah - got refunded") [20:59] guess it's hurting their relations with the crappy VPS providers [20:59] even one where "allegations of" was prefixed [21:03] they should add "political correct version:" as a prefix ;D [21:26] ersi: I think someone here made it a long time ago [21:27] yup looks like it built ok :) [21:40] ersi: chfoo [21:41] at least according to archiveteam.org's change tracking [21:41] it's possible someone else did it [21:47] oh https://archive.org/details/DigiBarn has started adding materials again [21:57] ersi: i made it. source file: https://github.com/chfoo/cloaked-octo-nemesis/blob/master/dev-docs/archiveteam_warrior_infrastructure.svg [22:00] It's awesome. [22:01] DFJustin: I gave them two days on account of snow [22:01] Was also waiting to make sure Internet Archive didn't hit them first, I try to avoid muddying the pond [22:32] aw