[00:05] Vierzon has finished processing! [00:11] *** danneh_ has quit IRC (Ping timeout: 633 seconds) [00:31] awesome [00:35] Yeah, let's try to avoid that nightmare going forward. [00:38] Going to update the Pycon video set, should I upload the chunks into the original item, or create new items for each chunk? [00:46] *** primus104 has quit IRC (Leaving.) [00:59] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [01:02] *** ete_ has quit IRC (Remote host closed the connection) [01:03] *** ete_ has joined #archiveteam [01:10] *** abartov has quit IRC (Ping timeout: 258 seconds) [01:16] *** danneh_ has joined #archiveteam [01:19] anyone at ohio state university? [01:22] *** mistym has quit IRC (Remote host closed the connection) [01:30] *** abartov has joined #archiveteam [01:32] *** ruukasu has joined #archiveteam [01:44] *** APerti_ has quit IRC (Ping timeout: 258 seconds) [01:47] *** mistym has joined #archiveteam [01:55] *** thechip has quit IRC (Read error: Connection reset by peer) [02:07] *** abartov has quit IRC (Read error: Operation timed out) [02:17] *** APerti has joined #archiveteam [02:20] *** mistym has quit IRC (Leaving...) [02:23] *** abartov has joined #archiveteam [02:24] *** Fangrille has joined #archiveteam [02:24] *** abartov has quit IRC (Client Quit) [02:32] *** Fangrille has quit IRC (Ping timeout: 335 seconds) [03:27] *** mistym has joined #archiveteam [03:29] *** Emcy has quit IRC (Read error: Connection reset by peer) [03:53] *** w0rp_ has joined #archiveteam [03:53] *** w0rp has quit IRC (Read error: Operation timed out) [03:53] *** w0rp_ is now known as w0rp [04:06] *** APerti_ has joined #archiveteam [04:08] *** APerti has quit IRC (Ping timeout: 370 seconds) [04:22] *** Ymgve has quit IRC () [04:28] so I got word that personal sites under astro.temple.edu will likely go away soon [04:28] how do we go about archiving these? [04:29] probably not for some time though [04:57] *** robink has quit IRC (Read error: Connection reset by peer) [05:02] *** aaaaaaaaa has quit IRC (Leaving) [05:06] *** robink has joined #archiveteam [05:12] *** mistym has quit IRC (Remote host closed the connection) [05:48] *** Smiley has quit IRC (Read error: Operation timed out) [05:50] *** mistym has joined #archiveteam [05:55] *** mistym_ has joined #archiveteam [05:58] *** curious_g has joined #archiveteam [05:58] hello [05:59] hey. [05:59] you guys might be interested in this https://news.ycombinator.com/item?id=8720064 [05:59] mit is pulling some ocw stuff [06:01] *** mistym has quit IRC (Read error: Operation timed out) [06:05] Is there a way to check which video file is the original and which are derived from that on archive.org? [06:05] I'd like to save the ones MIT is pulling just in case they get removed form there [06:14] bah i'll just save the lot [06:16] *** Smiley has joined #archiveteam [06:18] curious_g: if you look at the _files.xml file for the item it will say either source="original" or source="derivative" for each file [06:18] thanks [06:20] Should i write a script to do the job? ie python script read xml, save any file without derivative associated with it? [06:20] there's https://pypi.python.org/pypi/internetarchive which would make that a lot easier [06:22] thanks, I'll have a look at that. I was just thinking of using an xml parsing library and mechanize. [06:22] if you use the beta interface (add /v2 to the url) on the download page there's a link to download a zip file of the item, originals only [06:22] Fantastic! [06:23] Shouldn't that be on the wiki page for IA? [06:26] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [06:26] yahoosucks [06:26] yahoo sucks [06:27] wait, without the space :p [06:27] also, good morning DFJustin [06:27] thanks, i'll add that info to the wiki [06:33] *** Start is now known as StartAway [06:37] updated http://archiveteam.org/index.php?title=Internet_Archive [07:18] *** ete_ has quit IRC (Remote host closed the connection) [07:26] *** primus104 has joined #archiveteam [07:51] this page is very outdated and possibly dangerously so: http://archiveteam.org/index.php?title=Working_with_ARCHIVE.ORG [07:51] if somebody has some time (I don't), could they update that to reflect the current state of things wrt WARC etc? [07:59] Really? http://archiveteam.org/index.php?title=Internet_Archive&action=historysubmit&diff=20976&oldid=20592 [08:00] You just have to click the HTTPS link (and yes it's not obvious) [08:00] *** mistym_ has quit IRC (Remote host closed the connection) [08:02] joepie91: please mark the outdated parts as such [08:03] Or just remove them if they're doing more harm than good, instead of complaining on IRC :) [08:04] *** APerti_ has quit IRC (Read error: Operation timed out) [08:05] *** Nemo_bis sets mode: +oooo aNthraXx arkiver balrog Kenshin [08:05] *** Nemo_bis sets mode: +ooo norbert79 norbert79 tephra [08:05] *** balrog sets mode: +o Lord_Nigh [08:05] *** brayden has quit IRC (Read error: Operation timed out) [08:07] Nemo_bis: the entire page is outdated, and I was unable to find a "this article is outdated" template [08:07] Here is one: [08:07] :''This page is outdated!'' [08:08] which exactly nobody is going to notice and it will thus not be updated [08:08] which is what the 'outdated' templates are normally used for [08:08] because those keep track of what pages they're used on [08:09] *yawn* [08:09] this posting it here because I don't have the time to fix up the entire page [08:09] and most likely *somebody* else in here does [08:11] There needs to be a guide on the wiki for home users on how to help you guys out [08:11] i'd give a few hunbdred gigs of space and transfer quota if i knew how [08:11] joepie91: easier done than said http://archiveteam.org/index.php?title=Working_with_ARCHIVE.ORG&action=historysubmit&diff=20978&oldid=16717 [08:12] I guess "wiki" is still a word which most of the world doesn't understand [08:13] I don't edit wikis much because i know bugger all of the syntax and don't want to fuck thigns up [08:13] Nemo_bis: any particular reason you're being so bitter? [08:15] curious_g: please don't worry about the syntax, you can't break anything [08:15] ok. also is that warrior thing suitable for someone on adsl? [08:16] curious_g: yes [08:16] okay, i'll look into setting it up on when i'm not feeling sick [08:28] *** danneh_ has quit IRC (Quit: Nyan nyan) [08:32] *** APerti has joined #archiveteam [08:33] *** lytv has quit IRC (Read error: Connection reset by peer) [08:35] *** Smiley has quit IRC (Read error: Operation timed out) [08:35] *** T31m_ has joined #archiveteam [08:36] *** xk_id_ has quit IRC (Read error: Operation timed out) [08:37] *** lytv has joined #archiveteam [08:37] *** xk_id has joined #archiveteam [08:40] *** Smiley has joined #archiveteam [08:47] *** T31M has quit IRC (Read error: Operation timed out) [09:09] *** schbirid has joined #archiveteam [09:34] *** APerti has quit IRC (Ping timeout: 272 seconds) [09:42] *** danneh_ has joined #archiveteam [09:47] *** APerti has joined #archiveteam [09:54] *** primus104 has quit IRC (Leaving.) [10:12] *** brayden has joined #archiveteam [10:13] *** APerti has quit IRC (Read error: Operation timed out) [10:15] *** ex-parrot has joined #archiveteam [10:17] *** schbirid has quit IRC (Read error: Operation timed out) [10:32] *** schbirid has joined #archiveteam [10:35] *** Sellyme has joined #archiveteam [11:03] SketchCow: the Wayback Machine archive of the old BSTJ site is pretty incomplete; would there be interest in adding the missing bits? (or have you been asked not to keep it?) [11:07] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [11:13] *** ruukasu has joined #archiveteam [11:34] *** philpem has joined #archiveteam [11:37] *** curious_g has quit IRC (Ping timeout: 240 seconds) [11:40] archive.org itself has a full copy. [11:40] And if you're smart you won't make noise about that, or it'll probably be asked to go down [11:51] SketchCow: the nation's business i think you forgot to move: https://archive.org/search.php?query=subject%3A%22Nation%27s%20Business%22 [11:51] 1090 items [12:15] someone should really archive all the ludum dare entries http://ludumdare.com/ [12:16] i wish they would host them themselves for longevity [12:27] i'm sending it to the archivebot [12:41] *** primus104 has joined #archiveteam [13:03] *** Smiley has quit IRC (Quit: http://www.milkme.co.uk - You'll never understand.) [13:08] *** Smiley has joined #archiveteam [13:30] *** Ymgve has joined #archiveteam [13:34] *** primus104 has quit IRC (Leaving.) [13:38] *** curious_g has joined #archiveteam [13:40] is it ok for me to ask for help connecting to this IRC server here? I can't connect via my IRC client [13:41] curious_g: you can join #archiveteam-bs :) [13:41] should be able to just click the channel name to join [13:41] ok, thanks. [13:43] *** sankin has joined #archiveteam [13:47] *** Ctrl-S has joined #archiveteam [13:48] *** curious_g has quit IRC (Quit: Page closed) [13:51] *** sankin has quit IRC (Remote host closed the connection) [13:56] *** sankin has joined #archiveteam [14:52] *** schbirid has quit IRC (Leaving) [15:08] *** primus104 has joined #archiveteam [15:14] *** kyan has quit IRC (Quit: This computer has gone to sleep) [15:21] *** Smiley has quit IRC (Remote host closed the connection) [15:22] *** nertzy has joined #archiveteam [15:23] *** StartAway has quit IRC (Read error: Operation timed out) [15:28] *** Smiley has joined #archiveteam [15:33] *** Emcy has joined #archiveteam [15:35] *** Smiley has quit IRC (Remote host closed the connection) [15:36] *** kyan_ has joined #archiveteam [15:39] *** matthusb_ has quit IRC (ircd.arcti.ca irc.arcti.ca) [15:39] *** matthusb- has joined #archiveteam [15:41] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [15:47] http://newsoffice.mit.edu/2014/lewin-courses-removed-1208 [15:47] https://archive.org/details/MITclassical_mech [15:49] *** Smiley has joined #archiveteam [15:53] *** aaaaaaaaa has joined #archiveteam [16:03] *** kyan_ has quit IRC (Quit: This computer has gone to sleep) [16:05] *** Start has joined #archiveteam [16:09] *** Smiley has quit IRC (Remote host closed the connection) [16:11] *** Smiley has joined #archiveteam [16:22] *** Smiley has quit IRC (Remote host closed the connection) [16:32] *** kyan_ has joined #archiveteam [16:42] *** nertzy has joined #archiveteam [16:53] *** MMovie has joined #archiveteam [16:53] *** Start has quit IRC (Read error: Connection reset by peer) [16:54] *** MMovie1 has quit IRC (Read error: Operation timed out) [16:57] *** Start has joined #archiveteam [16:57] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [17:21] *** MMovie has quit IRC (Ping timeout: 335 seconds) [17:47] SketchCow: I'm currently downloading all metro newspapers, they go back a few years [17:47] Metro is a free newspaper, made in multiple countries [17:47] Good news - FOS has two drives. One has hit the lowest it can possibly go considering its uses - 2% usage. [17:47] All the nightmare megawarc projects are done, all the backlogged (on that drive) uploads are done [17:48] do you want me to upload stuff to opensource first, or directly to some collection? [17:48] Opensource [17:48] Congrats on FOS! :) [17:48] I can make the collection [17:48] Later [17:48] SketchCow: ok, I'll do that [17:48] I'm actually going through opensource with my recognizer/classifier. [17:48] First - a TON of Arabic and Urdu [17:48] So that's going on, I'm making subcollections for those. [17:48] Then it's some crazy stuff, those are harder. [17:49] Going through 300.000 unknown files sounds painful and awesome. [17:49] There are probably some great treasures between those [17:51] *** Start has quit IRC (Read error: Operation timed out) [17:54] *** rejon_ has quit IRC (Remote host closed the connection) [18:01] *** GLaDOS has quit IRC (Ping timeout: 272 seconds) [18:01] *** GLaDOS has joined #archiveteam [18:01] *** swebb sets mode: +o GLaDOS [18:05] *** kyan_ is now known as kyan [18:09] *** MMovie has joined #archiveteam [18:15] *** Jogie has quit IRC (Ping timeout: 247 seconds) [18:15] *** Jogie has joined #archiveteam [18:16] *** mistym has joined #archiveteam [18:16] *** Insomnia_ has quit IRC (Ping timeout: 247 seconds) [18:16] *** Insomnia_ has joined #archiveteam [18:17] *** norbert79 has quit IRC (Read error: Operation timed out) [18:17] *** norbert79 has joined #archiveteam [18:17] *** okeuday has quit IRC (Ping timeout: 246 seconds) [18:17] *** slash` has quit IRC (Read error: Operation timed out) [18:17] *** ryan__ has quit IRC (Ping timeout: 246 seconds) [18:17] *** ryan__ has joined #archiveteam [18:18] *** pft has quit IRC (Read error: Operation timed out) [18:18] *** okeuday has joined #archiveteam [18:19] *** dcmorton has quit IRC (Ping timeout: 616 seconds) [18:19] *** Zebranky_ has joined #archiveteam [18:20] *** lukeman has quit IRC (Ping timeout: 615 seconds) [18:20] *** lukeman has joined #archiveteam [18:20] *** sep332 has quit IRC (Read error: Operation timed out) [18:20] *** slash` has joined #archiveteam [18:21] *** sep332 has joined #archiveteam [18:23] *** dcmorton has joined #archiveteam [18:24] *** Zebranky has quit IRC (Ping timeout: 615 seconds) [18:27] *** pft has joined #archiveteam [18:32] *** Smiley has joined #archiveteam [18:34] *** nertzy has joined #archiveteam [18:44] *** galanviz has joined #archiveteam [18:44] *** galanviz has quit IRC (Client Quit) [18:45] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [18:53] *** RynO has joined #archiveteam [18:54] Any up-to-date, full archives of TPB which was just taken down again? [18:54] (preferably with comments and other info) [19:01] taken down? don't they have some imba unbreakable cloud infra? lol [19:05] https://proxybay.info/ [19:05] is anyone on the admin team providing dumps? [19:06] http://torrentfreak.com/swedish-police-raid-the-pirate-bay-site-offline-141209/ [19:11] again? [19:27] Found this.. latest one I could find [19:27] http://webcache.googleusercontent.com/search?q=cache:E3Sx6VAEWsAJ:thepiratebay.se/torrent/8163015&hl=en&gl=us&strip=1 [19:27] with Google's help, lol [19:29] are there any later dumps of the site? [19:29] newer than the Feb 2013 version [19:29] ? [19:34] *** APerti has joined #archiveteam [19:58] RynO: I've looked pretty extensively, only found 2013/02 [19:58] ;_; [20:07] *** mnx_ has joined #archiveteam [20:07] hello [20:07] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [20:08] yahoosucks [20:08] Thank you! [20:08] np [20:09] How are the TPB archives? they seem to be down, and although it is likely they will be back, the archives linked on the wiki seem to be way out of date [20:11] *** mistym has quit IRC (Leaving...) [20:11] *** schbirid has joined #archiveteam [20:15] *** thechip has joined #archiveteam [20:16] *** mnx_ has quit IRC (Quit: Page closed) [20:17] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [20:18] I like how the best way to gain attention here is to take down a torrent site [20:20] *** ruukasu has joined #archiveteam [20:20] things getting taken down is what this channel is about [20:21] I'm just saying that it's particularly funny when torrents are involved [20:21] DHT and all the redundancy stuff? [20:21] much more on the social side [20:30] *** APerti_ has joined #archiveteam [20:32] *** APerti has quit IRC (Ping timeout: 504 seconds) [20:44] *** Start has joined #archiveteam [20:46] *** ruukasuu has joined #archiveteam [20:49] *** ruukasuu has quit IRC (Client Quit) [20:50] *** ruukasu has quit IRC (Ping timeout: 265 seconds) [20:54] *** mistym has joined #archiveteam [20:58] *** assface13 has joined #archiveteam [21:00] *** assface13 has quit IRC (Client Quit) [21:11] *** ruukasu has joined #archiveteam [21:13] *** www2 has joined #archiveteam [21:26] *** Start_ has joined #archiveteam [21:26] *** Start has quit IRC (Read error: Connection reset by peer) [21:27] *** Insomnia1 has joined #archiveteam [21:30] *** Sellyme_ has joined #archiveteam [21:30] *** ex-parro1 has joined #archiveteam [21:31] *** Kazzy_ has joined #archiveteam [21:31] *** Insomnia_ has quit IRC (hub.se efnet.port80.se) [21:31] *** GLaDOS has quit IRC (hub.se efnet.port80.se) [21:31] *** philpem has quit IRC (hub.se efnet.port80.se) [21:31] *** Sellyme has quit IRC (hub.se efnet.port80.se) [21:31] *** ex-parrot has quit IRC (hub.se efnet.port80.se) [21:31] *** Shank__ has quit IRC (hub.se efnet.port80.se) [21:31] *** fresco___ has quit IRC (hub.se efnet.port80.se) [21:31] *** hive-mind has quit IRC (hub.se efnet.port80.se) [21:31] *** RainbowCo has quit IRC (hub.se efnet.port80.se) [21:31] *** lysobit has quit IRC (hub.se efnet.port80.se) [21:31] *** Kazzy has quit IRC (hub.se efnet.port80.se) [21:31] *** deathy has quit IRC (hub.se efnet.port80.se) [21:31] *** VonCloud_ has quit IRC (hub.se efnet.port80.se) [21:31] *** lhobas has quit IRC (hub.se efnet.port80.se) [21:31] *** Nemo_bis has quit IRC (hub.se efnet.port80.se) [21:31] *** Muad-Dib has quit IRC (hub.se efnet.port80.se) [21:31] *** parsons has quit IRC (hub.se efnet.port80.se) [21:31] *** fluff has quit IRC (hub.se efnet.port80.se) [21:31] *** Riviera has quit IRC (hub.se efnet.port80.se) [21:31] *** musalbas has joined #archiveteam [21:33] *** beardicus has quit IRC (Ping timeout: 480 seconds) [21:38] *** n00bl337 has joined #archiveteam [21:38] Hello [21:38] tpb is down [21:39] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [21:40] yahoosucks [21:40] what is your quest [21:45] wanted to know if the perl script can backup from web.archive.org [21:46] http://archiveteam.org/index.php?title=The_Pirate_Bay [21:46] *** Start_ is now known as Start [21:47] *** Kazzy_ is now known as Kazzy [21:47] *** musalbas is now known as lysobit [21:52] *** GLaDOS has joined #archiveteam [21:52] *** swebb sets mode: +o GLaDOS [21:53] *** fluff has joined #archiveteam [21:54] *** Nemo_bis has joined #archiveteam [21:55] *** sankin has quit IRC (Leaving.) [21:56] *** hive-mind has joined #archiveteam [22:13] *** deathy___ has joined #archiveteam [22:26] *** Start is now known as StartAway [22:27] *** n00bl337 has quit IRC (Quit: Page closed) [22:31] *** StartAway has quit IRC (Ping timeout: 265 seconds) [22:37] i just noticed http://www.icq.com/people/1 (increment as you like) [22:37] might be a fun idea to archive in a calm week [22:52] just tested an item https://archive.org/details/20131231_br_brasilia [22:52] actual name is brasília [22:52] but it shows up as Bras?lia [22:52] so I guess I'll be using brasilia [22:53] what encoding did you use? [22:55] not sure, I just used í in ias3upload's csv [22:55] so that'd be [22:56] *** s3r3wd has joined #archiveteam [22:57] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [22:57] yahoosucks [22:57] schbirid: utf8 should be able to do í right? [22:58] s3r3wd: yahoosucks [22:58] s3r3wd: what is your quest [23:00] Just wanted to update a little the wiki. just did. Added some info on the last known the piratebay archive. [23:00] which has everything till 2013-01-09. I'm currently seeding that one, along with two other guys. [23:01] cool [23:02] arkiver: i'd hope so, no idea [23:02] schbirid, arkiver: utf8 should be the standard, and yes it has that character [23:03] also you pasted it in here in utf8 so. [23:03] yeah [23:03] IA is not fully utf8 yet but I'm sure they're going to improve that later [23:05] *** schbirid has quit IRC (Leaving) [23:09] *** Riviera has joined #archiveteam [23:11] *** lysobit has quit IRC (Quit: quit) [23:15] *** lhobas has joined #archiveteam [23:16] *** s3r3wd has quit IRC (Quit: leaving) [23:16] it should work if you edit via the web interface [23:16] using s3 tools is pretty hit or miss [23:18] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [23:18] *** lysobit has joined #archiveteam [23:19] I've put in all kinds of stuff like https://archive.org/details/yomigaeru-pc-9801-densetsu [23:20] xmc: looks like it's a problem with the script I write the csv file with [23:21] will fix that tomorrow [23:22] ok [23:25] *** aaaaaaaaa has quit IRC (Leaving) [23:27] *** ruukasu has joined #archiveteam [23:32] *** aaaaaaaaa has joined #archiveteam [23:32] *** Start has joined #archiveteam [23:54] *** GLaDOS has quit IRC (Ping timeout: 272 seconds) [23:55] *** GLaDOS has joined #archiveteam [23:55] *** swebb sets mode: +o GLaDOS