[00:23] *** nertzy2 has joined #archiveteam [00:40] *** SimpBrain has quit IRC (Read error: Operation timed out) [00:42] *** nertzy2 has quit IRC (Quit: This computer has gone to sleep) [00:48] *** mismatch_ has joined #archiveteam [01:02] *** logchfoo1 starts logging #archiveteam at Fri Jan 22 01:02:44 2016 [01:02] *** logchfoo1 has joined #archiveteam [01:15] *** Ghost_of_ has quit IRC (Read error: Operation timed out) [01:20] *** JesseW has joined #archiveteam [01:23] *** SimpBrain has joined #archiveteam [01:53] *** JesseW has quit IRC (Leaving.) [02:13] *** jspiros has quit IRC (leaving) [02:13] *** jspiros has joined #archiveteam [02:30] *** megaminxw has quit IRC (Quit: Leaving.) [02:30] *** JesseW has joined #archiveteam [02:48] *** Froggypwn has joined #archiveteam [03:11] *** Zebranky_ is now known as Zebranky [03:45] *** kyan has joined #archiveteam [03:58] *** W1nterFox has joined #archiveteam [04:03] *** WinterFox has quit IRC (Read error: Operation timed out) [04:06] *** W1nterFox has quit IRC (Read error: Operation timed out) [04:38] *** brayden has quit IRC (Quit: Leaving) [04:39] *** brayden has joined #archiveteam [04:52] request for assistance: could someone please help me backup the gitorious disk image? it's just shy of 5T and i would like to have more than one copy [04:53] looking for serious multiyear commitments [04:53] ultimately i will figure out how to shove it into IA but it's a bit large for one item [04:53] or something [04:56] *** megaminxw has joined #archiveteam [04:57] *** WinterFox has joined #archiveteam [05:00] *** fie has quit IRC (Read error: Operation timed out) [05:10] I am glad to physically hold on to a copy, but: 1) I'm also in Seattle, so that helps less with geographic separation ; 2) While I can likely afford to buy 5TBs worth of hard drives, I haven't done so yet. [05:10] xmc: [05:11] hi [05:11] the image is physically on a ceph cluster in san jose, not in my house :P [05:11] ah, well then me storing one in Seattle might be more useful. :-) [05:12] yea [05:12] and it should be easier to get it to IA (because you're going to want to use sneakernet) [05:13] mmmmaybe [05:13] why ever not? [05:14] because that would require traveling and i don't have a place to stay down there and i don't really want to visit the bay area? [05:15] Just ask the IA folks to drop by the data center with 5 1T drives, plug them in, then pick them back up. [05:15] when they are full [05:15] * xmc shrug [05:20] * xmc email info@ [05:30] *** RichardG has quit IRC (Read error: Connection reset by peer) [05:33] Is there a simple way to split it? [05:33] (He asked) [05:36] well i could split it by username, each username has somewhere between two and many repositories [05:37] an item by user would make some sense, though clones are stored in the directory of the account it was cloned *from* [05:37] 5TB drives aren't that expensive these days; when I was bulk-grabbing videos from Blip they were $150 each [05:39] Also, it looks like oldfriends.co.nz is fully shut down. Should the primary domain be blocked from the ArchiveBot job? [05:39] They have a secondary images domain which is still returning data [05:39] At least for now [05:50] Ha -- I now know of *68* identifiers that IA will show records (to logged in users) of having (shock, horror) "deleted". ;-P [05:56] JesseW, wut? [05:56] I assume deleted meaning darked or something? [05:57] (My understanding from talking to IA was that nothing is ever deleted) [05:57] hehehehehehehehehehehehehehehehe [05:58] https://archive.org/history/20_minutes_of_massachusetts [05:59] JesseW, That's fairly disturbing but all that was there was the meta.xml, the reviews.xml, and an empty dir, apparently [05:59] I'll have to find where i was told nothing's deleted [06:00] Frankly I wonder why anyone would go to the trouble of deleting an empty identifier [06:00] to save 10k of disk space or sthg? [06:01] * JesseW shrug -- IDK. The ones I've looked at have all been in 2007, so maybe things were different then. [06:01] Dec 05 14:51:40 We don't even delete SPAM [06:01] Dec 05 14:51:59 Nothing leaves the archive, not a bit [06:01] JesseW, Ah, hmm [06:03] You're all the most fucking adorable things [06:03] * JesseW bows [06:03] http://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2014-12-05,Fri&sel=228#l224 [06:05] * JesseW is mostly amused by the levels of recordkeeping -- even when something is *removed*, it still shows up in whatever jake used to generate the census list. [06:13] http://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2014-12-05,Fri&sel=238#l234 -- hm, I wonder how they are now. [06:14] https://archive.org/details/opensource_media <- 204,816 [06:14] https://archive.org/details/opensource_movies <- 547,723 [06:15] https://archive.org/details/opensource_religionvideo <- 104,421 [06:15] https://archive.org/details/opensource <- 426,941 [06:16] https://archive.org/details/opensource_audio <- 2,059,999 (!!) [06:17] https://archive.org/details/open_source_software <- 11,368 [06:18] I think that's all of them... [06:21] good morning [06:22] Atluxity: morning [06:25] well, of the 68 I found, all but 3 were deleted in 2006 or 2007. The other 3 were deleted by the archive.org staffer who uploaded them, presumably as a test. [06:25] Mostly just an interesting curio. [06:41] *** RichardG has joined #archiveteam [07:00] what's more concerning are the 48 items that were retrievable in the last census (in March 2015) but now are gone without even any records in archive.org/history/ [07:01] including one that (randomly) was the first in the itemlist, https://archive.org/history/Urdu-Trana-001 [07:03] That's weird. [07:03] according to the census, it contained 10 mp3s of what was presumably Islamic speeches (from the filenames) and was in the iraq_middleeast, iraq_war and newsandpublicaffairs collections. [07:03] Could it have been renamed? [07:04] Hm, let me look in those collections. [07:05] There are over 36k items there https://archive.org/search.php?query=collection%3A%22newsandpublicaffairs%22%20collection%3A%22iraq_war%22%20collection%3A%22iraq_middleeast%22 [07:05] Yeah, the other two are also too large to look through manually. [07:05] Hm, the name of the _meta file doesn't match the identifier. Let me look in that. [07:06] yep, there it is: https://archive.org/metadata/AansoonAurAhoon-MP3 [07:07] Ah, yay. The files are the same? Assuming they are we should probably upload a placeholder item to the other identifier to aid in locating it. Also, is that identifier also listed in the census? [07:07] how did it get under that other identifier, I wonder? [07:08] yep, the other identifier is in the census [07:08] " k e y = > 3 4 3 9 5 0 6 - 1 8 0 4 6p r e v t a s k = > 3 8 2 5 0 5 6 0 2d i r = > / 3 0 / i t e m s / A a n s o o n A u r A h o o n - M P 3c o m m e n t = > R u n n i n g n o o p ' s t o u p d a t e c o l l e c t i o n s t r i n g i n m e t a d a t a t a b l e t o m a t c h c o l l e c t i o n s i n i t e m s m e t a . x m ln o o p = > 1key=3439506-18046&noop=1&.. " [07:08] *** vitzli has joined #archiveteam [07:09] Heh, I was just going to post that. :-) [07:09] fixer.php submitted by jake@archive.org (who IIRC did the census?) around a year ago https://catalogd.archive.org/log/382521508 [07:09] :P [07:10] yep, I've found a few other fixes jake did after running the census. :-) [07:10] and I've sent a few more into info@ which have been done now. [07:11] (FWIW, https://archive.org/details/AansoonAurAhoon-MP3 seems to be music, rather than speeches) [07:12] Hah this one is cool https://ia802304.us.archive.org/30/items/AansoonAurAhoon-MP3/ek-sitara-tha-main.mp3 [07:13] if you google for the song title, looks like it's associated with some graphic videos [07:14] pics are good too [07:16] English song title is "I was a Star", it's in Hindi apparently [07:16] according to Google Translate [07:24] wish i knew what the lyrics were. All too many references to "jihad" in the google search results for the title for my taste [07:24] i like the music tho [07:27] I just realized that JIHAD is an acronym for "Jesus, I'm Having A Dump" [07:27] sorry that was like lightyears off topic [07:27] *how* exactly did you "just realize" that? :-) [07:28] I got tired of not knowing what I'm doing and so I switched to the IRC client for a little while and it just happened [07:31] well, you're welcome. :-) [07:32] it may happen more often as I continue to realize that everything I knew about the GPU is wrong [08:16] *** JesseW has quit IRC (Leaving.) [08:31] *** atomotic has joined #archiveteam [08:33] *** redlob has quit IRC (Quit: ZNC - http://znc.in) [08:36] *** redlob has joined #archiveteam [08:51] *** vitzli has quit IRC (Leaving) [09:15] *** MrRadar has quit IRC (Read error: Operation timed out) [09:18] *** MrRadar has joined #archiveteam [09:31] SketchCow: oldfriends has closed. Our grab was a succes! [09:31] There's some older files you can delete from FOS rom oldfriends [09:32] Or instead of that pack them up in a non-WARC archive and upload them to IA, so we have them anyway [09:32] * kyan likes the second option better [09:37] delete something?! thats not how we do it [09:38] SketchCow: looks like some items didn't get the metadata update: https://archive.org/details/archiveteam_newssites_20160120_0021 [09:47] arkiver: the bot has crashed, and student WiFi here blocks SSH [10:20] *** dashcloud has quit IRC (Read error: Operation timed out) [10:24] Is catalogd down? [10:26] *** dashcloud has joined #archiveteam [11:18] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [12:45] *** K4k_ has joined #archiveteam [12:47] *** atomotic has joined #archiveteam [12:49] *** VADemon has joined #archiveteam [13:00] *** Ghost_of_ has joined #archiveteam [13:14] *** K4k_ has quit IRC (Read error: Operation timed out) [13:56] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [13:57] *** atomotic has joined #archiveteam [14:02] *** K4k_ has joined #archiveteam [14:02] *** K4k_ has quit IRC (Remote host closed the connection!) [14:02] *** K4k_ has joined #archiveteam [14:15] *** nertzy2 has joined #archiveteam [14:24] *** dashcloud has quit IRC (Read error: Operation timed out) [14:25] *** Ghost_of_ has quit IRC (Quit: Leaving) [14:27] *** dashcloud has joined #archiveteam [14:44] *** WinterFox has quit IRC (Remote host closed the connection) [14:49] Can we requeue some of the items for gamefront? [14:50] *** nertzy2 has quit IRC (Quit: This computer has gone to sleep) [14:50] There's 60k items out right now [14:53] *** nertzy2 has joined #archiveteam [14:55] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [14:56] *** dashcloud has quit IRC (Read error: Operation timed out) [15:03] *** nertzy2 has quit IRC (Quit: This computer has gone to sleep) [15:04] *** dashcloud has joined #archiveteam [15:20] *** K4k_ has quit IRC (Ping timeout: 260 seconds) [15:31] *** nertzy2 has joined #archiveteam [15:41] *** nertzy2 has quit IRC (Quit: This computer has gone to sleep) [15:59] *** K4k_ has joined #archiveteam [16:03] *** Lord_Nigh sets mode: +o balrog [16:10] *** godane has quit IRC (Read error: Operation timed out) [16:13] *** megaminxw has quit IRC (Quit: Leaving.) [16:31] *** Ghost_of_ has joined #archiveteam [16:51] *** BlueMaxim has joined #archiveteam [17:03] *** K4k__ has joined #archiveteam [17:05] *** K4k_ has quit IRC (Ping timeout: 252 seconds) [17:15] *** JesseW has joined #archiveteam [17:18] *** kristian_ has joined #archiveteam [17:22] *** schbirid has joined #archiveteam [17:27] *** JesseW has quit IRC (Leaving.) [17:34] *** z00nx has quit IRC (Ping timeout: 252 seconds) [17:34] *** z00nx has joined #archiveteam [17:36] *** rizzzz has quit IRC (Read error: Operation timed out) [17:40] *** rizzzz has joined #archiveteam [17:43] *** Atom__ has joined #archiveteam [17:46] *** Atom-- has quit IRC (Ping timeout: 252 seconds) [17:56] *** atomotic has joined #archiveteam [18:03] *** dashcloud has quit IRC (Read error: Operation timed out) [18:06] *** dashcloud has joined #archiveteam [18:31] *** Emcy has quit IRC (Ping timeout: 250 seconds) [19:03] *** K4k has joined #archiveteam [19:08] *** K4k__ has quit IRC (Read error: Operation timed out) [19:15] *** dashcloud has quit IRC (Read error: Operation timed out) [19:20] *** dashcloud has joined #archiveteam [19:21] *** scyther has joined #archiveteam [19:35] *** aliz has quit IRC (Ping timeout: 260 seconds) [19:41] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [19:54] *** atomotic has joined #archiveteam [19:54] *** atomotic has quit IRC (Client Quit) [19:58] *** dashcloud has quit IRC (Read error: Operation timed out) [20:02] *** dashcloud has joined #archiveteam [20:16] *** Ghost_of_ has quit IRC (Quit: Leaving) [20:16] *** kristian_ has quit IRC (Quit: Leaving) [20:38] *** JesseW has joined #archiveteam [20:41] *** JesseW has quit IRC (Client Quit) [20:47] Great news on Google Code! [20:47] We can keep the grab running after the shutdown on the 25th [20:53] Awesome! [21:17] *** godane has joined #archiveteam [21:20] *** K4k has quit IRC (Ping timeout: 252 seconds) [21:41] *** Ghost_of_ has joined #archiveteam [21:44] *** dashcloud has quit IRC (Read error: Operation timed out) [21:45] *** dashcloud has joined #archiveteam [21:55] *** scyther has quit IRC (Quit: Leaving) [22:05] holy COW those gamefront items are getting BIG [22:16] *** K4k has joined #archiveteam [22:22] *** JetBalsa has quit IRC (Read error: Connection reset by peer) [22:23] *** K4k has quit IRC (Read error: Operation timed out) [22:25] *** dashcloud has quit IRC (Read error: Operation timed out) [22:26] if I was banned from gamefront, would I be getting any 200 OK at all? [22:29] *** dashcloud has joined #archiveteam [22:35] *** Start has quit IRC (Read error: Connection reset by peer) [22:35] *** Start has joined #archiveteam [22:38] *** JesseW has joined #archiveteam [23:08] *** WinterFox has joined #archiveteam [23:18] *** K4k has joined #archiveteam [23:23] *** K4k has quit IRC (Ping timeout: 260 seconds) [23:26] *** dashcloud has quit IRC (Read error: Operation timed out) [23:30] *** dashcloud has joined #archiveteam [23:41] *** nertzy2 has joined #archiveteam [23:45] *** JesseW has quit IRC (Leaving.) [23:49] xmc: Archive came to me asking what to do about the guy with gitorious [23:49] hahaha [23:49] ok [23:49] So really, it's all about me. I'm Rome and everything leads to me [23:49] so i'll make something up and then do it [23:50] If you could split it into 5 pieces, that would be good. [23:50] Even if it kind of sucks [23:50] i could, but it'd be weird [23:52] i could also split it into 40,000 pieces, one per username [23:52] eh, i can do it alphabetically or something [23:52] ok. [23:58] Work through it. [23:58] But we'll take it.