[00:00] *** BlueMaxim has joined #archiveteam [00:09] should have something up tonight [00:14] *** mistym has quit IRC (Remote host closed the connection) [00:20] *** dashcloud has quit IRC (Read error: Operation timed out) [00:27] *** dashcloud has joined #archiveteam [00:29] *** xk_id has quit IRC (Remote host closed the connection) [00:33] *** dashcloud has quit IRC (Read error: Operation timed out) [00:33] *** mistym has joined #archiveteam [00:39] *** dashcloud has joined #archiveteam [00:47] *** xk_id has joined #archiveteam [00:55] *** xk_id has quit IRC (Remote host closed the connection) [01:00] *** mistym has quit IRC (Remote host closed the connection) [01:21] *** JesseW has joined #archiveteam [01:26] *** expr_ has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [02:01] *** mistym has joined #archiveteam [02:26] *** mistym has quit IRC (Read error: Operation timed out) [02:52] *** JesseW has quit IRC (Quit: Leaving.) [03:04] *** JesseW has joined #archiveteam [03:10] *** JesseW has quit IRC (Quit: Leaving.) [03:12] *** mistym has joined #archiveteam [03:13] *** JesseW has joined #archiveteam [03:14] *** mistym has quit IRC (Read error: Operation timed out) [03:19] *** JesseW has quit IRC (Quit: Leaving.) [03:25] *** xk_id has joined #archiveteam [03:28] *** dashcloud has quit IRC (Read error: Operation timed out) [03:34] *** Ravenloft has quit IRC (Remote host closed the connection) [03:35] *** dashcloud has joined #archiveteam [03:46] *** maz_ has quit IRC (Read error: Operation timed out) [03:48] http://www.engadget.com/2015/08/06/apples-website-store-death/ [03:49] apple killed store.apple.com [04:06] *** Gfy has quit IRC (ircd.choopa.net irc2.choopa.net) [04:06] *** Gfy_ has joined #archiveteam [04:11] *** xk_id has quit IRC (Ping timeout: 483 seconds) [04:14] *** mistym has joined #archiveteam [04:15] *** dashcloud has quit IRC (Read error: Operation timed out) [04:21] *** Gfy_ is now known as Gfy [04:23] *** mistym has quit IRC (Ping timeout: 483 seconds) [04:26] *** dashcloud has joined #archiveteam [04:33] *** aaaaaaaaa has quit IRC (Leaving) [04:40] *** JesseW has joined #archiveteam [04:40] *** JesseW has quit IRC (Client Quit) [04:43] *** mistym has joined #archiveteam [05:04] *** mistym has quit IRC (Ping timeout: 840 seconds) [05:06] *** mistym has joined #archiveteam [05:17] *** BlueMaxim has quit IRC (Quit: Leaving) [05:19] *** JesseW has joined #archiveteam [05:29] *** habi has joined #archiveteam [05:32] *** habi has quit IRC (Read error: Operation timed out) [05:32] *** mistym has quit IRC (Ping timeout: 483 seconds) [05:32] *** mistym has joined #archiveteam [05:56] *** habi has joined #archiveteam [05:58] *** BlueMaxim has joined #archiveteam [06:05] *** xk_id has joined #archiveteam [06:16] *** habi has left [06:41] *** xk_id has quit IRC (Ping timeout: 606 seconds) [06:45] *** mistym has quit IRC (Remote host closed the connection) [06:50] *** khaoohs_ has joined #archiveteam [06:52] *** khaoohs has quit IRC (Read error: Operation timed out) [06:52] *** atlogbot has quit IRC (Ping timeout: 369 seconds) [06:53] *** aschmitz has quit IRC (Read error: Operation timed out) [06:53] *** thefinn93 has quit IRC (Ping timeout: 255 seconds) [06:54] *** thefinn93 has joined #archiveteam [06:55] *** dxrt has quit IRC (Ping timeout: 369 seconds) [06:56] *** swebb has quit IRC (Excess Flood) [06:56] *** vOYtEC has quit IRC (Ping timeout: 369 seconds) [06:57] *** JesseW has quit IRC (Quit: Leaving.) [06:58] *** dserodio has quit IRC (Read error: Operation timed out) [06:58] *** dxrt has joined #archiveteam [06:59] *** wp494 has quit IRC (Read error: Operation timed out) [06:59] *** Laverne has quit IRC (Ping timeout: 369 seconds) [06:59] *** chazchaz has quit IRC (Ping timeout: 369 seconds) [07:00] *** achip has quit IRC (Read error: Operation timed out) [07:00] *** no2penci1 has joined #archiveteam [07:00] *** wp494 has joined #archiveteam [07:01] *** achip has joined #archiveteam [07:02] *** dcmorton has quit IRC (Excess Flood) [07:02] *** dcmorton has joined #archiveteam [07:03] *** dserodio has joined #archiveteam [07:03] *** aschmitz has joined #archiveteam [07:03] *** atlogbot has joined #archiveteam [07:03] *** Laverne has joined #archiveteam [07:03] *** vOYtEC has joined #archiveteam [07:04] *** chazchaz has joined #archiveteam [07:05] *** swebb has joined #archiveteam [07:07] *** thefinn93 has quit IRC (Ping timeout: 186 seconds) [07:07] *** no2pencil has quit IRC (Read error: Operation timed out) [07:08] *** thefinn93 has joined #archiveteam [07:09] *** xk_id has joined #archiveteam [07:34] *** schbirid has joined #archiveteam [07:46] *** mistym has joined #archiveteam [08:00] *** mistym has quit IRC (Read error: Operation timed out) [08:02] *** dashcloud has quit IRC (Read error: Operation timed out) [08:05] *** dashcloud has joined #archiveteam [08:19] *** PurpleSym has joined #archiveteam [08:44] *** xk_id has quit IRC (Remote host closed the connection) [08:44] *** dashcloud has quit IRC (Read error: Connection reset by peer) [08:44] *** dashcloud has joined #archiveteam [08:49] *** mistym has joined #archiveteam [08:54] *** mistym has quit IRC (Ping timeout: 252 seconds) [09:30] S[h]O[r]T: thanks! [09:30] Start: that sucks :/ [09:50] *** mistym has joined #archiveteam [09:54] *** PurpleSym has quit IRC (WeeChat 1.1.1) [09:57] *** mistym has quit IRC (Read error: Operation timed out) [10:06] *** ersi_ is now known as ersi [10:33] *** xk_id has joined #archiveteam [10:40] *** brayden has joined #archiveteam [10:55] *** superkuh has quit IRC (Read error: Operation timed out) [11:10] *** superkuh has joined #archiveteam [12:10] *** brayden has quit IRC (Quit: Leaving) [12:21] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [12:38] *** PurpleSym has joined #archiveteam [13:30] *** philpem has joined #archiveteam [13:43] *** dashcloud has quit IRC (Read error: Operation timed out) [13:48] *** dashcloud has joined #archiveteam [13:53] *** mistym has joined #archiveteam [13:58] *** mistym has quit IRC (Ping timeout: 252 seconds) [14:20] *** mistym has joined #archiveteam [14:22] *** signius has quit IRC (Ping timeout: 306 seconds) [14:34] *** signius has joined #archiveteam [14:43] *** mistym has quit IRC (Remote host closed the connection) [14:49] *** xk_id has quit IRC (Read error: Operation timed out) [15:01] *** mistym has joined #archiveteam [15:35] *** xk_id has joined #archiveteam [15:58] *** xk_id has quit IRC (Ping timeout: 186 seconds) [16:00] *** chfoo has quit IRC (Ping timeout: 258 seconds) [16:02] *** mistym has quit IRC (Remote host closed the connection) [16:05] *** JesseW has joined #archiveteam [16:07] *** JesseW has quit IRC (Client Quit) [16:13] *** chfoo has joined #archiveteam [16:18] *** JesseW has joined #archiveteam [16:24] *** JesseW has quit IRC (Leaving.) [16:43] *** mistym has joined #archiveteam [16:47] *** chfoo has quit IRC (Ping timeout: 260 seconds) [16:59] *** chfoo has joined #archiveteam [17:05] *** kukutz has joined #archiveteam [17:12] hi. [17:12] guys, I have a question. [17:12] we (Yandex - largest search engine in Russia) have unique archive of data and want it to be preserved for the mankind [17:12] this data is archive from our blog search service, which is now gradually shutting down [17:13] the service worked for 10 years and indexed russian blogosphere, but main data source was Livejournal.com [17:13] a few words about the importance. In the period from 2001 to 2010, Livejournal was the biggest social network in Runet. A huge number of writers, journalists, politicians, thinkers and other great people wrote and commented in Livejournal. The influence of these texts on the culture of Russia cannot be overstated [17:14] now many blogs either have been removed from Livejournal or the accounts have been sold with all the content deleted, so our archive contains unique information that cannot be downloaded from the Livejournal [17:14] we have about 5-10 terabytes of data ? the posts and comments from Livejournal with meta-information: author, date, reply-to and many other fields (we indexed Livejournal via RSS, PubSubHubBub, and other machine-readable formats and protocols) [17:14] i wrote this to info@archive.org, but there's no answer [17:15] do you have any ideas how this archive can be preserved online? [17:19] rest assured that archiveteam will be absolutely interested [17:19] this is fantastic [17:20] not my kind of stuff though, so i have no idea [17:20] if you need to get rid of it, we could provide space to dump it to [17:20] no, space is not a problem [17:20] for a nice archival, surely someone will come and say "let me do this" [17:21] SketchCow = jason, but on holiday i think [17:27] kukutz, you may also want to email jason@textfiles.com and/or stick around here. [17:27] kukutz: in what format if the data? [17:28] is* [17:28] We and I'm sure Internet Archive are definitely interested in preserving this data. [17:30] What format is the data in and how much data is it in terms of bytes? [17:31] "we have about 5-10 terabytes of data" [17:35] Now it is in some our internal machine-readable format, but we can export it in any common format I think [17:39] *** SimpBrain has joined #archiveteam [17:40] *** aaaaaaaaa has joined #archiveteam [17:40] So if I read it correctly the data (posts, comments, meta-information) was extracted from RSS, PubSubHubHUb, html(?), so not saved in the formats it was extracted from [17:44] Are the request and response headers saved for this data? [17:45] If those are saved and data is available in the format is was indexed from we might be able to convert this data to WARCs. [17:45] WARCs are playable by the Wayback Machine, which means users would be able to easily browse what has been saved by Yandex. [17:46] kukutz: if you or someone else can document that format, another possibility is to just export that data as-is [17:46] derivation at that point is deferraable [17:51] arkiver: nope, request and response headers was not saved, unfortunately [17:51] yidpw: I'll speak with our team about current format, it's good point, thank you [17:52] *** SimpBrain has quit IRC (Quit: Leaving) [17:54] I need to go home, it's late evening here in Russia. When I'll know more about current format, should I return here or wrote to jason@ ? It will be Monday I think [17:55] kukutz: it's probably okay to do both [17:56] great, I'll do both :) [17:56] kukutz: both is fine, jason will pick it up when he reads it and hanging around here is always good as the response is most likely faster through here [17:56] not fast enough again :p [17:56] brevity wins [17:57] rss would be a nice export format, if it preserves all the information that you currently have [17:57] xml: got it [17:57] xmc, sorry [17:58] no worries [17:59] xmc is like xml [17:59] if xmc cannot solve your problem, use more [17:59] unfortunately [18:08] *** SimpBrain has joined #archiveteam [18:17] hah [18:23] kukutz: have a good night! thank you for preserving this data :) [18:29] *** Wyatts has quit IRC (Remote host closed the connection) [18:30] *** Wyatts has joined #archiveteam [18:46] *** mistym_ has joined #archiveteam [18:55] *** oldcad has joined #archiveteam [18:55] *** mistym has quit IRC (Ping timeout: 606 seconds) [19:03] *** kukutz has quit IRC (Quit: This computer has gone to sleep) [19:14] Not on holiday. [19:14] Taking care of parent, now back. [19:38] *** tsp_ has joined #archiveteam [19:40] What will it take to archive www.kitchensinc.net up to archive.org? It redirects to the real site once it loads. Can the archivebot follow that? [19:42] *** dashcloud has quit IRC (Read error: Connection reset by peer) [19:42] *** dashcloud has joined #archiveteam [19:44] *** mistym_ has quit IRC (Remote host closed the connection) [19:48] *** HCross has quit IRC (Ping timeout: 252 seconds) [19:50] *** HCross has joined #archiveteam [20:09] *** SimpBrain has quit IRC (Quit: Leaving) [20:14] *** PurpleSym has quit IRC (WeeChat 1.1.1) [20:26] how big do you think that site is? [20:26] gigs? [20:27] uhm, no images? [20:27] ah, files [20:28] look good for archivebot [20:29] queued [20:29] finished [20:32] lol [20:33] it redirects to http://Kitchensinc.jgriffith.com [20:33] which i just queue [20:33] d [20:33] guess it did not take the jgriffith.com domains [20:33] tight [20:33] right [20:34] yeah it redirects with a meta-refresh so archivebot won't see it [20:35] Atluxity, tsp_: got it, 207 mbytes [20:43] *** mistym has joined #archiveteam [20:53] *** mistym has quit IRC (Remote host closed the connection) [20:59] *** mistym has joined #archiveteam [21:00] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [21:02] *** dashcloud has joined #archiveteam [21:14] *** schbirid has quit IRC (Leaving) [21:23] *** kukutz has joined #archiveteam [21:27] *** JesseW has joined #archiveteam [21:42] thanks, missed the backscroll. Apparently the guy just died [21:43] thats a shame :\ [21:44] *** JesseW has quit IRC (Quit: Leaving.) [21:44] thanks for bringing the site to our attention [22:20] *** pwnsrv has quit IRC (Ping timeout: 265 seconds) [22:22] *** kukutz2 has joined #archiveteam [22:25] *** Gfy has quit IRC (Ping timeout: 265 seconds) [22:25] *** kukutz has quit IRC (Ping timeout: 306 seconds) [22:33] *** BlueMaxim has joined #archiveteam [22:34] *** Gfy has joined #archiveteam [22:38] *** pwnsrv has joined #archiveteam [22:43] *** RedType_ has joined #archiveteam [22:49] *** kukutz__ has joined #archiveteam [22:51] *** pwnsrv has quit IRC (hub.se efnet.portlane.se) [22:51] *** oldcad has quit IRC (hub.se efnet.portlane.se) [22:51] *** RedType has quit IRC (hub.se efnet.portlane.se) [22:51] *** yakfish has quit IRC (hub.se efnet.portlane.se) [22:51] *** bauruine has quit IRC (hub.se efnet.portlane.se) [22:51] *** kukutz2 has quit IRC (Read error: Operation timed out) [23:03] *** Rotab has quit IRC (Ping timeout: 198 seconds) [23:12] *** Deewiant has quit IRC (hub.se irc.du.se) [23:12] *** Boppen has quit IRC (hub.se irc.du.se) [23:24] *** Boppen has joined #archiveteam [23:24] *** Rotab has joined #archiveteam [23:25] *** Deewiant has joined #archiveteam [23:33] *** mistym has quit IRC (Remote host closed the connection) [23:36] *** brayden has joined #archiveteam [23:42] *** wyatt8740 has quit IRC (Remote host closed the connection) [23:48] *** JesseW has joined #archiveteam