#archiveteam 2015-08-07,Fri

↑back Search

Time Nickname Message
00:00 🔗 BlueMaxim has joined #archiveteam
00:09 🔗 S[h]O[r]T should have something up tonight
00:14 🔗 mistym has quit IRC (Remote host closed the connection)
00:20 🔗 dashcloud has quit IRC (Read error: Operation timed out)
00:27 🔗 dashcloud has joined #archiveteam
00:29 🔗 xk_id has quit IRC (Remote host closed the connection)
00:33 🔗 dashcloud has quit IRC (Read error: Operation timed out)
00:33 🔗 mistym has joined #archiveteam
00:39 🔗 dashcloud has joined #archiveteam
00:47 🔗 xk_id has joined #archiveteam
00:55 🔗 xk_id has quit IRC (Remote host closed the connection)
01:00 🔗 mistym has quit IRC (Remote host closed the connection)
01:21 🔗 JesseW has joined #archiveteam
01:26 🔗 expr_ has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…)
02:01 🔗 mistym has joined #archiveteam
02:26 🔗 mistym has quit IRC (Read error: Operation timed out)
02:52 🔗 JesseW has quit IRC (Quit: Leaving.)
03:04 🔗 JesseW has joined #archiveteam
03:10 🔗 JesseW has quit IRC (Quit: Leaving.)
03:12 🔗 mistym has joined #archiveteam
03:13 🔗 JesseW has joined #archiveteam
03:14 🔗 mistym has quit IRC (Read error: Operation timed out)
03:19 🔗 JesseW has quit IRC (Quit: Leaving.)
03:25 🔗 xk_id has joined #archiveteam
03:28 🔗 dashcloud has quit IRC (Read error: Operation timed out)
03:34 🔗 Ravenloft has quit IRC (Remote host closed the connection)
03:35 🔗 dashcloud has joined #archiveteam
03:46 🔗 maz_ has quit IRC (Read error: Operation timed out)
03:48 🔗 Start http://www.engadget.com/2015/08/06/apples-website-store-death/
03:49 🔗 Start apple killed store.apple.com
04:06 🔗 Gfy has quit IRC (ircd.choopa.net irc2.choopa.net)
04:06 🔗 Gfy_ has joined #archiveteam
04:11 🔗 xk_id has quit IRC (Ping timeout: 483 seconds)
04:14 🔗 mistym has joined #archiveteam
04:15 🔗 dashcloud has quit IRC (Read error: Operation timed out)
04:21 🔗 Gfy_ is now known as Gfy
04:23 🔗 mistym has quit IRC (Ping timeout: 483 seconds)
04:26 🔗 dashcloud has joined #archiveteam
04:33 🔗 aaaaaaaaa has quit IRC (Leaving)
04:40 🔗 JesseW has joined #archiveteam
04:40 🔗 JesseW has quit IRC (Client Quit)
04:43 🔗 mistym has joined #archiveteam
05:04 🔗 mistym has quit IRC (Ping timeout: 840 seconds)
05:06 🔗 mistym has joined #archiveteam
05:17 🔗 BlueMaxim has quit IRC (Quit: Leaving)
05:19 🔗 JesseW has joined #archiveteam
05:29 🔗 habi has joined #archiveteam
05:32 🔗 habi has quit IRC (Read error: Operation timed out)
05:32 🔗 mistym has quit IRC (Ping timeout: 483 seconds)
05:32 🔗 mistym has joined #archiveteam
05:56 🔗 habi has joined #archiveteam
05:58 🔗 BlueMaxim has joined #archiveteam
06:05 🔗 xk_id has joined #archiveteam
06:16 🔗 habi has left
06:41 🔗 xk_id has quit IRC (Ping timeout: 606 seconds)
06:45 🔗 mistym has quit IRC (Remote host closed the connection)
06:50 🔗 khaoohs_ has joined #archiveteam
06:52 🔗 khaoohs has quit IRC (Read error: Operation timed out)
06:52 🔗 atlogbot has quit IRC (Ping timeout: 369 seconds)
06:53 🔗 aschmitz has quit IRC (Read error: Operation timed out)
06:53 🔗 thefinn93 has quit IRC (Ping timeout: 255 seconds)
06:54 🔗 thefinn93 has joined #archiveteam
06:55 🔗 dxrt has quit IRC (Ping timeout: 369 seconds)
06:56 🔗 swebb has quit IRC (Excess Flood)
06:56 🔗 vOYtEC has quit IRC (Ping timeout: 369 seconds)
06:57 🔗 JesseW has quit IRC (Quit: Leaving.)
06:58 🔗 dserodio has quit IRC (Read error: Operation timed out)
06:58 🔗 dxrt has joined #archiveteam
06:59 🔗 wp494 has quit IRC (Read error: Operation timed out)
06:59 🔗 Laverne has quit IRC (Ping timeout: 369 seconds)
06:59 🔗 chazchaz has quit IRC (Ping timeout: 369 seconds)
07:00 🔗 achip has quit IRC (Read error: Operation timed out)
07:00 🔗 no2penci1 has joined #archiveteam
07:00 🔗 wp494 has joined #archiveteam
07:01 🔗 achip has joined #archiveteam
07:02 🔗 dcmorton has quit IRC (Excess Flood)
07:02 🔗 dcmorton has joined #archiveteam
07:03 🔗 dserodio has joined #archiveteam
07:03 🔗 aschmitz has joined #archiveteam
07:03 🔗 atlogbot has joined #archiveteam
07:03 🔗 Laverne has joined #archiveteam
07:03 🔗 vOYtEC has joined #archiveteam
07:04 🔗 chazchaz has joined #archiveteam
07:05 🔗 swebb has joined #archiveteam
07:07 🔗 thefinn93 has quit IRC (Ping timeout: 186 seconds)
07:07 🔗 no2pencil has quit IRC (Read error: Operation timed out)
07:08 🔗 thefinn93 has joined #archiveteam
07:09 🔗 xk_id has joined #archiveteam
07:34 🔗 schbirid has joined #archiveteam
07:46 🔗 mistym has joined #archiveteam
08:00 🔗 mistym has quit IRC (Read error: Operation timed out)
08:02 🔗 dashcloud has quit IRC (Read error: Operation timed out)
08:05 🔗 dashcloud has joined #archiveteam
08:19 🔗 PurpleSym has joined #archiveteam
08:44 🔗 xk_id has quit IRC (Remote host closed the connection)
08:44 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
08:44 🔗 dashcloud has joined #archiveteam
08:49 🔗 mistym has joined #archiveteam
08:54 🔗 mistym has quit IRC (Ping timeout: 252 seconds)
09:30 🔗 arkiver S[h]O[r]T: thanks!
09:30 🔗 arkiver Start: that sucks :/
09:50 🔗 mistym has joined #archiveteam
09:54 🔗 PurpleSym has quit IRC (WeeChat 1.1.1)
09:57 🔗 mistym has quit IRC (Read error: Operation timed out)
10:06 🔗 ersi_ is now known as ersi
10:33 🔗 xk_id has joined #archiveteam
10:40 🔗 brayden has joined #archiveteam
10:55 🔗 superkuh has quit IRC (Read error: Operation timed out)
11:10 🔗 superkuh has joined #archiveteam
12:10 🔗 brayden has quit IRC (Quit: Leaving)
12:21 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
12:38 🔗 PurpleSym has joined #archiveteam
13:30 🔗 philpem has joined #archiveteam
13:43 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:48 🔗 dashcloud has joined #archiveteam
13:53 🔗 mistym has joined #archiveteam
13:58 🔗 mistym has quit IRC (Ping timeout: 252 seconds)
14:20 🔗 mistym has joined #archiveteam
14:22 🔗 signius has quit IRC (Ping timeout: 306 seconds)
14:34 🔗 signius has joined #archiveteam
14:43 🔗 mistym has quit IRC (Remote host closed the connection)
14:49 🔗 xk_id has quit IRC (Read error: Operation timed out)
15:01 🔗 mistym has joined #archiveteam
15:35 🔗 xk_id has joined #archiveteam
15:58 🔗 xk_id has quit IRC (Ping timeout: 186 seconds)
16:00 🔗 chfoo has quit IRC (Ping timeout: 258 seconds)
16:02 🔗 mistym has quit IRC (Remote host closed the connection)
16:05 🔗 JesseW has joined #archiveteam
16:07 🔗 JesseW has quit IRC (Client Quit)
16:13 🔗 chfoo has joined #archiveteam
16:18 🔗 JesseW has joined #archiveteam
16:24 🔗 JesseW has quit IRC (Leaving.)
16:43 🔗 mistym has joined #archiveteam
16:47 🔗 chfoo has quit IRC (Ping timeout: 260 seconds)
16:59 🔗 chfoo has joined #archiveteam
17:05 🔗 kukutz has joined #archiveteam
17:12 🔗 kukutz hi.
17:12 🔗 kukutz guys, I have a question.
17:12 🔗 kukutz we (Yandex - largest search engine in Russia) have unique archive of data and want it to be preserved for the mankind
17:12 🔗 kukutz this data is archive from our blog search service, which is now gradually shutting down
17:13 🔗 kukutz the service worked for 10 years and indexed russian blogosphere, but main data source was Livejournal.com
17:13 🔗 kukutz a few words about the importance. In the period from 2001 to 2010, Livejournal was the biggest social network in Runet. A huge number of writers, journalists, politicians, thinkers and other great people wrote and commented in Livejournal. The influence of these texts on the culture of Russia cannot be overstated
17:14 🔗 kukutz now many blogs either have been removed from Livejournal or the accounts have been sold with all the content deleted, so our archive contains unique information that cannot be downloaded from the Livejournal
17:14 🔗 kukutz we have about 5-10 terabytes of data ? the posts and comments from Livejournal with meta-information: author, date, reply-to and many other fields (we indexed Livejournal via RSS, PubSubHubBub, and other machine-readable formats and protocols)
17:14 🔗 kukutz i wrote this to info@archive.org, but there's no answer
17:15 🔗 kukutz do you have any ideas how this archive can be preserved online?
17:19 🔗 schbirid rest assured that archiveteam will be absolutely interested
17:19 🔗 schbirid this is fantastic
17:20 🔗 schbirid not my kind of stuff though, so i have no idea
17:20 🔗 schbirid if you need to get rid of it, we could provide space to dump it to
17:20 🔗 kukutz no, space is not a problem
17:20 🔗 schbirid for a nice archival, surely someone will come and say "let me do this"
17:21 🔗 schbirid SketchCow = jason, but on holiday i think
17:27 🔗 garyrh kukutz, you may also want to email jason@textfiles.com and/or stick around here.
17:27 🔗 arkiver kukutz: in what format if the data?
17:28 🔗 arkiver is*
17:28 🔗 arkiver We and I'm sure Internet Archive are definitely interested in preserving this data.
17:30 🔗 arkiver What format is the data in and how much data is it in terms of bytes?
17:31 🔗 ersi "we have about 5-10 terabytes of data"
17:35 🔗 kukutz Now it is in some our internal machine-readable format, but we can export it in any common format I think
17:39 🔗 SimpBrain has joined #archiveteam
17:40 🔗 aaaaaaaaa has joined #archiveteam
17:40 🔗 arkiver So if I read it correctly the data (posts, comments, meta-information) was extracted from RSS, PubSubHubHUb, html(?), so not saved in the formats it was extracted from
17:44 🔗 arkiver Are the request and response headers saved for this data?
17:45 🔗 arkiver If those are saved and data is available in the format is was indexed from we might be able to convert this data to WARCs.
17:45 🔗 arkiver WARCs are playable by the Wayback Machine, which means users would be able to easily browse what has been saved by Yandex.
17:46 🔗 yipdw kukutz: if you or someone else can document that format, another possibility is to just export that data as-is
17:46 🔗 yipdw derivation at that point is deferraable
17:51 🔗 kukutz arkiver: nope, request and response headers was not saved, unfortunately
17:51 🔗 kukutz yidpw: I'll speak with our team about current format, it's good point, thank you
17:52 🔗 SimpBrain has quit IRC (Quit: Leaving)
17:54 🔗 kukutz I need to go home, it's late evening here in Russia. When I'll know more about current format, should I return here or wrote to jason@ ? It will be Monday I think
17:55 🔗 yipdw kukutz: it's probably okay to do both
17:56 🔗 kukutz great, I'll do both :)
17:56 🔗 midas kukutz: both is fine, jason will pick it up when he reads it and hanging around here is always good as the response is most likely faster through here
17:56 🔗 midas not fast enough again :p
17:56 🔗 yipdw brevity wins
17:57 🔗 xmc rss would be a nice export format, if it preserves all the information that you currently have
17:57 🔗 kukutz xml: got it
17:57 🔗 kukutz xmc, sorry
17:58 🔗 xmc no worries
17:59 🔗 yipdw xmc is like xml
17:59 🔗 yipdw if xmc cannot solve your problem, use more
17:59 🔗 xmc unfortunately
18:08 🔗 SimpBrain has joined #archiveteam
18:17 🔗 ersi hah
18:23 🔗 arkiver kukutz: have a good night! thank you for preserving this data :)
18:29 🔗 Wyatts has quit IRC (Remote host closed the connection)
18:30 🔗 Wyatts has joined #archiveteam
18:46 🔗 mistym_ has joined #archiveteam
18:55 🔗 oldcad has joined #archiveteam
18:55 🔗 mistym has quit IRC (Ping timeout: 606 seconds)
19:03 🔗 kukutz has quit IRC (Quit: This computer has gone to sleep)
19:14 🔗 SketchCow Not on holiday.
19:14 🔗 SketchCow Taking care of parent, now back.
19:38 🔗 tsp_ has joined #archiveteam
19:40 🔗 tsp_ What will it take to archive www.kitchensinc.net up to archive.org? It redirects to the real site once it loads. Can the archivebot follow that?
19:42 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
19:42 🔗 dashcloud has joined #archiveteam
19:44 🔗 mistym_ has quit IRC (Remote host closed the connection)
19:48 🔗 HCross has quit IRC (Ping timeout: 252 seconds)
19:50 🔗 HCross has joined #archiveteam
20:09 🔗 SimpBrain has quit IRC (Quit: Leaving)
20:14 🔗 PurpleSym has quit IRC (WeeChat 1.1.1)
20:26 🔗 Atluxity how big do you think that site is?
20:26 🔗 Atluxity gigs?
20:27 🔗 Atluxity uhm, no images?
20:27 🔗 Atluxity ah, files
20:28 🔗 Atluxity look good for archivebot
20:29 🔗 Atluxity queued
20:29 🔗 Atluxity finished
20:32 🔗 Nemo_bis lol
20:33 🔗 xmc it redirects to http://Kitchensinc.jgriffith.com
20:33 🔗 xmc which i just queue
20:33 🔗 xmc d
20:33 🔗 Atluxity guess it did not take the jgriffith.com domains
20:33 🔗 Atluxity tight
20:33 🔗 Atluxity right
20:34 🔗 xmc yeah it redirects with a meta-refresh so archivebot won't see it
20:35 🔗 xmc Atluxity, tsp_: got it, 207 mbytes
20:43 🔗 mistym has joined #archiveteam
20:53 🔗 mistym has quit IRC (Remote host closed the connection)
20:59 🔗 mistym has joined #archiveteam
21:00 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
21:02 🔗 dashcloud has joined #archiveteam
21:14 🔗 schbirid has quit IRC (Leaving)
21:23 🔗 kukutz has joined #archiveteam
21:27 🔗 JesseW has joined #archiveteam
21:42 🔗 tsp_ thanks, missed the backscroll. Apparently the guy just died
21:43 🔗 Atluxity thats a shame :\
21:44 🔗 JesseW has quit IRC (Quit: Leaving.)
21:44 🔗 Atluxity thanks for bringing the site to our attention
22:20 🔗 pwnsrv has quit IRC (Ping timeout: 265 seconds)
22:22 🔗 kukutz2 has joined #archiveteam
22:25 🔗 Gfy has quit IRC (Ping timeout: 265 seconds)
22:25 🔗 kukutz has quit IRC (Ping timeout: 306 seconds)
22:33 🔗 BlueMaxim has joined #archiveteam
22:34 🔗 Gfy has joined #archiveteam
22:38 🔗 pwnsrv has joined #archiveteam
22:43 🔗 RedType_ has joined #archiveteam
22:49 🔗 kukutz__ has joined #archiveteam
22:51 🔗 pwnsrv has quit IRC (hub.se efnet.portlane.se)
22:51 🔗 oldcad has quit IRC (hub.se efnet.portlane.se)
22:51 🔗 RedType has quit IRC (hub.se efnet.portlane.se)
22:51 🔗 yakfish has quit IRC (hub.se efnet.portlane.se)
22:51 🔗 bauruine has quit IRC (hub.se efnet.portlane.se)
22:51 🔗 kukutz2 has quit IRC (Read error: Operation timed out)
23:03 🔗 Rotab has quit IRC (Ping timeout: 198 seconds)
23:12 🔗 Deewiant has quit IRC (hub.se irc.du.se)
23:12 🔗 Boppen has quit IRC (hub.se irc.du.se)
23:24 🔗 Boppen has joined #archiveteam
23:24 🔗 Rotab has joined #archiveteam
23:25 🔗 Deewiant has joined #archiveteam
23:33 🔗 mistym has quit IRC (Remote host closed the connection)
23:36 🔗 brayden has joined #archiveteam
23:42 🔗 wyatt8740 has quit IRC (Remote host closed the connection)
23:48 🔗 JesseW has joined #archiveteam

irclogger-viewer