#archiveteam 2015-02-26,Thu

↑back Search

Time Nickname Message
00:00 🔗 Nertsy has joined #archiveteam
00:02 🔗 Lord_Nigh has joined #archiveteam
00:03 🔗 balrog sets mode: +o Lord_Nigh
00:20 🔗 Selanda has joined #archiveteam
00:24 🔗 Nertsy has quit IRC (Ping timeout: 512 seconds)
00:25 🔗 Nertsy has joined #archiveteam
00:31 🔗 Lord_Nigh has quit IRC (Read error: Connection reset by peer)
00:31 🔗 Lord_Nigh has joined #archiveteam
00:48 🔗 Emcy has quit IRC (Ping timeout: 512 seconds)
00:52 🔗 Emcy has joined #archiveteam
00:57 🔗 cbb2 has joined #archiveteam
00:57 🔗 Nertsy has quit IRC (Read error: Operation timed out)
01:05 🔗 Nertsy has joined #archiveteam
01:06 🔗 Nertsy has quit IRC (Client Quit)
01:10 🔗 Nertsy has joined #archiveteam
01:22 🔗 nwf has joined #archiveteam
01:24 🔗 nwf Hey gang; I just saw the article on slashdot about NEWTON going away. Is there an effort to archive it?
01:25 🔗 nwf (I didn't see anything on the wiki, but I may have missed it.)
01:25 🔗 antonizoo link?
01:25 🔗 Thynix has joined #archiveteam
01:26 🔗 winr5r http://www.newton.dep.anl.gov/
01:26 🔗 winr5r 5 days
01:26 🔗 winr5r uh, three because i can't count
01:26 🔗 Thynix Any plans to archive... yeah ^
01:26 🔗 nwf http://science.slashdot.org/story/15/02/25/2313241/argonne-national-laboratory-shuts-down-online-ask-a-scientist-program or... that. thanks winr5r
01:26 🔗 winr5r well
01:27 🔗 nwf I just started a 'wget -r -np' on an I2 link, but I don't know if it'll finish in time.
01:27 🔗 antonizoo wow that's like only 4 days of notice
01:27 🔗 winr5r it looks like flat html
01:27 🔗 winr5r no ajax asshattery
01:27 🔗 winr5r so should archive easily
01:27 🔗 Thynix that's encouraging
01:27 🔗 winr5r huh, let's see how much disk i have on my VPS
01:27 🔗 winr5r i could probably get this
01:27 🔗 Thynix this actually looks really easy to crawl
01:28 🔗 winr5r yes
01:28 🔗 antonizoo well nwf, next time you want to try archiving on your own, use wget with all the right options (especially Archive Team's WARC dump option): https://github.com/baslqc/baslqc/wiki/Wget#wget-warc
01:28 🔗 Thynix Our Archives > Category > To see all entries
01:28 🔗 nwf antonizoo: Thanks! I'll restart, using that.
01:29 🔗 nwf Hopefully anl.gov forgives me the double-downloading.
01:29 🔗 winr5r nwf: i doubt they'll notice
01:29 🔗 antonizoo i think it'll be fine as long as you have a good enough delay
01:29 🔗 Thynix Should we be concerned about DOSing it? I suppose it's shrugged off a slashdotting already.
01:29 🔗 winr5r Thynix: probably not
01:29 🔗 antonizoo 2 seconds?
01:29 🔗 Thynix So.... behold, a .gov with robust hosting.
01:30 🔗 winr5r Thynix: serving up static file is a solved problem
01:30 🔗 Thynix Is there anything I can run to help?
01:30 🔗 winr5r +s
01:30 🔗 Thynix Or it sounds like you've got it already?
01:30 🔗 winr5r Thynix: you may as well grab a copy
01:30 🔗 winr5r i am!
01:30 🔗 Thynix how is a duplicate useful?
01:31 🔗 Thynix aw crap requires Jessie
01:32 🔗 winr5r Thynix: i think you just asked archive team "why is a redundant copy a good idea"
01:32 🔗 Thynix that was not my intention but it did come off that way
01:34 🔗 antonizoo so... where do we recommend sites for the archivebot to dump
01:34 🔗 Thynix I'm now at the point where I'm not sure what I meant. I'll start a Jessie VM then.
01:34 🔗 antonizoo I would normally dump it myself, but the site is pretty large (no images though)
01:35 🔗 Thynix antonizoo: what site?
01:35 🔗 antonizoo it's 4chandata.org . No images, text only, weird structure
01:35 🔗 winr5r antonizoo: ask in here, there's a couple with archivebot access
01:35 🔗 antonizoo also 4chanArchive.org . They are in danger because the admin ran out of funding and stopped archiving
01:35 🔗 antonizoo whoops, 4chanArchive.net
01:35 🔗 antonizoo and 4chandata.org
01:36 🔗 antonizoo I think a person, (who I assume is the admin ) came in here and offered a massive dump of this site, but I'm not sure if he ever followed through
01:37 🔗 Thynix so once I have a mirror what do I do with it?
01:37 🔗 Thynix oh FAQ
01:38 🔗 Thynix okay next question
01:38 🔗 Thynix say we both have mirrors of the site now - why would we both upload it to archive.org?
01:38 🔗 winr5r Thynix: as long as one of you does, it's fine
01:41 🔗 Ymgve has quit IRC ()
01:41 🔗 antonizoo http://archiveteam.org/index.php?title=4chan#4chandata.2F4chanarchive
01:41 🔗 antonizoo Interestingly, one person a few years back uploaded it's original image archive
01:42 🔗 antonizoo now it's text is in danger though
01:42 🔗 xk_id_ has joined #archiveteam
01:42 🔗 xk_id has quit IRC (Read error: Connection reset by peer)
01:42 🔗 xk_id_ has quit IRC (Read error: Connection reset by peer)
01:42 🔗 antonizoo I can dump the site myself, 4chandata.org (and there's no images), but it's likely to get big, so anyone feel like putting it in the archivebot?
01:44 🔗 Thynix has quit IRC (http://www.mibbit.com ajax IRC Client)
01:45 🔗 Thynix has joined #archiveteam
01:46 🔗 xk_id has joined #archiveteam
01:46 🔗 antonizoo Also, is there anyone who can accept WARC files to be imported into the Internet Archive (like Archivebot does)? Currently I just upload using http://archive.org/upload
01:46 🔗 antonizoo *Wayback Machine
01:48 🔗 winr5r yes, an IA admin can
01:49 🔗 antonizoo oh, cool
01:50 🔗 antonizoo well, if anyone wants to know, I've already uploaded the WARC dumps via the normal uploader, so just look for the tag: https://archive.org/search.php?query=subject%3A%22Bibliotheca%20Anonoma%22
01:50 🔗 antonizoo the big ones are: https://archive.org/details/studionyami-com_penfifteen-2012-03-05
01:50 🔗 antonizoo fybertech: https://archive.org/details/fybertech
01:52 🔗 xk_id_ has joined #archiveteam
01:52 🔗 xk_id has quit IRC (Read error: Operation timed out)
01:52 🔗 Start has quit IRC (Disconnected.)
01:53 🔗 winr5r excellent
01:53 🔗 Start has joined #archiveteam
01:54 🔗 winr5r i'm grabbing a non-WARC version, might end up rehosting newton because i can
01:54 🔗 nwf Newton is tiny, as it turns out. -- Downloaded: 20388 files, 269M.
01:55 🔗 nwf Now that I've got it, where do I send these warc files?
01:55 🔗 winr5r christ
01:55 🔗 winr5r is now known as winr4r
01:55 🔗 winr4r they can't afford like, $2/month to keep that shit online?
01:56 🔗 winr4r nwf: IA uploader, ask an IA admin to put it in the archive team collection
01:56 🔗 Emcy has quit IRC (Ping timeout: 306 seconds)
01:56 🔗 achip NEWTON is also in archivebot, when that's done it'll get uploaded to IA for processing into the wayback machine
02:04 🔗 cbb2 has quit IRC (cbb2)
02:05 🔗 mistym has quit IRC (Remote host closed the connection)
02:05 🔗 antonizoo hey, my wget WARC instance is going in an infinite loop
02:06 🔗 antonizoo is there any way to continue a wget WARC dump?
02:06 🔗 antonizoo don't really want to do everything again
02:06 🔗 antonizoo it's been a week dumping already
02:06 🔗 antonizoo I'll upload a picture to imgur
02:07 🔗 antonizoo http://i.imgur.com/Ye9XvMH.jpg
02:07 🔗 antonizoo so basically, this wget script is doomed to go down /src/src/src/src/src/... for eternity
02:08 🔗 antonizoo http://195.242.99.71/eos/kareha.pl/1140909402/src/1140909578052.jpg
02:08 🔗 antonizoo offending url link
02:09 🔗 winr4r i recall reading that you can't resume a WARC with wget
02:09 🔗 antonizoo damn, why?
02:09 🔗 winr4r let me make sure of that
02:09 🔗 antonizoo any way I can prevent this kind of infinite loop?
02:10 🔗 antonizoo because it's starting to get insane
02:10 🔗 antonizoo http://i.imgur.com/ZBFRLIy.jpg
02:10 🔗 antonizoo It grows
02:11 🔗 antonizoo when will it end
02:11 🔗 antonizoo is there a wget option that prevents going under an insane amount of directories?
02:11 🔗 achip if you set --level
02:14 🔗 antonizoo Alright. I think I will set --level to 50 as a failsafe. Should we warn people about this sort of bug on the wiki?
02:14 🔗 antonizoo because it basically destroyed a whole week's of WARC dumping
02:15 🔗 antonizoo better yet, is there a technical reason why a WARC dump cannot be continued?
02:16 🔗 winr4r antonizoo: i am not totally sure that is the case, i'm trying to find where i read that
02:17 🔗 antonizoo well, wget does remind you that --continue does not work with WARC
02:17 🔗 achip it's not really a bug in wget, it's a bug in that wakaba that just keeps going (this may be best in -bs)
02:18 🔗 primus104 has quit IRC (Leaving.)
02:19 🔗 antonizoo true. So, I'll be setting --level=50 from now on
02:19 🔗 antonizoo so, there's no way a site would have more than, like, 50 directory levels, right?
02:20 🔗 antonizoo I guess 100 to be extra safe?
02:20 🔗 winr4r antonizoo: decide that per-site
02:20 🔗 mistym has joined #archiveteam
02:21 🔗 antonizoo true. though, have you ever heard of a website with excessively long directory paths?
02:21 🔗 achip remember that's not directory levels thats link levels so page a linking to page b is one, b to c is another
02:21 🔗 antonizoo oh
02:22 🔗 antonizoo anything that will let me escape from this infinite loop condition?
02:22 🔗 antonizoo I will have to redump the site again
02:25 🔗 Specular has joined #archiveteam
02:26 🔗 antonizoo oh god it hungers
02:26 🔗 antonizoo http://i.imgur.com/7YPFAUf.jpg
02:26 🔗 winr4r hungry hungry hippos
02:27 🔗 winr4r in this case, for this site, wouldn't --level be functionally identical to "directory depth of X"
02:28 🔗 antonizoo perhaps
02:29 🔗 antonizoo though I don't want to lose any page
02:29 🔗 antonizoo then again, if we think of a website like a branching tree
02:29 🔗 antonizoo homepage -> threads -> images
02:29 🔗 antonizoo but then again, pages of threads would probably count under that fact, right? This site is massive and has 200 pages of threads
02:30 🔗 antonizoo I just need an option that prevents an infinite /src/src/src/src/src/src/src/ decending
02:30 🔗 kyan Archivebot can do that. If you don't want to use the main bot, you could run your own
02:31 🔗 kyan (if you do that change the user agent though so people don't think you're the main one)
02:31 🔗 antonizoo oh, cool. What is archivebot based on?
02:32 🔗 winr4r antonizoo: also, X60 represent \o/
02:32 🔗 antonizoo this link from the ArchiveTeam wikipage is dead: https://github.com/ArchiveTeam/ArchiveBot/blob/master/INSTALL
02:32 🔗 antonizoo oh wait, the file was moved, not the repo: https://github.com/ArchiveTeam/ArchiveBot/
02:33 🔗 antonizoo yes, the X60 uses Libreboot. I helped to write the guide on installation
02:33 🔗 antonizoo before, they had very little documetnation
02:33 🔗 kyan archivebot uses wpull
02:33 🔗 kyan http://i.imgur.com/7YPFAUf.jpg
02:34 🔗 kyan mispaste. https://github.com/chfoo/wpull
02:34 🔗 kyan maybe we should take this to #archiveteam-bs?
02:35 🔗 antonizoo sure
02:35 🔗 antonizoo The URLs are as follows
02:35 🔗 winr4r yes
02:35 🔗 Spring has quit IRC (Read error: Operation timed out)
02:35 🔗 antonizoo oh, yeah, moving to another channel
02:44 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:49 🔗 dashcloud has joined #archiveteam
03:23 🔗 Emcy has joined #archiveteam
03:47 🔗 Emcy_ has joined #archiveteam
03:48 🔗 winr4r Thynix, nwf, have you uploaded your grabs yet?
03:49 🔗 Emcy has quit IRC (Ping timeout: 306 seconds)
04:04 🔗 Emcy_ has quit IRC (Read error: Connection reset by peer)
04:13 🔗 aschmitz has joined #archiveteam
04:21 🔗 signius has quit IRC (Read error: Operation timed out)
04:22 🔗 Emcy has joined #archiveteam
04:34 🔗 signius has joined #archiveteam
04:45 🔗 ruukasu has quit IRC (Remote host closed the connection)
04:49 🔗 ruukasu has joined #archiveteam
04:56 🔗 mistym has quit IRC (Remote host closed the connection)
05:04 🔗 swebb has quit IRC (Ping timeout: 319 seconds)
05:04 🔗 goekesmi has quit IRC (Ping timeout: 319 seconds)
05:04 🔗 Meeh has quit IRC (Ping timeout: 319 seconds)
05:04 🔗 warthurto has quit IRC (Ping timeout: 319 seconds)
05:04 🔗 Zebranky has quit IRC (Ping timeout: 319 seconds)
05:05 🔗 swebb has joined #archiveteam
05:07 🔗 Meeh has joined #archiveteam
05:09 🔗 warthurto has joined #archiveteam
05:12 🔗 goekesmi has joined #archiveteam
05:13 🔗 Zebranky has joined #archiveteam
05:14 🔗 aaaaaaaaa has quit IRC (Leaving)
05:28 🔗 nwf winr4r: It's going up very slowly (Sorry, I had to move from my speedy I2 connection to home, so it's going up over DSL...)
05:28 🔗 mistym has joined #archiveteam
05:42 🔗 winr4r nwf: ah okay
05:43 🔗 nwf (currently at 55 / 157 MB pushed)
05:43 🔗 winr4r oh man, i just got 15mbit up yesterday, it's amazing
05:45 🔗 winr4r anyway, when you create an item, http://wat.lewiscollard.com/archive/archive-www.newton.dep.anl.gov-20150214-converted-links.tar.gz
05:45 🔗 winr4r stick that in there as well, that's a mirror-(almost)-ready version
05:52 🔗 nwf "create an item"?
05:52 🔗 nwf Aw, damn, I just realized I have the wrong thing in Page title because of a typo when I was selecting files. Can that be changed at a later part of the UI?
05:53 🔗 xmc yeah
05:53 🔗 nwf Hooray
05:54 🔗 xmc everything except the item name (url) can be changed
06:09 🔗 sep332 has quit IRC (Read error: Operation timed out)
06:20 🔗 sep332 has joined #archiveteam
07:04 🔗 wp494 FYI: anyone that's in #flogger will be kicked to be aware that the real Blogger channel that we're using is #frogger in around 30 or so minutes
07:05 🔗 wp494 if anyone has objections to me doing this, speak now, or forever hold your peace
07:18 🔗 Atluxity will that project have work to do in tracker soon-ish?
07:22 🔗 Smiley has quit IRC (Remote host closed the connection)
07:22 🔗 Smiley has joined #archiveteam
07:22 🔗 antomati_ has joined #archiveteam
07:25 🔗 antomatic has quit IRC (Ping timeout: 246 seconds)
07:33 🔗 antonizoo has quit IRC (Quit: Konversation terminated!)
07:35 🔗 wp494 alright, kicks will be carried out now
07:40 🔗 SmileyG has joined #archiveteam
07:42 🔗 winr5r has joined #archiveteam
07:42 🔗 tephra_ has joined #archiveteam
07:43 🔗 dugo_ has joined #archiveteam
07:43 🔗 PepsiMax_ has joined #archiveteam
07:43 🔗 lrkj has joined #archiveteam
07:44 🔗 musalbas has joined #archiveteam
07:44 🔗 Smiley has quit IRC (hub.efnet.us irc.colosolutions.net)
07:44 🔗 Zebranky has quit IRC (hub.efnet.us irc.colosolutions.net)
07:45 🔗 lukeman has joined #archiveteam
07:45 🔗 balrog_ has joined #archiveteam
07:46 🔗 Arkiver2 has joined #archiveteam
07:47 🔗 joepie91_ has joined #archiveteam
07:47 🔗 primus104 has joined #archiveteam
07:48 🔗 nico_32_ has joined #archiveteam
07:49 🔗 RedType_ has joined #archiveteam
07:49 🔗 Fusl_ has joined #archiveteam
07:51 🔗 Zebranky_ has joined #archiveteam
07:51 🔗 dashcloud has quit IRC (Read error: Operation timed out)
07:57 🔗 dashcloud has joined #archiveteam
08:04 🔗 sivoais_ has joined #archiveteam
08:05 🔗 tephra has quit IRC (hub.se efnet.port80.se)
08:05 🔗 GLaDOS has quit IRC (hub.se efnet.port80.se)
08:05 🔗 sivoais has quit IRC (hub.se efnet.port80.se)
08:05 🔗 lukeman_ has quit IRC (hub.se efnet.port80.se)
08:05 🔗 Fusl has quit IRC (hub.se efnet.port80.se)
08:05 🔗 dugo has quit IRC (hub.se efnet.port80.se)
08:05 🔗 Famicoman has quit IRC (hub.se efnet.port80.se)
08:05 🔗 RedType has quit IRC (hub.se efnet.port80.se)
08:05 🔗 lysobit has quit IRC (hub.se efnet.port80.se)
08:05 🔗 arkiver has quit IRC (hub.se efnet.port80.se)
08:05 🔗 lrkj_ has quit IRC (hub.se efnet.port80.se)
08:05 🔗 lhobas has quit IRC (hub.se efnet.port80.se)
08:05 🔗 VonGuard_ has quit IRC (hub.se efnet.port80.se)
08:05 🔗 PepsiMax has quit IRC (hub.se efnet.port80.se)
08:05 🔗 balrog has quit IRC (hub.se efnet.port80.se)
08:05 🔗 nico_32 has quit IRC (hub.se efnet.port80.se)
08:05 🔗 Sue_ has quit IRC (hub.se efnet.port80.se)
08:05 🔗 nox has quit IRC (hub.se efnet.port80.se)
08:05 🔗 winr4r has quit IRC (hub.se efnet.port80.se)
08:05 🔗 joepie91 has quit IRC (hub.se efnet.port80.se)
08:05 🔗 Muad-Dib has quit IRC (hub.se efnet.port80.se)
08:05 🔗 russss has quit IRC (hub.se efnet.port80.se)
08:05 🔗 danneh_ has quit IRC (hub.se efnet.port80.se)
08:05 🔗 LittUp has quit IRC (hub.se efnet.port80.se)
08:05 🔗 fresco___ has quit IRC (hub.se efnet.port80.se)
08:05 🔗 deathy has quit IRC (hub.se efnet.port80.se)
08:13 🔗 mistym has quit IRC (Remote host closed the connection)
08:17 🔗 chazchaz has quit IRC (Ping timeout: 369 seconds)
08:20 🔗 musalbas is now known as lysobit
08:20 🔗 Fusl_ is now known as Fusl
08:20 🔗 balrog_ is now known as balrog
08:27 🔗 Sue_ has joined #archiveteam
08:27 🔗 deathy has joined #archiveteam
08:27 🔗 VonGuard_ has joined #archiveteam
08:27 🔗 danneh_ has joined #archiveteam
08:27 🔗 fresco___ has joined #archiveteam
08:27 🔗 GLaDOS has joined #archiveteam
08:27 🔗 LittUp_ has joined #archiveteam
08:27 🔗 Muad-Dib has joined #archiveteam
08:27 🔗 nox has joined #archiveteam
08:27 🔗 lhobas has joined #archiveteam
08:27 🔗 russss has joined #archiveteam
08:27 🔗 xk_id_ has quit IRC (Remote host closed the connection)
08:40 🔗 antomati_ is now known as antomatic
08:46 🔗 nertzy2 has quit IRC (Ping timeout: 512 seconds)
08:51 🔗 PepsiMax_ is now known as PepsiMax
08:52 🔗 nertzy has joined #archiveteam
08:53 🔗 Famicoman has joined #archiveteam
08:56 🔗 primus104 has quit IRC (Leaving.)
08:59 🔗 sep332 has quit IRC (Read error: Operation timed out)
09:22 🔗 sep332 has joined #archiveteam
09:46 🔗 Famicoman has quit IRC (hub.se efnet.port80.se)
09:46 🔗 Sue_ has quit IRC (hub.se efnet.port80.se)
09:46 🔗 deathy has quit IRC (hub.se efnet.port80.se)
09:46 🔗 VonGuard_ has quit IRC (hub.se efnet.port80.se)
09:46 🔗 danneh_ has quit IRC (hub.se efnet.port80.se)
09:46 🔗 fresco___ has quit IRC (hub.se efnet.port80.se)
09:46 🔗 GLaDOS has quit IRC (hub.se efnet.port80.se)
09:46 🔗 LittUp_ has quit IRC (hub.se efnet.port80.se)
09:46 🔗 Muad-Dib has quit IRC (hub.se efnet.port80.se)
09:46 🔗 nox has quit IRC (hub.se efnet.port80.se)
09:46 🔗 lhobas has quit IRC (hub.se efnet.port80.se)
09:46 🔗 russss has quit IRC (hub.se efnet.port80.se)
10:00 🔗 Famicoma1 has joined #archiveteam
10:01 🔗 LittUp has joined #archiveteam
10:01 🔗 danneh_ has joined #archiveteam
10:02 🔗 nox has joined #archiveteam
10:02 🔗 russss has joined #archiveteam
10:02 🔗 deathy has joined #archiveteam
10:03 🔗 Muad-Dib has joined #archiveteam
10:06 🔗 nox has quit IRC (Ping timeout: 260 seconds)
10:06 🔗 danneh_ has quit IRC (Ping timeout: 260 seconds)
10:06 🔗 LittUp has quit IRC (Ping timeout: 260 seconds)
10:16 🔗 LittUp has joined #archiveteam
10:16 🔗 danneh_ has joined #archiveteam
10:18 🔗 nox has joined #archiveteam
10:18 🔗 schbirid has joined #archiveteam
10:50 🔗 is- has joined #archiveteam
11:06 🔗 Arkiver2 is now known as arkiver
11:09 🔗 Ymgve has joined #archiveteam
11:17 🔗 nico_32_ is now known as nico_32
11:20 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
11:20 🔗 Emcy_ has joined #archiveteam
11:26 🔗 Emcy has quit IRC (Ping timeout: 512 seconds)
11:34 🔗 lhobas has joined #archiveteam
11:38 🔗 winr5r is now known as winr4r
11:43 🔗 primus104 has joined #archiveteam
11:52 🔗 primus105 has joined #archiveteam
11:58 🔗 Sue_ has joined #archiveteam
12:00 🔗 primus104 has quit IRC (Read error: Operation timed out)
13:10 🔗 WubTheCap has joined #archiveteam
13:20 🔗 xtr-201 has quit IRC (Ping timeout: 370 seconds)
13:52 🔗 garyrh has quit IRC (Remote host closed the connection)
13:56 🔗 sankin has joined #archiveteam
14:11 🔗 Specular has quit IRC (Read error: Connection reset by peer)
14:17 🔗 Spring has joined #archiveteam
14:41 🔗 garyrh has joined #archiveteam
14:41 🔗 dashcloud has quit IRC (Read error: Operation timed out)
14:48 🔗 dashcloud has joined #archiveteam
15:24 🔗 fche has joined #archiveteam
15:27 🔗 fche hi guys; aware of http://www.newton.dep.anl.gov/ -- "turning off this website March 1, 2015" ?
15:28 🔗 arkiver fche: it's being archived, it's between the other sites here: http://archivebot.com/
15:28 🔗 fche excellent, thanks!
15:28 🔗 fche y'all are doing god's work, so to speak, thanks
15:29 🔗 fche (might want to plop that onto http://archiveteam.org/index.php?title=Deathwatch)
16:03 🔗 Spring asked in the IA channel but no reply, is there some indexed search of archived sites? It occurred to me earlier that many sites have a risk of being effectively 'forgotten' if users aren't aware of the old links, due especially to regular search results which can lead to fewer new mentions in active sites over time for obscure pages with still-relevant content.
16:08 🔗 mistym has joined #archiveteam
16:08 🔗 mistym has quit IRC (Remote host closed the connection)
16:12 🔗 arkiver no
16:12 🔗 arkiver Spring: these is no search function currently in the wayback machine
16:12 🔗 arkiver maybe better with no search function
16:13 🔗 Spring to prevent robots.txt exclusions?
16:13 🔗 arkiver to prevent a lot of pages going dark due to complaints
16:13 🔗 xtr-201 has joined #archiveteam
16:14 🔗 arkiver but no, there's no search function or index of all archived sites in the wayback machine
16:14 🔗 Spring someday they'll hopefully be an unofficial index, so much good content is there
16:14 🔗 Spring *there'll
16:16 🔗 primus105 has quit IRC (Leaving.)
16:17 🔗 mhazinsk index of sites or index of content?
16:17 🔗 mhazinsk the latter sounds prohibitively difficult
16:21 🔗 Spring the content. Would be a huge task but sooner or later it seems some sites will be so obscure as to practically not exist in the future.
16:25 🔗 mistym has joined #archiveteam
16:25 🔗 Spring it's the whole 'if a tree falls in the woods but no one's there to hear it', does the content exist to those who can't find it (not as much relevant from a archivist's pov, but more from a user's pov).
16:34 🔗 Nertsy has quit IRC (Ping timeout: 512 seconds)
16:35 🔗 Nertsy has joined #archiveteam
16:43 🔗 aaaaaaaaa has joined #archiveteam
17:17 🔗 Specular has joined #archiveteam
17:19 🔗 chazchaz has joined #archiveteam
17:21 🔗 primus104 has joined #archiveteam
17:24 🔗 Spring has quit IRC (Read error: Operation timed out)
17:24 🔗 Specular_ has joined #archiveteam
17:30 🔗 Specular has quit IRC (Read error: Operation timed out)
17:36 🔗 mistym has quit IRC (Remote host closed the connection)
17:51 🔗 mistym has joined #archiveteam
18:09 🔗 xtr-201 has quit IRC (Ping timeout: 370 seconds)
18:10 🔗 chris_ has joined #archiveteam
18:10 🔗 chris_ is now known as wacky
18:37 🔗 T31M has joined #archiveteam
18:58 🔗 khronon has joined #archiveteam
18:58 🔗 khronon has quit IRC (Client Quit)
19:06 🔗 xmc sets mode: +o swebb
19:06 🔗 swebb sets mode: +o balrog
19:14 🔗 garyrh_ has quit IRC (Quit: Leaving)
19:20 🔗 garyrh_ has joined #archiveteam
19:34 🔗 dugo_ is now known as dugo
19:43 🔗 db48x has joined #archiveteam
20:18 🔗 DFJustin http://www.dhakatribune.com/bangladesh/2015/feb/26/writer-avijit-roy-killed-miscreants-attack
20:18 🔗 DFJustin site seems to already be down :(
20:45 🔗 miljo has quit IRC (leaving)
20:52 🔗 signius has quit IRC (Ping timeout: 306 seconds)
21:03 🔗 xtr-201 has joined #archiveteam
21:04 🔗 signius has joined #archiveteam
21:10 🔗 chfoo has quit IRC (Quit: chfoo)
21:15 🔗 chfoo has joined #archiveteam
21:16 🔗 chfoo has quit IRC (Remote host closed the connection)
21:18 🔗 chfoo has joined #archiveteam
21:21 🔗 mistym has quit IRC (Remote host closed the connection)
21:21 🔗 mashmac2 has joined #archiveteam
21:26 🔗 mashmac2 has quit IRC (Client Quit)
21:29 🔗 BlueMaxim has joined #archiveteam
21:37 🔗 schbirid has quit IRC (Leaving)
21:44 🔗 mistym has joined #archiveteam
21:48 🔗 sankin has quit IRC (Leaving.)
22:03 🔗 Ravenloft has quit IRC (Read error: Operation timed out)
22:06 🔗 SN4T14 has quit IRC (Quit: Leaving)
22:12 🔗 SketchCow People are VERY unhappy about http://newton.dep.anl.gov/
22:12 🔗 SketchCow It's going down end of month, yes Sunday
22:18 🔗 balrog do we know why it's closing?
22:18 🔗 balrog and why they're not leaving it up as an archive?
22:20 🔗 mistym has quit IRC (Remote host closed the connection)
22:36 🔗 mistym has joined #archiveteam
22:42 🔗 SketchCow No idea.
22:43 🔗 yipdw argonne wants it allgone
22:45 🔗 primus_ has quit IRC (Read error: Operation timed out)
22:47 🔗 primus_ has joined #archiveteam
23:00 🔗 aaaaaaaaa my guess is that it is either they are just sick of doing it or Washington Monument syndrome
23:01 🔗 aaaaaaaaa although the second would be a long play.
23:02 🔗 rejon has joined #archiveteam
23:05 🔗 Spring has joined #archiveteam
23:07 🔗 balrog http://mirror.anl.gov/pub/
23:09 🔗 Specular_ has quit IRC (Read error: Operation timed out)
23:19 🔗 Ctrl-S does anyone here know how to do SQL in python the right way?
23:19 🔗 Ctrl-S oops wrong channel
23:28 🔗 danneh_ Ctrl-S: i've heard quite a lot about http://www.sqlalchemy.org/ but never used it, but I'm mostly doing sqlite so I just use the sqlite3 inbuilt

irclogger-viewer