#archiveteam-bs 2017-04-28,Fri

↑back Search

Time Nickname Message
00:24 🔗 Stilett0 has quit IRC (Ping timeout: 246 seconds)
00:28 🔗 Coderjo_ is now known as Coderjo
01:02 🔗 akaibu has joined #archiveteam-bs
01:18 🔗 valesi has quit IRC (Remote host closed the connection)
01:24 🔗 Selavi has joined #archiveteam-bs
01:27 🔗 akaibu don't know if saying it here will matter, but maybe Sketch can forward this to IA, but anyway, i've been using archive.org for the last couple years(be it to extract info from dead sites, compare changes, etc), but for a few site have died and then get domain squatted with robots.txt locking out the content of the previous site, this is quite a problem
01:37 🔗 Odd0002 yes it is
01:38 🔗 akaibu (guess i should suggest a fix, but i'm not sure what exactly would work for IA, since its basically only using repecting robots.txt so it doesn't get sued)
01:39 🔗 Odd0002 only provide access during times when robots.txt wasn't active on it?
01:45 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
02:09 🔗 akaibu yea that would likely do it
02:09 🔗 Stilett0 has joined #archiveteam-bs
02:10 🔗 akaibu again this means nothing if it doesn't get to the IA people and i don't know if sketch actually reads all of the chat
02:11 🔗 Dark_Star has quit IRC (Ping timeout: 633 seconds)
02:15 🔗 Dark_Star has joined #archiveteam-bs
02:36 🔗 Frogging I think Sketch would prefer that you contact the IA directly
02:37 🔗 Frogging since Archive Team is not the Internet Archive
02:47 🔗 akaibu i've... not had good luck with talking with them before
02:53 🔗 JensRex Can someone with tracker access requeue items on the mlkshk project? https://tracker.archiveteam.org/mlkshk/
02:53 🔗 JensRex I've been prodding arkiver for days, but he's afk I guess.
02:53 🔗 JensRex Site shuts down on may 1st.
03:47 🔗 hook54321 akaibu: Have you seen this blog post? https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/
03:47 🔗 hook54321 Sounds like they're planning on not respecting robots.txt anymore
03:54 🔗 hook54321 Anyone know what the status is on the search engine for the wayback machine?
04:00 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
04:22 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:28 🔗 Sk1d has joined #archiveteam-bs
06:11 🔗 LordNigh2 has joined #archiveteam-bs
06:13 🔗 Lord_Nigh has quit IRC (Ping timeout: 250 seconds)
06:13 🔗 LordNigh2 is now known as Lord_Nigh
08:05 🔗 godane has quit IRC (Leaving.)
08:20 🔗 GE has joined #archiveteam-bs
09:38 🔗 GE has quit IRC (Remote host closed the connection)
09:48 🔗 Honno has joined #archiveteam-bs
10:26 🔗 btfo has quit IRC (Ping timeout: 600 seconds)
10:35 🔗 btfo has joined #archiveteam-bs
11:14 🔗 GE has joined #archiveteam-bs
11:14 🔗 Yoshimura has joined #archiveteam-bs
11:40 🔗 pizzaiolo has joined #archiveteam-bs
12:07 🔗 raphidae has joined #archiveteam-bs
12:14 🔗 odemg has joined #archiveteam-bs
12:26 🔗 ZexaronS has quit IRC (Leaving)
12:35 🔗 pizzaiolo has quit IRC (Ping timeout: 245 seconds)
12:40 🔗 JAA has joined #archiveteam-bs
13:18 🔗 akaibu hook54321: oh nice, i've been meaning to check on the IA blog, thanks for letting me know
13:20 🔗 akaibu while it might not be good for them to respect Robots.txt anymore, they should still use them for say the sitemaps and the crawl rates settings
13:23 🔗 akaibu (leaving for now but if you ping me i'll see it in the logs and respond later)
13:23 🔗 akaibu has quit IRC ()
13:39 🔗 JAA has quit IRC (Quit: Page closed)
13:46 🔗 Yoshimura has quit IRC (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
14:08 🔗 brayden_ has joined #archiveteam-bs
14:08 🔗 swebb sets mode: +o brayden_
14:14 🔗 brayden has quit IRC (Read error: Operation timed out)
14:32 🔗 brayden__ has joined #archiveteam-bs
14:32 🔗 swebb sets mode: +o brayden__
14:37 🔗 brayden_ has quit IRC (Read error: Operation timed out)
14:40 🔗 JensRex has quit IRC (Remote host closed the connection)
14:41 🔗 JensRex has joined #archiveteam-bs
15:21 🔗 Honno_ has joined #archiveteam-bs
15:22 🔗 odemg has quit IRC (Remote host closed the connection)
15:27 🔗 Honno has quit IRC (Ping timeout: 370 seconds)
15:30 🔗 brayden__ is now known as brayden
15:31 🔗 joepie91 in the category of "typos you really shouldn't make on customs declaration forms from China": https://i.imgur.com/Dgy6Voi.jpg
15:34 🔗 odemg has joined #archiveteam-bs
15:48 🔗 pizzaiolo has joined #archiveteam-bs
16:56 🔗 dan_ has joined #archiveteam-bs
17:09 🔗 brayden_ has joined #archiveteam-bs
17:09 🔗 swebb sets mode: +o brayden_
17:13 🔗 brayden has quit IRC (Read error: Operation timed out)
17:14 🔗 brayden_ is now known as brayden
17:24 🔗 dan_ has quit IRC (Quit: Page closed)
17:27 🔗 odemg has quit IRC (Remote host closed the connection)
17:30 🔗 Honno_ has quit IRC (Ping timeout: 370 seconds)
17:38 🔗 odemg has joined #archiveteam-bs
17:38 🔗 odemg has quit IRC (Connection closed)
18:00 🔗 pizzaiolo has quit IRC (Read error: Connection reset by peer)
18:00 🔗 odemg has joined #archiveteam-bs
18:02 🔗 icedice has joined #archiveteam-bs
18:02 🔗 pizzaiolo has joined #archiveteam-bs
18:07 🔗 GE_ has joined #archiveteam-bs
18:07 🔗 GE_ has quit IRC (Remote host closed the connection)
18:09 🔗 GE has quit IRC (Ping timeout: 255 seconds)
18:16 🔗 Aranje has quit IRC (Ping timeout: 245 seconds)
18:34 🔗 ndiddy joepie91: this is my favorite one http://i.imgur.com/oO2XPp1.jpg
18:34 🔗 joepie91 HAHAHA
18:35 🔗 joepie91 ndiddy: I wonder whether they got screwed over
18:35 🔗 joepie91 :p
18:36 🔗 ndiddy in my experience chinese sellers don't give a shit about customs
18:37 🔗 ndiddy my $50 eeprom programmer was a $5 "gift" on the box
18:40 🔗 godane has joined #archiveteam-bs
18:48 🔗 godane so freenode is having a excess flood
18:48 🔗 godane anyone else having that problem?
18:52 🔗 odemg has quit IRC (Remote host closed the connection)
19:02 🔗 Aranje has joined #archiveteam-bs
19:17 🔗 joepie91 godane: some servers are a bit iffy about lots of joins-on-connect, cycle to a different server
19:18 🔗 joepie91 ndiddy: I actually ran across a seller yesterday that stated in their description that they declare the true value
19:18 🔗 joepie91 and if you wanted a fake value you had to send them a message
19:18 🔗 joepie91 they exist!
19:38 🔗 GE has joined #archiveteam-bs
19:54 🔗 odemg has joined #archiveteam-bs
20:07 🔗 antomati_ has joined #archiveteam-bs
20:07 🔗 swebb sets mode: +o antomati_
20:09 🔗 antomatic has quit IRC (Ping timeout: 250 seconds)
20:11 🔗 antomati_ is now known as antomatic
20:14 🔗 antomati_ has joined #archiveteam-bs
20:14 🔗 swebb sets mode: +o antomati_
20:16 🔗 antomatic has quit IRC (Read error: Operation timed out)
20:17 🔗 godane i'm starting to upload AngryJoeShow youtube channel: https://archive.org/details/angryjoeshow-youtube-channel-2008-10-04
20:17 🔗 antomati_ is now known as antomatic
20:27 🔗 JAA has joined #archiveteam-bs
20:29 🔗 JAA My EpicSki grab phase 1 is going well so far, 430k of 2M done in 7.5 hours, no ban or throttling. :-)
20:41 🔗 Odd0002 no warrior script for it?
20:43 🔗 mls JAA: I have little to no experience in doing manual archive jobs, though count me in if I can be of any assistance
20:53 🔗 mls ndiddy: Perhaps you could contribute your further findings on pkware dcl format on http://fileformats.archiveteam.org
20:53 🔗 mls ^ If any
20:58 🔗 JAA Odd0002: While that might be a bit faster, I think this is sufficient, especially since I can just bomb the server (Fastly, actually) with requests. Note that there is an ArchiveBot job running as well, so I think it's covered well enough.
20:58 🔗 Odd0002 ah
20:59 🔗 JAA Now that the deadline has been extended to 12 May, I actually think that ArchiveBot has a very real chance of finishing in time, but I'm running my grab anyway just in case.
21:00 🔗 JAA An option of prioritising certain URLs in wpull/ArchiveBot would be extremely useful. ArchiveBot's following external links is a very useful feature, but especially in these emergency grabs, it should really prioritise getting the content in immediate danger ASAP and delay the external resources.
21:01 🔗 JAA wpull actually does have a way to specify a priority for a URL internally, but it isn't used anywhere yet.
21:06 🔗 JAA Something else that would be awesome is a way to run a recursive grab with wpull in a distributed manner, i.e. from multiple machines/IPs (cf. a discussion about that from a few days ago, on Sat or Sun I think). In the meantime, I found out that wpull has a --database-uri option, which can be used instead of --database to use any kind of DB supported by SQLAlchemy. I'm pretty sure this could be used to set up a distributed recursive grab, b
21:06 🔗 JAA ut I have yet to try it out.
21:07 🔗 JAA mls: Thanks. For now, I think I got this covered. I'll let you know if more resources are needed.
21:16 🔗 JAA has quit IRC (Quit: Good night)
21:19 🔗 Honno_ has joined #archiveteam-bs
21:21 🔗 GE has quit IRC (Quit: zzz)
21:23 🔗 Odd0002 by the way dukgo's xmpp server is going down May 18th, but I don't know if there's anything that can even be grabbed from it
21:26 🔗 anjacks0n has joined #archiveteam-bs
21:28 🔗 anjacks0n has quit IRC (Client Quit)
21:30 🔗 godane i'm going after archive.wired.com
21:31 🔗 godane looks like at least i didn't upload it: https://archive.org/search.php?query=subject%3A%22archive.wired.com%22
21:41 🔗 JensRex has quit IRC (Remote host closed the connection)
21:41 🔗 JensRex has joined #archiveteam-bs
21:46 🔗 Honno_ has quit IRC (Ping timeout: 370 seconds)
22:13 🔗 JensRex Less than two days left to finish mlkshk project. I've been trying to get someone to manage the tracker for many days now. Honestly, it's getting extremely frustrating at this point.
22:13 🔗 JensRex It's like I'm the only one who cares about finishing this job.
22:21 🔗 Odd0002 the problem is that I think the website is already missing images
22:22 🔗 odemg has quit IRC (Remote host closed the connection)
22:23 🔗 Odd0002 JensRex: if you go to https://mlkshk.com/popular, the images are mostly all missing
22:23 🔗 JensRex Well, I've had nothing but total radio silence.
22:24 🔗 Odd0002 it's been broken for at least a week now, if not more
22:24 🔗 JensRex I'm not blaming anyone specific. We all have stuff to do IRL, but maybe someone should be around for tracker maintenance.
22:24 🔗 JensRex Also, we worked around the missing image issue.
22:25 🔗 JensRex I was in touch with mlkshk admins, and they provided a workaround that arkiver implemented.
22:35 🔗 tammy_ JAA: lost power last night, went to restart the scrape and I think I'm seeing a page depth loop occuring. https://gist.github.com/anonymous/2b3e0f0e1f606f0ba3b4319889f04987
22:41 🔗 arkiver wooh
22:41 🔗 arkiver loops
23:28 🔗 odemg has joined #archiveteam-bs
23:29 🔗 BlueMaxim has joined #archiveteam-bs

irclogger-viewer