[00:24] *** Stilett0 has quit IRC (Ping timeout: 246 seconds) [00:28] *** Coderjo_ is now known as Coderjo [01:02] *** akaibu has joined #archiveteam-bs [01:18] *** valesi has quit IRC (Remote host closed the connection) [01:24] *** Selavi has joined #archiveteam-bs [01:27] don't know if saying it here will matter, but maybe Sketch can forward this to IA, but anyway, i've been using archive.org for the last couple years(be it to extract info from dead sites, compare changes, etc), but for a few site have died and then get domain squatted with robots.txt locking out the content of the previous site, this is quite a problem [01:37] yes it is [01:38] (guess i should suggest a fix, but i'm not sure what exactly would work for IA, since its basically only using repecting robots.txt so it doesn't get sued) [01:39] only provide access during times when robots.txt wasn't active on it? [01:45] *** pizzaiolo has quit IRC (Remote host closed the connection) [02:09] yea that would likely do it [02:09] *** Stilett0 has joined #archiveteam-bs [02:10] again this means nothing if it doesn't get to the IA people and i don't know if sketch actually reads all of the chat [02:11] *** Dark_Star has quit IRC (Ping timeout: 633 seconds) [02:15] *** Dark_Star has joined #archiveteam-bs [02:36] I think Sketch would prefer that you contact the IA directly [02:37] since Archive Team is not the Internet Archive [02:47] i've... not had good luck with talking with them before [02:53] Can someone with tracker access requeue items on the mlkshk project? https://tracker.archiveteam.org/mlkshk/ [02:53] I've been prodding arkiver for days, but he's afk I guess. [02:53] Site shuts down on may 1st. [03:47] akaibu: Have you seen this blog post? https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ [03:47] Sounds like they're planning on not respecting robots.txt anymore [03:54] Anyone know what the status is on the search engine for the wayback machine? [04:00] *** BlueMaxim has quit IRC (Read error: Operation timed out) [04:22] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:28] *** Sk1d has joined #archiveteam-bs [06:11] *** LordNigh2 has joined #archiveteam-bs [06:13] *** Lord_Nigh has quit IRC (Ping timeout: 250 seconds) [06:13] *** LordNigh2 is now known as Lord_Nigh [08:05] *** godane has quit IRC (Leaving.) [08:20] *** GE has joined #archiveteam-bs [09:38] *** GE has quit IRC (Remote host closed the connection) [09:48] *** Honno has joined #archiveteam-bs [10:26] *** btfo has quit IRC (Ping timeout: 600 seconds) [10:35] *** btfo has joined #archiveteam-bs [11:14] *** GE has joined #archiveteam-bs [11:14] *** Yoshimura has joined #archiveteam-bs [11:40] *** pizzaiolo has joined #archiveteam-bs [12:07] *** raphidae has joined #archiveteam-bs [12:14] *** odemg has joined #archiveteam-bs [12:26] *** ZexaronS has quit IRC (Leaving) [12:35] *** pizzaiolo has quit IRC (Ping timeout: 245 seconds) [12:40] *** JAA has joined #archiveteam-bs [13:18] hook54321: oh nice, i've been meaning to check on the IA blog, thanks for letting me know [13:20] while it might not be good for them to respect Robots.txt anymore, they should still use them for say the sitemaps and the crawl rates settings [13:23] (leaving for now but if you ping me i'll see it in the logs and respond later) [13:23] *** akaibu has quit IRC () [13:39] *** JAA has quit IRC (Quit: Page closed) [13:46] *** Yoshimura has quit IRC (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client) [14:08] *** brayden_ has joined #archiveteam-bs [14:08] *** swebb sets mode: +o brayden_ [14:14] *** brayden has quit IRC (Read error: Operation timed out) [14:32] *** brayden__ has joined #archiveteam-bs [14:32] *** swebb sets mode: +o brayden__ [14:37] *** brayden_ has quit IRC (Read error: Operation timed out) [14:40] *** JensRex has quit IRC (Remote host closed the connection) [14:41] *** JensRex has joined #archiveteam-bs [15:21] *** Honno_ has joined #archiveteam-bs [15:22] *** odemg has quit IRC (Remote host closed the connection) [15:27] *** Honno has quit IRC (Ping timeout: 370 seconds) [15:30] *** brayden__ is now known as brayden [15:31] in the category of "typos you really shouldn't make on customs declaration forms from China": https://i.imgur.com/Dgy6Voi.jpg [15:34] *** odemg has joined #archiveteam-bs [15:48] *** pizzaiolo has joined #archiveteam-bs [16:56] *** dan_ has joined #archiveteam-bs [17:09] *** brayden_ has joined #archiveteam-bs [17:09] *** swebb sets mode: +o brayden_ [17:13] *** brayden has quit IRC (Read error: Operation timed out) [17:14] *** brayden_ is now known as brayden [17:24] *** dan_ has quit IRC (Quit: Page closed) [17:27] *** odemg has quit IRC (Remote host closed the connection) [17:30] *** Honno_ has quit IRC (Ping timeout: 370 seconds) [17:38] *** odemg has joined #archiveteam-bs [17:38] *** odemg has quit IRC (Connection closed) [18:00] *** pizzaiolo has quit IRC (Read error: Connection reset by peer) [18:00] *** odemg has joined #archiveteam-bs [18:02] *** icedice has joined #archiveteam-bs [18:02] *** pizzaiolo has joined #archiveteam-bs [18:07] *** GE_ has joined #archiveteam-bs [18:07] *** GE_ has quit IRC (Remote host closed the connection) [18:09] *** GE has quit IRC (Ping timeout: 255 seconds) [18:16] *** Aranje has quit IRC (Ping timeout: 245 seconds) [18:34] joepie91: this is my favorite one http://i.imgur.com/oO2XPp1.jpg [18:34] HAHAHA [18:35] ndiddy: I wonder whether they got screwed over [18:35] :p [18:36] in my experience chinese sellers don't give a shit about customs [18:37] my $50 eeprom programmer was a $5 "gift" on the box [18:40] *** godane has joined #archiveteam-bs [18:48] so freenode is having a excess flood [18:48] anyone else having that problem? [18:52] *** odemg has quit IRC (Remote host closed the connection) [19:02] *** Aranje has joined #archiveteam-bs [19:17] godane: some servers are a bit iffy about lots of joins-on-connect, cycle to a different server [19:18] ndiddy: I actually ran across a seller yesterday that stated in their description that they declare the true value [19:18] and if you wanted a fake value you had to send them a message [19:18] they exist! [19:38] *** GE has joined #archiveteam-bs [19:54] *** odemg has joined #archiveteam-bs [20:07] *** antomati_ has joined #archiveteam-bs [20:07] *** swebb sets mode: +o antomati_ [20:09] *** antomatic has quit IRC (Ping timeout: 250 seconds) [20:11] *** antomati_ is now known as antomatic [20:14] *** antomati_ has joined #archiveteam-bs [20:14] *** swebb sets mode: +o antomati_ [20:16] *** antomatic has quit IRC (Read error: Operation timed out) [20:17] i'm starting to upload AngryJoeShow youtube channel: https://archive.org/details/angryjoeshow-youtube-channel-2008-10-04 [20:17] *** antomati_ is now known as antomatic [20:27] *** JAA has joined #archiveteam-bs [20:29] My EpicSki grab phase 1 is going well so far, 430k of 2M done in 7.5 hours, no ban or throttling. :-) [20:41] no warrior script for it? [20:43] JAA: I have little to no experience in doing manual archive jobs, though count me in if I can be of any assistance [20:53] ndiddy: Perhaps you could contribute your further findings on pkware dcl format on http://fileformats.archiveteam.org [20:53] ^ If any [20:58] Odd0002: While that might be a bit faster, I think this is sufficient, especially since I can just bomb the server (Fastly, actually) with requests. Note that there is an ArchiveBot job running as well, so I think it's covered well enough. [20:58] ah [20:59] Now that the deadline has been extended to 12 May, I actually think that ArchiveBot has a very real chance of finishing in time, but I'm running my grab anyway just in case. [21:00] An option of prioritising certain URLs in wpull/ArchiveBot would be extremely useful. ArchiveBot's following external links is a very useful feature, but especially in these emergency grabs, it should really prioritise getting the content in immediate danger ASAP and delay the external resources. [21:01] wpull actually does have a way to specify a priority for a URL internally, but it isn't used anywhere yet. [21:06] Something else that would be awesome is a way to run a recursive grab with wpull in a distributed manner, i.e. from multiple machines/IPs (cf. a discussion about that from a few days ago, on Sat or Sun I think). In the meantime, I found out that wpull has a --database-uri option, which can be used instead of --database to use any kind of DB supported by SQLAlchemy. I'm pretty sure this could be used to set up a distributed recursive grab, b [21:06] ut I have yet to try it out. [21:07] mls: Thanks. For now, I think I got this covered. I'll let you know if more resources are needed. [21:16] *** JAA has quit IRC (Quit: Good night) [21:19] *** Honno_ has joined #archiveteam-bs [21:21] *** GE has quit IRC (Quit: zzz) [21:23] by the way dukgo's xmpp server is going down May 18th, but I don't know if there's anything that can even be grabbed from it [21:26] *** anjacks0n has joined #archiveteam-bs [21:28] *** anjacks0n has quit IRC (Client Quit) [21:30] i'm going after archive.wired.com [21:31] looks like at least i didn't upload it: https://archive.org/search.php?query=subject%3A%22archive.wired.com%22 [21:41] *** JensRex has quit IRC (Remote host closed the connection) [21:41] *** JensRex has joined #archiveteam-bs [21:46] *** Honno_ has quit IRC (Ping timeout: 370 seconds) [22:13] Less than two days left to finish mlkshk project. I've been trying to get someone to manage the tracker for many days now. Honestly, it's getting extremely frustrating at this point. [22:13] It's like I'm the only one who cares about finishing this job. [22:21] the problem is that I think the website is already missing images [22:22] *** odemg has quit IRC (Remote host closed the connection) [22:23] JensRex: if you go to https://mlkshk.com/popular, the images are mostly all missing [22:23] Well, I've had nothing but total radio silence. [22:24] it's been broken for at least a week now, if not more [22:24] I'm not blaming anyone specific. We all have stuff to do IRL, but maybe someone should be around for tracker maintenance. [22:24] Also, we worked around the missing image issue. [22:25] I was in touch with mlkshk admins, and they provided a workaround that arkiver implemented. [22:35] JAA: lost power last night, went to restart the scrape and I think I'm seeing a page depth loop occuring. https://gist.github.com/anonymous/2b3e0f0e1f606f0ba3b4319889f04987 [22:41] wooh [22:41] loops [23:28] *** odemg has joined #archiveteam-bs [23:29] *** BlueMaxim has joined #archiveteam-bs