[01:05] *** tfgbd_znc has quit IRC (Read error: Connection reset by peer) [01:20] *** tfgbd_znc has joined #archiveteam-bs [01:51] *** dashcloud has quit IRC (Read error: Operation timed out) [01:56] *** dashcloud has joined #archiveteam-bs [03:36] *** ndiddy has quit IRC () [04:10] *** godane has quit IRC (hub.efnet.us irc.colosolutions.net) [04:10] *** trs80 has quit IRC (hub.efnet.us irc.colosolutions.net) [04:10] *** jspiros has quit IRC (hub.efnet.us irc.colosolutions.net) [04:10] *** SadDM has quit IRC (hub.efnet.us irc.colosolutions.net) [04:17] *** godane has joined #archiveteam-bs [04:17] *** jspiros has joined #archiveteam-bs [04:22] *** Stilett0 has quit IRC (Read error: Connection reset by peer) [04:22] *** SadDM has joined #archiveteam-bs [04:23] *** Stilett0 has joined #archiveteam-bs [04:31] *** SpaffGarg has quit IRC (Ping timeout: 255 seconds) [04:32] *** SpaffGarg has joined #archiveteam-bs [04:36] *** SadDM has quit IRC (Read error: Operation timed out) [04:37] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:39] *** jspiros has quit IRC (Read error: Operation timed out) [04:43] *** Sk1d has joined #archiveteam-bs [04:43] *** Zebranky has quit IRC (Ping timeout: 255 seconds) [04:46] *** Zebranky has joined #archiveteam-bs [04:46] *** jspiros has joined #archiveteam-bs [04:51] *** SadDM has joined #archiveteam-bs [06:53] *** Ravenloft has quit IRC (Ping timeout: 633 seconds) [06:58] *** trs80 has joined #archiveteam-bs [07:07] *** schbirid has joined #archiveteam-bs [07:24] SketchCow: we are up to 1993-12-31 with tagesschau 20 clock evening news [07:33] *** schbirid has quit IRC (Quit: Leaving) [07:53] *** GE has joined #archiveteam-bs [08:17] *** vap9r has joined #archiveteam-bs [08:22] *** chrono has joined #archiveteam-bs [08:23] ay [08:25] *** odemg has quit IRC (Remote host closed the connection) [08:31] 0 [08:38] *** j08nY has joined #archiveteam-bs [09:01] *** schbirid has joined #archiveteam-bs [09:03] *** brayden has quit IRC (Read error: Connection reset by peer) [09:03] *** brayden has joined #archiveteam-bs [09:06] *** BlueMaxim has quit IRC (Read error: Operation timed out) [10:16] *** JAA has joined #archiveteam-bs [10:20] gmane.org seems unwell despite having allegedly been taken over. http://www.archiveteam.org/index.php?title=Gmane doesn't say much. Did mailing archives get grabbed at all? [10:45] *** schbirid2 has joined #archiveteam-bs [10:46] ranma: If you're using Certbot, you may want to have a look at simp_le (fork by zenhack; the original code by koba is broken and unmaintained). Follows the KISS principle and is more straight-forward to set up. It also doesn't require being run as root and doesn't mess with your webserver configuration etc. [10:49] *** schbirid has quit IRC (Read error: Operation timed out) [10:50] *** GE has quit IRC (Remote host closed the connection) [11:22] *** username1 has joined #archiveteam-bs [11:27] *** schbirid2 has quit IRC (Read error: Operation timed out) [11:48] *** schbirid2 has joined #archiveteam-bs [11:52] *** username1 has quit IRC (Read error: Operation timed out) [11:52] *** schbirid has joined #archiveteam-bs [11:54] *** schbirid2 has quit IRC (Read error: Operation timed out) [12:04] *** arkiver2 has joined #archiveteam-bs [12:05] *** arkiver2 has quit IRC (Client Quit) [12:10] jtn2: it was taken over by a bunch of fucks who left it to die [12:16] *** GE has joined #archiveteam-bs [12:23] *** schbirid has quit IRC (Ping timeout: 255 seconds) [12:37] *** schbirid has joined #archiveteam-bs [12:41] The posts on http://home.gmane.org/ didn’t sound that bad, HCross2 [12:43] DFJustin, Jonimus: Spread the ops. [12:50] *** VeganMars has quit IRC (Read error: No route to host) [12:50] *** VeganMar- has joined #archiveteam-bs [13:27] *** Ravenloft has joined #archiveteam-bs [13:35] *** Honno has joined #archiveteam-bs [13:59] *** schbirid has quit IRC (Ping timeout: 255 seconds) [14:11] *** schbirid has joined #archiveteam-bs [14:13] *** odemg has joined #archiveteam-bs [14:25] *** odemg has quit IRC (Remote host closed the connection) [14:36] *** icedice has joined #archiveteam-bs [15:04] *** schbirid has quit IRC (Ping timeout: 255 seconds) [15:16] *** schbirid has joined #archiveteam-bs [15:26] *** icedice2 has joined #archiveteam-bs [15:33] *** icedice has quit IRC (Read error: Operation timed out) [15:49] *** schbirid has quit IRC (Ping timeout: 255 seconds) [16:02] *** schbirid has joined #archiveteam-bs [16:06] zPlus: (if you read logs): I heard from somone at IA that it's very likely a known bug on their end, related to the verison of Java they are using and interactions with SSL. [16:07] They are planning to fix it, and will see if there's a workaround in the meantime. [16:23] *** GE has quit IRC (Quit: zzz) [16:23] *** GE has joined #archiveteam-bs [16:24] *** GE has quit IRC (Client Quit) [16:24] *** GE has joined #archiveteam-bs [16:26] *** Boppen has quit IRC (Ping timeout: 194 seconds) [16:28] *** Boppen has joined #archiveteam-bs [16:41] *** fie has quit IRC (Ping timeout: 250 seconds) [16:51] *** schbirid has quit IRC (Ping timeout: 255 seconds) [16:59] *** icedice2 has quit IRC (Ping timeout: 245 seconds) [17:02] *** JAA has quit IRC (Quit: Page closed) [17:04] *** schbirid has joined #archiveteam-bs [17:12] *** icedice has joined #archiveteam-bs [17:14] *** fie has joined #archiveteam-bs [18:02] Does Scaleway allow users to set root passwords, or is it just SSH keys that are supported? [18:03] I need a root password in order to be able to log into Ajenti [18:06] *** bwn has quit IRC (Read error: Operation timed out) [18:11] just ssh keys [18:11] but once youve logged in wth a key, you can turn the key off an use password [18:11] *** leftyfb has joined #archiveteam-bs [18:20] *** bwn has joined #archiveteam-bs [18:26] *** schbirid has quit IRC (Ping timeout: 255 seconds) [18:33] anything that requires you to enable password authentication seems like a poor choice of software to run on a server :) [18:33] suggests they don't care about security very much... [18:33] *** icedice has quit IRC (Ping timeout: 245 seconds) [18:34] --autologin - Will automatically log in the user under which the panel runs. This is a security issue if your system is public. [18:35] from http://docs.ajenti.org/en/latest/man/run.html - yeah I don't think they're bothered about security if they give you an option like that [18:38] *** schbirid has joined #archiveteam-bs [19:01] *** Simpbrain has quit IRC (Read error: Operation timed out) [19:01] *** Simpbrain has joined #archiveteam-bs [19:13] *** Jonimus sets mode: +o xmc [19:17] *** Jonimus sets mode: +o SketchCow [19:17] *** Jonimus sets mode: +o joepie91 [19:36] *** akaibu has joined #archiveteam-bs [19:41] Should we get in contact with Tom Fulp(founder/owner of newgrounds) and see if we could set something up so we can archive them? They are a major part of internet history and basically the springboard of today's animators [19:42] *** j08nY has quit IRC (Remote host closed the connection) [19:44] akaibu, huge site, we can watch it, but doing an archive if they aren't going down anytime soon would be a waste of resources. [19:44] True [19:45] But maybe Jason can talk to archive.org and set something up directly between newgrounds and archive.org [19:47] Just a thought [19:50] *** ndiddy has joined #archiveteam-bs [20:23] *** GE has quit IRC (Remote host closed the connection) [20:56] akaibu, huge site, we can watch it, but doing an archive if they aren't going down anytime soon would be a waste of resources. [20:57] i disagree vehemently [20:57] a site like newgrounds would be a perfect thing to archive [20:57] right now [20:57] without pressure [20:57] because it is important [21:00] schbirid, I may have worded that a little strongly. I have no problem with us archiving it. But we do have projects that are 100% going down within the week that should take priority. [21:07] like mlkshk \o/ [21:10] rocode: that's fine, we should prioritize projects that are on a time limit first then when time allows we work on stuff like newgrounds [21:12] *** ndiddy has quit IRC () [21:13] Or small sites such as https://vbox7.com which is a somewhat small foreign Video sharing site that has quite a few YouTube rejections, I think maybe it could be a one person job but I haven't done any real work on it [21:14] Don't really know how stable the site is financially or other wise so it might be good to grab considering how old it is (mid-late 00's [21:21] *** schbirid has quit IRC (Quit: Leaving) [21:34] *** iSO has joined #archiveteam-bs [21:37] *** JAA has joined #archiveteam-bs [21:38] (On a discussion on unaccessible URLs - talking on Appshopper): Half the time when you come across an app page, it's either 1) Listed, 2) Delisted, or 3) Listed, but upon clicking a link that sends you to their digital store, actually delisted. [21:39] Fortunately, all the delisted app pages are still in Appshopper, they're just not accessible at all unless you used that URL (or made a bunch of guesses). [21:40] On the flipside, the images older app pages are not rendering properly. [21:41] *images on [21:43] *** GE has joined #archiveteam-bs [21:45] *** Honno has quit IRC (Ping timeout: 370 seconds) [21:46] Example: http://appshopper.com/news/mess [21:46] There are supposed to be 3 images, but there are not there. [21:48] My WunderBlogs grab is at about 6k errors left, mostly external resources on dead services but also including about 20 actual blog pages which failed previously for some reason. I'm not sure I can prioritise those in any way, so I guess I'll just have to hope wpull gets to them in time. [21:49] My concurrency settings aren't working properly unfortunately - I set it to 10 threads, but it only processes 1-2 URLs at once after the first 10 URLs - meaning it's very slow (about 400 URLs per hour). [21:50] Appshopper appears to fetch the images directly from their digital store's app page (but not copy and hosted in their own servers), as right-clicking the broken images have a different (older) URL to link images - "http://a1.phobos.apple.com" or "http://a1.phobos.apple.com/us/r1000/038/Purple/5f/04/81/mzl.vjhdzyui.320x480-75.jpg" (the first image). [21:51] The current image URL link is: https://s2.mzstatic.com/us/r30/Purple111/v4/f8/31/4b/f8314b20-e46f-3dbb-56a6-1b8c687dbc42/screen696x696.jpeg (first image) from http://appshopper.com/games/crossbar-challenge-17 [22:01] What's more interesting on how Appshopper app pages are archived on their website is when clicking on app pages, the URL has a category, followed by a name (I'll talk this particular part later). The result is: http://appshopper.com/reference/the-night-sky [22:01] Funny thing is that you can freely swap the category names! http://appshopper.com/games/the-night-sky [22:02] The app page doesn't take that altered category name into account that as to the right of the app page, it still says "Reference". [22:02] Fortunately, it only works on valid category names, not http://appshopper.com/ujhgvhjhg/the-night-sky - as it will give you "AppShopper.com: Page Not Found". [22:18] On ID naming scheme: When an app page is archived on Appshopper, to determine the page ID, Appshopper will scan the App name at the time it was found/released and apply it as a name. This link is an example of the ID name and the display name matched (a quirk is that it ignores capital letters): http://appshopper.com/games/gems [22:24] SketchCow: i got +63gb of local news archive [22:24] from a youtube channel called NewsActive3 [22:25] However, if there's another app named "Gems", or something extremely similar (as it ignores anything that's not alphabet letters, numbers, languages in a different language [that is a LOT more harder to manually dig around if you don't know the language well!], and bizzare character letters like the Roman number 2 and not just "II" [arguably even harder! Would probably need to automate to find very weird ID names!]), it adds a number. [22:26] All 1,248 videos of it godane? [22:26] Example of ID names w/ added number: http://appshopper.com/games/gems-2 [22:27] The ID name is permanent (which is a good thing for archiving)! [22:28] It doesn't change it all when the app's display name is updated. [22:28] iSO: i'm still grabbing it [22:28] that size is about 500 videos [22:29] Ah~ [22:29] 516 to be exact so far [22:29] i will put up to FOS cause its very big [22:30] If that app is submitted with that current name on release (and not as a changed name), the ID name would've been "the-gems-fever-and-the-miners-path-panic-retrogaming-8-bits-pixel-art" (ignores " ' ", and "!", replaces " " with "-") [22:34] Due to the way they ID w/ numbers attached via archiving, you can easily change the # to find more entries: http://appshopper.com/games/gems-3 http://appshopper.com/games/gems-4 http://appshopper.com/games/gems-5 http://appshopper.com/games/gems-6 http://appshopper.com/games/gems-7 http://appshopper.com/games/gems-8 (This one is under Lifestyle, not Games) [22:34] There is currently no http://appshopper.com/games/gems-9 [22:40] Mentioning all this technical stuff about Appshopper app pages, a project idea I'm having in mind for months is a perpetual (forever-ongoing, similar to URLTeam) project on apps and it's constant updates (and maybe price changes [unsure how this part would be handled because the only info altered are just that, price changes]). [22:43] Appshopper is the only website that fetches and archives iOS app entry data and is accessible to anyone, not behind a paywall. [22:43] There's /so/ much crap to go through, particularly in the Games section! So, sooooo much. [22:44] But there are cool hidden gems to be found~ [22:55] WunderBlogs is terrific at handling user input. In addition to the eternal .../www.facebook.com/username/www.facebook.com/username/... et al. loops I mentioned a few days ago, I just discovered a URL of over 36k characters containing what looks like the entire HTML markup for a blog post. [22:56] I wouldn't be surprised if there were tons of opportunities for stored XSS attacks on that site. [23:00] *** zino has joined #archiveteam-bs [23:06] Attempts to stop/hinder archiving is the answer. [23:07] Like hindering archiving YouTube comments, there used to be an "All Comments" page for each YouTube video, and they yanked it off not long ago. [23:08] Had to use archive.is - but even then it's partially archived. [23:11] I doubt it. If they wanted to hinder archiving, I'm sure they'd implement some sort of rate limiting. I've been going at it with 10 threads for days without any issues. [23:12] Appshopper seems to have a limiter, when I opened 15-20 links at once there. [23:12] The loop is probably a result of users entering "www.facebook.com/username" into the relevant form field instead of "https://www..."; the 36k-character URL probably just shitty parsing. [23:12] After those links were processed, new links you open up just hang temporarily. [23:14] I'm not sure what else I can do on my proposed project besides more technical info from experience using the Appshopper website, I can only manually archive pages w/ Archive.org. [23:15] Even with an extension that lets me open multiple links at once. [23:16] *** BlueMaxim has joined #archiveteam-bs [23:16] Any idea how large it is? [23:16] Appshopper? [23:16] Yeah [23:16] Unsure~ [23:17] I guess let's take the total apps approved, which is "3,660,443", and multiply that by 10x (very very generous). [23:17] I say between 2x to 5x. [23:17] If wanting to find those dumb unaccessible app pages. [23:18] Perpetually more if apps update. [23:19] As far as I know, there's no way to access previous versions of the app pages at all. [23:22] I want to because I wanna know how apps and their damn app descriptions/images have evolved since then. [23:25] *** Aranje has joined #archiveteam-bs [23:25] It didn't used to have artwork overlayed with gameplay screenshots, or even visual text! [23:27] I think your best bet for that is checking the IA Wayback Machine for both Appshopper and the iTunes "preview" pages. Not sure if the latter include images though. [23:30] I tried checking it starting with Angry Birds, coverage on images appears limited. [23:30] It starts getting better in later entries. [23:31] But it's still random. [23:33] Yeah, that's often the case, especially for pages which weren't archived through a browser. [23:33] I have other archive project ideas, but that probably requires specialized tools or something that doesn't use Archive Warrior. [23:34] I know there were attempts that when archiving pages, 1 or 2 images/GIFs are not archived, but sometimes it's just all of them. [23:35] Regarding the future, it shouldn't be too difficult to write a script which scrapes http://appshopper.com/all/ and archives the corresponding pages continuously (directly, not through the Wayback Machine). [23:35] That has a page limit of 500. [23:36] Can be narrowed further with only new http://appshopper.com/all/new [23:36] Or price changes: http://appshopper.com/all/prices [23:37] Thing is, those types of pages, that attempt to emcompass all the pages they archived, that's only partial. [23:37] Yeah, but if you do it continuously, that's not really a problem. You'd just grab it often enough. [23:38] You have to grab links from the categories, that's where truly all of the recorded entries of that specific entry are there. [23:38] Yeah, the initial grab would need to work differently. I was just talking about keeping up with updates. [23:38] That's also because there's a delay on how long it'll appear on Appshopper after the app shows up in the digital store. [23:39] Ah~ [23:39] Also, I can definitely go further than 500 on /all. http://appshopper.com/all/50 for example. [23:39] (Which should show items 981 to 1000) [23:40] Ah wait, you meant 500 pages. Nevermind. [23:40] Yes, that's the max sadly. [23:40] Past that pt, and it's considered gone forever, unless there's a price change, or an update. [23:41] As of right now, the 10000th entry (last on page 500) was updated 3 days ago. So fetching /all every day would easily be enough. [23:41] Unless you search it on their website. [23:41] If to fetch all, there's going to be constant duplicates, and that's a waste of time and energy. [23:42] I would recommend doing new if to find the new ones. [23:42] "All" emcompasses new apps, updates, and price changes (doesn't show delistings at all). [23:43] Oh. Yeah, I guess delistings would be a bit more complicated [23:43] Maybe a periodic grab of the entire site would be easier then. [23:43] When searching in Appshopper with keywords, it can only grab up to 50 pages. [23:43] Although that might obviously miss some updates etc. [23:43] Every 2 days I would be good. [23:43] *it would [23:44] Of course, in my POV, I'm talking about the Games category. [23:44] You mean every app page every two days? [23:44] Uh, I was thinking when checking the app list. [23:45] Every app? That would be overkill~ [23:45] Well, then again, there are actually updates that are 1 day apart. [23:45] I don't remember if it applies on the same day. [23:46] Yeah, the website isn't very verbose about how the whole thing is working. [23:46] I also wasn't able to find any ToS. [23:47] Not even at the bottom of the page? [23:47] Ugh, I still hate this new layout Appshopper put out, it's slower. [23:47] Slower for me to shift info. [23:48] I have to manually click "Show More" to see the full description, and I have to wait for the animation to scroll in-between images. [23:48] Nope, very minimal "about" page, a contact link, some links for copyright owners wanting to whine, but no ToS. [23:48] Just damn slow when comparing to the previous layout. [23:49] iSO: this site doesn't seem particularly large, you could possibly run it through grab-site or wpull and get a good initial grab? [23:49] Wait, 3 million pages isn't considered large? [23:49] It would probably need a nudge or two because the category pages are hidden behind a