#archiveteam-bs 2017-04-02,Sun

↑back Search

Time Nickname Message
01:05 🔗 tfgbd_znc has quit IRC (Read error: Connection reset by peer)
01:20 🔗 tfgbd_znc has joined #archiveteam-bs
01:51 🔗 dashcloud has quit IRC (Read error: Operation timed out)
01:56 🔗 dashcloud has joined #archiveteam-bs
03:36 🔗 ndiddy has quit IRC ()
04:10 🔗 godane has quit IRC (hub.efnet.us irc.colosolutions.net)
04:10 🔗 trs80 has quit IRC (hub.efnet.us irc.colosolutions.net)
04:10 🔗 jspiros has quit IRC (hub.efnet.us irc.colosolutions.net)
04:10 🔗 SadDM has quit IRC (hub.efnet.us irc.colosolutions.net)
04:17 🔗 godane has joined #archiveteam-bs
04:17 🔗 jspiros has joined #archiveteam-bs
04:22 🔗 Stilett0 has quit IRC (Read error: Connection reset by peer)
04:22 🔗 SadDM has joined #archiveteam-bs
04:23 🔗 Stilett0 has joined #archiveteam-bs
04:31 🔗 SpaffGarg has quit IRC (Ping timeout: 255 seconds)
04:32 🔗 SpaffGarg has joined #archiveteam-bs
04:36 🔗 SadDM has quit IRC (Read error: Operation timed out)
04:37 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:39 🔗 jspiros has quit IRC (Read error: Operation timed out)
04:43 🔗 Sk1d has joined #archiveteam-bs
04:43 🔗 Zebranky has quit IRC (Ping timeout: 255 seconds)
04:46 🔗 Zebranky has joined #archiveteam-bs
04:46 🔗 jspiros has joined #archiveteam-bs
04:51 🔗 SadDM has joined #archiveteam-bs
06:53 🔗 Ravenloft has quit IRC (Ping timeout: 633 seconds)
06:58 🔗 trs80 has joined #archiveteam-bs
07:07 🔗 schbirid has joined #archiveteam-bs
07:24 🔗 godane SketchCow: we are up to 1993-12-31 with tagesschau 20 clock evening news
07:33 🔗 schbirid has quit IRC (Quit: Leaving)
07:53 🔗 GE has joined #archiveteam-bs
08:17 🔗 vap9r has joined #archiveteam-bs
08:22 🔗 chrono has joined #archiveteam-bs
08:23 🔗 vap9r ay
08:25 🔗 odemg has quit IRC (Remote host closed the connection)
08:31 🔗 chrono 0
08:38 🔗 j08nY has joined #archiveteam-bs
09:01 🔗 schbirid has joined #archiveteam-bs
09:03 🔗 brayden has quit IRC (Read error: Connection reset by peer)
09:03 🔗 brayden has joined #archiveteam-bs
09:06 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
10:16 🔗 JAA has joined #archiveteam-bs
10:20 🔗 jtn2 gmane.org seems unwell despite having allegedly been taken over. http://www.archiveteam.org/index.php?title=Gmane doesn't say much. Did mailing archives get grabbed at all?
10:45 🔗 schbirid2 has joined #archiveteam-bs
10:46 🔗 JAA ranma: If you're using Certbot, you may want to have a look at simp_le (fork by zenhack; the original code by koba is broken and unmaintained). Follows the KISS principle and is more straight-forward to set up. It also doesn't require being run as root and doesn't mess with your webserver configuration etc.
10:49 🔗 schbirid has quit IRC (Read error: Operation timed out)
10:50 🔗 GE has quit IRC (Remote host closed the connection)
11:22 🔗 username1 has joined #archiveteam-bs
11:27 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
11:48 🔗 schbirid2 has joined #archiveteam-bs
11:52 🔗 username1 has quit IRC (Read error: Operation timed out)
11:52 🔗 schbirid has joined #archiveteam-bs
11:54 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
12:04 🔗 arkiver2 has joined #archiveteam-bs
12:05 🔗 arkiver2 has quit IRC (Client Quit)
12:10 🔗 HCross2 jtn2: it was taken over by a bunch of fucks who left it to die
12:16 🔗 GE has joined #archiveteam-bs
12:23 🔗 schbirid has quit IRC (Ping timeout: 255 seconds)
12:37 🔗 schbirid has joined #archiveteam-bs
12:41 🔗 PurpleSym The posts on http://home.gmane.org/ didn’t sound that bad, HCross2
12:43 🔗 PurpleSym DFJustin, Jonimus: Spread the ops.
12:50 🔗 VeganMars has quit IRC (Read error: No route to host)
12:50 🔗 VeganMar- has joined #archiveteam-bs
13:27 🔗 Ravenloft has joined #archiveteam-bs
13:35 🔗 Honno has joined #archiveteam-bs
13:59 🔗 schbirid has quit IRC (Ping timeout: 255 seconds)
14:11 🔗 schbirid has joined #archiveteam-bs
14:13 🔗 odemg has joined #archiveteam-bs
14:25 🔗 odemg has quit IRC (Remote host closed the connection)
14:36 🔗 icedice has joined #archiveteam-bs
15:04 🔗 schbirid has quit IRC (Ping timeout: 255 seconds)
15:16 🔗 schbirid has joined #archiveteam-bs
15:26 🔗 icedice2 has joined #archiveteam-bs
15:33 🔗 icedice has quit IRC (Read error: Operation timed out)
15:49 🔗 schbirid has quit IRC (Ping timeout: 255 seconds)
16:02 🔗 schbirid has joined #archiveteam-bs
16:06 🔗 Somebody2 zPlus: (if you read logs): I heard from somone at IA that it's very likely a known bug on their end, related to the verison of Java they are using and interactions with SSL.
16:07 🔗 Somebody2 They are planning to fix it, and will see if there's a workaround in the meantime.
16:23 🔗 GE has quit IRC (Quit: zzz)
16:23 🔗 GE has joined #archiveteam-bs
16:24 🔗 GE has quit IRC (Client Quit)
16:24 🔗 GE has joined #archiveteam-bs
16:26 🔗 Boppen has quit IRC (Ping timeout: 194 seconds)
16:28 🔗 Boppen has joined #archiveteam-bs
16:41 🔗 fie has quit IRC (Ping timeout: 250 seconds)
16:51 🔗 schbirid has quit IRC (Ping timeout: 255 seconds)
16:59 🔗 icedice2 has quit IRC (Ping timeout: 245 seconds)
17:02 🔗 JAA has quit IRC (Quit: Page closed)
17:04 🔗 schbirid has joined #archiveteam-bs
17:12 🔗 icedice has joined #archiveteam-bs
17:14 🔗 fie has joined #archiveteam-bs
18:02 🔗 icedice Does Scaleway allow users to set root passwords, or is it just SSH keys that are supported?
18:03 🔗 icedice I need a root password in order to be able to log into Ajenti
18:06 🔗 bwn has quit IRC (Read error: Operation timed out)
18:11 🔗 HCross2 just ssh keys
18:11 🔗 HCross2 but once youve logged in wth a key, you can turn the key off an use password
18:11 🔗 leftyfb has joined #archiveteam-bs
18:20 🔗 bwn has joined #archiveteam-bs
18:26 🔗 schbirid has quit IRC (Ping timeout: 255 seconds)
18:33 🔗 joepie91 anything that requires you to enable password authentication seems like a poor choice of software to run on a server :)
18:33 🔗 joepie91 suggests they don't care about security very much...
18:33 🔗 icedice has quit IRC (Ping timeout: 245 seconds)
18:34 🔗 Kaz --autologin - Will automatically log in the user under which the panel runs. This is a security issue if your system is public.
18:35 🔗 Kaz from http://docs.ajenti.org/en/latest/man/run.html - yeah I don't think they're bothered about security if they give you an option like that
18:38 🔗 schbirid has joined #archiveteam-bs
19:01 🔗 Simpbrain has quit IRC (Read error: Operation timed out)
19:01 🔗 Simpbrain has joined #archiveteam-bs
19:13 🔗 Jonimus sets mode: +o xmc
19:17 🔗 Jonimus sets mode: +o SketchCow
19:17 🔗 Jonimus sets mode: +o joepie91
19:36 🔗 akaibu has joined #archiveteam-bs
19:41 🔗 akaibu Should we get in contact with Tom Fulp(founder/owner of newgrounds) and see if we could set something up so we can archive them? They are a major part of internet history and basically the springboard of today's animators
19:42 🔗 j08nY has quit IRC (Remote host closed the connection)
19:44 🔗 rocode akaibu, huge site, we can watch it, but doing an archive if they aren't going down anytime soon would be a waste of resources.
19:44 🔗 akaibu True
19:45 🔗 akaibu But maybe Jason can talk to archive.org and set something up directly between newgrounds and archive.org
19:47 🔗 akaibu Just a thought
19:50 🔗 ndiddy has joined #archiveteam-bs
20:23 🔗 GE has quit IRC (Remote host closed the connection)
20:56 🔗 schbirid <rocode> akaibu, huge site, we can watch it, but doing an archive if they aren't going down anytime soon would be a waste of resources.
20:57 🔗 schbirid i disagree vehemently
20:57 🔗 schbirid a site like newgrounds would be a perfect thing to archive
20:57 🔗 schbirid right now
20:57 🔗 schbirid without pressure
20:57 🔗 schbirid because it is important
21:00 🔗 rocode schbirid, I may have worded that a little strongly. I have no problem with us archiving it. But we do have projects that are 100% going down within the week that should take priority.
21:07 🔗 pnJay like mlkshk \o/
21:10 🔗 akaibu rocode: that's fine, we should prioritize projects that are on a time limit first then when time allows we work on stuff like newgrounds
21:12 🔗 ndiddy has quit IRC ()
21:13 🔗 akaibu Or small sites such as https://vbox7.com which is a somewhat small foreign Video sharing site that has quite a few YouTube rejections, I think maybe it could be a one person job but I haven't done any real work on it
21:14 🔗 akaibu Don't really know how stable the site is financially or other wise so it might be good to grab considering how old it is (mid-late 00's
21:21 🔗 schbirid has quit IRC (Quit: Leaving)
21:34 🔗 iSO has joined #archiveteam-bs
21:37 🔗 JAA has joined #archiveteam-bs
21:38 🔗 iSO (On a discussion on unaccessible URLs - talking on Appshopper): Half the time when you come across an app page, it's either 1) Listed, 2) Delisted, or 3) Listed, but upon clicking a link that sends you to their digital store, actually delisted.
21:39 🔗 iSO Fortunately, all the delisted app pages are still in Appshopper, they're just not accessible at all unless you used that URL (or made a bunch of guesses).
21:40 🔗 iSO On the flipside, the images older app pages are not rendering properly.
21:41 🔗 iSO *images on
21:43 🔗 GE has joined #archiveteam-bs
21:45 🔗 Honno has quit IRC (Ping timeout: 370 seconds)
21:46 🔗 iSO Example: http://appshopper.com/news/mess
21:46 🔗 iSO There are supposed to be 3 images, but there are not there.
21:48 🔗 JAA My WunderBlogs grab is at about 6k errors left, mostly external resources on dead services but also including about 20 actual blog pages which failed previously for some reason. I'm not sure I can prioritise those in any way, so I guess I'll just have to hope wpull gets to them in time.
21:49 🔗 JAA My concurrency settings aren't working properly unfortunately - I set it to 10 threads, but it only processes 1-2 URLs at once after the first 10 URLs - meaning it's very slow (about 400 URLs per hour).
21:50 🔗 iSO Appshopper appears to fetch the images directly from their digital store's app page (but not copy and hosted in their own servers), as right-clicking the broken images have a different (older) URL to link images - "http://a1.phobos.apple.com" or "http://a1.phobos.apple.com/us/r1000/038/Purple/5f/04/81/mzl.vjhdzyui.320x480-75.jpg" (the first image).
21:51 🔗 iSO The current image URL link is: https://s2.mzstatic.com/us/r30/Purple111/v4/f8/31/4b/f8314b20-e46f-3dbb-56a6-1b8c687dbc42/screen696x696.jpeg (first image) from http://appshopper.com/games/crossbar-challenge-17
22:01 🔗 iSO What's more interesting on how Appshopper app pages are archived on their website is when clicking on app pages, the URL has a category, followed by a name (I'll talk this particular part later). The result is: http://appshopper.com/reference/the-night-sky
22:01 🔗 iSO Funny thing is that you can freely swap the category names! http://appshopper.com/games/the-night-sky
22:02 🔗 iSO The app page doesn't take that altered category name into account that as to the right of the app page, it still says "Reference".
22:02 🔗 iSO Fortunately, it only works on valid category names, not http://appshopper.com/ujhgvhjhg/the-night-sky - as it will give you "AppShopper.com: Page Not Found".
22:18 🔗 iSO On ID naming scheme: When an app page is archived on Appshopper, to determine the page ID, Appshopper will scan the App name at the time it was found/released and apply it as a name. This link is an example of the ID name and the display name matched (a quirk is that it ignores capital letters): http://appshopper.com/games/gems
22:24 🔗 godane SketchCow: i got +63gb of local news archive
22:24 🔗 godane from a youtube channel called NewsActive3
22:25 🔗 iSO However, if there's another app named "Gems", or something extremely similar (as it ignores anything that's not alphabet letters, numbers, languages in a different language [that is a LOT more harder to manually dig around if you don't know the language well!], and bizzare character letters like the Roman number 2 and not just "II" [arguably even harder! Would probably need to automate to find very weird ID names!]), it adds a number.
22:26 🔗 iSO All 1,248 videos of it godane?
22:26 🔗 iSO Example of ID names w/ added number: http://appshopper.com/games/gems-2
22:27 🔗 iSO The ID name is permanent (which is a good thing for archiving)!
22:28 🔗 iSO It doesn't change it all when the app's display name is updated.
22:28 🔗 godane iSO: i'm still grabbing it
22:28 🔗 godane that size is about 500 videos
22:29 🔗 iSO Ah~
22:29 🔗 godane 516 to be exact so far
22:29 🔗 godane i will put up to FOS cause its very big
22:30 🔗 iSO If that app is submitted with that current name on release (and not as a changed name), the ID name would've been "the-gems-fever-and-the-miners-path-panic-retrogaming-8-bits-pixel-art" (ignores " ' ", and "!", replaces " " with "-")
22:34 🔗 iSO Due to the way they ID w/ numbers attached via archiving, you can easily change the # to find more entries: http://appshopper.com/games/gems-3 http://appshopper.com/games/gems-4 http://appshopper.com/games/gems-5 http://appshopper.com/games/gems-6 http://appshopper.com/games/gems-7 http://appshopper.com/games/gems-8 (This one is under Lifestyle, not Games)
22:34 🔗 iSO There is currently no http://appshopper.com/games/gems-9
22:40 🔗 iSO Mentioning all this technical stuff about Appshopper app pages, a project idea I'm having in mind for months is a perpetual (forever-ongoing, similar to URLTeam) project on apps and it's constant updates (and maybe price changes [unsure how this part would be handled because the only info altered are just that, price changes]).
22:43 🔗 iSO Appshopper is the only website that fetches and archives iOS app entry data and is accessible to anyone, not behind a paywall.
22:43 🔗 iSO There's /so/ much crap to go through, particularly in the Games section! So, sooooo much.
22:44 🔗 iSO But there are cool hidden gems to be found~
22:55 🔗 JAA WunderBlogs is terrific at handling user input. In addition to the eternal .../www.facebook.com/username/www.facebook.com/username/... et al. loops I mentioned a few days ago, I just discovered a URL of over 36k characters containing what looks like the entire HTML markup for a blog post.
22:56 🔗 JAA I wouldn't be surprised if there were tons of opportunities for stored XSS attacks on that site.
23:00 🔗 zino has joined #archiveteam-bs
23:06 🔗 iSO Attempts to stop/hinder archiving is the answer.
23:07 🔗 iSO Like hindering archiving YouTube comments, there used to be an "All Comments" page for each YouTube video, and they yanked it off not long ago.
23:08 🔗 iSO Had to use archive.is - but even then it's partially archived.
23:11 🔗 JAA I doubt it. If they wanted to hinder archiving, I'm sure they'd implement some sort of rate limiting. I've been going at it with 10 threads for days without any issues.
23:12 🔗 iSO Appshopper seems to have a limiter, when I opened 15-20 links at once there.
23:12 🔗 JAA The loop is probably a result of users entering "www.facebook.com/username" into the relevant form field instead of "https://www..."; the 36k-character URL probably just shitty parsing.
23:12 🔗 iSO After those links were processed, new links you open up just hang temporarily.
23:14 🔗 iSO I'm not sure what else I can do on my proposed project besides more technical info from experience using the Appshopper website, I can only manually archive pages w/ Archive.org.
23:15 🔗 iSO Even with an extension that lets me open multiple links at once.
23:16 🔗 BlueMaxim has joined #archiveteam-bs
23:16 🔗 JAA Any idea how large it is?
23:16 🔗 iSO Appshopper?
23:16 🔗 JAA Yeah
23:16 🔗 iSO Unsure~
23:17 🔗 iSO I guess let's take the total apps approved, which is "3,660,443", and multiply that by 10x (very very generous).
23:17 🔗 iSO I say between 2x to 5x.
23:17 🔗 iSO If wanting to find those dumb unaccessible app pages.
23:18 🔗 iSO Perpetually more if apps update.
23:19 🔗 iSO As far as I know, there's no way to access previous versions of the app pages at all.
23:22 🔗 iSO I want to because I wanna know how apps and their damn app descriptions/images have evolved since then.
23:25 🔗 Aranje has joined #archiveteam-bs
23:25 🔗 iSO It didn't used to have artwork overlayed with gameplay screenshots, or even visual text!
23:27 🔗 JAA I think your best bet for that is checking the IA Wayback Machine for both Appshopper and the iTunes "preview" pages. Not sure if the latter include images though.
23:30 🔗 iSO I tried checking it starting with Angry Birds, coverage on images appears limited.
23:30 🔗 iSO It starts getting better in later entries.
23:31 🔗 iSO But it's still random.
23:33 🔗 JAA Yeah, that's often the case, especially for pages which weren't archived through a browser.
23:33 🔗 iSO I have other archive project ideas, but that probably requires specialized tools or something that doesn't use Archive Warrior.
23:34 🔗 iSO I know there were attempts that when archiving pages, 1 or 2 images/GIFs are not archived, but sometimes it's just all of them.
23:35 🔗 JAA Regarding the future, it shouldn't be too difficult to write a script which scrapes http://appshopper.com/all/ and archives the corresponding pages continuously (directly, not through the Wayback Machine).
23:35 🔗 iSO That has a page limit of 500.
23:36 🔗 iSO Can be narrowed further with only new http://appshopper.com/all/new
23:36 🔗 iSO Or price changes: http://appshopper.com/all/prices
23:37 🔗 iSO Thing is, those types of pages, that attempt to emcompass all the pages they archived, that's only partial.
23:37 🔗 JAA Yeah, but if you do it continuously, that's not really a problem. You'd just grab it often enough.
23:38 🔗 iSO You have to grab links from the categories, that's where truly all of the recorded entries of that specific entry are there.
23:38 🔗 JAA Yeah, the initial grab would need to work differently. I was just talking about keeping up with updates.
23:38 🔗 iSO That's also because there's a delay on how long it'll appear on Appshopper after the app shows up in the digital store.
23:39 🔗 iSO Ah~
23:39 🔗 JAA Also, I can definitely go further than 500 on /all. http://appshopper.com/all/50 for example.
23:39 🔗 JAA (Which should show items 981 to 1000)
23:40 🔗 JAA Ah wait, you meant 500 pages. Nevermind.
23:40 🔗 iSO Yes, that's the max sadly.
23:40 🔗 iSO Past that pt, and it's considered gone forever, unless there's a price change, or an update.
23:41 🔗 JAA As of right now, the 10000th entry (last on page 500) was updated 3 days ago. So fetching /all every day would easily be enough.
23:41 🔗 iSO Unless you search it on their website.
23:41 🔗 iSO If to fetch all, there's going to be constant duplicates, and that's a waste of time and energy.
23:42 🔗 iSO I would recommend doing new if to find the new ones.
23:42 🔗 iSO "All" emcompasses new apps, updates, and price changes (doesn't show delistings at all).
23:43 🔗 JAA Oh. Yeah, I guess delistings would be a bit more complicated
23:43 🔗 JAA Maybe a periodic grab of the entire site would be easier then.
23:43 🔗 iSO When searching in Appshopper with keywords, it can only grab up to 50 pages.
23:43 🔗 JAA Although that might obviously miss some updates etc.
23:43 🔗 iSO Every 2 days I would be good.
23:43 🔗 iSO *it would
23:44 🔗 iSO Of course, in my POV, I'm talking about the Games category.
23:44 🔗 JAA You mean every app page every two days?
23:44 🔗 iSO Uh, I was thinking when checking the app list.
23:45 🔗 iSO Every app? That would be overkill~
23:45 🔗 iSO Well, then again, there are actually updates that are 1 day apart.
23:45 🔗 iSO I don't remember if it applies on the same day.
23:46 🔗 JAA Yeah, the website isn't very verbose about how the whole thing is working.
23:46 🔗 JAA I also wasn't able to find any ToS.
23:47 🔗 iSO Not even at the bottom of the page?
23:47 🔗 iSO Ugh, I still hate this new layout Appshopper put out, it's slower.
23:47 🔗 iSO Slower for me to shift info.
23:48 🔗 iSO I have to manually click "Show More" to see the full description, and I have to wait for the animation to scroll in-between images.
23:48 🔗 JAA Nope, very minimal "about" page, a contact link, some links for copyright owners wanting to whine, but no ToS.
23:48 🔗 iSO Just damn slow when comparing to the previous layout.
23:49 🔗 bwn iSO: this site doesn't seem particularly large, you could possibly run it through grab-site or wpull and get a good initial grab?
23:49 🔗 iSO Wait, 3 million pages isn't considered large?
23:49 🔗 JAA It would probably need a nudge or two because the category pages are hidden behind a <select>.
23:50 🔗 JAA Well, I'd say it's large but not huge.
23:50 🔗 iSO Ah.
23:52 🔗 bwn JAA: hm, grab-site sees the category pages
23:53 🔗 iSO Sigh, all the crappy iOS games, not the F2P ones, that's another league of questionable, it's those games that just change the graphics and change nothing else type of deals.
23:53 🔗 bwn iSO: both grab-site and wpull will take lists of urls for your delisted apps, you'd just need to assemble them
23:54 🔗 iSO Another monkey wrench on delisted apps is that there are some of them that do come back, but I'm not sure if that's recorded by Appshopper (unless an update counts).
23:54 🔗 iSO Coming back in the same ID name.
23:55 🔗 iSO Or a price change counts as well.
23:55 🔗 JAA bwn: Interesting. Any clue how it finds them? I don't see a link anywhere.
23:56 🔗 iSO It's probably fetching directly from the digital store website.
23:56 🔗 iSO Hang on,
23:56 🔗 iSO I think I recall a full list on the Apple App Store.
23:58 🔗 iSO I know it's there, I came across it accidentally a couple of times.

irclogger-viewer