#archiveteam-bs 2020-02-27,Thu

↑back Search

Time Nickname Message
00:03 🔗 BlueMax has joined #archiveteam-bs
00:16 🔗 RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
00:20 🔗 RichardG has joined #archiveteam-bs
01:17 🔗 alex73__ has joined #archiveteam-bs
01:29 🔗 JAA SketchCow: Any rough idea when FOS might come back? More like hours or more like days? (Or in other words, do I need to try to find another temporary solution for ArchiveBot or not?)
01:30 🔗 Craigle has quit IRC (se.hub efnet.portlane.se)
01:30 🔗 alex73_ has quit IRC (se.hub efnet.portlane.se)
01:30 🔗 sHATNER has quit IRC (se.hub efnet.portlane.se)
01:30 🔗 Laverne has quit IRC (se.hub efnet.portlane.se)
01:30 🔗 Gfy has quit IRC (se.hub efnet.portlane.se)
01:30 🔗 brayden has quit IRC (se.hub efnet.portlane.se)
01:42 🔗 SketchCow Eeerrp
01:42 🔗 sHATNER_ has joined #archiveteam-bs
01:42 🔗 SketchCow No idea, someone is going to the datacenter now
01:46 🔗 JAA Right, figured as much. Ah well, we'll see.
01:49 🔗 SketchCow This is 30 machines
01:49 🔗 SketchCow I guarantee there's big incentive to get them back up
01:51 🔗 JAA Eww, yeah. I guess FOS isn't at the top of the priority list though.
01:51 🔗 SketchCow If one goes up they'll all go up, it's a rack.
01:51 🔗 JAA Ah, ok.
01:52 🔗 SketchCow People made all this guffy noise about making FOS a rendundant component in a network of Archivebot uploaders, what happened there.
01:52 🔗 JAA Yep, that's in progress.
01:56 🔗 icedice2 has joined #archiveteam-bs
01:56 🔗 paul2520 has quit IRC (Read error: Operation timed out)
01:56 🔗 Wingy has quit IRC (Read error: Operation timed out)
01:56 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
01:56 🔗 fuzzy802 has joined #archiveteam-bs
01:56 🔗 alex73_ has joined #archiveteam-bs
01:57 🔗 twigfoot has quit IRC (Read error: Operation timed out)
01:57 🔗 twigfoot has joined #archiveteam-bs
01:58 🔗 ranma has joined #archiveteam-bs
01:58 🔗 paul2520 has joined #archiveteam-bs
01:58 🔗 Raccoon` has joined #archiveteam-bs
01:58 🔗 Wingy has joined #archiveteam-bs
01:58 🔗 alex73__ has quit IRC (Read error: Connection reset by peer)
01:58 🔗 Laverne has joined #archiveteam-bs
01:58 🔗 Lord_Nigh has joined #archiveteam-bs
01:58 🔗 fuzzy8021 has quit IRC (Read error: Operation timed out)
01:58 🔗 Raccoon has quit IRC (Read error: Connection reset by peer)
01:58 🔗 Raccoon` is now known as Raccoon
01:58 🔗 icedice has quit IRC (Read error: Connection reset by peer)
01:58 🔗 LowLevelM has quit IRC (Read error: Operation timed out)
01:59 🔗 systwi_ has joined #archiveteam-bs
02:00 🔗 Gfy has joined #archiveteam-bs
02:01 🔗 NIC007a83 has quit IRC (Read error: Operation timed out)
02:01 🔗 NIC007a83 has joined #archiveteam-bs
02:02 🔗 ranma_ has quit IRC (Read error: Operation timed out)
02:03 🔗 systwi has quit IRC (Read error: Operation timed out)
02:03 🔗 davis1 has joined #archiveteam-bs
02:06 🔗 fuzzy802 is now known as fuzzy8021
02:07 🔗 colona has quit IRC (Read error: Operation timed out)
02:07 🔗 colona has joined #archiveteam-bs
02:07 🔗 yano_ has joined #archiveteam-bs
02:08 🔗 obskyr has quit IRC (Read error: Operation timed out)
02:09 🔗 obskyr has joined #archiveteam-bs
02:09 🔗 yano has quit IRC (Read error: Operation timed out)
02:12 🔗 systwi_ has quit IRC (Read error: Connection reset by peer)
02:12 🔗 RichardG has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 Ryz has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 girst has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 Terbium has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 cppchrisc has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 closure_ has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 tonsofpcs has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 nyany_ has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 svchfoo3 has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 nico_32 has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 Datechnom has quit IRC (ircd.choopa.net irc.mzima.net)
02:12 🔗 twigfoot has quit IRC (Read error: Operation timed out)
02:13 🔗 twigfoot has joined #archiveteam-bs
02:13 🔗 systwi has joined #archiveteam-bs
02:15 🔗 atomicthu has quit IRC (Ping timeout: 610 seconds)
02:16 🔗 halt has quit IRC (Ping timeout: 610 seconds)
02:17 🔗 halt has joined #archiveteam-bs
02:17 🔗 atomicthu has joined #archiveteam-bs
02:22 🔗 RichardG has joined #archiveteam-bs
02:22 🔗 Ryz has joined #archiveteam-bs
02:22 🔗 girst has joined #archiveteam-bs
02:22 🔗 Terbium has joined #archiveteam-bs
02:22 🔗 cppchrisc has joined #archiveteam-bs
02:22 🔗 closure_ has joined #archiveteam-bs
02:22 🔗 tonsofpcs has joined #archiveteam-bs
02:22 🔗 nyany_ has joined #archiveteam-bs
02:22 🔗 svchfoo3 has joined #archiveteam-bs
02:22 🔗 nico_32 has joined #archiveteam-bs
02:22 🔗 Datechnom has joined #archiveteam-bs
02:22 🔗 irc.mzima.net sets mode: +o svchfoo3
02:27 🔗 LordNigh2 has joined #archiveteam-bs
02:28 🔗 SketchCow FOS still down, no updates
02:33 🔗 davis1 has quit IRC (se.hub efnet.portlane.se)
02:33 🔗 Gfy has quit IRC (se.hub efnet.portlane.se)
02:33 🔗 Lord_Nigh has quit IRC (se.hub efnet.portlane.se)
02:33 🔗 Laverne has quit IRC (se.hub efnet.portlane.se)
02:45 🔗 SketchCow "update: he's on-site and after resetting the breaker has found some rack pdu problems that require running cables to nearby racks"
02:48 🔗 LordNigh2 is now known as Lord_Nigh
02:56 🔗 SketchCow "starting to power up the hosts"
02:58 🔗 logchfoo1 starts logging #archiveteam-bs at Thu Feb 27 02:58:35 2020
02:58 🔗 logchfoo1 has joined #archiveteam-bs
02:58 🔗 thuban3 has quit IRC (Ping timeout: 255 seconds)
02:58 🔗 phirephly has quit IRC (Read error: Operation timed out)
02:58 🔗 scorche has joined #archiveteam-bs
02:59 🔗 phirephly has joined #archiveteam-bs
02:59 🔗 thuban3 has joined #archiveteam-bs
03:01 🔗 achip has quit IRC (Ping timeout: 255 seconds)
03:06 🔗 VADemon_ has joined #archiveteam-bs
03:06 🔗 yano has joined #archiveteam-bs
03:06 🔗 scorche has quit IRC (Ping timeout: 255 seconds)
03:07 🔗 ndiddy_ has quit IRC (Ping timeout: 255 seconds)
03:07 🔗 SketchCo1 has joined #archiveteam-bs
03:08 🔗 ndiddy2 has joined #archiveteam-bs
03:10 🔗 SketchCo1 FOS is back up.
03:11 🔗 yano_ has quit IRC (hub.efnet.us irc.Prison.NET)
03:11 🔗 Mateon1 has quit IRC (hub.efnet.us irc.Prison.NET)
03:11 🔗 SketchCow has quit IRC (hub.efnet.us irc.Prison.NET)
03:11 🔗 VADemon has quit IRC (hub.efnet.us irc.Prison.NET)
03:11 🔗 superkuh has quit IRC (hub.efnet.us irc.Prison.NET)
03:11 🔗 mundus201 has quit IRC (hub.efnet.us irc.Prison.NET)
03:11 🔗 Somebody2 has quit IRC (hub.efnet.us irc.Prison.NET)
03:11 🔗 JAA :-)
03:11 🔗 mundus20- has joined #archiveteam-bs
03:12 🔗 superkuh_ has joined #archiveteam-bs
03:26 🔗 SketchCo1 And this will change, VERY soon, but I bet the net connection is zippy
03:26 🔗 SketchCo1 is now known as SketchCow
03:29 🔗 Larsenv What's FOS?
03:29 🔗 Larsenv is it a pipeline?
03:30 🔗 fredgido_ has joined #archiveteam-bs
03:30 🔗 robogoat has quit IRC (Read error: Operation timed out)
03:30 🔗 MRX3 has joined #archiveteam-bs
03:30 🔗 fuzzy802 has joined #archiveteam-bs
03:30 🔗 JAA has quit IRC (Read error: Operation timed out)
03:30 🔗 antomatic has joined #archiveteam-bs
03:30 🔗 Mateon1 has joined #archiveteam-bs
03:30 🔗 Smiley has joined #archiveteam-bs
03:31 🔗 Selavi has quit IRC (Read error: Operation timed out)
03:31 🔗 Stiletto has joined #archiveteam-bs
03:31 🔗 Zebranky has quit IRC (Read error: Operation timed out)
03:31 🔗 fuzzy8021 has quit IRC (Read error: Operation timed out)
03:31 🔗 tapedrive has quit IRC (Read error: Operation timed out)
03:31 🔗 sknebel has quit IRC (Read error: Operation timed out)
03:31 🔗 luckcolor has quit IRC (Read error: Operation timed out)
03:31 🔗 Zebranky has joined #archiveteam-bs
03:31 🔗 yano has quit IRC (Read error: Operation timed out)
03:31 🔗 SmileyG has quit IRC (Read error: Operation timed out)
03:31 🔗 tapedrive has joined #archiveteam-bs
03:31 🔗 antomati_ has quit IRC (Read error: Operation timed out)
03:31 🔗 AlsoJAA Larsenv: https://www.archiveteam.org/index.php?title=Fortress_of_Solitude
03:32 🔗 Larsenv interesting
03:32 🔗 mistym has quit IRC (Ping timeout: 260 seconds)
03:32 🔗 cf has quit IRC (Ping timeout: 260 seconds)
03:32 🔗 simon816 has quit IRC (Ping timeout: 260 seconds)
03:32 🔗 kiska18 has quit IRC (Read error: Operation timed out)
03:32 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
03:32 🔗 SynMonger has quit IRC (Ping timeout: 260 seconds)
03:32 🔗 jodizzle has quit IRC (Ping timeout: 260 seconds)
03:32 🔗 benjinsmi has joined #archiveteam-bs
03:32 🔗 eythian has quit IRC (Read error: Operation timed out)
03:33 🔗 cf has joined #archiveteam-bs
03:33 🔗 icedice2 has quit IRC (Ping timeout: 260 seconds)
03:33 🔗 d5f4a3622 has quit IRC (Ping timeout: 260 seconds)
03:33 🔗 benjinss has quit IRC (Read error: Operation timed out)
03:33 🔗 SynMonger has joined #archiveteam-bs
03:34 🔗 sknebel has joined #archiveteam-bs
03:36 🔗 fredgido has quit IRC (Ping timeout: 492 seconds)
03:37 🔗 mistym has joined #archiveteam-bs
03:37 🔗 robogoat has joined #archiveteam-bs
03:38 🔗 Gfy has joined #archiveteam-bs
03:38 🔗 Somebody2 has joined #archiveteam-bs
03:38 🔗 irc.Prison.NET sets mode: +o Somebody2
03:40 🔗 jodizzle has joined #archiveteam-bs
03:40 🔗 fuzzy802 is now known as fuzzy8021
03:45 🔗 eythian has joined #archiveteam-bs
03:45 🔗 Somebody2 has quit IRC (hub.efnet.us irc.Prison.NET)
03:48 🔗 NIC007a83 has quit IRC (Ping timeout: 276 seconds)
03:49 🔗 NIC007a83 has joined #archiveteam-bs
03:54 🔗 scorche has joined #archiveteam-bs
03:54 🔗 Somebody2 has joined #archiveteam-bs
03:54 🔗 irc.Prison.NET sets mode: +o Somebody2
03:54 🔗 achip has joined #archiveteam-bs
03:54 🔗 sknebel_ has joined #archiveteam-bs
03:57 🔗 fuzzy8021 has quit IRC (Read error: Connection reset by peer)
03:58 🔗 ndiddy2 has quit IRC (Read error: Operation timed out)
03:58 🔗 paul2520 has quit IRC (Read error: Operation timed out)
03:58 🔗 balrog has quit IRC (Read error: Operation timed out)
03:58 🔗 jake_test has quit IRC (Read error: Operation timed out)
03:58 🔗 twigfoot has quit IRC (Read error: Operation timed out)
03:58 🔗 fuzzy8021 has joined #archiveteam-bs
03:58 🔗 twigfoot has joined #archiveteam-bs
03:58 🔗 sknebel has quit IRC (Ping timeout: 610 seconds)
03:59 🔗 ndiddy2 has joined #archiveteam-bs
03:59 🔗 kiska has quit IRC (Read error: Operation timed out)
03:59 🔗 Coderjo_ has quit IRC (Read error: Operation timed out)
03:59 🔗 chfoo has quit IRC (Read error: Operation timed out)
04:00 🔗 chfoo has joined #archiveteam-bs
16:39 🔗 logchfoo0 starts logging #archiveteam-bs at Thu Feb 27 16:39:22 2020
16:39 🔗 logchfoo0 has joined #archiveteam-bs
16:59 🔗 atphoenix 11k results per https://www.google.com/search?client=firefox-b-1-d&q=+site:temple.edu+astro.temple.edu . Examples: https://astro.temple.edu/~smatsika/ https://astro.temple.edu/~tue52586/
17:02 🔗 Ryz I've tried finding a directory in regards to https://astro.temple.edu/ to speed this up tremendously; nothing yet :c
17:04 🔗 Ryz atphoenix: Like this one that finally came back after 12-13 months, logchfoo0
17:04 🔗 Ryz *12-13 hours
17:09 🔗 atphoenix another way to find astro pages: https://search.temple.edu/search.php?searchString=astro
17:09 🔗 atphoenix or https://search.temple.edu/search.php?searchString=astro.temple.edu
17:10 🔗 atphoenix that shouldn't be subject to the same constraints as trying to scrape from a Google search
17:10 🔗 Ryz No, not really, it has a pagination limit of 100 pages when I pushed it hard enough :c
17:11 🔗 Ryz It's more closer than Google is though
17:13 🔗 Ryz Here's page 100 - https://search.temple.edu/search.php?searchStart=990&searchString=astro.temple.edu - and page 101 - https://search.temple.edu/search.php?searchStart=1000&searchString=astro.temple.edu - and stuff going far like https://search.temple.edu/search.php?searchStart=10000&searchString=astro.temple.edu is still the same page 101
17:14 🔗 Ryz Mm, search results being capped like that suck...
17:19 🔗 thuban3 are there non-user pages? or could the usernames simply be brute-forced?
17:23 🔗 atphoenix this is a tighter search: https://search.temple.edu/search.php?searchString=astro.temple.edu%2F%7E
17:23 🔗 atphoenix added the tilde
17:24 🔗 atphoenix says total results 21k
17:27 🔗 atphoenix goes from page 97 of 2180 here https://search.temple.edu/search.php?searchStart=960&searchString=astro.temple.edu%2F to 98 out of 100 afterwards
17:31 🔗 Ryz Unsure if helpful, but here's the process of how a webpage is uploaded and updated for those users: https://its.temple.edu/uploading-your-dreamweaver-files-astro-web-server
17:31 🔗 atphoenix may also be able to find more by trying stuff like adding letters to the search, just to mix things up. It's not really wildcarding though : astro.temple.edu/z
17:33 🔗 atphoenix https://search.temple.edu/search.php?searchString=astro.temple.edu+class
17:35 🔗 atphoenix change class to course, professor, letters a-z, other relevent keywords etc. as well. Guessing that doing those combinations over the 100 available search results pages should provide decent coverage.
17:43 🔗 DLoader_ has joined #archiveteam-bs
17:54 🔗 DLoader has quit IRC (Ping timeout: 745 seconds)
17:54 🔗 DLoader_ is now known as DLoader
18:06 🔗 NIC007a83 has quit IRC (Ping timeout: 615 seconds)
18:13 🔗 systwi has quit IRC (Give me your HAND, and I'll help you across.)
18:13 🔗 systwi has joined #archiveteam-bs
18:58 🔗 thuban3 JAA what are your thoughts on trying to be smart as opposed to brute-forcing homedirs
18:59 🔗 thuban3 hit rate would be _low_ but with six months do we care?
19:02 🔗 NIC007a83 has joined #archiveteam-bs
19:17 🔗 atphoenix thuban3, from the homedir names I have seen, I do not see a regular pattern. I think it would be better to scrape the temple search page, and scrape Google, using multiple search parameters
19:18 🔗 atphoenix (not necessary just Google. Scrape from any/all search engines.)
19:21 🔗 atphoenix also, with a potentially very sparse search space, BF would result in many 404s in their logs
19:24 🔗 NIC007a83 has quit IRC (Ping timeout: 745 seconds)
19:26 🔗 thuban3 none appear to be long, the pattern is restricted ([a-z]+[0-9]*), and brute force covers what searches, especially truncated searches, can easily miss
19:26 🔗 thuban3 action on their end is the only thing i would worry about, but JAA would know more about that
19:37 🔗 NIC007a83 has joined #archiveteam-bs
20:18 🔗 atphoenix 8 chars may be the limit. This professor's page is shorter than his actual name, probably to fit into 8 chars. https://astro.temple.edu/~rosenthl/
20:28 🔗 atphoenix With 36 possible chars per position, and 8 positions ... 36^8 = 2,821,109,907,456 combos. Assuming 6 months -> would need to check ~182k per second for 6 months to exhaust the namespace.
20:28 🔗 OrIdow6 atphoenix: CDX server supports you on how many there are (curl "https://web.archive.org/cdx/search?url=astro.temple.edu/~*&collapse=urlkey" | grep --invert-match " 404 " | cut -d" " -f1 | cut -d"/" -f2 | grep -E --only-matching "~[a-zA-Z0-9_\-]{9,}")
20:29 🔗 OrIdow6 Also, the CDX server gives 3660 unique names, if that's useful for discovery
20:30 🔗 atphoenix OrIdow6, I guess asking archive.org is another way to discover additional names.
20:32 🔗 atphoenix A variety of directory searches using possibly relevant keywords (or even just top dictionary words), combined with results from various sources like archive.org known pages should give good coverage.
20:44 🔗 BlueMax has joined #archiveteam-bs
21:25 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
21:34 🔗 JAA Yeah, doesn't really seem bruteforcable. We'd need a good list of candidates. Most combinations are very unlikely to exist.
21:35 🔗 JAA Regarding scraping search engines, the only one that lets you do that at a reasonable speed is Bing. All others I've tried ban you very quickly.
21:35 🔗 JAA https://github.com/JustAnotherArchivist/little-things/blob/master/bing-scrape
21:46 🔗 atphoenix I found a google scraping service a few weeks back. I think they use some kind of MIPS solution to address Google's limits, and also to deal with regionalization.
21:46 🔗 atphoenix (not free)
21:51 🔗 atphoenix As for temple, there are 4 char users in the directories. It's only 1.7m combos for up to 4 chars. Even 5 chars is plausible at 60m combos. A partial brute may be plausible as a supplement to other means of identifying users. But due to the noise it is probably best to exhaust and gather everything possible using other means first.
21:52 🔗 JAA 60 M is easy.
22:08 🔗 thuban3 you can probably stretch it further as i believe numbers can be found only at the end
23:24 🔗 thuban3 (another option might be to find someone with access to the directory (https://directory.temple.edu/) / alumni directory (https://www.alumni.temple.edu/s/705/index.aspx?sid=705&gid=1&pgid=6&cid=41))
23:26 🔗 thuban3 (some of the homedir names seem consistent with the stated alumnus username format. it seems likely that some pages are up under the original usernames of people who are no longer with the university, but i have no idea whether either directory would list them)
23:36 🔗 atphoenix https://directory.temple.edu/ was working for me for maybe 3-4 searches but now it tells me To continue searching from off campus, you must login.
23:37 🔗 atphoenix I did not see any homepage URLs in the results I skimmed over
23:38 🔗 atphoenix I also do not see any homepage URLs under https://education.temple.edu/about/faculty-staff
23:44 🔗 atphoenix there is definitely old stuff out there. 2003: https://astro.temple.edu/~boyer/_baks/index.htm.0080.b558.bak
23:44 🔗 thuban3 does the directory not have emails? i assume homepages would be under the same username
23:46 🔗 atphoenix IA has been over parts of astro http://web.archive.org/web/20050209174031/https://astro.temple.edu/~boyer/
23:47 🔗 atphoenix the homedirs do not match emails. e.g. https://astro.temple.edu/~rosenthl/ vs firstname.lastname for his listed email address
23:48 🔗 thuban3 ah, unusual
23:49 🔗 atphoenix guessing that as astro is old, it follows an 8 char limit, while the email systems either always allow long form or were upgraded later to add long form.
23:52 🔗 thuban3 i would be curious to know whether homedir usernames are based on those of tu's "accessnet" portal or an entirely independent system
23:52 🔗 atphoenix Some universities handled IDs by assigning them but aliases could be elected. And some people faculty who got email early on could choose any unused name. I know some faculty at my university had firstname@university.edu
23:58 🔗 thuban3 https://its.temple.edu/email-account it looks like those usernames are in a 'tua[0-9]{5}' format (ever other letters?), and the firstname.lastname emails are auto-generated aliases per https://its.temple.edu/email-aliases-creating-and-using

irclogger-viewer